Yesterday I responded to an emergency callout to a customer with 800 users running a single Exchange 2010 SP3 UR3 multi role server running on Windows Server 2008 R2 SP1. The server server would not boot and was simply blue screening. We were not able to access Windows in anyway even by booting into safe mode and had no indication as to why the failure occurred as we could not access the server event logs. The Exchange 2010 SP3 server was running on top of a VMware vSphere 5.1 clustered environment hosted on shared storage.
This server was one I setup a couple of years back and as a result it followed my standard multi-role Exchange server build which consists of two or more NTFS volumes.
Note: This server only had two volumes however additional volumes can exist in the event additional databases are required.
As the system volume was corrupt and no longer booting, this needed to either rebuilt or restored from backup. The database/log volume was assumed to be fine and the plan was to simply re-attach the database/log files after the system volume containing Windows and Exchange server was restored. We did have a full backup of the Exchange 2010 server through Backup Exec 2010 R3 SP3 which was taken on the weekend of both the system volume and system state. As a result we had two methods for recovering this Exchange 2010 SP3 server bringing it back online:
Backup Exec 2010 R3 SP3 Issues
I have been working with Backup Exec for over 8 years now back when it was owned by a company named Veritas. Every time I have had to perform a full system restore of a failed server, it has always been a cumbersome process aligned with multiple challenges. After the so many years of having this product on the market, you think the functionality of the "Backup" and "Restore" processes would be completely ironed out and bullet proof, after the primary purpose of this product is to backup and restore data. However, this my friends is what makes Backup Exec "special", the ability to cause companies pain by not performing these tasks.
Despite having used Backup Exec to recover servers in the past, from the history experienced with the product, to give myself the best chance for a successful restore I put what I knew aside and followed the Symantec online documentation exactly. To restore a server running Windows Server 2008 or Windows Server 2008 R2, Symantec has published a knowledge base article on their support page to restore a remote Windows 2008 computer which has completely failed. This is the most important support page article they could publish online as it documents the steps to restore a failed Windows Server, again the whole point of a backup and restore program. As a result you think the instructions documented on this particular article would be 100% accurate and reviewed carefully by the Symantec support team. Unfortunately, this is not the case and the below article does have technical mistakes to performing a successful restore of a Windows 2008 server.
http://www.symantec.com/business/support/index?page=content&id=TECH86323
In the instructions taken from the above article entitled "Disaster Recovery restore steps for a remote Windows 2008 computer" it states to:
Following these instructions exactly, I was unable to proceed with the restore documented further down in step 10 in the documentation. At this point we raised a support case with Symantec Gold Support to assist us with restoring the server. The Symantec support engineer after taking the case advised us that the instructions documented online were incorrect and the server needed to be joined to the domain.
After joining the computer to the domain I was able to proceed with performing the restore task. The task proceeded successfully until it got to 91% where the following error was generated.
V-79-57344-782
The job failed with the following error: Unable to swap out active registry hive with new data
At this point the Symantec engineer asked us to retry the process. It failed again, exact same experience. At this point the company had been without email for over 12 hours due to the lengthy timeframe Backup Exec takes to restore data.
Symantec then ran a tool on the Backup Exec media server to collect a bunch of logs for analysis and advised us that they require 48 hours to review why the server restore is failing. When we asked them the question "so you expect us to be without email for 48 hours", their response was yes there is nothing we can do.
Exchange 2010 Recover Server Installation
At this point I advised my customer that we need to forget about Backup Exec and proceed doing a native Exchange 2010 Recover Server installation using the Setup /m:RecoverServer, something I had faith in working. We rebuilt the Windows 2008 R2 server, provided it the same host name, re-joined it to the domain, installed all windows updates along with Exchange 2010 pre-requisites.
One thing I noticed is the Exchange 2010 SP3 media does not allow you to run Setup /m:RecoverServer, it errors out. You must run the Recover Server installation from Exchange 2010 SP2 media and then after the install proceed to installing Exchange 2010 SP3 followed by the latest update rollup.
We had the server up within 2 hours of starting this procedure.
Summary
There is no denying that the restore procedure documented under http://www.symantec.com/business/support/index?page=content&id=TECH86323 should have worked for servers backed up using a Backup Exec agent.
This being said, it is important to note that Backup Exec has better methods for backing up and recovering servers which this customer is not currently following. In virtual environments running VMware ESX it is recommended that companies present the VMware datastores running Virtual Machine File System (VMFS) to the Backup Exec media server. Whilst Windows cannot read VMFS, Backup Exec can allowing it to directly backup the Virtual Machine hard disks (VMDKs) and configuration files. This backup method provides faster backup speeds and reliable restores as Backup Exec no longer needs to rely on Backup Exec agents, instead it can communicate directly with vCentre to snapshot virtual machines for backup purposes and restore entire virtual machines back to their original state.
In addition to virtual machine level backups, Symantec Backup Exec 2012 also offers an alternative way for recovering servers backed up using a Backup Exec agent. Backup Exec 2012 provides a recovery media for companies to boot off providing the ability to directly restore a server taken from an agent based backup. This means companies no longer need to:
For companies which have already made significant investment in Backup Exec, before throwing away the investment with Symantec, it is advised that customers review their backup strategy to ensure it suits the infrastructure within their company.
This server was one I setup a couple of years back and as a result it followed my standard multi-role Exchange server build which consists of two or more NTFS volumes.
- Volume 1 (SYSTEM) consists of Operating System, Page File and Exchange System Files.
- Volume 2 (Logs + Database) which contain the Exchange 2010 database and log files
Note: This server only had two volumes however additional volumes can exist in the event additional databases are required.
As the system volume was corrupt and no longer booting, this needed to either rebuilt or restored from backup. The database/log volume was assumed to be fine and the plan was to simply re-attach the database/log files after the system volume containing Windows and Exchange server was restored. We did have a full backup of the Exchange 2010 server through Backup Exec 2010 R3 SP3 which was taken on the weekend of both the system volume and system state. As a result we had two methods for recovering this Exchange 2010 SP3 server bringing it back online:
- Recover the servers system volume and system state using the last backup taken with Backup Exec and relink it to the database/log volume.
- Recover the Server by performing an Exchange Recover Server installation to reconnect a newly installed Exchange server to the existing configuration stored in the Active Directory configuration partition using the Setup /m:RecoverServer switch.
Backup Exec 2010 R3 SP3 Issues
I have been working with Backup Exec for over 8 years now back when it was owned by a company named Veritas. Every time I have had to perform a full system restore of a failed server, it has always been a cumbersome process aligned with multiple challenges. After the so many years of having this product on the market, you think the functionality of the "Backup" and "Restore" processes would be completely ironed out and bullet proof, after the primary purpose of this product is to backup and restore data. However, this my friends is what makes Backup Exec "special", the ability to cause companies pain by not performing these tasks.
Despite having used Backup Exec to recover servers in the past, from the history experienced with the product, to give myself the best chance for a successful restore I put what I knew aside and followed the Symantec online documentation exactly. To restore a server running Windows Server 2008 or Windows Server 2008 R2, Symantec has published a knowledge base article on their support page to restore a remote Windows 2008 computer which has completely failed. This is the most important support page article they could publish online as it documents the steps to restore a failed Windows Server, again the whole point of a backup and restore program. As a result you think the instructions documented on this particular article would be 100% accurate and reviewed carefully by the Symantec support team. Unfortunately, this is not the case and the below article does have technical mistakes to performing a successful restore of a Windows 2008 server.
http://www.symantec.com/business/support/index?page=content&id=TECH86323
In the instructions taken from the above article entitled "Disaster Recovery restore steps for a remote Windows 2008 computer" it states to:
- Build a new Windows Server 2008 computer by performing a fresh install
- Provide it the same computer name as before
- Ensure it has the same disk configuration as the previous system
- Do not join the new computer to an Active Directory domain, instead leave it as Workgroup.
Following these instructions exactly, I was unable to proceed with the restore documented further down in step 10 in the documentation. At this point we raised a support case with Symantec Gold Support to assist us with restoring the server. The Symantec support engineer after taking the case advised us that the instructions documented online were incorrect and the server needed to be joined to the domain.
After joining the computer to the domain I was able to proceed with performing the restore task. The task proceeded successfully until it got to 91% where the following error was generated.
V-79-57344-782
The job failed with the following error: Unable to swap out active registry hive with new data
After rebooting the server being restored, Windows Boot Manager was unable to boot the server as it was left in a corrupt state.
Symantec then ran a tool on the Backup Exec media server to collect a bunch of logs for analysis and advised us that they require 48 hours to review why the server restore is failing. When we asked them the question "so you expect us to be without email for 48 hours", their response was yes there is nothing we can do.
Exchange 2010 Recover Server Installation
At this point I advised my customer that we need to forget about Backup Exec and proceed doing a native Exchange 2010 Recover Server installation using the Setup /m:RecoverServer, something I had faith in working. We rebuilt the Windows 2008 R2 server, provided it the same host name, re-joined it to the domain, installed all windows updates along with Exchange 2010 pre-requisites.
One thing I noticed is the Exchange 2010 SP3 media does not allow you to run Setup /m:RecoverServer, it errors out. You must run the Recover Server installation from Exchange 2010 SP2 media and then after the install proceed to installing Exchange 2010 SP3 followed by the latest update rollup.
We had the server up within 2 hours of starting this procedure.
Summary
There is no denying that the restore procedure documented under http://www.symantec.com/business/support/index?page=content&id=TECH86323 should have worked for servers backed up using a Backup Exec agent.
This being said, it is important to note that Backup Exec has better methods for backing up and recovering servers which this customer is not currently following. In virtual environments running VMware ESX it is recommended that companies present the VMware datastores running Virtual Machine File System (VMFS) to the Backup Exec media server. Whilst Windows cannot read VMFS, Backup Exec can allowing it to directly backup the Virtual Machine hard disks (VMDKs) and configuration files. This backup method provides faster backup speeds and reliable restores as Backup Exec no longer needs to rely on Backup Exec agents, instead it can communicate directly with vCentre to snapshot virtual machines for backup purposes and restore entire virtual machines back to their original state.
In addition to virtual machine level backups, Symantec Backup Exec 2012 also offers an alternative way for recovering servers backed up using a Backup Exec agent. Backup Exec 2012 provides a recovery media for companies to boot off providing the ability to directly restore a server taken from an agent based backup. This means companies no longer need to:
- Install Windows Server from Microsoft Installation Media
- Provide the server the same host name and drive letters
- Join the server to the domain
- Deploy the Backup Exec Agent
- Start the Recovery Process
For companies which have already made significant investment in Backup Exec, before throwing away the investment with Symantec, it is advised that customers review their backup strategy to ensure it suits the infrastructure within their company.