Legitimate Disaster Recovery
Defining “Disaster”
There is a fine line between ‘backups’ and ‘disaster recovery’. For example if you ask 10 people in the IT field what exactly defines a ‘disaster’ in the field of technology you will more than likely get 10 different answers. I like to think of a ‘disaster’ as a production system that is currently not in production. It does not matter how the server came into an offline state just that it is offline. When a production system is offline all or part of the business is offline which essentially hurts the business. I consider that a disaster from an IT perspective. The majority of businessmen and women don’t even care what a server is as long as there are no problems.
Defining “Recovery”
“I don’t care what it takes just get my system back online!” is a phrase that many IT people hear when a system is offline. Recovery processes vary by the organization and is a primary focus of the IT field. I view “Recovery” as bringing the production system, or a replacement, back online. Full recovery happens when the business can continue operations as before or better.
Let’s take a look at Microsoft Windows and the primary components of backup and recovery.
Microsoft Windows Disaster Recovery
There are two primary components for the backup and recovery of a windows system; the system state and the data (file system).
Here are some components contained in the Windows System State:
- Boot files (Boot.ini, NDTLDR, NTDetect.com)
- Registry – Including Applications and COM settings
- SYSVOL – Group Policy and Logon Scripts
- Active Directory NTDS.DIT (Domain Controllers)
- Certificate Store (If the service is installed)
Here are some examples of the components of the data (file system) portion:
- Microsoft Office Documents (.doc, .xls, .ppt, etc)
- Application and application installation files (.exe, .msi, etc)
- Backup files such as database dumps (.bak)
To be fully recoverable a system state and the data must be available and recoverable. To capture a system state manually within Windows Server 2003 you can run the ntbackup utility provided by Microsoft. In Windows Server 2008 you can capture a system state file manually by using wbadmin tool from the command line.
Backup Solutions at Donet
For those of who you are new to the idea of data de-duplication it is something that completely changes the game. EMC’s Avamar grid is a leading backup solution that we use internally and offer to our customers. I can talk for days about how data de-duplication has changed backup technology forever and I’m sure it will be covered across numerous other posts. Instead of traditional tape backups (how we all miss those) or just a robocopy of the data, we can install an Avamar client on a machine and take data directly from the system itself. Avamar has the option to backup the system state within the operating system for full disaster recovery in the event of any disaster.
With the system state backed up and the data we can re-install the system in less than a couple of hours. Traditionally backup/recovery involved ordering new hardware, waiting for it to arrive, re-installing the OS and applications, and then import any data that was backed up, usually spanning across numerous days. Now Donet can bring up another machine and revert to the system state and file system within hours if another machine is readily available.
But here is where it gets really exciting; converting physical machines to virtual machines (VM). Consider the following scenario:
At 3:00 P.M. on a Saturday a traditional server crashes due to disk failure. I immediately see the alerts from our monitoring system and contact all necessary parties involved. At 3:15 P.M. it has been decided to create a VM and bring up the system as soon as possible. I can spin up a VM with an operating system installed in 20 minutes. At 3:40 we can perform the system state restore and data restore in approximately 40-50 minutes. At 4:30 I contact Donet’s networking team to complete IP configuration (approximately 20 minutes) and just like that we have a fully functional machine in less than 2 hours. Post production tests will then be performed to verify integrity of all systems and troubleshoot as needed. Keep in mind these actions can be done remotely from the comfort of my own home using our existing infrastructure.
This is legitimate disaster recovery.
Tags: Avamar, backup, disaster recovery, downtime, EMC, recovery, Windows
Leave a Reply