The Ultimate Disaster Recovery Plan Checklist

Prepare for emergencies with an actionable disaster recovery plan. Use our checklist to build robust procedures, define roles and update protocols for business continuity.

Dave Raffo, MSP News Editor

June 25, 2024

1 Min Read
Disaster recovery plan checklist
Jack_the_sparow/Shutterstock

Introduction to Disaster Recovery Planning

Disaster recovery (DR) is the process of restoring an organization’s data, systems and applications to resume business following a natural or human-caused disaster. This process requires a great deal of planning, whether a company conducts its own disaster recovery or enlists a partner that provides DR as a service (DRaaS).

Understanding the Need for a Robust DR Plan

Most organizations can't afford to lose critical data and records for more than a few days. Many can’t afford to be offline even for a few hours. And all organizations must be prepared for disruptions. Disasters that can disrupt operations include:

  • Acts of nature: floods, floods, fires, hurricanes, earthquakes, tornadoes, blizzards, pandemics, etc.

  • External attacks: active shooters, terrorist strikes, cyberattacks and more.

  • Internal issues: accidental or intentional human error.

  • Technology failures: power outages, equipment failure, etc.

These disasters can physically damage or restrict access to buildings, damage or destroy IT equipment, and interrupt supply chains. A disaster can result in the loss of key IT infrastructure or even entire data centers. It can also sever connections to a service or cloud provider, knock out crucial software applications and communications systems, and impact the ability to access, back up and restore data.

A robust DR plan can prevent the loss of critical data, minimize production delays, even mitigate a hit on a company’s reputation while protecting its bottom line. A DR plan is a road map for streamlining an organization’s response when disaster strikes so it can recover faster and keep data secure. A disaster recovery checklist is the first step in creating a DR plan.

It’s important to know that disaster recovery goes far beyond the practice of backing up and restoring data. A backup makes copies of data that can be used to replace data that is deleted, destroyed or unavailable for any other reason. DR is an end-to-end plan for running a business efficiently during and immediately after a disaster.

A DR plan is a road map for recovery that involves all aspects of a company. Besides providing a way to bring technology back online, a good DR plan also involves logistics. Can employees function temporarily from outside their offices and without some key infrastructure until operations are fully restored? A DR plan is a coordinated response road map that should incorporate existing security and incident response protocols.

The Essential Components of a Disaster Recovery Checklist

Any good DR plan involves the following steps:

  • Identifying critical systems that you need to get back online quickly.

  • Determining a way to bring those systems back up fast enough to meet your needs.

  • Establishing backup sites and systems in place to survive until full restoration from outages.

  • Identifying outside groups such as vendors, contractors and other business partners whose help you will require.

To accomplish those goals, an organization needs a disaster recovery checklist to follow. The following is a good framework for such a checklist, although each bullet point might include several steps:

  • Lay the foundation for your DR plan. Define clear recovery objectives (RTO and RPO), and inventory hardware, software and data critical to operations.

  • Establish the framework. Determine key personnel roles and responsibilities, and designate primary and alternative disaster recovery sites. Protect your assets with a strategic data backup and storage plan.

  • Regularly test your disaster recovery plan. Schedule periodic reviews and practice drills. Review and modify them. Make sure to update the plan according to emerging technologies and risks.

Channel Futures TV: Kyle Fenske of Scale Computing, which has disaster recovery solutions within its portfolio, talks with Channel Futures senior news editor at the 2024 Channel Partners Conference & Expo.

Laying the Foundation for Your DR Plan

Define Clear Recovery Objectives (RTO and RPO)

A good DR plan includes clearly defined recovery time objectives and recovery point objectives. These can vary depending on the criticality of your applications and types of data. A recovery time objective (RTO) is the goal for the maximum time it should take to restore operations after downtime. Recovery point objective (RPO) is the amount of time that can elapse between the outage and restoration of data. For example, if you can only afford to lose one day’s worth of data for a specific application, the RPO time for that app is 24 hours.

The best way to explain the difference between RPO and RTO is: RPO measures how long ago your last backup occurred before the outage, while RTO identifies the amount of time you want to pass before you recover the data. When defining RTO, you must ask how much data loss you can accept to return to business. Your RPO depends on how long you can go without data to allow your business to continue. RPO is measured by time, RPO by amount.

RPO_RTO_2024.jpg

Inventory Hardware, Software and Data Critical to Operations

A business recovery plan requires an inventory of hardware and software assets, and critical data required to return to business. You can’t know what you need to protect and restore unless you know what you have.

The inventory should include computers (servers, desktops, laptops, wireless devices), monitors, printers, storage systems (disk, solid-state, tape drives), communications, networking and telecom devices, heating/cooling systems and all critical business applications. The complete list should incorporate critical software assets and the hardware required to run them.

Establishing the Framework

Determine Key Personnel Roles and Responsibilities

An organization must have many areas of expertise on its DR planning team. Depending on the size of a company, these planning roles can be handled by groups or individuals. C-suite executives, IT teams and business units should all be involved at a minimum. These areas must be addressed:

  • Crisis management

  • Business continuity

  • Impact assessment and recovery

  • IT applications monitoring

  • Business units’ input

A crisis communications team must be part of a disaster recovery checklist. The CEO or another C-suite executive should lead this team, with senior executives filling other team roles. The team should identify all of the DR plan stakeholders and ensure they know their responsibilities when disaster strikes. The crisis communication team should take all types of threats under consideration, craft responses, look at best-case and worst-case scenarios, and plan accordingly. The crisis communication team should set up notification systems and determine the messages that will go out during a disaster. You want multiple types of communication, such as emails, texts and other company-wide channels. These communications channels should also be part of regular DR tests.

Designate Primary and Alternative Disaster Recovery Sites

A DR plan will often include alternative locations for employees to work, particularly those whose office location is impacted. Organizations with IT infrastructure mainly on-premises also will need alternative IT sites. Recovery sites are characterized as cold, warm and hot.

  • Cold sites include IT infrastructure but no software or data. These sites have low costs for setup and maintenance, but are only useful for applications with long RTO times.

  • Warm sites have IT infrastructure and a set of pre-installed software applications for critical data. These sites cost more than cold sites but have faster recovery times.

  • Hot sites are fully functional data centers that are similar – sometimes identical – to the organization’s primary site. This is the most expensive option because it doubles the cost of infrastructure but provides the fastest failover and failback times. (see Role of Failover and Failback section below).

One can also use a public cloud as a DR site. Organizations can replicate data to the cloud to match their on-premises environments. The cloud serves alternate sites but can be managed by the cloud provider. Using the cloud for DR can have lower upfront costs than maintaining an on-premises DR site, and the recovery process can be faster and automated. However, costs will rise as more data and applications move to the cloud. Certain vertical industries face strict compliance rules that require them to keep data inside their data center.

Channel Futures TV: Channel Futures editorial director Craig Galbraith gets a business update from Jason Pryce of Datto, a Kaseya company, whose roots are in backup and disaster recovery.

Role of Failover and Failback

Failover and failback are key processes in disaster recovery. Failover replicates data from the primary site to a secondary site, and failback replicates the data back to the primary site. Failover moves production from to the secondary (recovery) site, enabling the secondary site to function as the production site while the primary site is offline. Because failover should be triggered automatically when a system goes down, it requires constant monitoring of the primary system. The failback process returns production to the primary site after restoration. As with other key processes in DR planning, you must conduct regularly scheduled failover and failback tests.

Protecting Your Assets

Develop a strategic data backup and storage plan. While data backup and disaster recovery are different, backups are an important part of the DR process. A good DR plan requires a reliable backup of critical data. Regular backups and a sound restoration method are necessary to get a company up and running after a disaster. It’s important to make daily backups of data and other systems that change frequently, and weekly backups for resources that don’t change often.

Backed up data can be stored on-premises or at offsite locations such as public clouds. The DR team must include people who understand the company’s backup process, including the methods and requirements for restoring data from backups.

A good DR plan also needs to keep in mind compliance with industry regulations. Regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR) include restrictions on how and where an organization can store specific types of data. Some regulations also impose penalties for loss of certain types of data and mandate DR testing.

Testing and Maintenance

Regularly Test Your Disaster Recovery Plan

Disaster recovery testing is a crucial – and often overlooked – piece of any valid DR plan. Don't ignore DR testing, even if can be expensive and become labor intensive. You don’t want to wait until disaster strikes until you know if you can recover necessary assets.

The first step of DR testing is to document the entire process. After you've done that, you should review it regularly. While you should test the entire process at least once per year, you can also conduct more frequent tests for individual parts of the DR plan on a monthly or even weekly basis. You should update your DR plan accordingly based on the results of these tests.

Furthermore, you should also document each test, including any problems your business encounters, and the steps required to fix those problems. Your documentation should include exactly what was tested, and who conducted the tests.

You want to test hardware, software, processes and people when preparing for a disaster. Everyone involved must prove they know their roles in case of a disaster, and that they can fulfill their responsibilities.

Disaster_Recovery_Plan_2_2024.jpg

You should have several types of tests in your disaster recovery plan checklist. These range from simple reviews where stakeholders read through the plan (highlighting areas that may have changed or the responsibilities of newer employees) to full-blown tests where the main systems are taken down and must be recovered. Another popular type of DR test is a tabletop test, where key members are asked how they would react to a specific type of disaster. Tabletop tests should include representatives of all departments. It's a good way to find gaps in the plan, or to suggest stronger response methods.

You also want to test key technical processes on a regular basis. You can restore systems to an alternate site even if they are still functioning properly — a good way to test your backup recovery, for example. Or you can migrate servers, virtual machines or your entire system to another site and migrate back to make sure the process can be completed in a timely manner. You can use the public cloud as a migration target for virtual machines and data.

Incorporate Vendor and Supplier Agreements Into Your Plan

Any DR plan should not overlook vendors and suppliers if they provide critical technology and services. Using third parties can increase a company’s risk if not managed properly. You want to make sure any outside company that stores or manages your critical infrastructure and data has robust data recovery procedures in place. Remember that any third party in close geographical proximity to your physical location will often suffer from the same natural disasters or widespread, human-caused outages as your company.

Your third-party contacts should include service-level agreements (SLAs) to ensure your systems and data are protected and will meet your RTO and RPO requirements. If you use an outside backup service, it should have multiple locations for your most critical data.

You also want to check with outside vendors and providers to make sure they update their systems and processes, and conduct regular DR tests. Make sure you understand their DR plan well enough to ensure it will meet your needs if disaster strikes.

Disaster Recovery-as-a-Service (DRaaS) Options

Many organizations turn over all or part of their DR planning to service providers. DRaaS can be fully or partially managed by the provider.

With managed DRaaS, the provider handles all the DR planning and implementation. As with any other IT vendor, however, you need to stay connected with the DRaaS provider to make sure its infrastructure, software and policies are up to date and follow your organization’s needs.

Partially managed or assisted DRaaS shares the DR responsibilities among the provider and your organization. The provider might handle specific applications or sites, while your team implements other parts of the DR plan.

Read more about:

MSPsVARs/SIs
Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like