Creating a Fail-safe Data Centre Disaster Recovery Plan

Whether you host the bulk of your infrastructure in your own data centre, a third-party data centre or a mix of both, you need a data center disaster recovery plan. These plans differ from the traditional disaster recovery plan as they take into account the actual data centre, its location, infrastructure and environmental systems, amongst other things. A DR plan, whilst still necessary, deals more with the actual IT systems that could be disrupted during an outage or event and the process of recovering them. The right data centre services provider can help you create your plan whether it’s for your own data centre or a third-party facility.

Check the operations

Depending on your arrangement, this exercise could involve input from internal data centre teams, external data centre providers and other resources. Anything related to the data centre infrastructure you use should be considered in your assessment. This will include building and floor plans, environmental features and network configuration documents. If you’re using a third-party facility, they might already have their own data centre disaster recovery plan. If you’re relying on your own facility, you’ll need to assess the biggest potential issues that could affect the data centre, such as security breaches and power outages, which types of disruptions have affected you in the past and what the current process is for addressing them.

Depending on the information you uncover, you might need to retest certain procedures and redefine the maximum outage time you can bear. You’ll also want to ensure you know which key staff will need to be available to respond to data centre incidents and whether they need any additional training or retraining. In addition, outline the response procedures of any third-party providers and if they ran smoothly last time they were used.

Know your gaps

Mining this information will give you a current-state picture of what you are missing in your data centre disaster recovery plan strategy. It will also help you identify the most pressing risk scenarios, whether they are related to nature, security or human error. You’ll aim to list these potential situations in order of impact, seriousness and probability to help formulate the proper steps and procedures to take to respond to them. Then you’ll outline how to achieve your desired future state of data centre readiness and what this will require in terms of resourcing, staff training and budget.

After the plans are reviewed and next steps are actioned, the data centre disaster recovery plan should be tested and implemented once any tweaks needed are made. Going forward, you’ll want to schedule regular audits of the plan to ensure it still meets your business’s needs and reflects the current state of your data centre assets and arrangement.

Remember to enlist the help of your data centre services provider or hosting facility in creating your data centre disaster recovery plan. Their expertise will help ensure all your bases are covered so you have the most protection from potential outages and incidents.

The importance of preparing for disaster

By Arthur Shih

“My data is in the cloud, why do I need to worry about disaster recovery or backups?”

It’s a question that I get asked a lot.

I love the concept of cloud computing. Period. I love the fact that I can access my information at any time, from anywhere, on any device. I especially love the fact that I just have to consume the services and the content, without really worrying about the underlying infrastructure and what it does. What concerns me, though, is how people often think of it as a silver bullet – once their data is in the cloud, everything becomes someone else’s problem, including disaster recovery.

A true disaster isn’t when your service is down. A disaster is when your service is down, but a competitor’s service is still running. It’s times like these that potential customers move from one service to another, and more often than not, they never come back.

When you buy into a pure Infrastructure-as-a-Service cloud environment (like AWS or Rackspace), you are purchasing computational and storage resources to use as required. The beauty of it is you can buy it when you need it and have it immediately. What a lot of people forget, however, is that this fancy IaaS cloud you have just purchased still runs on physical hardware, and is stored in physical locations. As much as virtualisation and replication have lessened the impact of hardware failures, they have not removed them completely. A disaster may still strike and outages will happen. Think Hurricane Sandy. If your IaaS was being provided out of a datacentre located within the areas that lost power, then your service provider most likely would have had a failure, and your systems are down. And in the case of Hurricane Sandy, you have no idea when it will be back up again – unless you have a disaster recovery plan.

There are of course other events that may cause outages –Amazon have had many well-publicised failures that have nothing to do with natural disasters. The main thing to understand is that it’s not a matter of if it will fail, but when it fails.

“But I’m in the cloud, so isn’t everything load-balanced and copied across geographies automatically?”

That’s the second question I get asked after discussing the first point above. In some cases, yes, your data can be stored in multiple locations, but it is something you need to actively purchase on top of your original cloud service.

When thinking about disaster recovery, there are four key questions in my opinion that you need to answer:

  1. How long can I survive without my IT Systems?
  2. Once my systems are running again, how much information can I afford to have lost compared to the point in time when the system went down?
  3. Which systems are really the ones that I need up and running?
  4. How do my users then access these systems?

Also, never forget the human element of a disaster. You may have all the best disaster recovery plan in the world, but in the event of a disaster, it’s guaranteed that your staff will be more worried about getting home to check on their families than how to respond to customers’ emails.

Arthur Shih is Datacom’s Cloud Solutions Manager.

Why Disaster Recovery Should Be Standard Operating Procedure

Not that long ago, disaster recovery often simply entailed organisations making back-ups of critical files and a staff member taking tape back-ups home “just in case” anything happened.

Of course, the IT field has evolved considerably since those simpler times. Yet, many organisations aren’t keeping pace with current disaster recovery technology and standards. Failing to update your approach to disaster recovery doesn’t leave your organisation at risk during a catastrophe — it puts your organisation at risk every day. Taking a holistic approach to disaster recovery planning — with the help of a DR or data centre services provider — will help you cover each minute detail so your organisation is ready to withstand the simplest and worst of unplanned incidents.

The Full Scope of Disaster Recovery

Disaster recovery is not simply what you do when the disaster strikes, but what you do to mitigate risk and ensure business continuity for technology and the related processes. CSO Online defines disaster recovery as the “planning and processes that help organisations prepare for disruptive events — whether those events might include a hurricane or simply a power outage caused by a backhoe in the parking lot.” Preparing for such an event, whether it be hurricane or errant backhoe, means creating and maintaining a solution that covers:

  • Scalability that accounts for new processes and data beyond planned growth
  • Redundancy of critical servers and infrastructure — particularly for customer-facing processes
  • Failover systems that continue business operations if a disaster strikes
  • Secure back-ups that aren’t harmed in an emergency and can be retrieved as soon as possible
  • Vetting all SaaS programs to ensure vendors meet your disaster recovery standards
  • Written and known procedures for your staff to follow in a disaster, and for end-users if their workflow changes during the process of an event

Creating a Disaster Recovery Plan

Like most large-scale IT projects, the process of crafting a disaster recovery plan will demand two very important elements:

  • A large majority of your staff’s time and resources, likely meaning an adjustment in their day-to-day duties that could hamper operations
  • Experts who are familiar with the best disaster recovery technology and protocols, particularly if you’re in a highly regulated industry

Instead of continually postponing planning until your internal resources are available and the stars have aligned, you can rely on a disaster recovery partner to guide your business through the process. Besides freeing your staff, a team of experts will help you:

  • Assess how all business processes — inside IT and within your organisation — will be affected during a disaster
  • Audit your infrastructure, technology and technology vendors to determine gaps in disaster recovery, including redundancy and failovers
  • Draft plans for everyone in your organisation that explain any alternate workflows, work locations or different technology to use during a disaster
  • Manage the implementation of disaster recovery technology and plans

If your fellow executives question the cost of this project, a cost-benefits analysis demonstrating how much business you’ll lose during days of downtime if you didn’t have a plan versus how much you’ll lose if a catastrophe strikes and you were prepared properly will likely make them start humming a different tune. But most importantly: Start planning now. After all, you never know when that backhoe will strike.

2 Crucial Components of Your Disaster Recovery Plan

IT disaster recovery planning requires a lot of detailed work covering software, infrastructure, processes and people to ensure your systems are recoverable during a disruptive incident so you minimise data loss and business risk. There are two areas that Datacom has seen even the most seemingly prepared organisations overlook in IT disaster recovery planning. With the help of the right disaster recovery provider, you can get a handle on these components of IT disaster recovery planning early on to ensure your plan will work in your organisation’s time of crisis.

1. Minimise both technical and human single points of failure

The failure of one link in the chain is all your IT disaster recovery planning needs to fall apart. Your DR provider will help you evaluate your infrastructure and equipment to identify, and then fix, single points of failure in their design. These fixes will include ensuring the failure of single servers or network switches won’t make systems unavailable or result in data loss.

Don’t forget with IT disaster recovery planning that people can also be single points of failure. Ensure you have multiple people trained for and designated to the same roles on your disaster recovery team in case one of them can’t react or fulfil the duty when an actual disaster happens. It’s ideal if these individuals are geographically dispersed so a disaster in one location doesn’t affect all the related personnel.

2. Document, test, update and repeat

You are undertaking IT disaster recovery planning to minimise risk, but if your plan isn’t tested, documented and corrected, you actually increase risk. Good IT disaster recovery planning includes the full range of disaster recovery TLC, including design, backup and recovery, testing, monitoring, documenting gaps and corrective actions.

Every step of IT disaster recovery planning needs to be physically documented to ensure the DR team can execute on the plan when the time comes. And all documents need to be stored in multiple locations — universally accessible — so they can be retrieved regardless of a disaster’s location. Tests of the entire disaster recovery plan need to be conducted every six months to a year so it can be tweaked if needed. A good disaster recovery provider will be able to conduct these annual tests, in addition to smaller recovery processes related to the plan monthly or quarterly. Plans should be updateddepending on the results of these tests and when new systems and critical software are incorporated into the organisation.

Remember that your disaster recovery plan will be a living, breathing document that will change as your business needs shift. Continue to go over your plan in detail, testing and revising with the help of your disaster recovery provider.

4 Indications You Have a Good Disaster Recovery Provider

Not all disaster recovery services are created equal — and the differences between them could mean the difference between a solution that works and one that fails during a catastrophic incident. Deciphering the differences between good and bad disaster recovery services takes more than a technical eye. Organisations must also look at what the disaster recovery provider delivers from a business impact perspective as well.

1. They help you identify what’s critical to protect

You might go into a disaster recovery plan thinking you should try to protect everything. But putting everything but the kitchen sink in your disaster recovery plan can significantly increase your costs. What are the core systems you need to get up and running during a disaster? What are your most critical applications? A good disaster recovery provider will help you identify what your highly critical, moderately critical or non-critical systems are and prioritise them for your disaster recovery plan. The disaster recovery provider will then analyse your recovery point objective and recovery time objective so you know exactly what you will be able to recover in what time frame.

2. They help you determine how you want disaster recovery to play out

After you’ve decided what you need to make recoverable, you need to identify the mechanisms for doing it. Do you want automatic disaster recovery? Which scenarios will activate disaster recovery and which won’t? Who are the individuals in your organisation that decide these answers? A good disaster recovery provider will guide you along this path and ensure your plans map to your business requirements.

3. They give you configuration and recovery site options

There are different ways and places to configure your disaster recovery site. A cold site is a low-fuss, low-cost option for organisations that simply want to set up new equipment in a data centre during a disaster. Hot sites offer the most protection and the most minimal disruption during a disaster, with all critical data duplicated at another data centre site up within the time frame you need it. Look for a data centre provider that offers these options in addition to inter-data centre capability between local and regional nodes and high-availability disaster recovery available within the cloud.

4. They monitor and test your plan

How do you know your disaster recovery plan will actually work? You can reduce the risk of disaster recovery failure by choosing a provider that will monitor, test and document gaps and put corrective actions in your plan. A provider with different types of monthly and annual tests that suit your business needs can help you feel more secure about your disaster recovery investment. The best providers will have a managed service offering that addresses disaster recovery maintenance so all aspects of your plan are serviced through one local contact and contract.

Deciphering Disaster Recovery from Business Continuity Planning

Disaster recovery and business continuity planning are two related terms that organisations often confuse. And this confusion isn’t something to be taken lightly. Muddling disaster recovery and business continuity planning can, at the very least, inhibit organisations from staying competitive and, in the worst-case scenario, force organisations out of business. Talking with an IT provider experienced at developing and implementing both disaster recovery and business continuity planning will help you determine the best approach to protect your organisation from potential data loss and business risk.

Keeping technology covered with disaster recovery

Disaster recovery is basically what it sounds like — it allows your business to continue operating from a technical standpoint after a disruptive occurrence. This includes backup activities and ensuring systems can start back up offsite. Statistics from 2007 reveal that only 6 per cent of organisations that suffer catastrophic data loss remain in business. A properly followed disaster recovery plan can prevent such dire financial straits, in addition to unacceptable downtime and customer data loss.

A disaster recovery plan will involve a business impact analysis and guide the appropriate approach to systems, data and networks that are critical to the business. Some of the considerations that will go into a disaster recovery plan include how quickly businesses will need systems to be available — one hour or one day, for instance — in the event of a disaster, critical processes that must be included and backup procedures.

Focusing on people with business continuity planning

Disaster recovery covers technology and some processes essential to operations. But what about your organisation’s most valuable resources: your people and critical partners’ employees? Business continuity planning looks at the whole picture of how your enterprise will continue in the face of any minor or major change.

Consider business continuity planning the IT version of strategic succession planning, with the addition of technology and operations to people. Disaster recovery is an important component to a successful business continuity plan, but it’s only one part. You must consider all challenges in your business continuity plan, ranging from how employees communicate during a disaster or small technological hiccup to who will keep things running smoothly if a network administrator takes a sick day. This business continuity planning includes the simplest of components, such as having all employees’ contact information, to more technical aspects involving how they will be able to continue working.

Developing disaster recovery and business continuity planning approaches are, as you’ve already guessed, involved processes that will benefit from expert advice. They are considerations that you can’t afford to ignore when it comes to keeping your business going in the event of an outage or disaster.

4 Steps to Creating a Viable Business Continuity Plan

Your organisation might need a business continuity plan (BCP) to ensure that the right people and processes are in place to allow your business to continue functioning during an outage or disaster. Without a BCP in place, you might have the right disaster recovery technology to be able to access critical systems and data, but not a plan for actually activating the human element needed to see it through.

There’s no substitute for working with BCP experts to devise a comprehensive plan. But you can begin the BCP process with your team and fellow senior managers. Follow these four steps to take the guesswork out of drafting a BCP.

BCP Step 1: Identify the roles and responsibilities personnel will assume. Start your business continuity plan by listing key personnel — those with intimate knowledge of systems and processes and, perhaps most importantly, can-do attitudes. Ensure they’re assigned the most important duties and have proper communications equipment. See who amongst other employees can perform as back-ups in your business continuity plan. Create a list of all processes and functions that must be covered in your business continuity plan, including step-by-step directions to complete tasks. More than likely, you’ll notice how many processes or functions have no back-ups. Create a cross-training schedule for the BCP to mitigate this risk.

BCP Step 2: Establish the main risks and response times. Know where you’re most vulnerable in the event of a disaster or even a key employee stepping down when crafting your business continuity plan. Whether the vulnerabilities lie in technology, processes or job functions, you’ll need to set two very important objectives in your BCP: your recovery point objective (RPO) and recovery time objective (RTO). The former is the amount of data loss you can sustain during a disaster or disruption. The latter is the amount of time required to come back online and resume normal operations. Because your RTO is largely dependent upon data centre and disaster recovery providers, check your service level agreements to ensure they’re up to snuff for your business continuity plan.

BCP Step 3: Lock down all the contingency plans. It may take some time to find alternate locations fully stocked with the computers, network infrastructure, process necessities and communications devices your staff can work from during an event. But as you work towards a fully functioning contingency location for your business continuity plan, you can:

  • Know your current contingency equipment options and what you’ll need to fill the gaps
  • Set an emergency meeting location every employee can report to
  • Place emergency kits in your offices and in the emergency meeting location
  • Establish who can work from home and, in the event of an emergency, how they know to work from home until notified otherwise

BCP Step 4: Ensure the plan is communicated and that people can communicate. A BCP requires many interconnected — and disparate — parts. But they all must reside in one final version that the right people can access. Once you know employees can access the business continuity plan, ensure they also have the communications devices they’ll need as well as a phone-tree list so information is passed quickly and methodically. Be sure to include in your business continuity plan contact information for external contacts so your organisation can communicate with partners and vendors.

Remember: a BCP is never “done.” Think of it as a living document. Because business always changes, so must your business continuity plan to keep your organisation afloat.

3 Things Every Data Centre Must Offer

All data centres are not created equal. Some offer unacceptable security measures. Others might not provide adequate failover ability. Others might store your data offshore, which breaches federal, state or industry privacy regulations.

Vetting a data centre is, simply put, one of the most critical IT decisions your department will make, one that affects nearly every business unit. Ideally, your data centre will provide a secure, central locationso your organisation can access, store and use data that’s available anywhere. As you research data centre providers, whether it’s for Infrastructure-as-a-Service cloud or disaster recovery, keep these things in as top priorities.

1. The data capacity to meet your organisation’s evolving needs: Consider how your business planning may affect data needs over the foreseeable future. For example, is your business planning to launch new products or services that will require operational changes? Can you think of a handful of departments yearning for a data solution to solve their problems? What is the data transmission capacity? Your provider should ensure it can properly forecast capacity to prevent issues with rapid scaling and make it easy to scale from both technical and financial perspectives.

2. Multiple hosting and storage options and room to grow: The rack space you need on Day 1 of your data centre service contract likely won’t be the same as on Day 403. Your data centre should allow room to expand and offer a range of hosting and service options to enable business growth and agility. This might include an ability to go from co-location to a fully-managed cloud infrastructure environment if you wish, or setting up a disaster recovery or production site.

3. Support 24 hours a day, every day: How would you feel if your data was essentially abandoned after hours? A critical test of your prospective data centre provider is whether they provide hands-on maintenance monitoring around the clock to protect your infrastructure and applications.

This extends beyond simple phone support to encompass a 24×7, purpose-built data centre facility with backup power generation, uninterruptable power supplies, and redundant systems for cooling and telecommunications links. And the operations team doing the monitoring support must be actual IT professionals, not security guards. A data centre provider with local, accredited IT technicians who can maintain and troubleshoot the data centre environment will help you sleep better at night.

This list many not cover the gamut of data centre options potential providers must meet. But if they can’t pass these initial tests, they’re in the wrong class.

Datacom’s Strong Partnerships & Tech Expertise Help RSL Care Secure Patient Data

A not-for-profit provider of retirement accommodation and aged care services throughout Queensland and New South Wales, RSL Care determined in mid-2011 that it needed a highly redundant, highly available disaster recovery solution to protect the information of the 25,000 Australians it serves each year in the case of an outage, disaster or other event.

RSL Care had three separate infrastructure stacks servicing different requirements in its data centre. Through its disaster recovery solution, the organisation wanted to reduce its data centre storage footprint to save money and maximise the efficiency of its environment over five years.

RSL Care already had a relationship with Datacom, having worked with them on a number of projects which included reviewing RSL Care’s IT infrastructure and an assessment of the requirements for a new electronic file solution. After reviewing proposals from three separate IT integrators to find the right one to handle its disaster recovery project, they chose Datacom to implement the disaster recovery solution as they knew they would have direct access to the business leaders and technical leads should questions, changes or concerns crop up.

The solution

RSL Care determined that the FlexPod, a solution that brings together Cisco Unified Computing System, Cisco Nexus data centre switches and NetApp FAS storage into a single-architecture data centre platform, made the most sense for its disaster recovery goals. The FlexPod data centre configuration is built on Cisco Validated Designs, which involves solutions that have already gone through a design, test and documentation phase to ensure successful deployment by customers, reduce risk and boost efficiencies.

Part of the FlexPod’s uniqueness is that it lets IT integrators build customised data centre solutions for clients as long as they stay within the design parameters and source technologies from Cisco’s partners. FlexPod implementations can involve a variety of vendors besides NetApp and Cisco. RSL Care’s particular design for disaster recovery also involved VMware’s vCenter Site Recovery Manager 5.

The implementation

Datacom’s implementation enabled RSL Care to consolidate its three storage stacks into just one and create a second date centre to serve as the disaster recovery site. The result was a highly available, redundant disaster recovery solution. The disaster recovery project also involved:

  • Consolidation of two different hypervisors to a single-hypervisor platform
  • An infrastructure upgrade at both data centre sites from 1 gigabyte core networking to 10 gigabytes at each site
  • Redundant fabrics, switches, fibre, firewalls, power and cooling
  • Automated failover

Although the FlexPod was a fairly seamless disaster recovery solution to integrate, the Datacom team had a few challenges involving setting up an additional data centre to serve as the disaster recovery site. The disaster recovery implementation involved seven different design documents and necessitated no down time between executing the existing data centre storage environment and the new one. This meant the disaster recovery project had to temporarily integrate the existing storage setup into the new one to keep everything up and running.

The results

The FlexPod was only implemented in June of 2012 and RSL Care has already realised the disaster recovery benefits, including streamlining across the backup environment, reduction in storage footprint and increased data centre storage efficiency.

RSL Care has also managed to reduce its recovery point objective to 15 minutes in the production environments and its recovery time objective from 12 weeks to mere hours. Using NetApp storage allows RSL Care to use less space and power in its data centre.

“In our business case, we stated we wanted to reduce the time it took to do backups — we did that,” says Ian Youngson, RSL Care Technology Manager. “We have also reduced our dependencies on tape for backup. We were using it nightly, and it’s now just weekly or monthly. We also wanted to increase resilience, and we have by having a second data centre site.”

Datacom Transforms Wilmar Australia’s IT Environment with a Move to the Cloud

Talk about a short deadline.

In 2010, sugar, ethanol and energy producer Wilmar Australia needed to transform its entire IT environment and rebrand its technology image as part of its divestment from parent company CSR. The catch: the project needed to be done in under a year for 1,500 employees in 31 locations to meet the divestment schedule.

The project involved leveraging the latest Microsoft technology for a desktop upgradewhile outsourcing core IT functions and moving Wilmar Australia’s IT infrastructure to Infrastructure as a Service. Wilmar Australia also wanted a brand makeover; it saw itself as developing into a more engaging, dynamic organisation and wanted its technology to reflect that.

A sweet solution

A lot of IT solutions providers might’ve balked at that lofty task. Not Datacom.

 Wilmar Australia’s new vision aligned very closely with Datacom’s approach to doing business – delivering enduring performance through fresh thinking. These shared values, along with Datacom’s technology knowledge and commitment to building strong partnerships with its customers, made choosing an IT solutions provider easier for Wilmar Australia.

The eventual solution for Wilmar Australia’s IT needs was a Microsoft stack running on Datacom’s IaaS cloud. The list of tasks involved to complete the project included:

  • Migrating from Windows XP to Windows 7
  • Migrating from Office 2003 to Office 2010
  • Adding messaging with Exchange 2010
  • Establishing a new Active Directory 2008
  • Updating legacy unified communications systems using Cisco and Microsoft Lync 2010
  • Incorporating document management with SharePoint 2010
  • Establishing database management and consolidation using SQL 2010
  • Integrating with BizTalk
  • Establishing security via Threat Management Gateway
  • Creating full disaster recovery

Datacom also helped move 240 of Wilmar Australia’s servers and over 90 of their critical business applications to the cloud.

Better technology, better business

Wilmar Australia’s employees across locations are now able to leverage the latest technology to collaborate and communicate better. With the latest operating system and office and communications tools, each employee is able to complete his or her tasks and assignments with renewed efficiency and simplicity.

Brendan O’Kane, General Manager of Information Services at Wilmar Australia, says the Datacom migration has set the organisation on a course toward greater agility and innovation.

“We have given people new tools and better access to help them do their jobs more efficiently and take a lot of noise out of the business,” he says. “The IT strategy is now supporting the business.

“The solution is a much better fit to our DNA than where we came from. We needed to be able to support change and innovation in a fast-moving environment, and the Datacom cloud allows us to do that.”