Multi-Cloud Disaster Recovery
If this topic sounds confusing, we get it. With our experts’ advice, we hope to temper your apprehensions – while also raising some important considerations for your organisation before or after going multi-cloud. Planning for disaster recovery is a common point of confusion for companies employing cloud computing, especially when it involves multiple cloud providers.
It’s taxing enough to ensure data protection and disaster recovery (DR) when all data is located on-premises. But today many companies have data on-premises as well as with multiple cloud providers, a hybrid strategy that may make good business sense but can create challenges for those tasked with data protection. Before we delve into the details, let’s define the key terms.
What is multi-cloud?
Multi-cloud is the utilization of two or more cloud providers to serve an organization’s IT services and infrastructure. A multi-cloud approach typically consists of a combination of major public cloud providers, namely Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.
Organizations choose the best services from each cloud provider based on costs, technical requirements, geographic availability, and other factors. This may mean that a company uses Google Cloud for development/test, while using AWS for disaster recovery, and Microsoft Azure to process business analytics data.
Multi-cloud differs from hybrid cloud which refers to computing environments that mix on-premises infrastructure, private cloud services, and a public cloud.
Who uses multiple clouds?
- Regulated industries – Many organizations run different business operations in different cloud environments. This may be a deliberate strategy of optimizing their IT environments based on the strengths of individual cloud providers or simply the product of a decentralized IT organization.
- Media and Entertainment – Today’s media and entertainment landscape is increasingly composed of relatively small and specialized studios that meet the swelling content-production needs of the largest players, like Netflix and Hulu. Multi-cloud solutions enable these teams to work together on the same projects, access their preferred production tools from various public clouds, and streamline approvals without the delays associated with moving large media files from one site to another.
- Transportation and Autonomous Driving – Connected car and autonomous driving projects generate immense amounts of data from a variety of sensors. Car manufacturers, public transportation agencies, and rideshare companies are among those motivated to take advantage of multi-cloud innovation, blending both accessibility of data across multiple clouds without the risks of significant egress charges and slow transfers, while maintaining the freedom to leverage the optimal public cloud services for each project.
- Energy Sector – Multi-cloud adoption can help lower the significant costs associated with finding and drilling for resources. Engineers and data scientists can use machine learning (ML) analytics to identify places that merit more resources to prospect for oil, to gauge environmental risks of new projects, and to improve safety.
Multi-cloud disaster recovery pain points:
- Not reading before you sign. Customers may face issues if they fail to read the fine print in their cloud agreements. The cloud provider is responsible for its computer infrastructure, but customers are responsible for protecting their applications and data. There are many reasons for application downtime that are not covered under cloud SLAs. Business critical workloads need high availability and disaster recovery protection software as well.
- Developing a centralized protection policy. A centralized protection policy must be created to cover all data, no matter where it lives. Each cloud provider has its unique way of accessing, creating, moving and storing data, with different storage tiers. It can be cumbersome to create a disaster recovery plan that covers data across different clouds.
- Reporting. This is important for ensuring protection of data in accordance with the service-level agreements that govern it. Given how quickly users can spin up cloud resources, it can be challenging to make sure you’re protecting each resource appropriately and identifying all data that needs to be incorporated into your DR plan.
- Test your DR plan. Customers must fully screen and test their DR strategy. A multi cloud strategy compounds the need for testing. Some providers may charge customers for testing, which reinforces the need to read the fine print of the contract.
- Resource skill sets. Finding an expert in one cloud can be challenging; with multi-cloud you will either need to find expertise in each cloud, or the rare individual with significance in multiple clouds.
Overcoming the multi-cloud DR challenge
Meeting these challenges requires companies to develop a data protection and recovery strategy that covers numerous issues. Try asking yourself the following strategic questions:
- Have you defined the level of criticality for all applications and data? How much money will a few minutes of downtime for critical applications cost your organization in end user productivity, customer satisfaction, and IT labor?
- Will data protection and recovery be handled by IT or application owners and creators in a self-service model?
- Did you plan for data optimization, using a variety of cloud- and premises-based options?
- How do you plan to recover data? Restoring data to cloud-based virtual machines or using a backup image as the source of recovery?
Obtain the right multi-cloud DR solution
The biggest key to success in data protection and recovery in a multi-cloud scenario is ensuring you have visibility into all of your data, no matter how it’s stored. Tools from companies enable you to define which data and applications should be recovered in a disaster scenario and how to do it – whether from a backup image or by moving data to a newly created VM in the cloud, for example.
The tool should help you orchestrate the recovery scenario and, importantly, test it. If the tool is well integrated with your data backup tool, it can also allow you to use backups as a source of recovery data, even if the data is stored in different locations – like multiple clouds. Our most recent SIOS webinar discusses this same point; watch it here if you’re interested. SIOS Datakeeper lets you run your business-critical applications in a flexible, scalable cloud environment, such as Amazon Web Services (AWS), Azure, and Google Cloud Platform without sacrificing performance, high availability or disaster protection. SIOS DataKeeper is available in the AWS Marketplace and the only Azure certified high availability software for WSFC offered in the Azure Marketplace.
High Availability & the Cloud: The More You Know
Disaster Recovery Made Simple
Disaster Recovery Made Simple
Disaster Recovery Made Simple
Heard the term disaster recovery (DR) thrown around often? DR is a strategy and set of policies, procedures, and tools. It ensures critical IT systems, databases, and applications continue to operate and be available to users when a man-made or natural disaster happens. It typically involves moving application operation to a redundant DR environment that is geographically separated from the primary environment. While the IT team owns the disaster recovery strategy, DR is an important component of every organization’s Business Continuity Plan. The latter is a strategy and set of policies, procedures, and tools to ensure business operations continue through an interruption in service.
It may sound confusing at first. But we’ve collected some quick facts to make disaster recovery simple to understand:
Point 1. Implement an IT disaster recovery or a disaster recovery plan (DRP)
A DRP is a strategy and set of policies, procedures, and tools that ensure critical IT systems, databases, and applications continue to operate and be available to users when a disaster strikes the organization’s primary computing environment. While the IT team owns the disaster recovery strategy, DR is an important component of every organization’s Business Continuity Plan.
Point 2. Ensure Geographic Separation
An essential part of application disaster recovery is ensuring there is a redundant, geographically separated application environment available. You have either efficient, block level replication and or a clustering software that can failover operation to it in the event of a disaster. If your application is running in a cloud, your clustering environment should failover across cloud regions and availability zones for disaster recovery.
Point 3. Test, test, and test some more
In a recent Spiceworks survey, 59 percent of organizations indicated they had experienced one to three outages (that is, any interruption to normal levels of IT-related service) over the course of one year. 11 percent have experienced four to six. 7 percent have experienced seven or more. In short, a DR event is nearly inevitable. Be sure you conduct regular testing to ensure you know exactly what will happen when it does.
Point 4. Understand Your Risk
The disaster in DR does not need to be a full-fledged hurricane, tornado, flood, or earthquake that impacts your business. Disasters come in many forms, including a cyber-attack, fire, theft, or vandalism. In fact, simple human error still rates among the leading causes of IT data center downtime. In short, a disaster is any crisis that results in a down system for a long duration and/or major data loss on a large scale that impacts your IT infrastructure, data center, and your business.
Point 5. Ensure Your DRP has a Checklist
It should include critical IT systems and network prioritized by their expected time for recovery (RTO). Document the steps needed to restart, reconfigure and recover systems and networks. Employees should know where to locate the DRP and how to execute basic emergency steps in the event of an unforeseen incident.
Point 6. Substantiate DRPs through testing
DRPs should identify deficiencies and provides opportunities to fix problems before a disaster occurs. Testing can offer proof that the plan is effective and that it will enable you to meet recovery point and recovery time objectives (RPOs and RTOs). Since IT systems and technologies are constantly changing, DR testing also helps ensure a disaster recovery plan is up to date.
Choose a failover clustering technology that makes DR testing simple by facilitating fast, simple, reliable switchover of application operation to DR nodes and back.
When you look at those statistics, you know you are living on borrowed time if you don’t have a disaster recovery plan in place. The SIOS disaster recovery solution is a multi-site, geographically dispersed cluster that meets RPO and RTOs with ease. What makes SIOS different from many other DR providers is that it offers one solution that meets both high availability and disaster recovery needs. To learn more about our DR solutions, check out the insights page here.
Reproduced with permission from SIOS
Enhanced High Availability for SAP S/4HANA in Cloud Environments
RTO vs. RPO: Learning the Difference to Achieve Your Operational Goals
RTO vs. RPO: Learning the Difference to Achieve Your Operational Goals
In addition to 99.99% availability time, high availability environments also need to meet stringent recovery time and recovery point objectives, RTO and RPO, respectively. RTO and RPO are two key parameters that businesses should define before creating their business continuity and disaster recovery plans. Both metrics help to design the recovery process, and to define the recovery time limits, the frequency of backups, and the recovery procedures. Although RTO and RPO may seem alike, there are core differences you should consider. Read on to understand the difference between RTO vs. RPO.
To be clear, RTO is a measure of the time elapsed from application failure to restoration of application operation. It is a measure that dictates how much time you have to recover after disaster strikes. On the other hand, RPO is a measure of how up-to-date the data is when application availability has been restored after a downtime issue. It is often described as the maximum amount of data loss that can be tolerated when a failure happens.
Things to consider for evaluating your disaster recovery plan
First, it’s important to define the criticality of the application and its associated data to core business operations. How much does a minute of downtime or data loss for this application cost the company? Next, consider the potential set of disasters against which you would like to protect your organization. Some disasters that require data recovery and backup include:
- Data loss: This may be as simple as someone deleting a folder, or as complex as a case of ransomware or an infected database.
- Application loss: This refers to when changes to security, an update, or system configurations negatively impact services.
- System loss: This includes when hardware fails, or, virtual server crashes.
- Datacenter loss: This includes data centers that are on-premises and in public clouds
- Business location loss: In this instance, a disaster might include an electrical outage, fire, flooding, or even a chemical spill outside the building. The business facilities require recovery to an alternate location.
Reducing an organization’s RPO and RTO
It’s important to consider the RTO and RPO as they apply to different types of data. Organizations that do a file-level backup of a database, rather than investing in an offsite virtual environment, will see longer recovery times and limits to how recently updated that data will be once recovered.
Consider the possible disasters, match them with the data sets that need to be protected, and then identify the recovery objectives. These steps will then provide you the information necessary to build tactical backup solutions that meet your recovery time objective and recovery point objective.
What is RTO and RPO in SQL Server?
SQL Server allows users to set up automated log backups to be restored from a standby server. With this log shipping, users can recover a fairly recent database copy—depending on the RTO and RPO of that process. Those RTO and RPO requirements are set by users, depending on their needs, budget, and any technological network limitations.
However, SQL Server RTO and RPO are not necessarily straightforward. In many cases, the process isn’t as fast as a client may imagine. They may have an ideal RPO in mind, but slow network speeds or an incorrectly configured backup can throttle this process. In addition, restoring a log backup in this way can involve transferring large amounts of data, and this process can easily exceed the determined acceptable RTO.
Since SQL Server is typically a business-critical application, customers can easily justify HA/DR protection for it – usually in the form of a failover cluster that can failover across cloud availability zones and regions for disaster recovery. This can be accomplished easily by adding SIOS DataKeeper to a Windows Server Failover Clustering environment or by using SIOS Protection Suite in a Linux environment. Both of these solutions will deliver not only 99.99% availability but also RPO of zero and RTO of mere seconds.
Now that you know…
Ultimately, data loss prevention for business continuity is a crucial requirement for any business. Take the time to consider how you will meet your RTO and RPO goals, no matter how large or small your business is, or what internal IT operations you support. SIOS high availability clusters deliver an RPO of zero and an RTO of mere minutes.
Learn more about SIOS DataKeeper for Windows or SIOS Protection Suite for Linux
To request a free trial, let us know here.
Reproduced from SIOS