SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Webinar: Ensuring High Availability in a Multi-Cloud Environment: Lessons from the CrowdStrike Outage

November 4, 2024 by Jason Aw Leave a Comment

Ensuring High Availability in a Multi-Cloud Environment Lessons from the CrowdStrike Outage

Webinar: Ensuring High Availability in a Multi-Cloud Environment: Lessons from the CrowdStrike Outage

Register for the On-Demand Webinar

Businesses increasingly use multiple cloud service providers to maintain flexibility and scalability; however, recent incidents like the CrowdStrike outage highlight that even top systems can encounter issues, particularly with updates and security patches. This webinar discusses best practices for implementing multi-cloud High Availability (HA) solutions to keep your mission-critical applications operational during unexpected disruptions. It also covers strategies to prevent downtime from system misconfigurations or problematic patches, ensuring you can effectively manage your cloud infrastructure.

Watch the on-demand webinar to discover how to achieve HA in your environment and minimize preventable downtime.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: High Availability and DR, multi-cloud, SQL Server, Windows

Multi-Cloud High Availability for Business-Critical Applications

February 12, 2023 by Jason Aw Leave a Comment

Multi-Cloud High Availability for Business-Critical Applications

Multi-Cloud High Availability for Business-Critical Applications

Cloud computing has become ubiquitous over the last decade with 99% of organizations using at least one public or private cloud according to the Flexera 2021 State of the Cloud Report. While AWS, Microsoft Azure, and GCP are the top three public cloud providers today, many organizations—whether by design or by accident—have adopted a multi-cloud strategy that allows them to pick and choose which cloud services are most compelling and best suited to their unique business requirements. According to the Flexera report, 92% of enterprises today have a multicloud strategy and use an average of 2.6 public and 2.7 private clouds, including Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) offerings.


What is multicloud?

A multi-cloud is simply an environment that consists of two or more public and/or private clouds (including SaaS, PaaS, and IaaS). The different services in a multi-cloud environment may interoperate (in which case it might be a hybrid cloud) or may not necessarily interoperate (essentially operating as separate cloud silos). Remember, although all hybrid clouds are multi-clouds, not all multi-clouds are hybrid clouds.


The Evolution (and Wide Adoption) of Multi-Cloud as a Strategy

A multi-cloud environment consists of a combination of any two or more public or private cloud offerings including SaaS, PaaS, and IaaS. Thus, an organization’s multi-cloud strategy may consist of an enterprise workload running on Amazon Elastic Cloud Compute (EC2) and using Microsoft 365 for email and back-office applications. Or an organization may connect a custom database hosted in a private cloud to Salesforce, a public cloud SaaS offering.

A hybrid cloud environment consists of a mix of on-premises, private cloud, and public cloud environments. According to the Flexera report, 80% of enterprises have a hybrid cloud strategy (see Figure 4). Multi-cloud environments often evolve as a result of shadow IT, in which different departments procure cloud services to meet their individual needs without necessarily consulting a centralized IT department. For example, your marketing team may have started using Salesforce long before IT deployed its first workload in AWS, while your HR and finance departments were busy adding Workday and Concur to the mix of SaaS applications that your organization now depends on. Or perhaps you have application development teams that work on different projects across the globe. One development team may prefer Azure DevOps, whereas another team may prefer the open source tools in AWS. Thus, your multi-cloud strategy may have evolved purely by accident—which isn’t necessarily a bad thing.

Your different departments are empowered to select best-of-breed solutions to meet their needs while your app dev teams can maximize productivity and reduce time-to-market working in their preferred development environments.

Multi-cloud environments also evolve by design, for example, due to regulatory requirements, mergers and acquisitions, or to implement high availability and disaster recovery strategies.

Regulatory language can be vague and confusing. For example, the Financial Conduct Authority (FCA) regulations on outsourcing IT state that firms must be able to “know how they would transition to an alternative service provider and maintain business continuity.” This statement implies that regulated firms need to at least plan for a secondary cloud environment. Given the risk-averse nature of many heavily regulated firms, these types of issues have led many to adopt a multicloud strategy.

Integrating IT systems and consolidating data centers and cloud environments after a merger or acquisition is a significant challenge. There are a number of factors that can complicate this challenge, including existing contracts with cloud providers or co-location providers. Similar to consolidating physical data centers, consolidating cloud workloads can be a major effort that doesn’t deliver significant business value, so it’s frequently delayed for higher-priority projects.

Finally, multi-cloud strategies are often adopted to support high availability and disaster recovery requirements. In evaluating major public cloud outages across AWS and Azure, most outages are typically limited to a single cloud region at a time (and are most commonly software-related).

More and more organizations (34% according to the Flexera report) have taken the added step of deploying their mission-critical workloads across multiple public cloud providers. This can be much easier for static workloads, such as websites and applications that can run independently of one another. For distributed systems, such as databases and directory services (for example, Active Directory), multi-cloud disaster recovery can be far more challenging.

Understanding Unique Challenges in Multi-Cloud Environments

Multi-cloud environments are more complex and thus more challenging to manage than single cloud deployments. Some unique challenges in multi-cloud environments include:

• End-to-end visibility: Ensuring complete visibility is a challenge in any IT environment—and it’s exponentially more complex and challenging in a highly dynamic multi-cloud environment. However, end-to-end visibility is critical to troubleshooting performance issues and bottlenecks, securing your digital footprint, and identifying single points of failure in mission-critical systems and applications.

• Security and identity management: Ransomware and other cybersecurity threats are top of mind for every IT leader today. While moving to a public cloud platform generally improves the security posture of an organization by shifting certain security responsibilities (such as data center and physical security) to the public cloud provider and providing on-demand access to services like encryption and network segmentation, it can also make it easier to make costly mistakes. For example, network misconfigurations can be common—thousands of data breaches have been caused by improperly configured AWS S3 storage buckets. Identity management is yet another challenge. For example, Azure Active Directory may be quite familiar to organizations that have previously used Active Directory in their on-premises environments, but extending identity management beyond Azure to AWS, GCP, and SaaS offerings (such as Salesforce, ServiceNow, Workday, and others) can introduce new challenges.

• Application and data portability: The ability to dynamically move applications and data across different public cloud platforms in a hybrid (multi-cloud) environment is key to many multi-cloud strategies. Although public cloud providers don’t necessarily build their services to restrict application and data portability, they don’t necessarily work together to facilitate this capability and there may be costs involved. Different cloud providers also use different technologies for their various service offerings.

• Multi-cloud silos: If organizations don’t plan and design their multi-cloud deployments for application and data portability, they can end up with siloed applications and storage, essentially re-creating a common problem in traditional on-premises data center environments, across multiple cloud platforms. At the very least, organizations need multi-cloud security and management tools that allow them to effectively manage their risks and usage/costs across different cloud platforms.

According to the Flexera 2021 State of the Cloud Report, 81% of organizations cite security as the top challenge in their cloud deployments, followed by managing cloud spend (79 %). Yet only 42% of organizations use multi-cloud cost management tools and only 38% use multi-cloud security tools.

Addressing High Availability and Disaster Recovery in Multi-Cloud Environments

While there are many challenges to multi-cloud deployments, they can provide additional availability, especially in the event of a major cloud outage, and disaster recovery. If your organization is pursuing a multicloud strategy, you should work with a trusted, cloud-agnostic partner to help you design and implement your multi-cloud deployment using a holistic approach.

For high availability and disaster recovery, you also need a cloud-agnostic technology solution that spans your multi-cloud environment, irrespective of the cloud platforms you use. You always want to avoid a scenario where your high availability solution causes more downtime in your environment than a standalone solution. Early versions of SQL Server clustering presented this conundrum—to add disk space, you had to incur downtime that wouldn’t have occurred on a standalone solution. While failing over something like a static website can be trivial, moving a multi-tier application stack is extremely complicated in terms of networking and data synchronization. You also need to avoid failing over to a less secure cloud environment that has potentially been misconfigured due to a lack of understanding the nuances between different security solutions across cloud providers.

So What Should I Do?

Finally, in every public cloud, there are a handful of services that can increase costs quickly. These services are charged according to usage-based pricing and can mean steep cost increases after only a few days. One way to mitigate this risk is to ensure you’re taking advantage of the cost monitoring services and alerts that are in each of your cloud platforms.

While multi-cloud deployments aren’t for all organizations, many will go down this path. Understanding networking and security are among your biggest technical hurdles, and managing governance and costs are key functional challenges. Testing is critical to ensure your multicloud cluster solution works. It’s important to use a high availability clustering solution that enables simple switchover and switchback and to understand how each of your applications will work with failover, and most importantly to regularly test that failover to understand any networking or data hurdles.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: High Availability, multi-cloud

Multi-Cloud Disaster Recovery

October 30, 2021 by Jason Aw Leave a Comment

Multi-Cloud Disaster Recovery

Multi-Cloud Disaster Recovery

If this topic sounds confusing, we get it. With our experts’ advice, we hope to temper your apprehensions – while also raising some important considerations for your organisation before or after going multi-cloud. Planning for disaster recovery is a common point of confusion for companies employing cloud computing, especially when it involves multiple cloud providers.

It’s taxing enough to ensure data protection and disaster recovery (DR) when all data is located on-premises. But today many companies have data on-premises as well as with multiple cloud providers, a hybrid strategy that may make good business sense but can create challenges for those tasked with data protection. Before we delve into the details, let’s define the key terms.

What is multi-cloud?

Multi-cloud is the utilization of two or more cloud providers to serve an organization’s IT services and infrastructure. A multi-cloud approach typically consists of a combination of major public cloud providers, namely Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

Organizations choose the best services from each cloud provider based on costs, technical requirements, geographic availability, and other factors. This may mean that a company uses Google Cloud for development/test, while using AWS for disaster recovery, and Microsoft Azure to process business analytics data.

Multi-cloud differs from hybrid cloud which refers to computing environments that mix on-premises infrastructure, private cloud services, and a public cloud.

Who uses multiple clouds?

  • Regulated industries – Many organizations run different business operations in different cloud environments. This may be a deliberate strategy of optimizing their IT environments based on the strengths of individual cloud providers or simply the product of a decentralized IT organization.
  • Media and Entertainment – Today’s media and entertainment landscape is increasingly composed of relatively small and specialized studios that meet the swelling content-production needs of the largest players, like Netflix and Hulu. Multi-cloud solutions enable these teams to work together on the same projects, access their preferred production tools from various public clouds, and streamline approvals without the delays associated with moving large media files from one site to another.
  • Transportation and Autonomous Driving – Connected car and autonomous driving projects generate immense amounts of data from a variety of sensors. Car manufacturers, public transportation agencies, and rideshare companies are among those motivated to take advantage of multi-cloud innovation, blending both accessibility of data across multiple clouds without the risks of significant egress charges and slow transfers, while maintaining the freedom to leverage the optimal public cloud services for each project.
  • Energy Sector – Multi-cloud adoption can help lower the significant costs associated with finding and drilling for resources. Engineers and data scientists can use machine learning (ML) analytics to identify places that merit more resources to prospect for oil, to gauge environmental risks of new projects, and to improve safety.

Multi-cloud disaster recovery pain points:

  • Not reading before you sign. Customers may face issues if they fail to read the fine print in their cloud agreements. The cloud provider is responsible for its computer infrastructure, but customers are responsible for protecting their applications and data. There are many reasons for application downtime that are not covered under cloud SLAs. Business critical workloads need high availability and disaster recovery protection software as well.
  • Developing a centralized protection policy. A centralized protection policy must be created to cover all data, no matter where it lives. Each cloud provider has its unique way of accessing, creating, moving and storing data, with different storage tiers. It can be cumbersome to create a disaster recovery plan that covers data across different clouds.
  • Reporting. This is important for ensuring protection of data in accordance with the service-level agreements that govern it. Given how quickly users can spin up cloud resources, it can be challenging to make sure you’re protecting each resource appropriately and identifying all data that needs to be incorporated into your DR plan.
  • Test your DR plan. Customers must fully screen and test their DR strategy. A multi cloud strategy compounds the need for testing. Some providers may charge customers for testing, which reinforces the need to read the fine print of the contract.
  • Resource skill sets. Finding an expert in one cloud can be challenging; with multi-cloud you will either need to find expertise in each cloud, or the rare individual with significance in multiple clouds.

Overcoming the multi-cloud DR challenge

Meeting these challenges requires companies to develop a data protection and recovery strategy that covers numerous issues. Try asking yourself the following strategic questions:

  • Have you defined the level of criticality for all applications and data? How much money will a few minutes of downtime for critical applications cost your organization in  end user productivity, customer satisfaction, and IT labor?
  • Will data protection and recovery be handled by IT or application owners and creators in a self-service model?
  • Did you plan for data optimization, using a variety of cloud- and premises-based options?
  • How do you plan to recover data? Restoring data to cloud-based virtual machines or using a backup image as the source of recovery?

Obtain the right multi-cloud DR solution

The biggest key to success in data protection and recovery in a multi-cloud scenario is ensuring you have visibility into all of your data, no matter how it’s stored. Tools from companies enable you to define which data and applications should be recovered in a disaster scenario and how to do it – whether from a backup image or by moving data to a newly created VM in the cloud, for example.

The tool should help you orchestrate the recovery scenario and, importantly, test it. If the tool is well integrated with your data backup tool, it can also allow you to use backups as a source of recovery data, even if the data is stored in different locations – like multiple clouds. Our most recent SIOS webinar discusses this same point; watch it here if you’re interested. SIOS Datakeeper lets you run your business-critical applications in a flexible, scalable cloud environment, such as Amazon Web Services (AWS), Azure, and Google Cloud Platform without sacrificing performance, high availability or disaster protection. SIOS DataKeeper is available in the AWS Marketplace and the only Azure certified high availability software for WSFC offered in the Azure Marketplace.

Reproduced from SIOS

Filed Under: Clustering Simplified Tagged With: Amazon AWS, Azure, Cloud, disaster recovery, GCP, Google Cloud Platform, multi-cloud, public cloud

Managing a Real-Time Recovery in a Major Cloud Outage

January 19, 2019 by Jason Aw Leave a Comment

Managing a Real-Time Recovery in a Major Cloud Outage

Managing A Real-Time Recovery In A Major Cloud Outage

Disasters happen, making sudden downtime reality. But there are things all customers can do to survive virtually any cloud outage.

Stuff happens. Failures—both large and small—are inevitable. What is not inevitable is extended periods of downtime.

Consider the day the South Central US Region of Microsoft’s Azure cloud experienced a catastrophic failure. A severe thunderstorm led to a cascading series of problems that eventually knocked out an entire data center. In what some have called “The Day the Azure Cloud Fell from the Sky,” most customers were offline, not just for a few seconds or minutes, but for a full day. Some were offline for over two days. While Microsoft has since addressed the many issues that led to the outage, the incident will long be remembered by IT professionals.

That’s the bad news. The good news is: There are things all Azure customers can do to survive virtually any outage. It can be from a single server failing to an entire data center going offline. In fact, Azure customers who implement robust high-availability and/or disaster recovery provisions, complete with real-time data replication and rapid, automatic failover, can expect to experience no data loss, and little or no downtime whenever catastrophe strikes.

See also: Nutanix sees enterprise cloud winning the cloud race

Managing The Cloud Outage

This article examines four options for providing disaster recovery (DR) and high availability (HA) protections in hybrid and purely Azure cloud configurations. Two of the options are specific to the Microsoft SQL Server database, which is a popular application in the Azure cloud; the other two options are application-agnostic. The four options, which can also be used in various combinations, are compared in the table and include:

  • The Azure Site Recovery (ASR) Service
  • SQL Server Failover Cluster Instances with Storage Spaces Direct
  • SQL Server Always On Availability Groups
  • Third-party Failover Clustering Software

RT Insights SIOS_Real-timeRecovery for Cloud Outage_181119

RTO and RPO 101

Before describing the four options, it is necessary to have a basic understanding of the two metrics used to assess the effectiveness of DR and HA provisions: Recovery Time Objective and Recovery Point Objective. Those familiar with RTO and RPO can skip this section.

RTO is the maximum tolerable duration of an outage. Online transaction processing applications generally have the lowest RTOs, and those that are mission-critical often have an RTO of only a few seconds. RPO is the maximum period during which data loss can be tolerated. If no data loss is tolerable, then the RPO is zero.

The RTO will normally determine the type of HA and/or DR protection needed. Low recovery times usually demand robust HA provisions that protect against routine system and software failures, while longer RTOs can be satisfied with basic DR provisions designed to protect against more widespread, but far less frequent disasters.

The data replication used with HA and DR provisions can create the need for a potential tradeoff between RTO and RPO. In a low-latency LAN environment, where replication can be synchronous, the primary and secondary datasets can be updated concurrently. This enables full recoveries to occur automatically and in real-time, making it possible to satisfy the most demanding recovery time and recovery point objectives (a few seconds and zero, respectively) with no tradeoff necessary.

Across the WAN, by contrast, forcing the primary to wait for the secondary to confirm the completion of updates for every transaction would adversely impact on performance. For this reason, data replication in the WAN is usually asynchronous. This can create a tradeoff between accommodating RTO and RPO that normally results in an increase in recovery times. Here’s why: To satisfy an RPO of zero, manual processes are needed to ensure all data (e.g. from a transaction log) has been fully replicated on the secondary before the failover can occur This extra effort lengthens the recovery time, which is why such configurations are often used for DR and not HA.

Azure Site Recovery (ASR) Service

ASR is Azure’s DR-as-a-service (DRaaS) offering. ASR replicates both physical and virtual machines to other Azure sites, potentially in other regions, or from on-premises instances to the Azure cloud. The service delivers a reasonably rapid recovery from system and site outages, and also facilitates planned maintenance by eliminating downtime during rolling software upgrades.

Like all DRaaS offerings, ASR has some limitations, the most serious being the inability to automatically detect and failover from many failures that cause application-level downtime. Of course, this is why the service is characterized as being for DR and not for HA.

With ASR, recovery times are typically 3-4 minutes depending, of course, on how quickly administrators are able to manually detect and respond to a problem. As described above, the need for asynchronous data replication across the WAN can further increase recovery times for applications with an RPO of zero.

SQL Server Failover Cluster Instance with Storage Spaces Direct

SQL Server offers two of its own HA/DR options: Failover Cluster Instances (discussed here) and Always On Availability Groups (discussed next).

FCIs afford two advantages: The feature is available in the less expensive Standard Edition of SQL Server, and it does not depend on having shared storage like traditional HA clusters do. This latter advantage is important because shared storage is simply not available in the cloud—from Microsoft or any other cloud service provider.

A popular choice for storage in the Azure cloud is Storage Spaces Direct (S2D), which supports a wide range of applications, and its support for SQL Server protects the entire instance and not just the database. A major disadvantage of S2D is that the servers must reside within a single data center, making this option suitable for some HA needs but not for DR. For multi-site HA and DR protections, the requisite data replication will need to be provided by either log shipping or a third-party failover clustering solution.

SQL Server Always On Availability Groups

While Always On Availability Groups is SQL Server’s most capable offering for both HA and DR, it requires licensing the more expensive Enterprise Edition. This option is able to deliver a recovery time of 5-10 seconds and a recovery point of seconds or less. It also offers readable secondaries for querying the databases (with appropriate licensing), and places no restrictions on the size of the database or the number of secondary instances.

An Always On Availability Groups configuration that provides both HA and DR protections consists of a three-node arrangement with two nodes in a single Availability Set or Zone, and the third in a separate Azure Region. One notable limitation is that only the database is replicated and not the entire SQL instance, which must be protected by some other means.

In addition to being cost-prohibitive for some database applications, this approach has another disadvantage. Being application-specific requires IT departments to implement other HA and DR provisions for all other applications. The use of multiple HA/DR solutions can substantially increase complexity and costs (for licensing, training, implementation and ongoing operations), making this another reason why organizations increasingly prefer using application-agnostic third-party solutions.

Third-party Failover Clustering Software

With its application-agnostic and platform-agnostic design, failover clustering software is able to provide a complete HA and DR solution for virtually all applications in private, public and hybrid cloud environments. This includes for both Windows and Linux.

Being application-agnostic eliminates the need for having different HA/DR provisions for different applications. Being platform-agnostic makes it possible to leverage various capabilities and services in the Azure cloud, including Fault Domains, Availability Sets and Zones, Region Pairs, and Azure Site Recovery.

As complete solutions, the software includes, at a minimum, real-time data replication, continuous monitoring capable of detecting failures at the application level, and configurable policies for failover and failback. Most solutions also offer a variety of value-added capabilities that enable failover clusters to deliver recovery times below 20 seconds with minimal or no data loss to satisfy virtually all HA/DR needs.

Making It Real

All four options, whether operating separately or in concert, can have roles to play in making the continuum of DR and HA protections more effective and affordable for the full spectrum of enterprise applications. This includes from those that can tolerate some data loss and extended periods of downtime, to those that require real-time recovery to achieve five-9’s of uptime with minimal or no data loss.

To survive the next cloud outage in the real-world, make certain that whatever DR and/or HA provisions you choose are configured with at least two nodes spread across two sites. Also be sure to understand how well the provisions satisfy each application’s recovery time and recovery point objectives. As well as any limitations that might exist, including the need for manual processes required to detect all possible failures, and trigger failovers in ways that ensure both application continuity and data integrity.

About Jonathan Meltzer

Jonathan Meltzer is Director, Product Management, at SIOS Technology. He has over 20 years of experience in product management and marketing for software and SaaS products that help customers manage, transform, and optimize their human capital and IT resources.

Reproduced from RTinsights

Filed Under: News and Events Tagged With: Azure, Cloud, cloud outage, cybersecurity, microsoft azure, multi-cloud, recovery, server failover, SQL, storage

Recent Posts

  • The Best Rolling Upgrade Strategy to Enhance Business Continuity
  • How to Patch Without the Pause: Near-Zero Downtime with HA
  • SIOS LifeKeeper Demo: How Rolling Updates and Failover Protect PostgreSQL in AWS
  • How to Assess if My Network Card Needs Replacement
  • SIOS Technology to Demonstrate High Availability Clustering Software for Mission-Critical Applications at Red Hat Summit, Milestone Technology Day and XPerience Day, and SQLBits 2025

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2025 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in