SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Availability SLAs: FT, High Availability and Disaster Recovery – Where to start

May 14, 2022 by Jason Aw Leave a Comment

Availability SLAs FT, High Availability, Disaster Recovery

Availability SLAs: FT, High Availability and Disaster Recovery – Where to start

It’s fair to say that in this modern era where many aspects of our lives are technology-driven, we live in a very instantaneous world.  For example, at a click of a button, our weekly grocery order arrives on our doorstep.  We can instantly purchase tickets for events or travel.  Or even these days, order a brand-new car without having to go anywhere near a showroom and deal with a pushy salesperson. We are spoilt in this world of convenience.

But let’s spare a thought for all the vendors and the service providers who must underpin this level of service.  They have to maintain a high level of investment to ensure that their underlying infrastructures (and specifically their IT infrastructures) are built and operated in a way where they can support this “always-on” expectation.  Applications and databases have to be always running, to meet both customer demand and maximise company productivity and revenue.  The importance of IT business continuity is as critical as it’s ever been.

Many IT availability concepts are floated about such as fault tolerance (FT), high availability (HA) and disaster recovery (DR).  But this can raise further questions.  What’s the difference between these availability concepts?  Which of them will be right for my infrastructure?  Can they be combined or interchanged? The first and foremost step for any availability initiative is to establish a clear application/database availability service level agreement (SLA).  This then defines the most suitable availability approach.

What is an SLA?

To some extent, we all know what an SLA is, but for this discussion, let’s make sure we’re all on the same wavelength. The availability SLA is a contract between a service provider and their end-user that defines the expected level of application/database uptime and accessibility a vendor is to ensure and outlines the penalties involved (usually financial) if the agreed-upon service levels are not met.  In the IT world, the SLA is forged from two measures of criticality to the business – Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).  Very simply, the RTO defines how quickly we need the application operation to be restored in the event of a failure. The RPO defines how current our data needs to be in the event of a recovery scenario. Once you can identify these metrics for your applications and databases, this will then define your SLA.  The SLA is measured as a percentage, so for example, you may come across terms such as 99.9% or 99.99% available.  These are measures of how many minutes of uptime and availability IT will guarantee for the application in a given year. In general, more protection means more cost. It’s therefore critical to estimate the cost of an hour of downtime for the application or database and use this SLA as a tool for selecting a solution that makes good business sense.

Once we have our SLA, we can make a business decision about which type of solution – FT, HA, DR, or a combination thereof — is the most suitable approach for our availability needs.

What is Fault Tolerance (FT)?

FT delivers a very impressive availability SLA at 99.999%.  In real-world terms, an FT solution will guarantee no more than 5.25 minutes of downtime in one year.  Essentially, two identical servers are run in parallel to each other, processing transactions on both servers at the same time in an active-active configuration in what’s referred to as a “lockstep” process. If the primary server fails, the secondary server continues processing, without any interruption to the application or any data loss.  The end-user will be blissfully unaware that a server failure has occurred.

This sounds fantastic!  This sounds superb!  Why would we need anything else?  But hold on…as awesome as FT sounds on paper, there are some caveats to consider.

The “lockstep” process is a strange beast.  It’s very fussy about the type of server hardware it can run on, particularly in terms of processors.  This limited hardware compatibility list forces FT solutions to sit in the higher end of the cost bracket, which could very much be in the hundreds of thousands of dollars by the time you factor in two or more FT clusters with associated support and services.

Software Error Vulnerability

FT solutions are also designed with hardware fault tolerance in mind and don’t pay much attention to any potential application errors.  Remember, FT solutions are running the same transactions and processes at the same time, so if there’s an application error on the primary server, this will get replicated on the secondary server too.

What is High Availability (HA)?

For most SLAs, FT is simply too expensive to purchase and manage for average use cases.  In most cases, HA solutions are a better option. They provide nearly the same level of protection at a fraction of the cost.  HA solutions provide a 99.99% SLA which equates to about 52 minutes of downtime in one year, by deploying in an Active-Standby manner.  The reduced SLA is introduced as there’s a small period of downtime where the Active server has to switch over to the Standby server before operations are resumed.  OK, this is not as impressive as an FT solution, but for most IT requirements, HA meets SLAs, even for supercritical applications such as CRM and ERP systems.

Equally important, High Availability solutions are more application agnostic, and can also manage the failover of servers in the event of an application failure as well as hardware or OS failures. They also allow a lot more configuration flexibility.  There is no FT-like hardware compatibility list to deal with, as on most occasions they will run on any platform where the underlying OS is supported.

How does Disaster Recovery (DR) fit into the picture?

Like FT and HA, DR can also be used to support critical business functions. However, DR can be used in conjunction with FT and HA.  Fault Tolerance and High Availability are focussed on maintaining uptime on a local level, such as within a datacentre (or cloud availability zone).  DR delivers a redundant site or datacentre to failover to in the event a disaster hits the primary datacentre.

What does it all mean?

At the end of the day, there’s no wrong or right availability approach to take.  It boils down to the criticality of the business processes you’re trying to protect and the basic economics of the solution.  In some scenarios, it’s a no-brainer.  For example, if you’re running a nuclear power plant, I’d feel more comfortable that the critical operations are being protected by an FT system. Let’s face it, you probably don’t want any interruptions in service there.  But for most IT environments, critical uptime can also be delivered with HA at a much more digestible price point.

How to choose: FT, HA and DR? 

  • First and foremost, understand your business operations in detail and identify the cost of downtime.
  • Once your SLAs are established, weigh up the costs of the availability solution of choice against the cost of any potential downtime.
  • When choosing your availability solution, look at ease of deployment and ease of use, as these will also impact the overall TCO of the availability solution.

IT systems are robust, but they can go wrong at the most inconvenient times.   FT, HA and DR are your insurance policies to protect you when delivering SLAs to customers in this instant and convenience-led world.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, Fault Tolerance, High Availability

Protect Systems from Downtime

May 1, 2022 by Jason Aw Leave a Comment

Protect Systems from Downtime

In today’s business environment, organizations rely on applications, databases, and ERP Systems such as SAP, SQL Server, Oracle and more. These applications unify and streamline your most critical business operations. When they fail, they cost you more than just money. It is critical to protect these complicated systems from downtime.

Proven High Availability & Disaster Recovery

SIOS has 20+ years of experience in high availability and disaster recovery.  SIOS knows there isn’t a one size fits all solution. Today’s data systems are a combination of on-premise, public cloud, hybrid cloud and multi cloud environments. The applications themselves can create even more complexity. But configuring open-source cluster software can be painstaking, time consuming, and prone to human error.

SIOS has solutions that provide high availability and disaster recovery for critical applications. These solutions have been developed based on our real-world experience, across different industries and use-cases. Our products include SIOS DataKeeper Cluster Edition for Windows and SIOS LifeKeeper for Linux or Windows. These powerful applications provide failover protection. The Application Recovery Kits included with LifeKeeper speeds up application configuration time by automating configuration and validates inputs.

System Protection On-Premises, in the Cloud or in Hybrid Environments

SIOS provides the protection you need for business-critical applications and reduces complexity of managing them, whether on-premises, in the cloud, or in hybrid cloud environments. Learn more about us in the video below or contact us to learn more about high availability and disaster recovery for your business critical applications.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, High Availability

How to Get the Most from Your Tech Support Call

April 13, 2022 by Jason Aw Leave a Comment

How to Get the Most from Your Tech Support Call

How to Get the Most from Your Tech Support Call

Technical support experts share their tips on how to fast-track issue resolution

SIOS provides high availability protection for our customers’ most critical applications, databases, and ERPs. When our customers call tech support, there is no time to waste. We’ve earned a reputation (and several awards) for our HA/DR expertise and support excellence.

We’ve asked our tech support team to share the following five questions that can fast-track your issue resolution.

Fast, Accurate Diagnosis

Thorough and accurate tech support is similar to diagnosing an illness. Imagine asking your doctor to treat a headache. The human body is a complex interaction of multiple systems. The source of your problem may not be obvious or even in your head. To diagnose the issue and recommend a treatment, your doctor typically begins with questions aimed at identifying the circumstances that caused your symptoms.

Failover clustering also involves multiple systems at every layer of the IT infrastructure – network, storage, OS, application, database, and server. And like your real headache, your HA issue is often caused by something unrelated to your HA clustering software. Like your doctor, a good support professional will ask a variety of questions to characterize your issue. The more information you can provide about your support issue, the faster and more effectively it can be diagnosed and resolved.

Fast-Tracking Issue Resolution

As an IT best practice, consider logging key information and system changes as an ongoing business exercise. By putting answers to the following key questions at your fingertips, this process will speed the diagnosis and fast-track issue resolution. (It may also help you prevent issues from occurring in the first place).

  1. Can you describe the error you are receiving? What is the exact symptom you are witnessing that is causing concern?
  2. When did it happen (time, time zone you are in?)
    A typical diagnostic method is to examine log files from the machine with issues. Log files can be hundreds of lines of message strings or command output. By tracking the precise time you noticed the problematic symptoms, we can significantly narrow the log file examination.
  3. Have you or are you able to upload the logs?
    Providing an explanation and description of the error along with the timeframe for which it happened goes a long way in diagnosis provided the logs can be uploaded to the support ticket.  In some IT environments uploading the logs requires using corporate-approved file sharing, while dark sites require no electronic distribution of system logs.  If logs cannot be provided externally, be sure that the full logs are captured and archived for reference and review with the support agent as the case progresses. Applications and systems, especially those under duress can produce exhaustive and extensive logs that can overwrite critical information.
  4. Which system was the primary cluster node at the time?
    Given the interconnected nature of clustering, it is important to inform your tech support representative of whether the cluster node you are calling about was functioning as the primary or secondary node at the time of the issue.
  5. What have you tried to do to remedy the issue?
    Great physicians know that their patients have likely tried a home remedy or over-the-counter medication prior to the visit. Knowing this information is helpful in diagnosis and treatment.  The same applies with great support technicians.  Sharing not only what you were trying to do at the time of the issue, but how you tried to resolve your errors can help them craft a better treatment and recovery plan, and make sure that their recommendations for recovery protect your critical data and applications.

For more than 20 years, SIOS Customer Experience team has been helping enterprise customers implement HA/DR solution for a wide range of use cases. We value our customers and encourage them to contact us whenever they have questions about their HA/DR.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, High Availability, Support

Two Truths and a Lie: Understanding the Real Truth About Availability

April 9, 2022 by Jason Aw Leave a Comment

Understanding the Real Truth About Availability

Two Truths and a Lie: Understanding the Real Truth About Availability

We played two truths and a lie at a company event years ago. The game involved putting forth two true statements and one untrue statement to see if you could fool the most people. The winner put forth ideas that all seemed believable or unbelievable, depending on your own personal history. Here is what was said:

  1. While growing up, my hometown had no stoplights.
  2. My grandparents met in the second grade and married in their teens.
  3. After graduation, I attended a prestigious out-of-state university in Georgia before transferring back home to attend an in-state university.

I grew up in a small community with no stoplights, so that one seemed possible, but I was skeptical. I’ve heard stories of people who met at an early age, and got married in their teens, so that was possible but also the one that I might want to flag. The third one also seemed true, but I wondered who would transfer from a prestigious out-of-state university back to the no stoplight hometown to attend an in-state college. For what seemed like an eternity the entire group reasoned and pondered which of the three statements was a lie.  And, it seemed as if no one could spot it. Several of us reasoned that if the hometown had no stoplights, would it really have a university as well?  A few took the line that it was unlikely he attended the prestigious out-of-state university, given his age, years with the company, and multiple degrees.  After final deliberation the verdict was in, the two truths were number one and number two.  The lie was number three.

With all the information swirling around about High Availability, you might feel like you are playing a game of “Two Truths and a Lie.”  Depending on where you look, you may find statements about availability that seem believable, but are not completely true when you dig in beneath the surface.  For example, the following widely accepted statements are not actually true:

  1. Storage availability is all that is needed for high availability

    Applications require access to data to be effective and efficient.  Your database will need to have access to storage if you are going to successfully run your enterprise.  Your other enterprise application likewise must have access to configuration files, data stores, and transaction and error log directories to be usable.  But, while reliable, readily accessible, and performant storage is essential for all enterprise systems, websites, databases, applications, and interconnects, storage available alone is not all that is needed for high availability.  There are more components that make up a sound, reliable, resilient high availability architecture than just storage.

  2. Platform availability is all that is needed for high availability

    With the continued development and growth of cloud computing, many enterprises searching for high availability are confused by the concept of platform availability.  Platform availability, sometimes referred to as system availability or infrastructure availability relates to the time that the platform (hardware, network, OS, and related components) are accessible and deliver their intended IT service.  Applications and databases absolutely need compute, memory, storage, and network resources to operate properly and efficiently.  Every service or function in your data center needs a reliable place to execute its logic, and without the underlying platform, these operations are not possible.  Because of this, many consider that platform availability is all that is needed for high availability. As VP of Customer Experience, I have helped customers and partners understand the gaps between an available platform and available applications, databases, and client connectivity.  In those conversations, we have discussed real examples of platforms showing no downtime or service issues, while simultaneously the enterprise applications running within that data center or cloud infrastructure are unavailable, unstable, or inaccessible to clients due to non-platform issues.

So What’s the Real Truth?

When our co-worker shared his three statements, we all got it wrong.  His hometown was a small community, its borders were buffered by larger towns with a stoplight, but his own town did not have one of its own. And, as it turned out, he graduated early and went to that well-known, prestigious out-of-state institute of technology in Georgia, before getting homesick and transferring to an in-state university back home.  So the lie was about his grandparents.  While they may or may not have met at an early age, they definitely did not meet in the second grade.

The truth about high availability is that storage availability and platform or infrastructure availability are not enough on their own.  In order to create the most robust, available, resilient, and reliable high availability infrastructure you must also include a commercial-grade solution to provide application-aware monitoring, alerting and recovery.  You’ll also want that solution to be knowledgeable of your storage’s high availability capabilities, have a strong awareness of the infrastructure’s nuances and gaps, and have the ability to leverage best practices across the entire architecture to help your applications, databases, and services achieve your business objectives.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, High Availability

Minimizing Downtime with High Availability

January 29, 2022 by Jason Aw Leave a Comment

Minimizing Downtime with High Availability

Minimizing Downtime with High Availability

Downtime has become more costly than ever before for modern businesses. The ITIC 2021 Hourly Cost of Downtime Survey found that in 91% of organizations, one hour of downtime in a business-critical system, database, or application costs an average of more than $300,000, and for 18% of large enterprises, the cost of an hour of downtime exceeds $5 million.

High availability (HA) is an attribute of a system, database, or application that’s designed to operate continuously and reliably for extended periods. The goal of HA is to reduce or eliminate unplanned downtime for critical applications. This is achieved by eliminating single points of failure by incorporating redundant components and other technologies in the design of a business-critical system, database, or application.

SLAs and HA Metrics

Service-level agreements (SLAs) are used by service providers to guarantee that a customer’s business-critical systems, databases, or applications are up and running when the business needs them.

IDC has created an SLA model that defines uptime requirements at five levels as follows:

  • AL4 (Continuous Availability – System Fault Tolerance): No more than 5 minutes and 15 seconds of planned and unplanned downtime per year (99.999% or “five-nines” availability)
  • AL3 (High Availability – Traditional Clustering): No more than 52 minutes and 35 seconds of planned and unplanned downtime per year (99.99% or “four-nines” availability)
  • AL2 (Recovery – Data Replication and Backup): No more than 8 hours, 45 minutes, and 56 seconds of planned and unplanned downtime per year (99.9% or “three-nines” availability)
  • AL1 (Reliability – Hot Swappable Components): No more than 87 hours, 39 minutes, and 29 seconds of planned and unplanned downtime per year (99% or “two-nines” availability)
  • AL0 (Unprotected Servers): No availability or uptime guarantee

 According to ITIC, 89% of surveyed organizations now require “four-nines” availability for their business-critical systems, databases, and applications, and 35% of those organizations further endeavor to achieve “five-nines” availability.

In addition to uptime and availability, two other important HA metrics are Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). RTO is the maximum tolerable duration of any outage and RPO is the maximum amount of data loss that can be tolerated when a failure happens. Unlike RTO and RPO metrics for disaster recovery which are typically defined in hours and days, RTO and RPO metrics for business-critical systems, databases, and applications are often only a few seconds (RTO) and zero (RPO).

HA Clustering

HA clustering typically consists of server nodes, storage, and clustering software.

Traditional Clustering

A traditional, on-premises HA cluster is a group of two or more server nodes connected to shared storage (typically, a storage area network, or SAN) that are configured with the same operating system, databases, and applications (see Figure 1).

Traditional server clustering with shared storage
Figure 1: Traditional server clustering with shared storage

One of the nodes is designated as the primary (or active) node and the other(s) are designated as secondary (or standby) nodes. If the primary node fails, clustering allows a system, database, or application to automatically fail over to one or more secondary nodes and continue operating with minimal disruption. Since the secondary node is connected to the same storage, operation continues with zero data loss.

However, the use of shared storage in the traditional clustering model creates several challenges, including:

  • The shared storage itself is a single point of failure that can potentially take all of the connected nodes in the cluster offline.
  • SAN storage can also be costly and complex to own and manage.
  • Shared storage in the cloud can add significant, unnecessary cost and complexity and some cloud providers don’t even offer a shared storage option.

SANless Clustering

SANless or “shared nothing” clusters (see Figure 2) address the challenges associated with shared storage. In these configurations, every cluster node has its own local storage. Efficient host-based, block-level replication is used to synchronize storage on the cluster nodes, keeping them identical. In the event of a failover, secondary nodes access an identical copy of the storage used by the primary node.

HA clustering with SANless or “shared-nothing” storage
Figure 2: HA clustering with SANless or “shared-nothing” storage

Clustering Software

Clustering software lets you configure your servers as a cluster so that multiple servers can work together to provide HA and prevent data loss. A variety of clustering software solutions are available for Windows, Linux distributions, and various virtual machine hypervisors. However, each of these solutions limits your flexibility and deployment options and introduces various challenges such as technical complexity and expensive licensing.

Don’t Wait for Disaster to Strike

HA is crucial for business-critical systems, databases, and applications. But with the myriad platforms available, complexity ramps up significantly. That’s why an application-aware solution makes so much sense. What you need is a trusted partner who has extensive expertise in high availability—a partner like SIOS, which has the technological know-how to ensure that your business stays up and running.

Don’t wait for an outage or disaster to find out if you have the resiliency your business needs. Schedule a personalized demo today at https://us.sios.com to see what SIOS can do for your business.

Reproduced from SIOS

Filed Under: Clustering Simplified Tagged With: Clustering, disaster recovery, High Availability

  • 1
  • 2
  • 3
  • …
  • 13
  • Next Page »

Recent Posts

  • What is “Split Brain” and How to Avoid It
  • How does Data Replication between Nodes Work?
  • How a Client Connects to the Active Node
  • Public Cloud Platforms and their Network Structure Differences
  • How Workloads Should be Distributed when Migrating to a Cloud Environment

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2022 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in