Video: The SIOS Advantage
Reproduced with permission from SIOS
SIOS SANless clusters High-availability Machine Learning monitoring
Epicure, Canada’s leading direct sales company, sells healthy, easy-to-prepare food products through a network of over 16,000 consultants. The company relies on two websites for its critical business operations. Their public website provides company and product information, recipes, blogs, and enrollment information to its customers and to people interested in becoming a consultant. Their internal website provides consultants with important information about products and enables them to place all of their orders. “Our websites are vital to our business,” said Russell Born, Senior Network Infrastructure Administrator at Epicure.
Both of Epicure’s websites run on a single server using two instances of SQL Server Standard Edition—one for each website. As the company expanded its products and services, the Epicure IT department needed to update and to ensure both of its business-critical websites would continue to operate in the event of failures or disasters. They decided to move both websites from a third-party hosted facility to its on-premises data center and to use Amazon Web Services EC2 cloud for disaster recovery. “By bringing the sites in-house, we could ensure that our websites would deliver excellent user experiences for both our customers and consultants as our business continues to grow,” said Born.
As part of this website update process, Epicure IT staff wanted an efficient, cost- effective way to provide high availability and disaster protection for both websites while continuing to run them on two instances of SQL Server Standard Edition.
“We didn’t want the added expense of moving to SQL Server Enterprise Edition if we could provide HA and DR with the more cost- effective Standard Edition,” Born said.
Using SIOS DataKeeper Cluster Edition software, Epicure IT staff created a two-node SANLess cluster in an active-passive failover configuration that enables each SQL instance to failover independently. One cluster node is in the Epicure on-premises data center and the second node is in an instance of the AWS EC2 cloud. Epicure IT staff created the SIOS SANLess clusters and configured them using the software’s intuitive graphical user interface.
The SIOS software provided Epicure with an easy, cost-efficient way to provide HA and DR protection for its business-critical SQL Server applications without the cost and complexity of building out a remote DR site or purchasing costly SAN storage or SQL Server Enterprise Edition licenses. “The SIOS software has allowed us to create a hybrid solution that provides the cost savings of running on-premises and the reliability and flexibility of running in the cloud,” said Born. “Because we know that if there is a website outage, it will failover automatically, our IT team can now focus their attention on other priorities to strengthen our business.”
SIOS software provides high availability in AWS environments, enabling leading pre-owned vehicle company to move all IT system to the cloud.
Gulliver International is a leading pre-owned car company based in Tokyo with 420 locations throughout Japan. Over the next four years, the company plans to expand into a global business with 1600 stores worldwide. To ensure its IT infrastructure can accommodate this rapid growth, the company is migrating all of its internal systems to AWS and promoting a company-wide “cloud-first” policy for all new applications.
“Moving our systems to the cloud will give us flexibility and scalability we need to grow quickly and cost-efficiently, while continuously providing excellent service to our customers,” said Manabu Tsukishima, IT Manager, Gulliver International.
To ensure the success of their cloud-first initiative, Gulliver needed to protect their business critical applications from downtime in a cloud environment, where traditional failover clusters are not possible.
“We would not consider moving our applications to the cloud without an efficient, easy-to-implement high availability solution,” said Tsukishima. Gulliver chose to use SIOS DataKeeper software, which is sold in Japan by SIOS Technology, Inc.
SIOS DataKeeper software enables Gulliver to use Windows Server Failover Clustering (WSFC) to build a failover cluster in a cloud environment, where traditional shared- storage clusters are not possible.
SIOS software uses efficient, real time replication to synchronize storage between servers operating as a WSFC cluster in an AWS environment.
Using SIOS software, Gulliver can configure two servers operating as a cluster across separate Amazon Availability Zones.
Just as in a traditional physical environment, if there is a failure on the primary server in the AWS cloud within one Availability Zone, WSFC moves the application to the second server located in another Amazon Availability Zone, providing full disaster tolerance and recovery in the cloud.
“We are extremely pleased with the value that SIOS DataKeeper software brings to our company’s cloud-first initiative,” said Tsukishima. With SIOS DataKeeper software, Gulliver can move to the cloud without adding complexity or disruption to existing operations.
“By enabling us to use a clustering configuration in the cloud in the same way we would in a physical environment, SIOS DataKeeper software made it possible for us to migrate to AWS without sacrificing application protection or changing the configuration of our existing system at all.”
About 30 percent of Gulliver’s existing on-premises systems have been migrated to AWS without any changes to the company’s system administration or added complexity.
As Gulliver continues to execute its expansion plan, it will soon need to protect even larger volumes of data and a wider range of applications. To meet thi s need, it will continue to use SIOS DataKeeper software as it migrates systems to the cloud. As a Standard Consulting Partner of APN (AWS Partner Network), SIOS is committed to continuing to provide high availability systems that operate on AWS.”
During a recent meeting, a customer asked a question about High Availability (HA) and the need for quorum/witness feasibility. Their question was, “What is the best way to deploy quorum/witness?” The answer to their question is simple, there is no single best way to deploy quorum. To understand why, let’s start by defining three key things: what is a witness resource, a quorum resource and a split-brain scenario.
In a normal cluster environment, the protected application is running on the primary node in the cluster. In the event of an application failure of that primary node, the clustering software moves the application operation to a secondary or remote node, which assumes the role of primary. At any given time, there is only one primary node.
Split brain is a condition that occurs when members of a cluster are unable to communicate with each other, but are in a running and operable state, and subsequently take ownership of common resources simultaneously. In effect, you have two bus drivers fighting for the steering wheel. Split-brain, due to its destructive nature, can cause data loss or data corruption and is best avoided through use of fencing, quorum, witness, or a quorum/witness functionality for cluster arbitration.
In most cluster managers, quorum is maintained when:
In most cluster managers, quorum is lost when:
A witness resource is a server, network endpoint, or a device that is used to achieve and maintain quorum when a cluster has an even number of members. A cluster with an odd number of members, using cluster majority, does not need to use a witness resource as all members of the cluster server to arbitrate majority membership.
A quorum resource is a resource (device, system, block storage, file storage, file share, etc) that serves as a means for arbitration of the cluster state and membership. In some cluster managers, quorum is a resource within the cluster that aids or is required for any cluster state and cluster membership decisions. In other cluster managers, quorum functions as a tie-breaker to avoid split-brain.
Given the critical nature of quorum it is essential that HA architectures deploy quorum/witness resources properly, and fortunately (or unfortunately) there is no single, best way to deploy quorum. There are several factors that may shape the way in which your witness and quorum resources behave. These factors include:
Deploying in an on-premises datacenter where additional storage devices, such as fiber channel storage, power control devices or connections, or traditional stonith devices are present will provide customers with additional options for quorum and witness functionality that may not reside in the cloud. Likewise, cloud and hybrid environments present differences in what can be deployed and what use cases quorum is being deployed to prevent. Additionally, latency requirements and differences may limit what types of devices and resources are available for a quorum/witness configuration.
Recovery objectives are also important to consider when designing and architecting your quorum and witness resources. In an example two node cluster (node A and node B), when node A experiences a loss of connectivity to node B, what is the highest priority for recovery. If the witness/quorum resources are in the same network with node A, this could result in node A remaining online, but severed from clients, while node B is unable to assess quorum and takeover. Likewise, if the quorum device lived only in the region, data-center or network with node B, a loss could result in a failover of resources to a defunct network or center or away from a functional and operation primary node.
The redundancy of the data center or region is also an important factor in your HA topology with quorum/witness. If your data center has only two levels of redundancy, you must understand the tradeoff between placement of the quorum/witness in the same data center as the primary or standby cluster node. If the data center has more than two redundant tiers, such as a third availability zone or access to a second region, this option would provide a higher level of redundancy for the cluster.
Understanding your true disaster recovery requirements is also a major factor in your design. If your cluster manager software requires access to the quorum/witness in order to recover from a total data center outage (or region failure) then you’ll need to understand this impact on your design. Many high availability software packages have tools or methods for this scenario, but if your software does not, your design and placement of quorum/witness may need to accommodate this reality.
An additional quorum/witness server is typically not required when the cluster contains an odd number of nodes. However, if using only two nodes in a cluster or deploying a DR node that is not always available may change your architecture. As VP of Customer Experience I have worked with customers who have deployed three node architectures, but for cost savings they automate periodic shutdown of the third server.
The final factor to mention on quorum/witness is the cluster manager and operating system. Not all HA software and cluster managers are equal when it comes to deployment of quorum/witness or arbitration of quorum status. Some clustering software requires shared disks for arbitration, others are more flexible allowing shares (NFS, SMB, EFS, Azure Files, and S3). Being aware of what your cluster manager requires, and the modes that it supports with regards to quorum (simple majority, witness, file share, etc) will impact not only what you deploy, but how you deploy.
The single best way to deploy a quorum/witness server is to understand your vendor’s definition of quorum/witness and their available options, know your requirements, factor in the limitations or opportunities presented by your data center (or cloud environment) and architect the solution that provides your critical systems the highest level of protection against split-brains, false failovers, and downtime.
-Cassius Rhue, VP, Customer Experience
Reproduced from SIOS