SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

White Paper: High Availability Clusters in VMware vSphere without Sacrificing Features or Flexibility

October 22, 2022 by Jason Aw Leave a Comment

White Paper High Availability Clusters in VMware vSphere without Sacrificing Features or Flexibility

High Availability Clusters in VMware vSphere without Sacrificing Features or Flexibility

Six key facts you should know about high availability protection in VMware vSphere

Many large enterprises are moving important applications from traditional physical servers to virtualized environments, such as VMware vSphere in order to take advantage of key benefits such as configuration flexibility, data and application mobility, and efficient use of IT resources. Realizing these benefits with business-critical applications, such as SQL Server or SAP can pose several challenges.

This paper explains these challenges and highlights six key facts you should know about HA protection in VMware vSphere environments that can save you money.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: clusters, High Availability

New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

September 24, 2022 by Jason Aw Leave a Comment

New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

Microsoft introduced Azure Shared Disk in Q1 of 2022. Shared Disk allows you to attach a managed disk to more than one host. Effectively this means that Azure now has the equivalent of SAN storage, enabling Highly Available clusters to use shared disk in the cloud!

A major advantage of using Azure Shared Disk with a SIOS Lifekeeper cluster hierarchy is that you will no longer be required to have either a storage quorum or witness node to avoid so called split-brain – which occurs when the communication between nodes is lost and several nodes are potentially changing data simultaneously. Fewer nodes means less cost and complexity.

SIOS has introduced an Application Recovery Kit (ARK) for our LifeKeeper for Linux product; called LifeKeeper SCSI-3 Persistent Reservations (SCSI3) Recovery Kit that allows Azure Shared Disks to be used in conjunction with SCSI-3 reservations. This ARK guarantees that a shared disk is only writable from the node that currently holds the SCSI-3 reservations on that disk.

When installing SIOS Lifekeeper, the installer will detect that it’s running in Microsoft Azure EC2 and automatically install the LifeKeeper SCSI-3 Persistent Reservations (SCSI3) Recovery Kit to enable support for Azure Shared Disk.

Resource creation within Lifekeeper is straightforward and simple (Figure 1). Once locally mounted, the Azure Shared Disk is simply added into Lifekeeper as a file-system type resource. Lifekeeper will assign it an ID (Figure 2) and manage the SCSI-3 locking automatically.

New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

Figure 1] Creation of /sapinst within Lifekeeper.

New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

Figure 2] /sapinst created and extended to both cluster nodes.

SCSI-3 reservations guarantee that Azure Shared Disk is only writable on the node that holds the reservations (Figure 3). In a scenario where cluster nodes lose communication with each other the standby server will come online, causing a potential split-brain situation. However, because of the SCSI-3 reservations only one node can access the disk at a time, which prevents an actual split-brain scenario. Only one system will hold the reservation and it will either become the new active node (in this case the other will reboot) or remain the active node. Nodes that do not hold the Azure Shared Disk reservation will simply end up with the resource in an “Standby State” state because they cannot acquire the reservation.

New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

Figure 3] Output from Lifekeeper logs when trying to mount a disk that is already reserved.

Link to Microsoft’s definition of Azure Shared Disks https://docs.microsoft.com/en-us/azure/virtual-machines/disks-shared

At present SIOS supports Locally-redundant Storage (LRS) and we’re working with Microsoft to test and support Zone-Redundant Storage (ZRS). Ideally we’d like to know when there is a ZRS failure so that we can fail-over the resource hierarchy to the most local node to the active storage.

At present SIOS is expecting the Azure Shared Disk support to arrive in its next release of Lifekeeper 9.6.2 for Linux, Q3 2022.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: clusters, High Availability

Introduction To Clusters – Part 2

November 23, 2021 by Jason Aw Leave a Comment

Introduction to Clusters - Part 2

Introduction To Clusters – Part 2

What Types of Clusters Are There and How Do They Work?

An Overview of HA Clusters, and Load Balancing Clusters

Clustering helps improve reliability and performance of software and hardware systems by creating redundancy to compensate for unforeseen system failure. If a system is interrupted due to hardware or software failure or natural disaster, this can have a major impact on business and revenue, wasting crucial time and expense to get things back up and running.

This is where clustering comes in. There are three main types of clustering solutions – HA clusters, load balancing clusters, and HPC clusters. Which type will best increase system availability and performance for your business? Let’s have a look at the three types of clustering solutions in more detail below.

What is HA Clustering?

High Availability clustering, also known as HA clustering, is effective for mission-critical business applications, ERP systems, and databases, such as SQL Server SAP, and Oracle that require near-continuous availability.

HA clustering can be divided into two types, “Active-Active” configuration and active-passive configuration.

Let’s take a look at the difference between these two HA clustering types.

HA Clustering Type 1: Active-Active Configuration

In the active-active configuration, processing is performed on all nodes in the cluster. For example, in the case of two-node clustering, both nodes are active. If one node stops, the processing will be taken over the other.

However, if each node is operating at close to 100% and one node stops, it will be difficult for another node to take on the additional processing load. Therefore, capacity planning with a margin is important for HA clustering.

 HA Clustering Type 2: Active-Standby Configuration

Let’s use our two-node example again. In the active-standby configuration, one node is configured as the active node and the other node is configured as the standby node. The active node and the standby node exchange signals called “heartbeats” to constantly check whether they are operating normally.

If the standby node cannot receive the heartbeat of the active node, the standby node determines that the active node has stopped and will take over the processing of the active node. This mechanism is called “failover”. Conversely, the mechanism that recovers the stopped operating node and transfers the processing back to the recovered active node is called “failback.”

In an active/standby configuration, when a failure occurs, the simple switch from the active node to the standby node makes recovery relatively easy. However, it is necessary to consider that the resources of the standby node when the operating node is operating normally will be wasted.

Two Components of HA Clustering: Application and Storage

For an HA cluster to be effective, two areas need to be addressed: application orchestration and storage protection. Clustering software monitors the health of the application being protected and, if it detects an issue, moves operation of that application over to the standby node. The standby node needs access to the most up-to-date versions of data – preferably identical to the data that the primary node was accessing before the incident. This can be accomplished in two ways: shared storage, share-nothing storage. In the shared storage model, both cluster nodes access the same storage – typically a SAN. In shared-nothing (aka SANless) configurations, local storage on all nodes are mirrored using replication software.

Clustering software products vary widely in their ability to monitor and detect issues that may cause application failure and in their ability to orchestrate failovers reliably. Many clustering products only detect whether the application server is operational, but do not detect a wide range of software, services, network, and other issues that can cause application failure.

Application Awareness is Essential

Similarly, complex ERP and database applications have multiple component parts that have to be stored on the correct server or instance, started up in the right order, and brought on line in accordance with complex best practices. Choose a clustering software with specialized software called application recovery kits designed specifically to maintain best practices for the application/database-specific requirements.

There are multiple ways to configure an HA Cluster:

Traditional Two Node Clusters with Shared Storage

 

Two servers are clustered with shared storage.

Two Node SANless Cluster

Clusters can be configured using local LAN and high speed synchronous block-level replication.

Real-time replication can be used to synchronize storage on the primary server with storage on a standby server located in the same data center, in your disaster recovery site, or both. This allows you to build high availability and disaster recovery configurations flexibly; Two node or multi-nodeSIOS block level replication is highly optimized for performance. You can even use super fast, high-speed locally attached storage such as PCIe flash type storage devices on your physical servers to achieve very low cost, high performance, high availability configurations.  Your data is protected on the flash device and your application too.

 

SAN-based cluster with a third node

Third Node for Disaster Protection

This configuration uses a SAN-based cluster and adds a third, SANless node into a remote data center or the cloud and achieve full disaster recovery protection.  In the event of a disaster, the standby remote physical server is brought into service automatically with no data loss, eliminating the hours needed for restoration from backup media.

What is a Load Balancing Cluster?

Load balancing clustering is a mechanism that can be used as a single system by distributing processing to multiple nodes using a load balancer to improve performance by distributing processing. While it can isolate a failed node to prevent node failure from affecting the entire system, the load balancer is a critical single point of failure risk and not a high availability option. It is only effective for applications such web server load balancing. If the load balancer itself fails, the entire system stops.

What is HPC Clustering?

You can also use clustering for performance instead of high availability. High-Performance Computing clusters, or HPC clusters combine the processing power of multiple (sometimes thousands of nodes) to get the CPU performance needed in CPU-intensive environments such as scientific and technological environments requiring large-scale simulations, CAE analysis, and parallel processing.

Are you ready to find the right HA clustering solution for your business?

Learn more about SIOS High Availability clustering here.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: clusters

Disaster Recovery Made Simple

October 22, 2021 by Jason Aw Leave a Comment

Disaster Recovery Made Simple

Disaster Recovery Made Simple

Disaster Recovery Made Simple

Heard the term disaster recovery (DR) thrown around often? DR is a strategy and set of policies, procedures, and tools. It ensures critical IT systems, databases, and applications continue to operate and be available to users when a man-made or natural disaster happens. It typically involves moving application operation to a redundant DR environment that is geographically separated from the primary environment. While the IT team owns the disaster recovery strategy, DR is an important component of every organization’s Business Continuity Plan. The latter is a strategy and set of policies, procedures, and tools to ensure business operations continue through an interruption in service.

It may sound confusing at first. But we’ve collected some quick facts to make disaster recovery simple to understand:

Point 1. Implement an IT disaster recovery or a disaster recovery plan (DRP)

A DRP is a strategy and set of policies, procedures, and tools that ensure critical IT systems, databases, and applications continue to operate and be available to users when a disaster strikes the organization’s primary computing environment. While the IT team owns the disaster recovery strategy, DR is an important component of every organization’s Business Continuity Plan.

Point 2. Ensure Geographic Separation

An essential part of application disaster recovery is ensuring there is a redundant, geographically separated application environment available. You have either efficient, block level replication and or a clustering software that can failover operation to it in the event of a disaster. If your application is running in a cloud, your clustering environment should failover across cloud regions and availability zones for disaster recovery.

Point 3. Test, test, and test some more

In a recent Spiceworks survey, 59 percent of organizations indicated they had experienced one to three outages (that is, any interruption to normal levels of IT-related service) over the course of one year. 11 percent have experienced four to six. 7 percent have experienced seven or more. In short, a DR event is nearly inevitable. Be sure you conduct regular testing to ensure you know exactly what will happen when it does.

Point 4. Understand Your Risk

The disaster in DR does not need to be a full-fledged hurricane, tornado, flood, or earthquake that impacts your business. Disasters come in many forms, including a cyber-attack, fire, theft, or vandalism. In fact, simple human error still rates among the leading causes of IT data center downtime. In short, a disaster is any crisis that results in a down system for a long duration and/or major data loss on a large scale that impacts your IT infrastructure, data center, and your business.

Point 5. Ensure Your DRP has a Checklist

It should include critical IT systems and network prioritized by their expected time for recovery (RTO). Document the steps needed to restart, reconfigure and recover systems and networks. Employees should know where to locate the DRP and how to execute basic emergency steps in the event of an unforeseen incident.

Point 6. Substantiate DRPs through testing

DRPs should identify deficiencies and provides opportunities to fix problems before a disaster occurs. Testing can offer proof that the plan is effective and that it will enable you to meet recovery point and recovery time objectives (RPOs and RTOs). Since IT systems and technologies are constantly changing, DR testing also helps ensure a disaster recovery plan is up to date.

Choose a failover clustering technology that makes DR testing simple by facilitating fast, simple, reliable switchover of application operation to DR nodes and back.

When you look at those statistics, you know you are living on borrowed time if you don’t have a disaster recovery plan in place. The SIOS disaster recovery solution is a multi-site, geographically dispersed cluster that meets RPO and  RTOs with ease. What makes SIOS different from many other DR providers is that it offers one solution that meets both high availability and disaster recovery needs. To learn more about our DR solutions, check out the insights page here.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: cluster, clusters, disaster recovery, RPO, RTO

Beginning Well is Great, But Maintaining Uptime Takes Vigilance

September 28, 2021 by Jason Aw Leave a Comment

Beginning Well is Great, But Maintaining Uptime Takes Vigilance

 

 

Beginning Well is Great, But Maintaining Uptime Takes Vigilance

Author Isabella Poretsis states, “Starting something can be easy, it is finishing it that is the highest hurdle.” It is great to have a kickoff meeting.  It is invigorating, and exciting. Managers and leaders look out at the greenfield with excitement and optimism is high.  But, this moment of kickoff, and even the Champagne popping moment of a successful deployment are but just the beginning. Maintaining uptime requires ongoing vigilance.

High availability and the elusive four nines of uptime for your critical applications and databases aren’t momentary occurrences, but rather, a constant endeavor to end the little foxes that destroy the vineyard.  Staying abreast of threats, up-to-date on the updates, and properly trained and prepared is the work from which your team “is never entitled to take a vacation.”

For those who want to stay vigilant in maintaining uptime, here are five tips:

1. Monitor the Environment 

Very little in enterprise software still follows the “set it and forget it” mindset.  Everything, since the day you uncorked the grand opening champagne to now, has been moving toward a state of decline.  If you aren’t monitoring the servers, workloads, network traffic, and hardware (virtual or physical), you may lose uptime and stability.

2. Perform Maintenance

One thing that I have always noticed in over twenty plus years of software development and services is that all software comes with updates.  Apply them.  Remember to execute sound maintenance policies, including taking and verifying backups. One tech writer suggested the only update you regret is the one you failed to make.

3. Learn Continuously

My first introduction to high availability came when I unplugged one end of the Token Ring for a server in our lab as an intern, fresh from the CE-211 lab.  The administrator was in my face in minutes.  After an earful, he gave me an education.  Ideally, you and your team want to learn without taking down your network, but you do absolutely want to keep learning.  Look into paid courses on existing technology, new releases, emerging infrastructure.  Check your vendors for courses and items related to your process, environment, software deployments and company enterprise.  Free courses for many things also exist if money is an issue.

4. Multiply the learning

In addition to continuous learning, make a plan to multiply the learning.  As the VP of Customer Experience at SIOS we have seen the tremendous difference between teams who share their learning and those who don’t.  Teams that share their learning avoid gaps in knowledge that compromise downtime.  The best way to know that you learned something is to teach it to somebody else. As you learn, share the learning with team members to reduce the risk of downtime due to error, and for that matter vacation.

5. End well . . .before the next beginning

All projects, servers, and software have an ending.  End well.  Decommission correctly.  Begin the next phase, deployment, software relationship, etc well by closing up loose ends, documenting what went well, what did not, and what to do next.  Treat your existing vendors well.  You just may need them again later.  Understand the existing systems and high availability solutions before proceeding with a new deployment.  This proper ending helps you begin again from a better starting place headed towards a stronger outcome.

Keeping the system highly available is a continuous process.  Set it and forget it is a nice catch phrase, but the reality is that uptime takes vigilance, continual monitoring, proper maintenance, and constant.

-Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: Application availability, clusters, disaster recovery, High Availability

  • 1
  • 2
  • 3
  • 4
  • Next Page »

Recent Posts

  • Video: The SIOS Advantage
  • Demo Of SIOS DataKeeper For A Three-Node Cluster In AWS
  • 2023 Predictions: Data Democratization To Drive Demand For High Availability
  • Understanding the Complexity of High Availability for Business-Critical Applications
  • Epicure Protects Business Critical SQL Server with Amazon EC2 and SIOS SANLess Clustering Software

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2023 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in