March 4, 2025 |
Nodes and Clusters: The Building Blocks of High AvailabilityNodes and Clusters: The Building Blocks of High AvailabilityI wanted to spend some time reviewing the terms “nodes” and “clusters.” For the purposes of this blog, I will explain how SIOS uses these terms and others and what they specifically mean. These might be considered standard terms in the world of distributed computing, but if you are new to the field, you may wonder exactly what they mean. What Are Nodes in Distributed Computing?When I started with SIOS, I noticed that the terms node and cluster were common, everyday words that you would hear many times daily. I kept asking myself, ‘Why are they using the word “node”’? It sounds from the context that they mean server, but why do they say node? To explain, a node can be a server, but it can also be a client computer or a peer; it is essentially any component used to perform computing duties and route traffic. In Amazon Web Services (AWS), a node can be a virtual machine implemented as an EC2 instance. You can install and run software on it, and it can have a network interface that can be used to communicate with it and for it to connect to other nodes. When you SSH into an AWS EC2 instance, the client computer from which you are launching your SSH session is an example of a client node, and you are connecting to an EC2 server instance node. Nodes can be a physical machine on-premises or a virtual machine (VM). Understanding Clusters: How Nodes Work TogetherLet’s move on to the term “cluster”. This word might make one think of things that are stuck together. In the distributed computing world, this means nodes that are linked together to form a combined resource that might handle a bigger task than a single node can handle. At SIOS, we have special cluster protection software on each node that monitors the volumes and can launch failover operations when problems are detected or respond to resources being intentionally taken in and out of service by a user. You might link nodes together in a cluster to perform automatic backups. You could run a database server on a separate node to isolate the computing power/disk I/O and the data from other operations. The Role of Redundancy in High-Availability ClustersClusters can also provide redundancy to allow services to remain up when one node fails. Redundancy of operation is not a new concept. The days of running any vital operation on a single server that has no redundancy are hopefully well behind us. For example, in the blade-computing world, redundancy is facilitated in a blade server configuration by running two computing modules within the same unit. The server firmware handles the failover/switchover logic. Power supplies, and rack KVM, are shared amongst the load of server hardware for cost savings. Facility operators may add more hardware to a server in an incremental fashion to handle extra load. This allows an operator to right-size their system and purchase / build it using standardized components from the rack manufacturer. This provides a more limited but similar scaling mechanism to that in the cloud world, the difference being that it is all hosted in one box. On-premises rack hardware such as this or similar can be used to construct clustered nodes. Cloud-Based Clusters vs. On-Premises ClustersCloud clusters benefit from all of the attributes of redundancy built into rack server equipment, as they are basically discrete VMs that run on shared data-center hardware owned by the cloud provider. However, they permit the customer to spread their clusters over different locations, intentionally load-slicing their computer needs into VM’s running in different physical buildings in other areas of the cloud provider’s physical data centers. This provides an enormous resiliency to single-site outages. A cluster implemented in the cloud utilizing servers in various locations can tolerate complete power loss to one location. Nodes and Clusters ExplainedSome questions that come up: Q. Is a cluster the same as a node?A. No, a node is typically one component that can perform computer duties. A cluster consists of 2 or more nodes. Q. What is a 3-node cluster?A. A 3-node cluster is a cluster of 3 nodes with communication paths between each of the respective nodes. 3 nodes, being an odd-numbered configuration, typically one of the nodes will be a so-called ‘witness’ node and may not perform other work. In the event of a partially failed network, and a node being unable to communicate with its peer, the two main server nodes may not be able to determine who should take control (this phenomenon is called ‘split-brain’). A witness node can offer information on what nodes it can see are in service, providing data to resolve the split-brain to bring up one active node and put the other node into standby mode, regaining correct control of the nodes. Q. What is 2 node cluster?A. A 2 node cluster is a cluster of 2 nodes with one or more communication paths between them. This is typically used to run services on a primary node and have the second node on standby. Q. How many nodes make a cluster?A. 2 or more nodes make a cluster. Maximizing High Availability with Nodes and ClustersIn summary, clusters are formed from nodes; a node is an independent computing module with networking capabilities. Be aware of the benefits of putting your nodes in different physical locations to guard against downtime in one area. Contact SIOS today to learn how our clustering solutions can help you optimize high availability and minimize downtime. Author: Paul Scrutton Reproduced with permission from SIOS |
February 23, 2025 |
Updating LifeKeeper for Linux: A Checklist for SuccessUpdating LifeKeeper for Linux: A Checklist for SuccessKeeping your LifeKeeper for Linux software updated is essential for maintaining high availability (HA), system security, performance, and compatibility. This blog will guide you through a structured process for performing software updates with minimal risk. Following these steps can ensure a smooth update process. 1. Check the Support MatrixBefore proceeding with an update, consult SIOS’s support matrix: docs.us.sios.com/spslinux/9.9.0/en/topic/sios-protection-for-linux-support-matrix This document provides essential compatibility information, including:
Failing to verify compatibility can result in conflicts or degraded system performance. If your setup isn’t supported, consider upgrading related components or delaying the update. 2. Create a RunbookA runbook is your detailed guide to executing the update process. It minimizes confusion and ensures every step is accounted for. Key elements should include:
Keep the runbook accessible to all team members involved in the process. 3. Take a Backup of the hierarchy:Before performing a LifeKeeper or OS upgrade, create a backup of your Lifekeeper hierarchies on all nodes. To create a backup, run the following command: /opt/LifeKeeper/bin/lkbackup –c The backup will be created in a file called: /opt/LifeKeeper/config/archive.<date-time-stamp>.tar.gz 4. Test in a QA EnvironmentAlways test updates in a QA or staging environment before deploying them in production. This step allows you to:
Document any issues that arise and adjust your runbook accordingly. 5. Execute the Update on your Production SystemsWith preparation complete, proceed with the update:
6. Validate and Monitor Post-UpdateAfter the update, perform thorough validation:
Best Practices for a Successful LifeKeeper UpdateTo ensure clarity and simplicity, we recommend implementing one update or patch at a time and testing its impact before moving on to the next. This approach helps isolate the effects of each action, making it easier to identify what works best and avoid potential complications. As part of the OS upgrade process, we recommend rerunning the LifeKeeper for Linux setup script to ensure all configurations are updated and compatible with the new environment. This helps prevent potential issues and ensures everything is functioning correctly after the upgrade. Reach out to support@us.sios.com or open a case in the Support Portal if you have any questions prior to upgrading: https://supportportal.us.sios.com/User/Login Author:Bill Darnell Senior Product Support Engineer at SIOS Technology Corp. Reproduced with permission from SIOS |
February 18, 2025 |
Webinar: Disaster Recovery Strategies for a Disaster-Prone WorldWebinar: Disaster Recovery Strategies for a Disaster-Prone WorldRegister for the On-Demand WebinarIn the face of increasing risks from natural disasters, cyberattacks, and disruptions, businesses must prioritize disaster recovery (DR) to ensure resilience and uninterrupted operations. This webinar covers essential DR strategies, including best practices for creating comprehensive plans that align with business objectives and enable swift recovery. Explore the benefits and challenges of cloud-based DR solutions, the role of automation in minimizing downtime, and methods for conducting practical risk assessments to mitigate potential threats. Watch to gain insights to strengthen your organization’s preparedness and response capabilities. Reproduce with permission from SIOS |
February 14, 2025 |
Achieving High Availability for SAP HANAAchieving High Availability for SAP HANACountless businesses rely on SAP ERP systems for their mission-critical, high availability applications. However, with the 2027 deadline for transforming these systems to the new HANA environment looming, it’s vital that these enterprises consider how they will achieve high availability under the new regime – ideally, before they’re faced with unplanned downtime. It’s vital that businesses start thinking about this change early, as achieving the top-tier “five nines” standard of high availability – 99.999% uptime – under the HAHA environment comes with many challenges. Fortunately, these can be overcome with well-designed architecture and the right technical expertise. This Database Trends & Applications article written by Ian Allton, Solutions Architect at SIOS Technology Corp published in their Big Data Quarterly looks to help enterprises make the right start in their transition to HANA by running through the three steps to achieving best practices for high availability with an SAP HANA database. Reproduced with permission from SIOS |
February 4, 2025 |
Establishing a Software-Based, High-Availability Failover Strategy for Disaster Mitigation and RecoveryEstablishing a Software-Based, High-Availability Failover Strategy for Disaster Mitigation and RecoveryCloud outages happen—don’t let them disrupt your operations. In Disaster Recovery Journal, Dave Bermingham shares how SANless clustering ensures seamless failover across multi-cloud environments, delivering flexibility, efficiency, and uninterrupted service.
Reproduced with permission from SIOS
|
- Results 1-5 of 949
- Page 1 of 190 >