SIOS APAC Portal

November 28, 2021	Four Reasons To Use An Avoidance Strategy In High Availability Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes Simple Steps for Deployment in SIOS Protection Suite Cluster Environment Avoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why? Four reasons to use an avoidance strategy in High Availability Better Performance (minimizing server overload) One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target. Performance Optimization Consider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available. HA Optimization HA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover. DR Avoidance Suppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR. So, how do you deploy an avoidance strategy? Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including: Resource prioritization In the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha. SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination. Policy or affinity rules Policy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include: Constraint policies that will block an application from a specific server by default. Resource policies that will block an application from a server that does not have sufficient resources Temporal policies that define a time period that resources are allowed or disallowed from a system Custom policies that define preferred servers or possible application ownership abilities within the cluster The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies. Specific Avoidance Resources The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions. For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script. Reproduced from SIOS
November 23, 2021	Introduction To Clusters – Part 2 Introduction To Clusters – Part 2 What Types of Clusters Are There and How Do They Work? An Overview of HA Clusters, and Load Balancing Clusters Clustering helps improve reliability and performance of software and hardware systems by creating redundancy to compensate for unforeseen system failure. If a system is interrupted due to hardware or software failure or natural disaster, this can have a major impact on business and revenue, wasting crucial time and expense to get things back up and running. This is where clustering comes in. There are three main types of clustering solutions – HA clusters, load balancing clusters, and HPC clusters. Which type will best increase system availability and performance for your business? Let’s have a look at the three types of clustering solutions in more detail below. What is HA Clustering? High Availability clustering, also known as HA clustering, is effective for mission-critical business applications, ERP systems, and databases, such as SQL Server SAP, and Oracle that require near-continuous availability. HA clustering can be divided into two types, “Active-Active” configuration and active-passive configuration. Let’s take a look at the difference between these two HA clustering types. HA Clustering Type 1: Active-Active Configuration In the active-active configuration, processing is performed on all nodes in the cluster. For example, in the case of two-node clustering, both nodes are active. If one node stops, the processing will be taken over the other. However, if each node is operating at close to 100% and one node stops, it will be difficult for another node to take on the additional processing load. Therefore, capacity planning with a margin is important for HA clustering. HA Clustering Type 2: Active-Standby Configuration Let’s use our two-node example again. In the active-standby configuration, one node is configured as the active node and the other node is configured as the standby node. The active node and the standby node exchange signals called “heartbeats” to constantly check whether they are operating normally. If the standby node cannot receive the heartbeat of the active node, the standby node determines that the active node has stopped and will take over the processing of the active node. This mechanism is called “failover”. Conversely, the mechanism that recovers the stopped operating node and transfers the processing back to the recovered active node is called “failback.” In an active/standby configuration, when a failure occurs, the simple switch from the active node to the standby node makes recovery relatively easy. However, it is necessary to consider that the resources of the standby node when the operating node is operating normally will be wasted. Two Components of HA Clustering: Application and Storage For an HA cluster to be effective, two areas need to be addressed: application orchestration and storage protection. Clustering software monitors the health of the application being protected and, if it detects an issue, moves operation of that application over to the standby node. The standby node needs access to the most up-to-date versions of data – preferably identical to the data that the primary node was accessing before the incident. This can be accomplished in two ways: shared storage, share-nothing storage. In the shared storage model, both cluster nodes access the same storage – typically a SAN. In shared-nothing (aka SANless) configurations, local storage on all nodes are mirrored using replication software. Clustering software products vary widely in their ability to monitor and detect issues that may cause application failure and in their ability to orchestrate failovers reliably. Many clustering products only detect whether the application server is operational, but do not detect a wide range of software, services, network, and other issues that can cause application failure. Application Awareness is Essential Similarly, complex ERP and database applications have multiple component parts that have to be stored on the correct server or instance, started up in the right order, and brought on line in accordance with complex best practices. Choose a clustering software with specialized software called application recovery kits designed specifically to maintain best practices for the application/database-specific requirements. There are multiple ways to configure an HA Cluster: Traditional Two Node Clusters with Shared Storage Two servers are clustered with shared storage. Two Node SANless Cluster Clusters can be configured using local LAN and high speed synchronous block-level replication. Real-time replication can be used to synchronize storage on the primary server with storage on a standby server located in the same data center, in your disaster recovery site, or both. This allows you to build high availability and disaster recovery configurations flexibly; Two node or multi-nodeSIOS block level replication is highly optimized for performance. You can even use super fast, high-speed locally attached storage such as PCIe flash type storage devices on your physical servers to achieve very low cost, high performance, high availability configurations. Your data is protected on the flash device and your application too. SAN-based cluster with a third node Third Node for Disaster Protection This configuration uses a SAN-based cluster and adds a third, SANless node into a remote data center or the cloud and achieve full disaster recovery protection. In the event of a disaster, the standby remote physical server is brought into service automatically with no data loss, eliminating the hours needed for restoration from backup media. What is a Load Balancing Cluster? Load balancing clustering is a mechanism that can be used as a single system by distributing processing to multiple nodes using a load balancer to improve performance by distributing processing. While it can isolate a failed node to prevent node failure from affecting the entire system, the load balancer is a critical single point of failure risk and not a high availability option. It is only effective for applications such web server load balancing. If the load balancer itself fails, the entire system stops. What is HPC Clustering? You can also use clustering for performance instead of high availability. High-Performance Computing clusters, or HPC clusters combine the processing power of multiple (sometimes thousands of nodes) to get the CPU performance needed in CPU-intensive environments such as scientific and technological environments requiring large-scale simulations, CAE analysis, and parallel processing. Are you ready to find the right HA clustering solution for your business? Learn more about SIOS High Availability clustering here. Reproduced with permission from SIOS
November 18, 2021	Introduction To Clusters – Part 1 Introduction To Clusters – Part 1 What is clustering in the first place? Clustering technology is a technology that allows you to connect multiple servers to act as a single functional unit. Types of clustering You can cluster servers for several purposes. For example, you can combine the processing power of multiple small servers for high performance. You can also distribute processing work to multiple nodes using a load balancer for added efficiency. High availability (HA) clustering is a process of combining server nodes to protect important applications from downtime and data loss. In a traditional shared storage failover cluster, a primary node and secondary or remote node share the same storage. HA Clustering High availability (HA) clustering is a mechanism that reduces downtime by eliminating single points of failure (SPOF). In an HA cluster, important applications are run on a primary node which is connected to one or more secondary or remote nodes in a cluster. Clustering software monitors the health of the application, server, and network. In the event of a failure on the primary node, it moves application operations over to a secondary node in a process called a failover, where operation continues. High Availability Application high availability is a measure of how much time in a given year an application is available and operational. In general, HA clusters provide 99.99% (Four nines) availability or a little more than 52 minutes of downtime over the course of a given year. It is important to note that in a traditional HA cluster, all of the cluster nodes are connected to the same shared storage – typically a SAN. In this way, after a failover, the secondary node is accessing the same data as the primary node and operation can continue. SANless cluster synchronizes local storage using host-based block level replication. SANless Clusters However, many companies prefer to use a SANless cluster for several reasons. First, shared storage represents a critical single point of failure. Second, shared storage is often not an option in public cloud environments. Third, SANs can sometimes impede performance of database applications, such as SQL Server, Oracle, and SAP. Instead of shared storage, these companies use efficient, host-based, block-level replication to synchronize local storage on all cluster nodes. In the event of a failover, the secondary node is connected to local storage with an identical copy of the primary storage. This not only eliminates the SAN SPOF risk but also enables the addition of fast disk (SSD) to local on-premises storage for cost-efficient high performance. SANless clustering also enables companies to migrate on-premises HA environments to the cloud with minimal effort or disruption of ongoing business processes. Reproduced from SIOS
November 13, 2021	Clustering Software for High Availability and Disaster Recovery Clustering Software Clustering Software for High Availability and Disaster Recovery What is Clustering Software? Clustering software lets you configure your servers as a grouping or cluster so that multiple servers can work together to provide availability and prevent data loss. Each server maintains the same information – operating systems, applications, and data. If one server fails, another server immediately picks up the workload. IT professionals rely on clustering to eliminate a single point of failure and minimize the risk of downtime. In fact, 86 percent of all organizations are operating their HA applications with some kind of clustering or high availability mechanism in place.[1] Types of Cluster Management Software There are a variety of cluster management software solutions available for Windows and Linux distributions. Examples include: Windows Server Failover Clustering (WSFC), SUSE Linux Enterprise High Availability Extension, Red Hat Cluster Suite, Oracle Real Application Clusters (RAC), and SIOS software. Except for SIOS, these products support a single operating system or require expensive SAN hardware, constraining flexibility and deployment options. Moreover, Linux open-source HA extensions require a high degree of technical skill, creating complexity and reliability issues that challenge most operators. SIOS products uniquely protect any Windows- or Linux-based application operating in physical, virtual, cloud or hybrid cloud environments and in any combination of site or disaster recovery scenarios. Applications such as SAP and databases, including Oracle, SQL Server, DB2, SAP HANA and many others, benefit from SIOS software. The “out-of-the-box” simplicity, configuration flexibility, reliability, performance, and cost-effectiveness of SIOS products set them apart from other clustering software. How SIOS Clustering Software Provides High Availability for Windows and Linux Clusters If you are running a critical application in a Windows or Linux environment, you may want to consider SIOS Technology Corporation’s high availability software clustering products. In a Windows environment, SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism. While WSFC manages the software cluster, SIOS performs the replication to enable disaster protection and ensure zero data loss in cases where shared storage clusters are impossible or impractical, such as in cloud, virtual, and high-performance storage environments. In a Linux environment, the SIOS Protection Suite for Linux provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters. Whether you are in a Windows or Linux environment, SIOS products free your IT team from the complexity and challenges of computing infrastructures. They provide the intelligence, automation, flexibility, high availability, and ease-of-use IT managers need to protect business-critical applications from downtime or data loss. With over 80,000 licenses sold, SIOS is used by many of the world’s largest companies. Here is one case study that discusses how a leading Hospital Information Systems (HIS) provider deployed SIOS DataKeeper Cluster Edition to improve high availability and network bandwidth in their Windows cluster environment. How One HIS Provider Improved RPO and RTO With SIOS DataKeeper Clustering Software This leading HIS provider has more than 10,000 U.S.-based health care organizations (HCOs) using a variety of its applications, including patient care management, patient self-service, and revenue management. To support these customers, the organization had more than 20 SQL Server clusters located in two geographically dispersed data centers, as well as a few smaller servers and SQL Server log shipping for disaster recovery (DR). The organization has a large customer base and vast IT infrastructure and needed a solution that could handle heavy network traffic and eliminate network bandwidth problems when replicating data to its DR site. The organization also needed to improve its Recovery Point Objective (RPO) and Recovery Time Objective (RTO) to reduce the volume of data at risk and get IT operations back up and running faster after a disaster or system failure. RPO is the maximum amount of data loss that can be tolerated when a server fails, or a disaster happens. RTO is the maximum tolerable duration of any outage. To address these challenges, this organization chose SIOS DataKeeper Cluster Edition, which provides seamless integration with WSFC, making it possible to create SANless clusters. Once SIOS DataKeeper Cluster Edition passed the organization’s stringent POC testing, the IT team deployed the solution in the company’s production environment. The team deployed SIOS across a three-node cluster comprised of two SAN-based nodes in the organization’s primary, on-premises data center and one SANless node in its remote DR site. The SIOS solution synchronizes replication across the three nodes in the cluster and eliminates the bandwidth issues at the DR site, improving both RPO and RTO and reducing the cost of bandwidth. Today, the organization uses SIOS DataKeeper Cluster Edition to protect their SQL Server environment across more than 18 cluster nodes. See the full case study to learn more. How SIOS Clustering Software Works SIOS software is an essential part of your cluster solution, protecting your choice of Windows or Linux environments in any configuration (or combination) of physical, virtual and cloud (public, private, and hybrid) environments without sacrificing performance or availability. If you need fast, efficient, replication to transfer data across low-bandwidth local or wide area networks, SIOS DataKeeper protects business-critical Windows environments, including Microsoft SQL Server, Oracle, SharePoint, Lync, Dynamics, and Hyper-V from downtime and data loss in a physical, virtual, or cloud environment. SIOS Protection Suite for Linux supports all major Linux distributions, including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, and Oracle Linux and accommodates a wide range of storage architectures. To see how SIOS clustering software works to protect Windows and Linux environments, request a demo or get a free trial. Learn more about: SAP clustering SQL Server clustering Oracle clustering Linux clustering Check out recent blog posts about our clustering products. References https://searchdomino.techtarget.com/definition/application-clustering https://www.itprotoday.com/cloud-computing/clustering-software https://searchwindowsserver.techtarget.com/definition/Windows-Server-failover-clustering [1] SIOS in partnership with ActualTech Research, (2018) The State of Application High Availability Survey Report Reproduced from SIOS
November 8, 2021	Disaster Recovery Fundamentals Disaster Recovery Fundamentals Disaster recovery overview Disaster recovery refers to the ability to quickly restore/repair a system and minimize damage in the event of a sitewide or even regional failure. Disaster recovery is a crucial part of business continuity management and having a robust disaster recovery protocol in place will help prevent unnecessary data loss and expense associated with system downtime. What constitutes the ‘disaster’ part of disaster recovery? This could refer to a natural disaster such as earthquakes, floods, etc. but also a wide range of events such as “fire,” “terrorism,” “unauthorized intrusion,” “large-scale hacking,” and “long-term large-scale power outages.” Anything that has the potential to cause catastrophic damage to an IT system if it were to fail. The Real Impact of System Failure In addition to potential physical damages and data loss associated with a system failure, the lack of a disaster recovery plan can cause unrecoverable revenue loss for businesses. For every minute of system downtime, this means lost sales and opportunities, potential negative customer experience, tarnished business reputation and high expense in emergency IT repair. The Importance of Disaster Recovery For a company that provides mission-critical services, building a business continuity system that can handle unexpected system downtime is essential. Having the ability to prevent failure in the first place, and to quickly recover in the event a local failure or even a sitewide or regional disaster occurs will help to protect data, maintain rapport with customers, and save time and potentially devastating financial loss. It’s important to recognize that catastrophic system failure is something that will happen, not something that may happen, so putting a proper disaster recovery plan in place will protect your business. Disaster Recovery Challenges While a disaster recovery protocol is essential, it is not without its challenges to set up and implement. Here are some common barriers to proper disaster recovery implementation: Challenge 1: Geographic separation. The essence of disaster protection is keeping systems and data in a location that is geographically separated from the primary data center or cloud instance so that, in the event of a disaster or cloud outage, the secondary systems can be brought online and operation can continue. Challenge 2: Network bandwidth Requirements Replicating data to an offsite location for disaster recovery can mean added network bandwidth requirements and latency issues. Challenge 3: Data volume continues to increase The storage capacity requirements on the disaster recovery site will increase over time. A proper disaster recovery plan needs to establish “protection priority” to clarify which data should be protected and optimize available storage resources. Challenge 4: Recovery procedure at the time of recovery If a system goes down due to a disaster, service recovery is required. Often, companies find their data is scattered in multiple locations and there aren’t standardized procedures for and recovery, resulting in immense loss of time and expenses. Developing a clear, standardized restoration procedure will eliminate this headache and allow for quick action when it matters most. Data backup vs Availability Protection Traditionally, data backup – essentially a process of making a copy of data and applications and moving it to an offsite location — has been performed for the purpose of protecting data in case of IT equipment failure/failure and for recordkeeping/archiving in compliance with regulatory requirements such as the HIPAA (Healthcare Information Portability Accountability Act). To recover operation, any servers, storage, and other hardware, as well as networking affected by the incident need to be replaced or repaired. Servers have to be configured and applications have to be restored, brought back online and connected to recovered data. These steps can months. Without an availability protection process in place, recovery operations with backup alone can be a time-consuming and expensive process. Availability processes keep fully operational systems ready to take over in the event of a disaster, enabling resumption of service in minutes. Here are some other common reasons an effective disaster recovery plan is important: Disaster recovery indicators The main metrics for disaster recovery are “RPO” and “RTO”. RPO (Recovery Point Objective) RPO indicates the point from the time of disaster occurrence to what time in the past the data recovery is guaranteed. If “RPO = < 5 minutes When aiming for “RPO = 0 (zero data loss)”, an availability protection mechanism such as failover clustering is required. RTO (Recovery Time Objective) RTO is an index that shows how much time your business can allow to pass from initial downtime to restoration of operation. “RTO = 1 month or more”, you may be able to handle data recovery by only doing remote backup and securing a substitute device. But if your “RTO = within a minutes”, failover clustering is required. Selecting a Disaster Recovery Method When determining the right disaster recovery method for your business, consider these important factors: Criticality of business processes and tolerance for impact The data type and capacity that you want to protect Recovery requirements – your RPO and RTO Budget Focus on Business Impact While IT departments take the technical lead in developing disaster recovery measures for IT systems, business owners must consider the impact and extent of system outages to the business impact of each system stop” to ensure the least harmful impact to the business. Protected data type (data integrity) It is important to classify the type and importance of protected data. For data that does not require very precise consistency (such as file servers), a simple primary storage backup may be sufficient. On the other hand, ERP systems and databases such as SQL Server, Oracle, and SAP have multiple services and parts that need to be located on specific servers, started up in specific orders, and managed according to a variety of application-specific best practices. They typically require high availability protection and an application-aware clustering solution to orchestrate failover. —————————————————————————————————————— Key Disaster Recovery Terms Remote backup – essentially keeping a copy of applications and data in a geographically separated remote location. Synchronous Storage Mirroring Keeping a local and remote copy of storage synchronized for DR protection. In this method, data is written to local storage and immediately replicated to remote storage. The local storage is not “committed” until the process of writing data to the remote location has been completed. This process keeps both locations identical, eliminating discrepancies that may result if data-in-transit at the time of an event fails to write on the remote location. Data integrity is guaranteed between the primary and backup sites. Asynchronous Storage Mirroring. This method writes data to the local storage then replicates it to the remote location. It enables greater network utilization efficiency and reduced bandwidth contention when geographic separation causes latency. “Cold standby” and “hot standby” Cold standby A process of keeping a copy of data or secondary system offline in case of disaster. If the primary system goes down, the systems and software have to be manually started up – in some cases configured – and data has to be restored before operation can continue. Hot standby This is a process of keeping secondary systems operational and switching over to them in the event of downtime on the primary system. Disaster Recovery Method Cost Comparisons The smaller the RPO and RTO, the shorter the downtime, but the cost will increase accordingly. Considering the cost and asset value of each type of data, it is necessary to find the optimum method for what level of protection is required. A balance between in-house implementation and outsourcing of services will impact costs. To learn more about high availability and disaster recovery solutions at SIOS, click here. Reproduced from SIOS

Results 256-260 of 959
< Page 52 of 192 >

Join Our Mailing List

First Name Last Name Email Address
Search