SIOS SANless clusters - Page 3 of 180 - SIOS SANless clusters High-availability Machine Learning monitoring

February 13, 2024	The Challenges of Using Amazon EBS Multi-Attach for Microsoft Failover Clusters The Challenges of Using Amazon EBS Multi-Attach for Microsoft Failover Clusters Overview of Amazon EBS and Microsoft Failover Clusters Amazon EBS Multi-Attach volumes and Microsoft Failover Clusters are powerful tools in the world of cloud computing and data management. However, integrating these two technologies can be fraught with challenges. This blog post delves into why using Amazon EBS Multi-Attach for Microsoft Failover Clusters is often not the best choice. The Single AZ Constraint for Robust Failover Clusters A key limitation of Amazon EBS volumes is their confinement to a single Availability Zone (AZ). For robust failover clusters, deploying instances across multiple AZs is a recommended best practice, something EBS volumes cannot directly support. High Availability SLA Concerns While EBS volumes offer a 99.9% availability SLA, this falls short of the 99.99% commonly expected for high availability solutions. AWS does guarantee this higher SLA when deploying instances across multiple AZs, a benefit not extended to single-AZ deployments. Cost Implications of IO2 Volumes Windows Failover Clusters with multi-attach EBS volumes necessitate the use of IO2 volumes, which are approximately nine times more expensive than GP3 volumes of similar size and performance. This cost difference is significant, especially for large-scale deployments. Complexity in AWS Cluster Configuration Building a cluster in AWS with nodes in the same AZ requires the division of the AZ into multiple subnets to support different virtual IP addresses (VIPs) in the Windows cluster. This complexity, along with the inability to share a single VIP across cluster nodes, adds to the configuration challenges. SIOS DataKeeper: A Superior Alternative SIOS DataKeeper emerges as a superior solution, allowing clusters that span subnets while providing the desired 99.99% availability SLA. Not only does it offer more flexible storage options, including the use of GPT3 disks, but it is also far more cost-effective. Clusters using SIOS DataKeeper with GPT3 disks can be around 20% of the cost of similar IO2-based clusters, with enhanced availability. Superior High Availability with SIOS The use of Amazon EBS Multi-Attach volumes in Microsoft Failover Clusters presents several significant challenges, from limited AZ deployment options and lower availability SLAs to higher costs and increased configuration complexity. SIOS DataKeeper offers a compelling alternative, balancing cost, flexibility, and reliability more effectively. For organizations seeking high availability and cost efficiency, exploring options beyond EBS Multi-Attach is a prudent strategy. Contact SIOS for more information. Reproduced with permission from SIOS
February 5, 2024	Video: Application High Availability Will Become Universal \| Predictions From SIOS Technology Video: Application High Availability Will Become Universal \| Predictions From SIOS Technology SIOS Technology is a high availability (HA) and disaster recovery (DR) solutions company providing application availability for critical mission-critical databases, applications, and services for their customers across Windows and Linux systems, and a variety of cloud platforms. Cassius Rhue, VP of Customer Experience at SIOS Technology, shares his 2024 predictions. As reliance on applications continues to rise, there will be increasing pressure on IT teams to deliver efficient high availability and disaster recovery for applications that were traditionally considered non-essential in addition to mission-critical ones. Due to this shift, we will likely see an expansion of high-availability software solutions and services to meet this expectation. With more companies expanding into the cloud and across different operating systems, more teams are also expected to cover a diverse set of operating systems, applications, and cloud platforms. Teams will be looking for applications and solutions that are consistent across these different operating systems and cloud environments to reduce complexity and improve cost efficiency. HA solutions will also need to be consistent across the operating systems and cloud environments and we will see a drive toward cloud-agnostic HA. Companies need HA and DR solutions to be simple, automated, quick, and intelligent. As more organizations are migrating to the cloud, they will need to ensure they do not lose data in the process. HA solutions will need to bridge the gap between the old systems and the more modern ones. 2024 will see an increased focus on data retention, security access controls, and permissions prompting organizations to integrate more enhanced security measures into their high availability and disaster recovery solutions, services, and strategies. As the volume of data that is being collected continues to increase, organizations will also need more information about why failures have occurred. Automation and orchestration tools will likely play a central role in streamlining root cause analysis and providing intelligent responses. SIOS Technology will continue to focus on its customers in the coming year, helping them avoid and reduce downtime, and ensuring their data and applications are available when the business needs them most. The company will continue to optimize its solution, providing additional adjacent services to benefit their customers, as well as, helping application providers and cloud providers form an effective HA strategy. Reproduced with permission from SIOS
February 2, 2024	Mitsubishi Motors Moves Critical Systems to the Cloud with LifeKeeper for High Availability Protection Mitsubishi Motors Moves Critical Systems to the Cloud with LifeKeeper for High Availability Protection When Mitsubishi Motors Corporation revamped its warehouse management system in three locations with new cloud-based systems they needed a new way to provide high availability without adding complexity or slowing performance. *“Even if a problem occurs, LifeKeeper automatically fails over from the primary server node to a secondary* system in an instant, and operation continues without any noticeable delays for the users, saving IT time and eliminating service interruption for customers,” said Hiromasa Tsuboshima, Manager, Business IT Department, Global IT Division, Mitsubishi Motors The Environment Each Mitsubishi Motors warehouse relies on a management system that handles orders and inventory for automobile parts and accessories sold by dealers, such as floor mats and roof racks. It manages the receipt of parts and supplies, shipment management for domestic and international orders from dealers, and inventory management and allocation within the warehouse location itself. The legacy systems were run on aging, on-premises server hardware that were increasingly prone to problems that wasted IT time on troubleshooting and caused frequent interruption of operations. The existing systems used the hardware manufacturer’s proprietary redundancy to reduce downtime. If a problem arose with a legacy system, IT personnel would have to manually stop the system and switch operation to redundant hardware until the problem was fixed – a process requiring two to four hours of an IT person’s time. Mitsubishi Motors must ensure that any parts or accessories ordered within a defined acceptance period are delivered to the dealerships the next day. Therefore, even short periods of downtime for these mission-critical systems could have a significant impact on the business. The warehouse management system plays a critical role in ensuring that all orders are processed in time to meet delivery schedules. For example, to ensure next day delivery of an order entered at 4:29 PM, the warehouse management system has to process and display it by 4:40 PM so that it can be put on the last truck or flight of the day. “We need to recover in less than 10 minutes,” said Iwasaki. The Challenge Hiromasa Tsuboshima, Manager of the Business IT Department, Global IT Division, Mitsubishi Motors Corporation, said, “Our existing systems at three of our six warehouses were on hardware from 2012. We needed to replace them with new systems that would eliminate the drain on IT resources and reduce negative impact on operations.” Finding a high availability solution for their new cloud-based warehouse systems was critical to the success of the project. Satoshi Iwasaki, member of the Business IT Department in the Global IT Division at Mitsubishi Motors Corporation, said, “According to company-wide policy, we need to migrate from legacy on-premises systems to the public cloud whenever we build a new system.” High Availability Software When migrating the warehouse management systems to a public cloud, Mitsubishi Motor’s consulted an outside IT consultant who recommended SIOS LifeKeeper for Linux for high availability. “In our past experience, we always used hardware solutions for high availability,” said Mr. Iwasaki. “I had a lot of in-depth questions for the SIOS representative about using software for HA, and SIOS provided accurate, complete answers, which built my trust in SIOS LifeKeeper.” Another key factor in deciding to select LifeKeeper was the optional LifeKeeper Professional Service, which provides an application-aware recovery kit (ARK) tailored to Mitsubishi’s specific warehouse system requirements. The SIOS ARKs enable LifeKeeper to monitor the entire application stack for potential downtime issues. They also orchestrate the application failover in accordance with best practices for smooth operation on the secondary node. “We were able to customize and develop LifeKeeper to meet our requirements, and SIOS was able to respond to all of our requests,” said Mr. Iwasaki. Fast, Automatic Failover “Even if a problem occurs, LifeKeeper automatically fails over from the primary server node to a secondary node in an instant, and operation continues without any noticeable delays for the users. It saves IT time and eliminates service interruption for customers,” said Mr. Tsuboshima. Mr. Tsuboshima is in charge of overseeing some of the systems in Global IT Division. Before the upgrade project, he used to receive failure alerts at all hours of the night that required his immediate attention. Today, in the event of a failure, he simply receives a notification of the failover and the systems continue to operate without intervention. The SIOS solution has saved Mr. Tsuboshima and the rest of the IT team many hours of valuable time and eliminated disruptions to service. The Results The benefits of moving the warehouse management system to the cloud while ensuring high availability with LifeKeeper were evident in response to the 2020 pandemic. ‘Having our systems in the cloud, enabled us to manage the systems remotely. “If we had stayed on the old, on-premises system, we would have faced significant added risk of coming into the office during the COVID-19 emergency to fix issues or manage the systems,” said Mr. Iwasaki. Although Mitsubishi Motors continues to shift to the public cloud, many of its systems still use mainframes. “As we consider moving our mission-critical systems, away from these host systems and into the cloud, we will look to LifeKeeper for high availability protection,” said Mr. Iwasaki. We will be recommending it to the company in the future.” Learn more about SIOS LifeKeeper for Linux Learn more about SIOS Protection Suite including SIOS LifeKeeper, SIOS DataKeeper, and SIOS application recovery kits. Reproduced with permission from SIOS
January 30, 2024	How to Set Up a DataKeeper Cluster Edition (DKCE Cluster) How to Set Up a DataKeeper Cluster Edition (DKCE Cluster) What is a DKCE cluster? DKCE is an acronym for DataKeeper Cluster Edition. DKCE is a SIOS software that combines the use of DataKeeper with features of Windows Failover Clustering to provide high availability through migration-based data replication. The steps to create a DKCE cluster For this example, I will set up a three-node cluster with the third node maintaining node majority. Step 1: You must have DataKeeper installed on 2/3 of your systems to set up a DKCE cluster. Click on the following link to follow our quick start guide to complete this install: https://docs.us.sios.com/dkce/8.10.0/en/topic/datakeeper-cluster-edition-quick-start-guide Step 2: Add the servers that you plan on managing in the Server Manager. This will need to be done on all servers you plan on adding to the cluster. On your server navigate to Server Manager Click “Add other servers to manage” I added the servers by their name here. To do so this way you will need to verify your system name and IP entries in the host file, located here: C:\Windows\System32\drivers\etc\hosts After all servers have been added, you can verify by navigating to “All servers” in Server Manager Step 3: You may notice a winRM error, to bypass it run this command in PS as an administrator. Run this command to add servers in your cluster as trusted hosts. This command will need to be run on every system in your cluster. Set-Item WSMan:\localhost\Client\TrustedHosts -Value ‘<name of server 1>,<name of server 2>’ Step 4: Install Failover Clustering Follow these steps to install failover clustering. Step 5: Navigate to the Failover Cluster Manager Step 6: Click “Create Cluster” Step 7: Next, add the servers that should be in the cluster and click “Add” after each entry. Step 8: The list should be similar to the one in the following image Step 9: Choose “Run all tests” for the validation test, and click Next. Step 10: Once the tests have been completed, click “Finish” Step 11: Name your cluster, I have named this one “Cluster1”, click “Next” Step 12: Verify “Add all eligible storage to the cluster” is checked, click “Next” Step 13: Once step 12 is completed, click “Finish” Step 14: In Failover Cluster Manager, the cluster will initially be offline. I will be assigning an unused IP to bring it online. In the “Cluster Core Resources” right-click on the IP address resource and select Properties. In the properties panel, my subnet mask is /28, therefore I will choose an available IP within the range, 12.0.0.14. Click “Apply” In the “Cluster Core Resources” right-click on the cluster and select “Bring Online” The resources should now be online Step 15: Navigate to DataKeeper Step 16: Right-click on Job and click “Create Job” to begin creating our first mirror Give the job a name I am naming my job “job1”, and click “Create Job” Choose the source and volume to replicate data from. I have chosen Box1 as my source, and volume D, click “Next” Next, choose a server and a volume to be the target. I have chosen Box2, and volume D. A prompt will appear to ask you to auto-register the volume you have created as a WSFC volume, select “Yes” to make this volume Highly Available. In DataKeeper you can now see that the volume is currently mirroring. Step 17: In Failover Cluster Manager, navigate to Storage, then Disks. You will see the volume that you have auto-registered is WSFC. Step 18: Let’s verify the Owners that should be checked. Right-click on the volume, and click “Properties” Since I need a third to be a witness and maintain node majority, Box3 will need to be unchecked / remain unchecked. Step 19: Now we can test a migration through Failover Cluster Manager. Navigate to File Explorer and create a new text file in the volume that is currently being mirrored. Do this on your Source. Navigate to Failover Cluster Manager, click “DataKeeper Volume D” and select “Move Available Storage” from the Actions pane. Right-click “Best Possible Node”. This should automatically migrate to your target. In Failover Cluster Manager verify the owner of “DataKeeper Volume D” is now the target node Navigate to DataKeeper to verify that your target is now the Source, and vice versa. Successful DKCE Cluster Setup You have completed the setup of a DKCE cluster. SIOS provides resources and training for all our products. Reproduced with permission from SIOS
January 24, 2024	Ensuring Access To Critical Educational Applications Ensuring Access To Critical Educational Applications Education and information technology (IT) are increasingly inextricable. Whether the IT in question is an application supporting a classroom whiteboard, the database supporting a university registration system, the learning management systems (LMS), or the building maintenance system controlling student access to the labs, dorms, and dining halls — if key components of your IT infrastructure suddenly go dark, neither teachers, administrators, nor students can accomplish what they are there to accomplish. The mission of the institution is interrupted. If the interruptions are too frequent, if the experiences of students, teachers, and administrators suffer, the reputation of the institution itself can suffer as well. An IT infrastructure designed to ensure the high availability (HA) of applications crucial to the educational experience can minimize the risk of disruption and reputational loss that could occur if for any reason these systems become unresponsive. In this instance, an HA infrastructure is defined as one capable of ensuring the availability of key applications no less than 99.99% of the time. Put another way, that means that your critical applications won’t be unexpectedly offline for more than four minutes per month. How do you achieve HA? That question is readily answered, but it is not the only question you need to ask. Just as important is this: Which applications are so critical that they warrant an HA configuration? At its heart, an IT infrastructure configured for HA has one or more sets of secondary servers and storage subsystems that are housed in a geographically distinct location (which could be a remote data center if your primary server resides on-premises or in a separate availability zone [AZ] if your servers reside in the cloud). If something causes the applications running on the primary server to stop responding, the HA software managing your application will immediately fail over the application to the secondary server, where your critical applications will start up again from the point at which the primary server stopped responding. Depending on the size and performance characteristics of the primary server you plan to replicate, that secondary server may be costly, so it’s unlikely you’re going to configure all your academic applications for HA. Once you determine which applications warrant the investment in HA, you’ll know where you need to build out an HA environment. Choices for Achieving High Availability Once you’ve chosen the applications you intend to protect, your options for achieving HA become clearer. Are they running on Windows or Linux? Does your database management system (DBMS) have built-in support for an HA configuration? If so, what are its limitations? If your critical applications are running on Windows and SQL Server, for example, you could enable HA using the Availability Group (AG) feature of SQL Server itself. Alternatively, you could configure HA using a third-party SANless clustering tool, which offers options that the AG services in SQL Server do not. If you’re trying to protect database servers from multiple vendors, or if some of your critical applications run on Windows while others run on Linux, your ability to manage HA will be facilitated by the use of an HA solution that supports multiple DBMS and OS platforms. Opting for a cluster solution that accommodates diverse DBMS and OS platforms simplifies management, in contrast to the potential complexity and cumbersomeness of handling multiple database-native HA services concurrently.. Ensuring High Availability via database-native HA solutions If you’re using a database-native HA solution, such as the AG feature of SQL Server, the software will synchronously replicate all the data in your primary SQL Server database to an identical instance of that database on the secondary system server. If something causes the primary server to stop responding, the monitoring features in the AG component will automatically cause the secondary server to take over. Because the AG feature has replicated all the data in real time, the secondary server can take over immediately and there is virtually no interruption of service or loss of data. Many database-native HA tools operate in a similar manner. There are a few caveats, though, when considering a database-native approach: If the HA services are bundled into the DBMS itself, they may replicate only the data associated with that DBMS. If other critical data resides on your primary server, that will not be replicated to the secondary server in a database-native HA scenario. There may be other limitations on what the database-native services will replicate as well. If you use the Basic AG functionality that is bundled into SQL Server Standard Edition, for example, each AG can replicate only a single SQL database to a single secondary location. You could create multiple Basic AGs if your applications involve multiple SQL databases, but you cannot control whether each AG fails over at the same time in a failover situation — and problems may arise if they do not. One way around this limitation would be to use the Always On AG functionality bundled into SQL Server Enterprise Edition, which enables the replication of multiple SQL databases to multiple secondary servers, but that can get very expensive from a licensing perspective if your applications don’t otherwise use any of the features of SQL Server Enterprise Edition. Other database-native HA solutions may have similar constraints, so be sure to understand them before investing in such an approach. Ensuring High Availability via SANless Clustering As an alternative to the database-native approach to HA, you could use a third-party tool to create a SANless cluster. Just as in the AG configuration described above, the SANless clustering software automates the synchronous replication of data from the primary to the secondary server; it also orchestrates the immediate failover to the secondary server if the primary server becomes unresponsive. Because failover takes only seconds, administrator, faculty, and student access to your critical applications will remain virtually uninterrupted. The critical differences between the SANless clustering and a database-native approach lie in the practical details. The SANless clustering approach is database agnostic. It replicates any data on a designated storage volume. That could include multiple databases from multiple vendors, text files, video files, or any other educational asset whose availability is important. This can save an institution a considerable amount of money if a database-native approach to HA would otherwise require an upgrade to a more expensive edition of the database. Finally, as noted earlier, if you are trying to protect applications and data running in multiple operating environments, a SANless clustering approach may be more manageable than individual database–native approaches. You can use SANless clustering to ensure HA in either Windows or Linux environments, which can eliminate the complexities that could accompany the deployment of database-native approaches that differ among operating environments. Reproduced with permission from SIOS