Azure Archives - SIOS SANless clusters

Hitachi Moves Leading Insurance Company’s Mission Critical System to Azure Ensuring High Availability

May 9, 2023 by Jason Aw Leave a Comment

Hitachi Moves Leading Insurance Company’s Mission Critical System to Azure Ensuring High Availability

SIOS DataKeeper chosen for its ability to enable data replication on Azure

A major insurance company wanted to migrate its mission-critical system called ‘channel system’ to the cloud. Channel system included both an in-house system, used by its employees and call centers, and a third-party system used by its sales agencies. The system is positioned as the core system of the company’s business – even a short system failure and downtime would impact the business largely.

The channel system operated in an on-premises, virtualized platform provided by Hitachi. However, the hardware was aging and support for the middleware was ending. This led the insurance company to launch a project to update the infrastructure and move the system to the public cloud to cut running costs. Hitachi took on the responsibility for configuring the system for this project.

The Environment

The channel system’s architecture mostly consisted of Windows software. “We decided to adopt Azure as the cloud platform as we could confirm that there was a high affinity between the system architecture and Microsoft Azure, and it would be the most cost-effective option. There were some things that cannot be changed, but we focused on placing first priority on the customer’s needs,” said Takuro Nishino, System Engineer at Hitachi.

The Challenge

One of the company’s requirements for the new infrastructure was to build a cluster configuration. Additionally, the database, which was built with a cluster configuration using DBMS, along with the job management middleware ‘JP1’ had to be configured in a cluster node in order to secure high availability (HA) even after the cloud migration. The existing system used WSFC (Windows Server Failover Clustering) to configure the HA cluster using Windows Server functions with a shared storage (SAN).

However, as of 2019, it was not possible to have a SAN-based configuration in Azure. Changing the HA software was considered as an option, but, “Considering the impact on the system, the priority was to move the system to the cloud while maintaining the application and infrastructure architecture as much as possible.” said Nishino. Hitachi decided to look for a solution that will both maintain the WSFC cluster configuration and replicate data without the SAN, while using the same HA software.

The Evaluation

After some research, Hitachi found SIOS DataKeeper, a data replication software that can build a cluster and integrate with WSFC on Azure. DataKeeper is an Azure-certified product, which enables you to synchronously replicate data from the active node to the standby node. By using DataKeeper with WSFC, the company would be able to make JP1 redundant. Hitachi also officially supported and had experience with this configuration. “DataKeeper was the only solution that made our project feasible – migrating to the cloud while maintaining the cluster configuration, without changing the HA software,” Nishino recalled.

Along with Hitachi, Hitachi Solutions, having a track record of implementing DataKeeper, took part in this project by creating detailed design documents and construction procedure manuals. The actual work of implementing DataKeeper was done by members of the Insurance company’s IT department.

The Solution

Hitachi Solutions created the basic designs and detailed designs after defining the requirements, and also provided templates to make it easier for the insurance company to implement the system. During this process, thanks to DataKeeper’s easy-to-understand features such as the parameter settings. Hitachi Solutions was able to create a procedure manual without any difficulty by simply customizing the default values to suit the customer environment. Hitachi Solutions also created a DataKeeper testing environment within its own local environment to identify any issues in advance – this process helped them create accurate procedure manuals and configure the production environment smoothly.

The Results

In October 2022, the insurance company’s system migration to Azure was completed. Since the migration, the HA cluster configuration – DataKeeper and WSFC – has been operating steadily. The company agrees that achieving an HA cluster configuration this way, without changing the system architecture or HA software, was the best way to migrate the system to Azure.

Hitachi and Hitachi Solutions were able to succeed in the insurance company’s cloud migration project by integrating DataKeeper within the HA cluster configuration. “The fact that we were able to introduce DataKeeper in our customer’s project was a great achievement. We’d like to apply our knowledge and achievements gained from this project in other Azure projects as well,” said Kenta Otsuka, Sales at Hitachi. “In fact, we have proposed DataKeeper to other clients who are also considering a migration of their systems to Azure. The experience of working with SIOS Technology through Hitachi Solutions and the skills and knowledge we gained will definitely contribute to our business in the future.”

“When the project began, Azure didn’t have a shared disk function, so DataKeeper was the only option. Even though SAN-based configuration is currently available in Azure, based on the fact that we were able to achieve a smooth migration and stable operation in this project by implementing DataKeeper, DataKeeper will continue to be a solution when redundancy is required in Azure migration and configuration projects,” said Satoshi Noguchi, System Engineer at Hitachi.

Download case study here

Reproduced with permission from SIOS

High Availability Options for SQL Server on Azure VMs

February 28, 2023 by Jason Aw Leave a Comment

High Availability Options for SQL Server on Azure VMs

Microsoft Azure infrastructure is designed to provide high availability for your applications and data. Azure offers a variety of infrastructure options for achieving high availability, including Availability Zones, Paired Regions, redundant storage, and high-speed, low-latency network connectivity. All of these services are backed by Service Level Agreements (SLAs) to ensure the availability of your business-critical applications. This blog post will focus on high availability options when running SQL Server in Azure Virtual Machines.

Azure Infrastructure

Before we jump into the high availability options for SQL Server, let’s discuss the vital infrastructure that must be in place. Availability Zones, Regions, and Paired Regions are key concepts in Azure infrastructure that are important to understand when planning for the high availability of your applications and data.

Availability Zones are physically separate locations within a region that provides redundant power, cooling, and networking. Each Availability Zone consists of one or more data centers. By placing your resources in different Availability Zones, you can protect your applications and data from outages caused by planned or unplanned maintenance, hardware failures, or natural disasters. When leveraging Availability Zones for your SQL Server deployment, you qualify for the 99.99% availability SLA for Virtual Machines.

Regions are geographic locations where Azure services are available. Azure currently has more than 60 regions worldwide, each with multiple Availability Zones. By placing your resources in different regions, you can provide even greater protection against outages caused by natural disasters or other significant events.

Paired Regions are pre-defined region pairs that have unique relationships. Most notably, paired Regions replicate data to each other when geo-redundant storage is in use. The other benefits of paired regions are region recovery sequence, sequential updating, physical isolation, and data residency. When designing your disaster recovery plan, it is advisable to use Paired Regions for your primary and disaster recovery locations.

Using Availability Zones and Paired Regions in conjunction with high availability options such as Availability Groups and Failover Cluster Instances, you can create highly available, resilient SQL Server deployments that can withstand a wide range of failures, minimizing downtime.

SQL Server Availability Groups and Failover Cluster Instances

SQL Server Availability Groups (AGs) and SQL Server Failover Cluster Instances (FCIs) are both high availability (HA) and disaster recovery (DR) solutions for SQL Server, but they work in different ways.

An AG is a feature of SQL Server Enterprise edition that provides an HA solution by replicating a database across multiple servers (called replicas) to ensure that the database is always available in case of failure. AGs can be used to provide HA for both a single database and multiple databases.

SQL Server Standard Edition supports something called a Basic AG. There are some limitations to Basic AGs in SQL Server. Firstly, a Basic AG only supports a single database. You need an AG for each database and the associated IP address and load balancer if you have more than one database. Additionally, Basic AGs do not support read-only replicas. While Basic AGs provide a simple way to implement HA for a single database, they may not be suitable for more complex scenarios.

On the other hand, a SQL Server FCI is a Windows Server Failover Cluster (WSFC) that provides an HA solution by creating a cluster of multiple servers (called nodes) that use shared storage. In the event of a failure, the SQL Server instance running on one node can fail over to another.

In SQL Server 2022 Enterprise Edition, the new Contained Availability Groups (CAG) address some of the AG limitations by allowing users to create system databases to CAG, which can then be replicated. CAG eliminates the need to synchronize things like SQL logins and SQL Agent jobs manually.

Availability Groups and Failover Cluster Instances have their own pros and cons. AGs have advanced features like readable secondaries and synchronous and asynchronous replication. However, AGs require the Enterprise Edition of SQL Server, which can be cost-prohibitive, particularly if you don’t need any other Enterprise Edition features.

FCIs protect the entire SQL Server instance, including all user-defined databases and system databases. FCIs make management easier since all changes, including those made to SQL Server Agent jobs, user accounts and passwords, and database additions and deletions, are automatically reconciled on all versions of SQL Server, not just SQL 2022 with CAG. FCIs are available with SQL Server Standard Edition, which makes it more cost-effective. However, FCIs require shared storage, which presents challenges when deploying in environments that span Availability Zones, Regions, or hybrid cloud configurations. Read more about how SIOS software enables high availability for SQL servers.

Storage Options for SQL Server Failover Cluster Instances

Regarding storage options for SQL Server Failover Cluster Instances that span Availability Zones, there are three options: Azure File Share, Azure Shared Disk with Zone Redundant Storage, and SIOS DataKeeper Cluster Edition. There is a fourth option, Storage Spaces Direct (S2D), but that is limited to single AZ deployments, so clusters based on S2D would not qualify for the 99.99% SLA and would be susceptible to failures that impact and entire AZ.

Azure File Share

Azure File Share with zonal redundancy (ZRS) is a feature that allows you to store multiple copies of your data across different availability zones in an Azure region, providing increased durability and availability. This data can then be shared as a CIFS file share, and the cluster connects to it using the SMB 3 protocol.

Azure Shared Disk

Azure Shared Disk with Zone Redundant Storage (ZRS) is a shared disk that can store SQL Server data for use in a cluster. SCSI persistent reservations ensure that only the active cluster node can access the data. If a primary Availability Zone fails, the data in the standby availability zone becomes active. Shared Disk with ZRS is only available in the West US 2, West Europe, North Europe, and France Central regions.

SIOS DataKeeper Cluster Edition

SIOS DataKeeper Cluster Edition is a storage HA solution that supports SQL Server Failover Clusters in Azure. It is available in all regions and is the only FCI storage option that supports cross Availability Zone failover and cross Region failover. It also enables hybrid cloud configurations that span on-prem to cloud configurations. DataKeeper is a software solution that keeps locally attached storage in sync across all the cluster nodes. It integrates with WSFC as a third-party storage class cluster resource called a DataKeeper volume. Failover Cluster controls all the management of the DataKeeper volume, making the experience seamless for the end user. Learn more about SIOS DataKeeper.

Summary

In conclusion, Azure provides various infrastructure options for achieving high availability for SQL Server deployments, such as Availability Zones, Regions, and Paired Regions. By leveraging these options, in conjunction with high availability solutions like Availability Groups and Failover Cluster Instances, you can create a highly available, resilient SQL Server deployment that can withstand a wide range of failures and minimize downtime. Understanding the infrastructure required and the pros and cons of each option is essential before choosing the best solution for your specific needs. It’s advisable to consult with a SQL and Azure expert to guide you through the process and also review the Azure documentation and best practices. With the proper planning and implementation, you can ensure that your SQL Server deployments on Azure are always available to support your business-critical applications.

Reproduced with permission from SIOS

SAP on Azure High Availability Best Practices

November 23, 2022 by Jason Aw Leave a Comment

SAP on Azure High Availability Best Practices

In the following video, Bala Anbalagan, senior SAP architect for Microsoft with 20 years of experience in SAP, explains the best practices for configuring high availability to protect SAP solutions in Azure. He also reviews the mistakes often made when implementing HA solutions in the cloud and key factors that users should know about when configuring SIOS LifeKeeper.

Configuring SAP High Availability Solutions in the Cloud

Bala explains that every SAP user should remember that a high availability solution is indispensable, especially in the cloud. Any cloud provider will need to make changes in their environments. Even though they have high service levels for their hardware infrastructure, there will be brief periods downtime that can bring your SAP systems down completely.

It is also critical that users configure SAP HA properly. The main purpose of installing HA solutions is to protect against downtime, but if you don’t do it properly, you are just wasting time and money, regardless of the cloud you’re running in. It is essential to follow the configuration rules of your cloud provider. If you misconfiguration your HA or fail to test failover and failback, it can result in a business disruption when you are least expecting it – particularly during a period of high-utilization.
SIOS LifeKeeper can detect errors during the configuration process. For example, it sends warnings if you only configure a single communication channel, as you always want a redundant communication channel, or a secondary network connection, between the nodes in the HA cluster. If you use SIOS DataKeeper, it will also show warnings if something is wrong with the configuration during the replication process.

What makes configuring SIOS straightforward?

SIOS has a pretty straightforward configuration process. Basically, you just need LifeKeeper installed in each of your cluster nodes and you use different types of SIOS application-specific recovery kits (ARK) modules (that come with LifeKeeper) depending on the application you want to recover. Also, the process is very easy to follow with a straightforward GUI – intelligence is built in, and you don’t need to change the details of the GUI. It automatically detects most of the information, further simplifying the set up process.

Knowing which ARK to use and how to use it is important in the configuring process. The ARK is a software module that provides application-specific intelligence to the LifeKeeper software. SIOS provides separate ARKs for different applications. For example, for SAP HANA, you install the SIOS SAP HANA ARK to enable LIfeKeeper to automate configuration steps, detect failures and manage a reliable failover for SAP HANA while maintaining SAP’s best practices.

Biggest Mistakes in Implementing HA for SAP in Azure

Users commonly implement HA for SAP solutions in Azure with the same process as they do in an on-premises environment. They need to change their mindset. Always make sure to follow the recommendations provided by the cloud provider, that is, read documents and keep the parameters as recommended by the cloud providers.

Another common mistake is adding too much complexity. Some customers put everything into a single cluster, but clusters should be separated for different servers. Making a cluster too large adds unnecessary complexity and potential risk.

Thorough testing in every aspect is critical when it comes to HA clustering. Testing HA configurations before going live as well as periodically (and frequently) are the best things you can do to prevent unexpected downtime.

Learn more about SAP high availability best practices in the video below or contact us to for more information about implementing high availability and disaster recovery for your essential applications in the cloud.

Reproduced with permission from SIOS

How to use Azure Site Recovery (ASR) to replicate a Windows Server Failover Cluster (WSFC) that uses SIOS DataKeeper for cluster storage

October 14, 2022 by Jason Aw Leave a Comment

How to use Azure Site Recovery (ASR) to replicate a Windows Server Failover Cluster (WSFC) that uses SIOS DataKeeper for cluster storage

Intro

So you have built a SQL Server Failover Cluster Instance (FCI), or maybe an SAP ASCS/ERS cluster in Azure. Each node of the cluster resides in a different Availability Zone (AZ), or maybe you have strict latency requirements and are using Placement Proximity Groups (PPG) and your nodes all reside in the same Availability Set. Regardless of the scenario, you now have a much higher level of availability for your business critical application than if you were just running a single instance.

Now that you have high availability (HA) covered, what are you going to do for disaster recovery? Regional disasters that take out multiple AZs are rare, but as recent history has shown us, Mother Nature can really pack a punch. You want to be prepared should an entire region go offline.

Azure Site Recovery (ASR) is Microsoft’s disaster recovery-as-a-service (DRaaS) offering that allows you to replicate entire VMs from one region to another. It can also replicate virtual machines and physical servers from on-prem into Azure, but for the purpose of this blog post we will focus on the Azure Region-to-Region DR capabilities.

Setting up Azure Site Recovery

We are going to assume you have already built your cluster using SIOS DataKeeper. If not, here are some pointers to help get you started.

Failover Cluster Instances with SQL Server on Azure VMs

SIOS DataKeeper Cluster Edition for the SAP ASCS/SCS cluster share disk

We are also going to assume you are familiar with Azure Site Recovery. Instead of yet another guide on setting up ASR, I suggest you read the latest documentation from Microsoft. This article will focus instead on some things you may not have considered and the specific steps required to fix your cluster after a failover to a different subnet.

Paired Regions

Before you start down the DR path, you should be aware of the concept of Azure Paired Regions. Every Region in Azure has a preferred DR Region. If you want to learn more about Paired Regions, the documentation provides a great background. There are some really good benefits of using your paired region, but it’s really up to you to decide on what region you want to use to host your DR site.

Cloud Witness Location

When you originally built your cluster you had to choose a witness type for your quorum. You may have selected a File Share Witness or a Cloud Witness. Typically either of those witness types should reside in an AZ that is separate from your cluster nodes.

However, when you consider that, in the event of a disaster, your entire cluster will be running in your DR region, there is a better option. You should use a cloud witness, and place it in your DR region. By placing your cloud witness in your DR region, you provide resiliency not only for local AZ failures, but it also protects you should the entire region fail and you have to use ASR to recover your cluster in the DR region. Through the magic of Dynamic Quorum and Dynamic Witness, you can be sure that even if your DR region goes offline temporarily, it will not impact your production cluster.

Multi-VM Consistency

When using ASR to replicate a cluster, it is important to enable Multi-VM Consistency to ensure that each cluster node’s recovery point is from the same point in time. That ensures that the DataKeeper block level replication occurring between the VMs will be able to continue after recovery without requiring a complete resync.

Crash Consistent Recovery Points

Application consistent recovery points are not supported in replicated clusters. When configuring the ASR replication options do not enable application consistent recovery points.

Keep IP Address After Failover?

When using ASR to replicate to your DR site there is a way to keep the IP address of the VMs the same. Microsoft described it in the article entitled Retain IP addresses during failover. If you can keep the IP address the same after failover it will simplify the recovery process since you won’t have to fix any cluster IP addresses or DataKeeper mirror endpoints, which are based on IP addresses.

However, in my experience, I have never seen anyone actually follow the guidance above, so recovering a cluster in a different subnet will require a few additional steps after recovery before you can bring the cluster online.

Your First Failover Attempt

Recovery Plan

Because you are using Multi-VM Consistency, you have to failover your VMs using a Recovery Plan. The documentation provides pretty straightforward guidance on how to do that. A Recovery Plan groups the VMs you want to recover together to ensure they all failover together. You can even add multiple groups of VMs to the same Recovery Plan to ensure that your entire infrastructure fails over in an orderly fashion.

A Recovery Plan can also launch post recovery scripts to help the failover complete the recovery successfully. The steps I describe below can all be scripted out as part of your Recovery Plan, thereby fully automating the complete recovery process. We will not be covering that process in this blog post, but Microsoft documents this process.

Static IP Addresses

As part of the recovery process you want to make sure the new VMs have static IP addresses. You will have to adjust the interface properties in the Azure Portal so that the VM always uses the same address. If you want to add a public IP address to the interface you should do so at this time as well.

Network Configuration

After the replicated VMs are successfully recovered in the DR site, the first thing you want to do is verify basic connectivity. Is the IP configuration correct? Are the instances using the right DNS server? Is name resolution functioning correctly? Can you ping the remote servers?

If there are any problems with network communications then the rest of the steps described below will be bound to fail. Don’t skip this step!

Load Balancer

As you probably know, clusters in Azure require you to configure a load balancer for client connectivity to work. The load balancer does not fail over as part of the Recovery Plan. You need to build a new load balancer based on the cluster that now resides in this new vNet. You can do this manually or script this as part of your Recovery Plan to happen automatically.

Network Security Groups

Running in this new subnet also means that you have to specify what Network Security Group you want to apply to these instances. You have to make sure the instances are able to communicate across the required ports. Again, you can do this manually, but it would be better to script this as part of your Recovery Plan.

Fix the IP Cluster Addresses

If you are unable to make the changes described earlier to recover your instances in the same subnet, you will have to complete the following steps to update your cluster IP addresses and the DataKeeper addresses for use in the new subnet.

Every cluster has a core cluster IP address. What you will see if you launch the WSFC UI after a failover is that the cluster won’t be able to connect. This is because the IP address used by the cluster is not valid in the new subnet.

If you open the properties of that IP Address resource you can change the IP address to something that works in the new subnet. Make sure to update the Network and Subnet Mask as well.

Once you fix that IP Address you will have to do the same thing for any other cluster address that you use in your cluster resources.

Fix the DataKeeper Mirror Addresses

SIOS DataKeeper mirrors use IP addresses as mirror endpoints. These are stored in the mirror and mirror job. If you recover a DataKeeper based cluster in a different subnet, you will see that the mirror comes up in a Resync Pending state. You will also notice that the Source IP and the Target IP reflect the original subnet, not the subnet of the DR site.

Fixing this issue involves running a command from SIOS called CHANGEMIRRORENDPOINTS. The usage for CHANGEMIRRORENDPOINTS is as follows.

emcmd <NEW source IP> CHANGEMIRRORENDPOINTS <volume letter> <ORIGINAL target IP> <NEW source IP> <NEW target IP>

In our example, the command and output looked like this.

After the command runs, the DataKeeper GUI will be updated to reflect the new IP addresses as shown below. The mirror will also go to a mirroring state.

Conclusions

You have now successfully configured and tested disaster recovery of your business critical applications using a combination of SIOS DataKeeper for high availability and Azure Site Recovery for disaster recovery. If you have questions, or would like to consult with SIOS to help you design and implement high availability and disaster recovery for your business critical applications like SQL Server, SAP ASCS and ERS, SAP HANA, Oracle or other business critical applications, please reach out to us.

Reproduced with permission from SIOS