failover cluster Archives - SIOS SANless clusters

Failover Cluster

December 18, 2021 by Jason Aw Leave a Comment

Failover Cluster

Failover Cluster Software Solutions: What You Need to Know

What is Failover?

The “Tell All” on Failover

Failover is the process by which a standby, redundant system, database, or network assumes operations when the primary system, database, or network fails, or primary operations are abnormally terminated. Hot failover is one of the key design principles incorporated into high availability and disaster recovery systems.

RTO is the maximum tolerable duration of any outage. Online transaction processing applications generally have the lowest RTOs, and those that are mission-critical often have an RTO of only a few seconds.

RPO is the maximum amount of data loss that can be tolerated when a failure happens. For HA, RPO is often zero to specify there should be zero data loss under all failure scenarios

Let’s Understand Failover

First, we need to discuss the difference between cold, warm, and hot standby servers:

A cold server (sometimes referred to as a cold failover) is one that is not connected to the primary server but is available and turned on only when the primary server goes down. With a cold server, it can take considerable time to power up the standby server, which may require an updated configuration and software. This means that Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) are the longest. Cold failover is considered unacceptable for mission-critical applications.
A warm server/failover is one that periodically receives updates from the primary server through data replication and mirroring.
A hot server/failover is one that receives regular updates from the primary server and is immediately available to take over in the event of a failover. Hot failover is the most resource-intensive, results in zero data loss (zero RPO) and RPOs of no more than a few minutes and is required to support high availability for mission-critical applications.

SIOS Failover Cluster Software supports hot failovers, delivering a zero RPO and an RTO of milliseconds for high availability. The results: a system failure results in no data loss and is transparent to the user.

SIOS Failover Cluster Software

SIOS offers failover cluster software so you can build a traditional shared storage cluster for high availability or a SIOS SANless cluster that uses local storage. SIOS software uses real-time synchronous (for LAN environments) or asynchronous (for WAN environments) data replication to synchronize storage.

SIOS solutions provide high availability and disaster recovery with a single solution. This approach eliminates the cost and complexity of SAN-based replication. With the ability to replicate to multiple targets, you can configure a multi-node failover cluster with nodes located in multiple locations to protect your systems from disasters.

With SIOS clusters, you can replicate between the configurations of your choice – between SAN and SANless environments and any combination of physical, virtual, cloud, and hybrid configurations. In fact, SIOS clustering solutions are unique in the breadth of operating systems, applications, and infrastructure environments supported, including Windows, Linux, SAP, SQL Server, Oracle, AWS, Azure, and Google cloud platforms.

SIOS failover — SIOS Failover Cluster Software uses efficient block-level replication to keep local storage synchronized, enabling the secondary nodes in your cluster to continue to operate after a failover with access to the most recent data.

In a Windows environment, SIOS DataKeeper gives you the flexibility to build a Windows cluster in any combination of physical, virtual, and cloud environments. SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism.

In a Linux environment, SIOS LifeKeeper and SIOS DataKeeper provide a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters.

Let’s review one case study that talks to the benefits of one of SIOS’ solutions – SIOS DataKeeper.

A SIOS Failover Cluster Software Solution in Action

Located in The Netherlands, Van de Lande BR (VDL) manufactures a wide range of PVC and PE compression fittings and valves. Their products are used all over the world in industrial and technical installations. Manufacturing more than 4500 different products, VDL is deeply committed to product improvement and quality, making their brand the choice for builders of systems and installations for more than 50 years.

VDL has a Hyper-V environment, solid state disk (SSD) storage, and its business relies heavily on its ERP solution. With only one data processing system in place, VDL was exposed and needed a disaster recovery (DR) solution to ensure the protection and availability of its ERP, web services, and other mission-critical systems in the event of a disaster.

To deliver immediate failover and DR protection, VDL built a Windows Server Failover Clustering (WSFC) system, with one node replicating data to the other node. If the primary node fails, WSFC transfers all operations to the standby node, giving users continuous access to applications and data.

VDL also chose SIOS DataKeeper Cluster Edition to provide DR for its Hyper-V virtual machines (VM). SIOS DataKeeper is a software add-on that provides uninterrupted data access. It seamlessly integrates with WSFC to add performance-optimized, host-based, synchronous or asynchronous real-time replication of Hyper-V VMs between physical servers across both LAN and WAN connections. Working with WSFC, SIOS DataKeeper monitors the system and application health, maintains client connectivity, and makes it possible to create SANless clusters. Unlike a SAN, a SANless cluster eliminates single points of failure and reduces the cost and complexity of deploying clusters.

SIOS DataKeeper also uses WSFC to provide system administrators with a familiar and application-agnostic HA/DR solution, thereby dramatically simplifying implementation and operation.

VDL deployed two SIOS DataKeeper clusters; one two-node cluster works as a file server and iSCSI server while the other supports a SQL Server (ERP) cluster and Dynamics NAV web services. The implementation took less than one day. During the system failover test, the network services team failed over and failed back the system quickly and easily. After a thorough evaluation of the VDL server configuration and the completion of testing, the installation team confirmed that SIOS DataKeeper and a SANless cluster met all their criteria for disaster recovery, performance, and high availability of their ERP system, web services, and other mission-critical applications. The organization no longer risks data loss in the event of a failure.

One Last Thing

In addition to testing SIOS, VDL tested other solutions with unacceptable results. Comments Maurits van de Lande, ICT Manager at VDL, “We have tested our file server with both DFS replication and AlwaysOn technology. Neither delivered an automated disaster recovery solution to match SIOS DataKeeper, which fully addresses our DR requirements.”

———————————————————————————————————————————-

Regardless of your IT environment, your organization can reap the benefits of SIOS DataKeeper Cluster Edition and DataKeeper Standard Edition, both of which provide configuration flexibility, reduce data transfer costs, eliminate single points of failure, reduce complexities, and optimize network performance.

For more information, contact us or request a free trial.

References:

Reproduced from SIOS

Glossary: Failover Cluster

June 6, 2021 by Jason Aw Leave a Comment

Glossary of Terms: Failover Cluster

Definition: A failover cluster is a group of servers that are configured to work together such that if one of the servers, or nodes, fails, another node in the cluster can take over operation of its application without any downtime to deliver high availability.

Reproduced from SIOS

Configure SQL Server 2008 R2 Failover Cluster Instance On Windows Server 2008 R2 In Azure

April 24, 2019 by Jason Aw Leave a Comment

Step-By-Step: How To Configure A SQL Server 2008 R2 Failover Cluster Instance On Windows Server 2008 R2 In Azure

Intro

On July 9, 2019, support for SQL Server 2008 and 2008 R2 will end. That means the end of regular security updates. However, if you move those SQL Server instances to Azure, Microsoft will give you three years of Extended Security Updates at no additional charge. If you are currently running SQL Server 2008/2008 R2 and you are unable to update to a later version of SQL Server before the July 9th deadline, you will want to take advantage of this offer rather than running the risk of facing a future security vulnerability. An unpatched instance of SQL Server could lead to data loss, downtime or a devastating data breach.

One of the challenges you will face when running SQL Server 2008/2008 R2 in Azure is ensuring high availability. On premises you may be running a SQL Server Failover Cluster (FCI) instance for high availability, or possibly you are running SQL Server in a virtual machine and are relying on VMware HA or a Hyper-V cluster for availability. When moving to Azure, none of those options are available. Downtime in Azure is a very real possibility that you must take steps to mitigate.

In order to mitigate the possibility of downtime and qualify for Azure’s 99.95% or 99.99% SLA, you have to leverage SIOS DataKeeper. DataKeeper overcomes Azure’s lack of shared storage and allows you to build a SQL Server FCI in Azure that leverages the locally attached storage on each instance. SIOS DataKeeper not only supports SQL Server 2008 R2 and Windows Server 2008 R2 as documented in this guide, it supports any version of Windows Server, from 2008 R2 through Windows Server 2019 and any version of SQL Server from from SQL Server 2008 through SQL Server 2019.

This guide will walk through the process of creating a two-node SQL Server 2008 R2 Failover Cluster Instance (FCI) in Azure, running on Windows Server 2008 R2. Although SIOS DataKeeper also supports clusters that span Availability Zones or Regions, this guide assumes each node resides in the same Azure Region, but different Fault Domains. SIOS DataKeeper will be used in place of the shared storage normally required to create a SQL Server 2008 R2 FCI.

Create The First SQL Server Instance In Azure

This guide will leverage the SQL Server 2008R2SP3 on Windows Server 2008R2 image that is published in the Azure Marketplace.

When you provision the first instance you will have to create a new Availability Set. During this process be sure to increase the number of Fault Domains to 3. This allows the two cluster nodes and the file share witness each to reside in their own Fault Domain.

Add additional disks to each instance. Premium or Ultra SSD are recommended. Disable caching on the disks used for the SQL log files. Enable read-only caching on the disk used for the SQL data files. Refer to Performance guidelines for SQL Server in Azure Virtual Machines for additional information on storage best practices.

If you don’t already have a virtual network configured, allow the creation wizard to create a new one for you.

Once the instance is created, go in to the IP configurations and make the Private IP address static. This is required for SIOS DataKeeper and is best practice for clustered instances.

Make sure that your virtual network is configured to set the DNS server to be a local Windows AD controller. This is to ensure you will be able to join the domain in a later step.

Create The End SQL Server Instance In Azure

Follow the same steps as above. Except be sure to place this instance in the same virtual network and Availability Set that you created with the 1st instance.

Create A File Share Witness (FSW) Instance

In order for the Windows Server Failover Cluster (WSFC) to work optimally you are required to create another Windows Server instance and place it in the same Availability Set as the SQL Server instances. By placing it in the same Availability Set, you ensure that each cluster node and the FSW reside in different Fault Domains. Thereby ensuring your cluster stays on line should an entire Fault Domain go off line. This instances does not require SQL Server. It can be a simple Windows Server as all it needs to do is host a simple file share.

This instance will host the file share witness required by WSFC. This instance does not need to be the same size, nor does it require any additional disks to be attached. It’s only purpose is to host a simple file share. It can in fact be used for other purposes. In my lab environment my FSW is also my domain controller.

Uninstall SQL Server 2008 R2

Each of the two SQL Server instances provisioned already have SQL Server 2008 R2 installed on them. However, they are installed as standalone SQL Server instances, not clustered instances. SQL Server must be uninstalled from each of these instances before we can install the cluster instance. The easiest way to do that is to run the SQL Setup as shown below.

When you run setup.exe /Action-RunDiscovery you will see everything that is preinstalled

setup.exe /Action-RunDiscovery

Running setup.exe /Action=Uninstall /FEATURES=SQL,AS,RS,IS,Tools /INSTANCENAME=MSSQLSERVER kicks off the uninstall process

setup.exe /Action=Uninstall /FEATURES=SQL,AS,RS,IS,Tools /INSTANCENAME=MSSQLSERVER

Running setup.exe /Action-RunDiscovery confirms the uninstallation completed

setup.exe /Action-RunDiscovery

Run this uninstallation process again on the 2nd instance.

Add Instances To The Domain

All three of these instances will need to be added to a Windows Domain.

Add Windows Failover Clustering Feature

The Failover Clustering Feature needs to be added to the two SQL Server instances

Add-WindowsFeature Failover-Clustering

Turn Off Windows Firewall

For simplicity sake, turn off the Windows Firewall during the installation and configuration of the SQL Server FCI. Consult Azure Network Security Best Practices for advice on securing your Azure resources. Details on required Windows ports can be found here , SQL Server ports here and SIOS DataKeeper ports here, The Internal Load Balancer that we will configure later also requires port 59999 access. So be sure to account for that in your security configuration.

NetSh Advfirewall set allprofiles state off

Install Convenience Rollup Update For Windows Server 2008 R2 SP1

There is a critical update ( kb2854082) that is required in order to configure a Windows Server 2008 R2 instance in Azure. That update and many more are included in the Convenience Rollup Update for Windows Server 2008 R2 SP1. Install this update on each of the two SQL Server instances.

Format The Storage

The additional disks that were attached when the two SQL Server instances were provisioned need to be formatted. Do the following for each volume on each instance.

Microsoft best practices says the following…

“NTFS allocation unit size: When formatting the data disk, it is recommended that you use a 64-KB allocation unit size for data and log files as well as TempDB.”

Run Cluster Validation

Run cluster validation to ensure everything is ready to be clustered.

Your report will contain WARNINGS about Storage and Networking. You can ignore those warnings as we know there are no shared disks and only a single network connection exists between the servers. You may also receive a warning about network binding order which can also be ignored. If you encounter any ERRORS you must address those before you continue.

Create The Cluster

Best practices for creating a cluster in Azure would be to use Powershell as shown below. Powershell allows us to specify a Static IP Address, whereas the GUI method does not. Unfortunately, Azure’s implementation of DHCP does not work well with Windows Server Failover Clustering. If you use the GUI method you will wind up with a duplicate IP address as the Cluster IP Address. It’s not the end of the world, but you will need to fix that as I show.

As I said, the Powershell method generally works best. However, for some reason, it seems to be failing on Windows Server 2008 R2 as shown below.

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.1.0.100 -NoStorage

You can try that method and if it works for you – great! I need to go back and investigate this a bit more to see if it was a fluke. Another option I need to explore if Powershell is not working is Cluster.exe. Running cluster /create /? gives the proper syntax to use for creating clusters with the deprecated cluster.exe command.

However, if Powershell or Cluster.exe fails you, the steps below illustrate how to create a cluster via the Windows Server Failover Clustering UI, including fixing the duplicate IP address that will be assigned to the cluster.

Remember, the name you specify here is just the Cluster Name Object (CNO). This is not the name that your SQL clients will use to connect to the cluster; we will define that during the SQL Server cluster setup in a later step.

At this point, the cluster is created, but you may not be able to connect to it with the Windows Server Failover Clustering UI due to the duplicate IP address problem.

Fix The Duplicate IP Address

As I mentioned earlier, if you create the cluster using the GUI, you are not given the opportunity to choose an IP address for the cluster. Because your instances are configured to use DHCP (required in Azure), the GUI wants to automatically assign you an IP address using DHCP. Unfortunately, Azure’s implementation of DHCP does not work as expected and the cluster will be assign the same address that is already being used by one of the nodes. Although the cluster will create properly, you will have a hard time connecting to the cluster until you fix this problem.

To fix this problem, from one of the nodes run the following command to ensure the Cluster service is started on that node.

Net start clussvc /fq

On that same node you should now be able to connect to the Windows Server Failover Clustering UI, where you will see the IP Address has failed to come online.

Open the properties of the Cluster IP address and change it from DHCP to Static, and assign it an unused IP address.

Bring the Name resource online

Add The File Share Witness

Next we need to add the File Share Witness. On the 3rd server we provisioned as the FSW, create a folder and share it as shown below. You will need to grant the Cluster Name Object (CNO) read/write permissions at both the Share and Security levels as shown below.

Once the share is created, run the Configure Cluster Quorum wizard on one of the cluster nodes and follow the steps illustrated below.

Create Service Account For DataKeeper

We are almost ready to install DataKeeper. However, before we do that you need to create a Domain account and add it to the Local Administrators group on each of the SQL Server cluster instances. We will specify this account when we install DataKeeper.

Install DataKeeper

Install DataKeeper on each of the two SQL Server cluster nodes as shown below.

This is where we will specify the Domain account we added to each of the local Domain Administrators group.

Configure DataKeeper

Once DataKeeper is installed on each of the two cluster nodes, you are ready to configure DataKeeper.

NOTE – The most common error encountered in the following steps is security related, most often by pre-existing Azure Security groups blocking required ports. Please refer to the SIOS documentation to ensure the servers can communicate over the required ports.

First, you must connect to each of the two nodes.

If everything is configured properly, you should then see the following in the Server Overview report.

Next, create a New Job and follow the steps illustrated below

Choose Yes here to register the DataKeeper Volume resource in Available Storage

Complete the above steps for each of the volumes. Once you are finished, you should see the following in the Windows Server Failover Clustering UI.

You are now ready to install SQL Server into the cluster.

NOTE – At this point the replicated volume is only accessible on the node that is currently hosting Available Storage. That is expected, so don’t worry!

Install SQL Server On The First Node

On the first node, run the SQL Server setup.

Choose New SQL Server Failover Cluster Installation and follow the steps as illustrated.

Choose only the options you need.

Please note, this document assumes you are using the Default instance of SQL Server. If you use a Named Instance, you need to make sure you lock down the port that it listens on, and use that port later on when you configure the load balancer. You also will need to create a load balancer rule for the SQL Server Browser Service (UDP 1434) in order to connect to a Named Instance. Neither of those two requirements are covered in this guide. But if you require a Named Instance, it will work if you do those two additional steps.

Here you will need to specify an unused IP address

Go to the Data Directories tab and relocate data and log files. At the end of this guide we talk about relocating tempdb to a non-mirrored DataKeeper Volume for optimal performance. For now, just keep it on one of the clustered disks.

Install SQL On Second Node

Run the SQL Server setup again on the second node. Then, choose Add node to a SQL Server Failover Cluster.

Congratulations, you are almost done! However, due to Azure’s lack of support for gratuitous ARP, we will need to configure an Internal Load Balancer (ILB) to assist with client redirection as shown in the following steps.

Update The SQL Cluster IP Address

In order for the ILB to function properly, you must run run the following command from one of the cluster nodes. It SQL Cluster IP enables the SQL Cluster IP address to respond to the ILB health probe while also setting the subnet mask to 255.255.255.255 in order to avoid IP address conflicts with the health probe.

cluster res <IPResourceName> /priv enabledhcp=0 address=<ILBIP> probeport=59999  subnetmask=255.255.255.255

NOTE – I don’t know if it is a fluke. On occasion I have run this command and it looks like it works, but it doesn’t complete the job and I have to start again. The way I can tell if it worked is by looking at the Subnet Mask of the SQL Server IP Resource. If it is not 255.255.255.255 then you know it didn’t run successfully. It may simply be a GUI refresh issue. Do try restarting the cluster GUI to verify the subnet mask was updated.

After it runs successfully, take the resource offline and bring it back online for the changes to take effect.

Create The Load Balancer

The final step is to create the load balancer. In this case we are assuming you are running the Default Instance of SQL Server, listening on port 1433.

The Private IP Address you define when you Create the load balancer will be the exact same address your SQL Server FCI uses.

Add just the two SQL Server instances to the backend pool. Do NOT add the FSW to the backend pool.

In this load balancing rule, you must enable Floating IP.

Test The Cluster

The most simple test is to open SQL Server Management Studio on the passive node and connect to the cluster. Congratulations! You did everything correctly as it connects! If you can’t connect, don’t fear. I wrote a blog article to help troubleshoot the issue. Managing the cluster is exactly the same as managing a traditional shared storage cluster. Everything is controlled through Failover Cluster Manager.

Optional – Relocate TempDB

For optimal performance it would be advisable to move tempdb to the local, non replicated, SSD. But, SQL Server 2008 R2 requires tempdb to be on a clustered disk. SIOS has a solution called a Non-Mirrored Volume Resource which addresses this issue. It would be advisable to create a non-mirrored volume resource of the local SSD drive and move tempdb there. Do note, the local SSD drive is non-persistent. You must take care to ensure the folder holding tempdb and the permissions on that folder are recreated each time the server reboots.

After you create the Non-Mirrored Volume Resource of the local SSD, follow the steps in this article to relocate tempdb. The startup script described in that article must be added to each cluster node.

Reproduced with permission from Clusteringformeremortals.com

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

April 14, 2019 by Jason Aw Leave a Comment

New Azure ILB Feature Allows You To Build A Multi-Instance SQL Server Failover Cluster

At Microsoft Ignite this past September, Microsoft made some announcements around Azure. One of these announcements was the general availability of multiple VIPs on internal load balancers. Why is this so important to a SQL Server DBA? Well, up until now if you want to deploy highly available SQL Server in Azure you were limited to a single SQL Server FCI per cluster or a single Availability Group listener.

This limitation forced you to deploy a new cluster for each instance of SQL Server you wanted to protect in a Failover Cluster. It also forced you to group all of your databases into a single Availability Group if you wanted automatic failover and client redirection in your AlwaysOn AG configuration.

How To Get Out Of These Restrictions?

Those restrictions have now been lifted with these new ILB features. In this post I am going to walk you through the process of deploying a SQL Server FCI in Azure that contains two SQL Server instances. In a future post I will walk you through the same process for SQL Server AlwaysOn AG.

Let’s Start With A Multi-Instance SQL Server Failover Cluster

Build a basic, single instance SQL Server FCI in Azure as I describe in my post Deploying Microsoft SQL Server 2014 Failover Clusters in Azure Resource Manager .

That post describes the process of creating the Multi-Instance SQL Server Failover Cluster. Using DataKeeper to create the replicated volume resources used in the cluster, try creating the Internal Load Balancer (ILB) and then fixing the SQL Server Cluster IP Resource to work with the ILB. If you want to skip that process and jumpstart your configuration you can always use the Azure Deployment Template that creates a 2-Node SQL Server FCI using SIOS DataKeeper

Assuming you now have a basic two node SQL Server FCI, the steps to add a 2^nd named instance are as follows:

Create another DataKeeper Volume Resource on another volume that is not currently being used. You may need to add additional disks to your Azure instance if you have no available volumes. As part of this volume creation process the new DataKeeper Volume resource will be registered in Available Storage in the cluster. Refer to the article referenced earlier for the details.
Install a named instance of SQL Server on the first node, specifying the DataKeeper Volume that we just created as the storage location.
“Add a node” to the cluster on the second node.
Lock down the port number of this new named instance to a port that is not in use. In my example I use port 1440.

Adjust ILB To Second Instance

Next we have to adjust the ILB to redirect traffic to this second instance. Here are the steps you need to follow:

Add a frontend IP address that is identical to the SQL cluster IP address you used for the second instance of SQL Server as shown below.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Next, we will need to add another probe since the instances could be running on different servers. As shown below, I added a probe that probes port 59998 (instead of the usual 59999). We will need to make sure the new rules reference this probe. We will also need to remember that port number since we will need to update IP address associated with this instance during the last step of this process.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Now we need to add two new rules to the ILB to direct traffic destined for this 2^ndinstance of SQL. Of course we need to add a rule to redirect TCP port 1440 (the port I used for the named instance of SQL), but because we are now using named instances we will also need to have a port to support the SQL Server Browser Service, UDP Port 1434.

In the picture below depicting the rule for the SQL Server Browser Service, take note that the Front End IP Address is referencing the new FrontendIP address (10.0.0.201), UDP port 1434 for both the Port and Backend Port. In the pool you will need to specify the two servers in the cluster, and finally make sure you choose the new Health Probe we just created.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

We will now add a rule for TCP/1440. As show in the picture below, add a new rule for port TCP 1440, or whatever port locked down for the named instance of SQL Server. Again, be sure to choose the new FrontEnd IP Address and the new Health Probe (59998). Also, make sure the Floating IP (direct server return) is enabled.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

The Last Step

Now that the load balancer is configured, the final step is to run the PowerShell script to update the new Cluster IP address associated with this 2^nd instance of SQL Server. This PowerShell script only needs to be run on one of the cluster nodes.

# Define variables

$ClusterNetworkName = “”

# the cluster network name 
(Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)

$IPResourceName = “”

# the IP Address resource name of the second instance of SQL Server

$ILBIP = “”

# the IP Address of the second instance of SQL, 
which should be the same as the new Frontend IP address as well

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:

Get-ClusterResource $IPResourceName | 
Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59998;
SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}

# If you are using Windows Server 2008 R2 use this:

#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59998  
subnetmask=255.255.255.255

You now have a fully functional multi-instance SQL Server FCI in Azure. Let me know if you have any questions to build a Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Reproduced from Clusteringformeremortals.com

A Guide To Configure A SQL Server Failover Cluster Instance in Azure

March 31, 2019 by Jason Aw Leave a Comment

Step-By-Step: How To Configure A SQL Server 2008 R2 Failover Cluster Instance in Azure

If you need a guide Configure A SQL Server Failover Cluster Instance in Azure, you probably are still using SQL Server 2008/2008 R2. And, want to take advantage of the extended security updates that Microsoft is offering if you move your SQL Server 2008/2008 R2 into Azure. I previously wrote about this topic in this blog post.

You may be wondering how to make sure your SQL Server Failover Cluster instance remains highly available once you make the move to Azure. Today, most people have business critical SQL Server 2008/2008 R2 configured as a clustered instance (SQL Server FCI) in their data center. When looking at Azure you have probably come to the realization that due to the lack of shared storage it might seem that you can’t bring your SQL Server FCI to the Azure cloud. However, that is not the case thanks to SIOS DataKeeper.

SIOS DataKeeper enables you to build a SQL Server Failover Cluster instance in Azure, AWS, Google Cloud, or anywhere else where shared storage is not available or where you wish to configure multi-site clusters where shared storage doesn’t make sense. DataKeeper has been enabling SANless clusters for Windows and Linux since 1999. Microsoft documents the use of SIOS DataKeeper for SQL Server Failover Cluster instance in their documentation: High availability and disaster recovery for SQL Server in Azure Virtual Machines.

I’ve written about SQL Server FCI’s running in Azure before, But I never published a Step-by-Step Guide specific to SQL Server 2008/2008 R2. The good news is that it works just as great with SQL 2008/2008 R2 as it does with SQL 2012/2014/2016/2017 and the soon to be released 2019. Also, regardless of the version of Windows Server (2008/2012/2016/2019) or SQL Server (2008/2012/2014/2016/2017) the configuration process is similar enough that this guide should be sufficient enough to get you through any configurations.

If your flavor of SQL or Windows is not covered in any of my guides, don’t be afraid to jump in and build a SQL Server FCI and reference this guide, I think you will figure out any differences and if you ever get stuck just reach out to me on Twitter @daveberm and I’ll be glad to give you a hand.

This guide uses SQL Server 2008 R2 with Windows Server 2012 R2. As of the time of this writing I did not see an Azure Marketplace image of SQL 2008 R2 on Windows Server 2012 R2, so I had to download and install SQL 2008 R2 manually. Personally I prefer this combination, but if you need to use Windows Server 2008 R2 or Windows 212 that is fine. If you use Windows Server 2008 R2 don’t forget to install the kb3125574Convenience Rollup Update for Windows Server 2008 R2 SP1. Or if you are stuck with Server 2012 (not R2) you need the Hotfix in kb2854082.

Don’t be fooled by this article that says you must install kb2854082 on your SQL Server 2008 R2 instances. If you start searching for that update for Windows Server 2008 R2 you will find that only the version for Server 2012 is available. That particular hotfix for Server 2008 R2 is instead included in the rollup Convenience Rollup Update for Windows Server 2008 R2 SP1.

PROVISION AZURE INSTANCES

I’m not going to go into great detail here with a bunch of screenshots, especially since the Azure Portal UI tends to change pretty frequently, so any screenshots I take will get stale pretty quickly. Instead, I will just cover the important topics that you should be aware of.

FAULT DOMAINS OR AVAILABILITY ZONES?

In order to ensure your SQL Server instances are highly available, you have to make sure your cluster nodes reside in different Fault Domains (FD) or in different Availability Zones (AZ). Not only do your instances need to reside in different FDs or AZs, but your File Share Witness (see below) also needs to reside in a FD or AZ that is different than that one your cluster nodes reside in.

Here is my take on it. AZs are the newest Azure feature, but they are only supported in a handful of regions so far. AZs give you a higher SLA (99.99%) then FDs (99.95%), and protect you against the kind of cloud outages I describe in my post Azure Outage Post-Mortem. If you can deploy in a region that supports AZs then I recommend you use AZs.

In this guide I used AZs which you will see when you get to the section on configuring the load balancer. However, if you use FDs everything will be exactly the same, except the load balancer configuration will reference Availability Sets rather than Availability Zones.

WHAT IS A FILE SHARE WITNESS YOU ASK?

Without going into great detail, Windows Server Failover Clustering (WSFC) requires you configure a “Witness” to ensure failover behaves properly. Windows Server Failover Clustering supports three kinds of witnesses: Disk, File Share, Cloud. Since we are in Azure a Disk Witness is not possible. Cloud Witness is only available with Windows Server 2016 and later, so that leaves us with a File Share Witness. If you want to learn more about cluster quorums check out my post on the Microsoft Press Blog, From the MVPs: Understanding the Windows Server Failover Cluster Quorum in Windows Server 2012 R2

ADD STORAGE TO YOUR SQL SERVER INSTANCES

As you provision your SQL Server instances you will want to add additional disks to each instance. Minimally you will need one disk for the SQL Data and Log file, one disk for Tempdb. Whether or not you should have a separate disk for log and data files is somewhat debated when running in the cloud. On the back end the storage all comes from the same place and your instance size limits your total IOPS. In my opinion there really isn’t any value in separating your log and data files since you cannot ensure that they are running on two physical sets of disks. I’ll leave that for you to decide, but I put log and data all on the same volume.

Normally a SQL Server 2008 R2 FCI would require you to put tempdb on a clustered disk. However, SIOS DataKeeper has this really nifty feature called a DataKeeper Non-Mirrored Volume Resource. This guide does not cover moving tempdb to this non-mirrored volume resource, but for optimal performance you should do this. There really is no good reason to replicate tempdb since it is recreated upon failover anyway.

As far as the storage is concerned you can use any storage type, but certainly use Managed Disks whenever possible. Make sure each node in the cluster has the identical storage configuration. Once you launch the instances you will want to attach these disks and format them NTFS. Make sure each instance uses the same drive letters.

NETWORKING

It’s not a hard requirement, but if at all possible use an instance size that supports accelerated networking. Also, make sure you edit the network interface in the Azure portal so that your instances use a static IP address. For clustering to work properly you want to make sure you update the settings for the DNS server so that it points to your Windows AD/DNS server and not just some public DNS server.

SECURITY

By default, the communications between nodes in the same virtual network are wide open, but if you have locked down your Azure Security Group you will need to know what ports must be open between the cluster nodes and adjust your security group. In my experience, almost all the issues you will encounter when building a cluster in Azure are either caused by blocked ports.

DataKeeper has some some ports that are required to be open between the clustered instance. Those ports are as follows:
UDP: 137, 138
TCP: 139, 445, 9999, plus ports in the 10000 to 10025 range

Failover cluster has its own set of port requirements that I won’t even attempt to document here. This article seems to have that covered. http://dsfnet.blogspot.com/2013/04/windows-server-clustering-sql-server.html

In addition, the Load Balancer described later will use a probe port that must allow inbound traffic on each node. The port that is commonly used and described in this guide is 59999.

And finally if you want your clients to be able to reach your SQL Server instance you want to make sure your SQL Server port is open, which by default is 1433.

Remember, these ports can be blocked by the Windows Firewall or Azure Security Groups, so to be sure to check both to ensure they are accessible.

JOIN THE DOMAIN

A requirement for SQL Server 2008 R2 FCI is that the instances must reside in the same Windows Server Domain. So if you have not done so, make sure you have joined the instances to your Windows domain

LOCAL SERVICE ACCOUNT

When you install DataKeeper, it will ask you to provide a service account. You must create a domain user account and then add that user account to the Local Administrators Group on each node. When asked during the DataKeeper installation, specify that account as the DataKeeper service account. Note – Don’t install DataKeeper just yet!

DOMAIN GLOBAL SECURITY GROUPS

You will be asked to specify two Global Domain Security Groups as you install SQL 2008 R2. You might want to look ahead at the SQL install instructions and create those groups now. Also, create a domain user account and place them in each of these security accounts. You will specify this account as part of the SQL Server Cluster installation.

OTHER PRE-REQUISITES

You must enable both Failover Clustering and .Net 3.5 on each instance of the two cluster instances. When you enable Failover Clustering, also be sure to enable the optional “Failover Cluster Automation Server”. This is required for a SQL Server 2008 R2 cluster in Windows Server 2012 R2.

CREATE THE CLUSTER AND DATAKEEPER VOLUME RESOURCES

We are now ready to start building the cluster. The first step is to create the base cluster. Because of the way Azure handles DHCP, we MUST create the cluster using Powershell and not the Cluster UI. We use Powershell because it will let us specify a static IP address as part of the creation process. If we used the UI, it would see that the VMs use DHCP and it will automatically assign a duplicate IP address. Therefore to avoid that situation, let’s use the Powershell as shown below.

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.100 -NoStorage

After the cluster creates, run Test-Cluster. This is required before SQL Server will install.

Test-Cluster

You will get warnings about Storage and Networking. Thankfully, you can ignore those as they are expected in a SANless cluster in Azure. However, address any other warnings or errors before moving on.

After the cluster is created, you will need to add the File Share Witness. On the third server we specified as the file share witness, create a file share and give Read/Write permissions to the cluster computer object we just created above. In this case $Cluster1 will be the name of the computer object that needs Read/Write permissions at both the share and NTFS security level.

Once the share is created, you can use the Configure Cluster Quorum Wizard as shown below to configure the File Share Witness.

INSTALL DATAKEEPER

It is important to wait until the basic cluster is created before we install DataKeeper, since the DataKeeper installation registers the DataKeeper Volume Resource type in failover clustering. If you jumped the gun and installed DataKeeper already that is okay. Simply run the setup again and choose Repair Installation.

The screenshots below walk you through a basic installation. Start by running the DataKeeper Setup.

The account you specify below must be a domain account. It must be part of the Local Administrators group on each of the cluster nodes.

When presented with the SIOS License Key manager you can browse out to your temporary key. Or if you have a permanent key, you can copy the System Host ID and use that to request your permanent license. If you ever need to refresh a key, the SIOS License Key Manager is a program that will be installed that you can run separately to add a new key.

CREATE DATAKEEPER VOLUME RESOURCE

Once DataKeeper is installed on each node you are ready to create your first DataKeeper Volume Resource. The first step is to open the DataKeeper UI and connect to each of the cluster nodes.

If everything is done correctly the Server Overview Report should look something like this.

You can now create your first Job as shown below.

After you choose a Source and Target you are presented with the following options. For a local target in the same region, the only thing you need to select is Synchronous.

Choose Yes and auto-register this volume as a cluster resource.

Once you complete this process open up the Failover Cluster Manager and look in Disk. You should see the DataKeeper Volume resource in Available Storage. At this point WSFC treats this as if it were a normal cluster disk resource.

SLIPSTREAM SP3 ONTO SQL 2008 R2 INSTALL MEDIA

SQL Server 2008 R2 is only supported on Windows Server 2012 R2 with SQL Server SP2 or later. Unfortunately, Microsoft never released a SQL Server 2008 R2 installation media that that includes SP2 or SP3. Instead, you must slipstream the service pack onto the installation media BEFORE you do the installation. If you try to do the installation with the standard SQL Server 2008 R2 media, you will run into all kinds of problems. I don’t remember the exact errors you will see. But I do recall they didn’t really point to the exact problem. You will waste a lot of time trying to figure out what went wrong.

As of the date of this writing, Microsoft does not have a Windows Server 2012 R2 with SQL Server 2008 R2 offering in the Azure Marketplace. Do bring your own SQL license if you want to run SQL 2008 R2 on Windows Server 2012 R2 in Azure. If they add that image later, or if you choose to use the SQL 2008 R2 on Windows Server 2008 R2 image, you must first uninstall the existing standalone instance of SQL Server before moving forward.

I followed the guidance in Option 1 of this article to slipstream SP3 on onto my SQL 2008 R2 installation media. You will of course have to adjust a few things as this article references SP2 instead of SP3. Make sure you slipstream SP3 on the installation media we will use for both nodes of the cluster. Once that is done, continue to the next step.

INSTALL SQL SERVER ON THE FIRST NODE

Using the SQL Server 2008 R2 media with SP3 slipstreamed, run setup and install the first node of the cluster as shown below.

If you use anything other than the Default instance of SQL Server, you will have some additional steps not covered in this guide. The biggest difference is you must lock down the port that SQL Server uses since by default a named instance of SQL Server does NOT use 1433. Once you lock down the port you also need to specify that port instead of 1433 whenever we reference port 1433 in this guide, including the firewall setting and the Load Balancer settings.

Here make sure to specify a new IP address that is not in use. This is the same IP address we will use later when we configure the Internal Load Balancer later.

As I mentioned earlier, SQL Server 2008 R2 utilizes AD Security Groups. If you have not already created them, go ahead and create them now as show below before you continue to the next step in the SQL install

Specify the Security Groups you created earlier.

Make sure the service accounts you specify are a member of the associated Security Group.

Specify your SQL Server administrators here.

If everything goes well you are now ready to install SQL Server on the second node of the cluster.

INSTALL SQL SERVER ON THE SECOND NODE

One the second node, run the SQL Server 2008 R2 with SP3 install and select Add Node to a SQL Server Failover Clustering Instance.

Proceed with the installation as shown in the following screenshots.

Assuming everything went well, you should now have a two node SQL Server 2008 R2 cluster configured that looks something like the following.

However, you probably will notice that you can only connect to the SQL Server instance from the active cluster node. The problem is that Azure does not support gratuitous ARP .Your clients probably cannot connect directly to the Cluster IP Address. Instead, the clients must connect to an Azure Load Balancer, which will redirect the connection to the active node. To make this work there are two steps: Create the Load Balancer and Fix the SQL Server Cluster IP to respond to the Load Balancer Probe and use a 255.255.255.255 Subnet mask. Those steps are described below.

CREATE THE AZURE LOAD BALANCER

I’m going to assume your clients can communicate directly to the internal IP address of the SQL cluster. Let’s go ahead to create an Internal Load Balancer (ILB) in this guide. If you need to expose your SQL Instance on the public internet, use a Public Load Balancer instead.

In the Azure portal, create a new Load Balancer following the screenshots as shown below. The Azure portal UI changes rapidly. Bbut these screenshots should give you enough information to do what you need to do. I will call out important settings as we go along.

Here we create the ILB. The important thing to note on this screen is you must select “Static IP address assignment”. Specify the same IP address that we used during the SQL Cluster installation too.

Since I used Availability Zones, I see Zone Redundant as an option. If you used Availability Sets your experience will be slightly different.

In the Backend pool be sure to select the two SQL Server instances. You DO NOT want to add your File Share Witness in the pool.

Here we configure the Health Probe. Most Azure documentation uses port 59999, so we will stick with that port for our configuration.

Then we will add a load balancing rule. In our case we want to redirect all SQL Server traffic to TCP port 1433 of the active node. It is also important that you select Floating IP (Direct Server Return) as Enabled.

RUN POWERSHELL SCRIPT TO UPDATE SQL CLIENT ACCESS POINT

Now we must run a Powershell script on one of the cluster nodes to allow the Load Balancer Probe to detect which node is active. The script also sets the Subnet Mask of the SQL Cluster IP Address to 255.255.255.255.255 so that it avoids IP address conflicts with the Load Balancer we just created.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of 
higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB) and SQL Cluster
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter 
-Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";
Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  
subnetmask=255.255.255.255

This is what the output will look like if run correctly.

You probably notice that the end of that script has a commented line of code to use if you are running on Windows Server 2008 R2. Running Windows Server 2008 R2? Ensure you run the code specific for Windows Server 2008 R2 at a Command prompt, it is not Powershell.

NEXT STEPS

You’re not the first if you get to this point and you still cannot connect to the cluster remotely. There are a lot of things that can go wrong in terms of security, load balancer, SQL ports, etc. I wrote this guide to help troubleshoot connection issues.

In fact, I ran into some strange issues in terms of my SQL Server TCP/IP Properties in SQL Server Configuration Manager. When I looked at the properties I did not see the SQL Server Cluster IP address as one of the addresses it was listening on. As such I had to add it manually. I’m not sure if that was an anomaly. Although it certainly was an issue I had to resolve before I could connect to the cluster from a remote client.

As I mentioned earlier, one other improvement you can make to this installation is to use a DataKeeper Non-Mirrored Volume Resource for TempDB. If you set that up please be aware of the following two configuration issues people commonly run into.

The first issue is if you move tempdb to a folder on the 1st node, you must be sure to create the exact same folder structure on the second node. If you don’t do that, when you try to failover SQL Server will fail to come online since it can’t create TempDB.

The second issue occurs anytime you add another DataKeeper Volume Resource to a SQL Cluster after the cluster is created. You must go into the properties of the SQL Server cluster resource and make it dependent on the new DataKeeper Volume resource you added. This is true for the TempDB volume and any other volumes you may decide to add after the cluster is created.

If you have any questions about this configuration or any other cluster configurations please feel free to reach out to me on Twitter @DaveBerm

Reproduced with permission from Clusteringformeremortals.com