#SANLess Clusters for SQL Server Environments Archives

Deployment of a SQL Server Failover Cluster Instance on Huawei Cloud

September 28, 2021 by Jason Aw Leave a Comment

Deployment of a SQL Server Failover Cluster Instance on Huawei Cloud

*DISCLAIMER: While the following completely covers the high availability portion within the scope of our product, this is a setup “guide” only and should be adapted to your own configuration.

Overview

HUAWEI CLOUD is a leading cloud service provider not just in China but also has global footprint with many datacenters around the world. They bring Huawei’s 30-plus years of expertise together in ICT infrastructure products and solutions and are committed to providing reliable, secure, and cost-effective cloud services to empower applications, harness the power of data, and help organizations of all sizes grow in today’s intelligent world. HUAWEI CLOUD is also committed to bringing affordable, effective, and reliable cloud and AI services through technological innovation.

DataKeeper Cluster Edition provides replication in a virtual private cloud (VPC) within a single region across availability zones for the Huawei cloud. In this particular SQL Server clustering example, we will launch four instances (one domain controller instance, two SQL Server instances and a quorum/witness instance) into three availability zones.

Huawei Cloud SIOS Datakeeper HA Architecture

DataKeeper Cluster Edition provides support for a data replication node outside of the cluster with all nodes in Huawei cloud. In this particular SQL Server clustering example, four instances are launched (one domain controller instance, two SQL Server instances and a quorum/witness instance) into three availability zones. Then an additional DataKeeper instance is launched in a second region including a VPN instance in both regions. Please see Configuration of Data Replication From a Cluster Node to External DR Site for more information. For additional information on using multiple regions please see Connecting Two VPCs in Different Regions.

Huawei Cloud SIOS Datakeeper DR architecture

DataKeeper Cluster Edition also provides support for a data replication node outside of the cluster with only the node outside of the cluster in Huawei Cloud. In this particular SQL Server clustering example, WSFC1 and WSFC2 are in an on-site cluster replicating to a Huawei Cloud instance. Then an additional DataKeeper instance is launched in a region in Huawei Cloud. Please see Configuration of Data Replication From a Cluster Node to External DR Site for more information.

Huawei Cloud SIOS Datakeeper Hybrid DR Architecture

Requirements

Description	Requirement
Virtual Private Cloud	In a single region with three availability zones
Instance Type	Minimum recommended instance type: s3.large.2
Operating System	See the DKCE Support Matrix
Elastic IP	One elastic IP address connected to the domain controller
Four instances	One domain controller instance, two SQL Server instances and one quorum/witness instance
Each SQL Server	ENI (Elastic Network Interface) with 4 IPs · Primary ENI IP statically defined in Windows and used by DataKeeper Cluster Edition · Three IPs maintained by ECS while used by Windows Failover Clustering , DTC and SQLFC
Volumes	Three volumes (EBS and NTFS only) · One primary volume (C drive) · Two additional volumes o One for Failover Clustering o One for MSDTC

Release Notes

Before beginning, make sure you read the DataKeeper Cluster Edition Release Notes for the latest information. It is highly recommended that you read and understand the DataKeeper Cluster Edition Installation Guide.

Create a Virtual Private Cloud (VPC)

A virtual private cloud is the first object you create when using DataKeeper Cluster Edition.

*A virtual Private Cloud (VPC) is an isolated private cloud consisting of a configurable pool of shared computing resources in a public cloud.

Using the email address and password specified when signing up for Huawei Cloud, sign in to the Huawei Cloud Management Console.
From the Services dropdown, select Virtual Private Cloud.

On the right side of the screen, click on Create VPC and select the region that you want to use.
Input the name that you want to use for the VPC
Define your virtual private cloud subnet by entering your CIDR (Classless Inter-Domain Routing) as described below
Input the subnet name, then click Create Now.

*A Route Table will automatically be created with a “main” association to the new VPC. You can use it later or create another Route Table.

*HELPFUL LINK:
Huawei’s Creating a Virtual Private Cloud (VPC)

Launch an Instance

The following walks you through launching an instance into your subnet. You will want to launch two instances into one availability zone, one for your domain controller instance and one for your SQL instance. Then you will launch another SQL instance into another availability zone and a quorum witness instance into yet another availability zone.

*HELPFUL LINKS:
Huawei Cloud ECS Instances

Using the email address and password specified when signing up for Huawei Cloud, sign in to the Huawei Cloud Management Console.
From the Service List dropdown, select Elastic Cloud Server.

Select Buy ECS button and choose the Billing Mode, Region and AZ (Availability Zone) to deploy the Instance
Select your Instance Type. (Note:Select s3.large.2 or larger.).
Choose an Image. Under Public Image, select the Windows Server 2019 Datacenter 64bit English image
1. For Configure Network, select your VPC.
2. For Subnet, select an Subnet that you want to use, select Manually-specified IP address and input the IP address that you want to use
3. Select the Security Group to use or Edit and select an existing one.
4. Assign an EIPif you need the ECS instance to access the internet
5. Click Configure Advanced Settings and provide a name for the ECS, use Password for Login Mode and provide the secure password for Administrator login
6. Click Configure Now on Advanced Options Add a Tag to name your instance and Click on Confirm
Perform final review of the Instance and click on Submit.

*IMPORTANT: Make a note of this initial administrator password. It will be needed to log on to your instance.

Repeat the above steps for all instances.

Connect to Instances

You can connect to your domain controller instance via Remote Login from the ECS pane.

*BEST PRACTICE: Once logged on, it is best practice to change your password.

Configure the Domain Controller Instance

Now that the instances have been created, we started with setting up the Domain Service instance.

This guide is not a tutorial on how to set up an Active Domain server instance. We recommend reading articles on how to set up and configure an Active Directory server. It is very important to understand that even though the instance is running in a Huawei cloud, this is a regular installation of Active Directory.

Static IP Addresses

Configure Static IP Addresses for your Instances

Connect to your domain controller instance.
Click Start/ Control Panel.
Click Network and Sharing Center.
Select your network interface.
Click Properties.
Click Internet Protocol Version 4 (TCP/IPv4), then Properties.
Obtain your current IPv4 address, default gateway and DNS server for the network interface from Amazon.
In the Internet Protocol Version 4 (TCP/IPv4) Properties dialog box, under Use the following IP address, enter your IPv4 address.
In the Subnet mask box, type the subnet mask associated with your virtual private cloud subnet.
In the Default Gateway box, type the IP address of the default gateway and then click OK.
For the Preferred DNS Server, enter the Primary IP Address of Your Domain Controller(ex. 15.0.1.72).
Click Okay, then select Close. Exit Network and Sharing Center.
Repeat the above steps on your other instances.

Join the Two SQL Instances and the Witness Instance to Domain

*Before attempting to join a domain make these network adjustments. On your network adapter, Add/Change the Preferred DNS server to the new Domain Controller address and its DNS server. Use ipconfig /flushdns to refresh the DNS search list after this change. Do this before attempting to join the Domain.

*Ensure that Core Networking and File and Printer Sharing options are permitted in Windows Firewall.

On each instance, click Start, then right-click Computer and select Properties.
On the far right, select Change Settings.
Click on Change.
Enter a new Computer Name.
Select Domain.
Enter Domain Name– (ex. docs.huawei.com).
Click Apply.

*Use Control Panel to make sure all instances are using the correct time zone for your location.

*BEST PRACTICE: It is recommend that the System Page File is set to system managed (not automatic) and to always use the C: drive.

Control Panel > Advanced system settings > Performance > Settings > Advanced > Virtual Memory. Select System managed size, Volume C: only, then select Set to save.

Assign Secondary Private IPs to the Two SQL Instances

In addition to the Primary IP, you will need to add three additional IPs (Secondary IPs) to the elastic network interface for each SQL instance.

From the Service List dropdown, select Elastic Cloud Server.
Click the instance for which you want to add secondary private IP addresses.
Select NICs > Manage Virtual IP Address.
Click on Assign Virtual IP address and select Manual enter an IP address that is within the subnet range for the instance (ex. For 15.0.1.25, enter 15.0.1.26). Click Ok.
Click on the More dropdown on the IP address row, and select Bind to Server, select the server to bind the IP address to, and the NIC card.
Click OK to save your work.
Perform the above on both SQL Instances.

*HELPFUL LINKS:
Managing Virtual IP Addresses
Binding a Virtual IP Address to an EIP or ECS

Create and Attach Volumes

DataKeeper is a block-level volume replication solution and requires that each node in the cluster have additional volume(s) (other than the system drive) that are the same size and same drive letters. Please review Volume Considerations for additional information regarding storage requirements.

Create Volumes

Create two volumes in each availability zone for each SQL server instance, a total of four volumes.

From the Service List dropdown, select Elastic Cloud Server.
Click the instance for which you want to manage
Go to the Disks tab
Click Add Disk to add a new volume of your choice and size, make sure you select the volume in the same AZ as the SQL server that you intend to attach it to
Select the check box to agree to the SLA and Submit
Click Back to Server Console
Attach the disk if necessary to the SQL instance
Do this for all four volumes.

*HELPFUL LINKS:
Elastic Volume Service

Configure the Cluster

Prior to installing DataKeeper Cluster Edition, it is important to have Windows Server configured as a cluster using either a node majority quorum (if there is an odd number of nodes) or a node and file share majority quorum (if there is an even number of nodes). Consult the Microsoft documentation on clustering in addition to this topic for step-by-step instructions. Note: Microsoft released a hotfix for Windows 2008R2 that allows disabling of a node’s vote which may help achieve a higher level of availability in certain multi-site cluster configurations.

Add Failover Clustering

Add the Failover Clustering feature to both SQL instances.

Launch Server Manager.
Select Features in the left pane and click Add Features in the Features This starts the Add Features Wizard.
Select Failover Clustering.
Select Install.

Validate a Configuration

Open Failover Cluster Manager.
Select Failover Cluster Manager, select Validate a Configuration.
Click Next, then add your two SQL instances.

Note: To search, select Browse, then click on Advanced and Find Now. This will list available instances.

Click Next.
Select Run Only Tests I Select and click Next.
In the Test Selection screen, deselect Storage and click Next.
At the resulting confirmation screen, click Next.
Review Validation Summary Report then click Finish.

Create Cluster

In Failover Cluster Manager, click on Create a Cluster then click Next.
Enter your two SQL instances.
On the Validation Warning page, select No then click Next.
On the Access Point for Administering the Cluster page, enter a unique name for your WSFC Cluster. Then enter the Failover Clustering IP address for each node involved in the cluster. This is the first of the three secondary IP addresses added previously to each instance.
IMPORTANT!Uncheck the “Add all available storage to the cluster” checkbox. DataKeeper mirrored drives must not be managed natively by the cluster. They will be managed as DataKeeper Volumes.
Click Next on the Confirmation
On Summary page, review any warnings then select Finish.

Configure Quorum/Witness

Create a folder on your quorum/witness instance (witness).
Share the folder.
1. Right-click folder and select Share With / Specific People….
2. From the dropdown, select Everyone and click Add.
3. Under Permission Level, select Read/Write.
4. Click Share, then Done. (Make note of the path of this file share to be used below.)
In Failover Cluster Manager, right-click cluster and choose More Actions and Configure Cluster Quorum Settings. Click Next.
On the Select Quorum Configuration, choose Node and File Share Majority and click Next.
On the Configure File Share Witness screen, enter the path to the file share previously created and click Next.
On the Confirmation page, click Next.
On the Summary page, click Finish.

Install and Configure DataKeeper

After the basic cluster is configured but prior to any cluster resources being created, install and license DataKeeper Cluster Edition on all cluster nodes. See the DataKeeper Cluster Edition Installation Guide for detailed instructions.

Run DataKeeper setup to install DataKeeper Cluster Edition on both SQL instances.
Enter your license key and reboot when prompted.
Launch the DataKeeper GUI and connect to server.

*Note: The domain or server account used must be added to the Local System Administrators Group. The account must have administrator privileges on each server that DataKeeper is installed on. Refer to DataKeeper Service Log On ID and Password Selection for additional information.

Right click on Jobs and connect to both SQL servers.
Create a Job for each mirror you will create. One for your DTC resource, and one for your SQL resource..
When asked if you would like to auto-register the volume as a cluster volume, select Yes.

*Note: If installing DataKeeper Cluster Edition on Windows “Core” (GUI-less Windows), make sure to read Installing and Using DataKeeper on Windows 2008R2/2012 Server Core Platforms for detailed instructions.

Configure MSDTC

For Windows Server 2012 and 2016, in the Failover Cluster Manager GUI, select Roles, then select Configure Role.
Select Distributed Transaction Coordinator (DTC), and click Next.

*For Windows Server 2008, in the Failover Cluster Manager GUI, select Services and Applications, then select Configure a Service or Application and click Next.

On the Client Access Point screen, enter a name, then enter the MSDTC IP address for each node involved in the cluster. This is the second of the three secondary IP addresses added previously to each instance. Click Next.
Select the MSDTC volume and click Next.
On the Confirmation page, click Next.
Once the Summary page displays, click Finish.

Install SQL on the First SQL Instance

On the domain controller server create a folder and share it..
1. For example “TEMPSHARE” with Everyone permission.
Create a sub folder “SQL” and copy the SQL .iso installer into that sub folder.
On the SQL server, create a network drive and attach it to the shared folder on the domain controller.
- . For example “net use S: \\\TEMPSHARE
On the SQL server the S: drive will appear. CD to the SQL folder and find the SQL .iso installer. Right click on the .iso file and select Mount. The setup.exe installer will appear with the SQL .iso installer.

F:\>Setup /SkipRules=Cluster_VerifyForErrors /Action=InstallFailoverCluster

On Setup Support Rules, click OK.
On the Product Key dialog, enter your product key and click Next.
On the License Terms dialog, accept the license agreement and click Next.
On the Product Updates dialog, click Next.
On the Setup Support Files dialog, click Install.
On the Setup Support Rules dialog, you will receive a warning. Click Next, ignoring this message, since it is expected in a multi-site or non-shared storage cluster.
Verify Cluster Node Configuration and click Next.
Configure your Cluster Network by adding the “third” secondary IP address for your SQL instance and click Next. Click Yes to proceed with multi-subnet configuration.
Enter passwords for service accounts and click Next.
On the Error Reporting dialog, click Next.
On the Add Node Rules dialog, skipped operation warnings can be ignored. Click Next.
Verify features and click Install.
Click Close to complete the installation process.

Install SQL on the Second SQL Instance

Installing the second SQL instance is similar to the first one.

On the SQL server, create a network drive and attach it to the shared folder on the domain controller as explained above for the first SQL server.
Once the .iso installer is mounted, run SQL setup once again from the command line in order to skip the Validate Open a Command window, browse to your SQL install directory and type the following command:

Setup /SkipRules=Cluster_VerifyForErrors /Action=AddNode /INSTANCENAME=”MSSQLSERVER”

(Note: This assumes you installed the default instance on the first node)

On Setup Support Rules, click OK.
On the Product Key dialog, enter your product key and click Next.
On the License Terms dialog, accept the license agreement and click Next.
On the Product Updates dialog, click Next.
On the Setup Support Files dialog, click Install.
On the Setup Support Rules dialog, you will receive a warning. Click Next, ignoring this message, since it is expected in a multi-site or non-shared storage cluster.
Verify Cluster Node Configuration and click Next.
Configure your Cluster Network by adding the “third” secondary IP address for your SQL Instance and click Next. Click Yes to proceed with multi-subnet configuration.
Enter passwords for service accounts and click Next.
On the Error Reporting dialog, click Next.
On the Add Node Rules dialog, skipped operation warnings can be ignored. Click Next.
Verify features and click Install.
Click Close to complete the installation process.

Common Cluster Configuration

This section describes a common 2-node replicated cluster configuration.

The initial configuration must be done from the DataKeeper UI running on one of the cluster nodes. If it is not possible to run the DataKeeper UI on a cluster node, such as when running DataKeeper on a Windows Core only server, install the DataKeeper UI on any computer running Windows XP or higher and follow the instruction in the Core Only section for creating a mirror and registering the cluster resources via the command line.
Once the DataKeeper UI is running, connect to each of the nodes in the cluster.
Create a Job using the DataKeeper UI. This process creates a mirror and adds the DataKeeper Volume resource to the Available Storage.

!IMPORTANT: Make sure that Virtual Network Names for NIC connections are identical on all cluster nodes.

If additional mirrors are required, you can Add a Mirror to a Job.
With the DataKeeper Volume(s)now in Available Storage, you are able to create cluster resources (SQL, File Server, etc.) in the same way as if there were a shared disk resource in the cluster. Refer to Microsoft documentation for additional information in addition to the above for step-by-step cluster configuration instructions.

Connectivity to the cluster (virtual) IPs

In addition to the Primary IP and secondary IP, you will also need to configure the virtual IP addresses in the Huawei Cloud so that they can be routed to the active node.

From the Service List dropdown, select Elastic Cloud Server.
Click on one of the SQL instance for which you want to add cluster virtual IP address (one for MSDTC, one for SQL Failover Cluster)
Select NICs > Manage Virtual IP Address.
Click on Assign Virtual IP address and select Manual enter an IP address that is within the subnet range for the instance (ex. For 15.0.1.25, enter 15.0.1.26). Click Ok.
Click on the More dropdown on the IP address row, and select Bind to Server, select both the server to bind the IP address to, and the NIC card.
Use the same steps 4. and 5 for the MSDTC and SQLFC virtual IPs
Click OKto save your work.

Management

Once a DataKeeper volume is registered with Windows Server Failover Clustering, all of the management of that volume will be done through the Windows Server Failover Clustering interface. All of the management functions normally available in DataKeeper will be disabled on any volume that is under cluster control. Instead, the DataKeeper Volume cluster resource will control the mirror direction, so when a DataKeeper Volume comes online on a node, that node becomes the source of the mirror. The properties of the DataKeeper Volume cluster resource also display basic mirroring information such as the source, target, type and state of the mirror.

Troubleshooting

Use the following resources to help troubleshoot issues:

Troubleshooting issues section
For customers with a support contract – http://us.sios.com/support/overview/
For evaluation customers only – Pre-sales support

Additional Resources:

Step-by-Step: Configuring a 2-Node Multi-Site Cluster on Windows Server 2008 R2 – Part 1 — http://clusteringformeremortals.com/2009/09/15/step-by-step-configuring-a-2-node-multi-site-cluster-on-windows-server-2008-r2-%E2%80%93-part-1/

Step-by-Step: Configuring a 2-Node Multi-Site Cluster on Windows Server 2008 R2 – Part 3 — http://clusteringformeremortals.com/2009/10/07/step-by-step-configuring-a-2-node-multi-site-cluster-on-windows-server-2008-r2-%E2%80%93-part-3/

How To Clone Availability In The Cloud With Better Outcomes

December 30, 2020 by Jason Aw Leave a Comment

How To Clone Availability In The Cloud With Better Outcomes

Tips from the movies – Multiplicity

Multiplicity is a 1996 American science fiction comedy film starring Michael Keaton as Doug Kinney, a busy construction worker struggling to make time for his family and his demanding job. When a scientist offers to clone him, Doug agrees to just make meeting his schedule and commitments easier. But then the copies of him begin making copies of themselves. By the time the last copy is made, the point is clear. Cloning may not be all it’s cracked up to be, or at the very least comes with some strong warnings, challenges and side effects. The famous original Star Trek episode “Trouble with Tribbles” illustrates a similar point.

Like cloning on the big screen (or small), cloning in the cloud is a great tool, but not without its challenges.

Tips for how to get better outcomes when you clone availability in the cloud

1. Clone operational systems

This sounds obvious, but I have seen it happen more than once in real enterprise environments. If you clone your non-functional system, the clone will be equally non-functional and problematic when you restore it. Be sure that the clone you make was from an operational and functional system.

2. Sync data to disk and resync on restore

File system integrity is critical. If you don’t ensure your application and/or VM are in a consistent state, most vendors will not guarantee the resulting created image. Since snapshots only capture data that has been written to your volume at the time the snapshot command is issued, this might exclude any data that has been cached by any applications or the operating system. Making sure data has been properly synced to the file system is an important step, and absolutely critical in a cluster environment.

File system integrity is also critical to keep in mind when you restore from an image. If you are using data replication and you restore an image as source or target in the cluster, making sure the two nodes are in sync is paramount. Failing to do so may lead to file system errors on failover or switchover, or even potential data loss. Clone availability in the cloud to get the result you want.

3. Stop your instance

Many environments do not require you to stop an instance to create an image, and some, such as AWS will do the step of powering down the node before making the copy. However, many tools and sites recommend making sure applications are stopped and file system access is properly synced to avoid damage, loss of integrity, or creating images that have trouble starting, stopping, or running installed applications.

4. Label everything in the cloud (nodes, disks, NICs, everything)

While creating a clone is a free operation, the resulting disks and components typically are not. AWS states, for example, that you are “charged for the snapshots until you deregister the image and delete the snapshots.” When things aren’t labeled, knowing what is in use or not in use and why it was created can become problematic. It also becomes subjected to the fleeting memories or poor concentration of existing team members. Label everything.

5. Prune clones and snapshots often (cost savings and headache savings)

Pruning old snapshots and clones is not only good for the cost savings, but it is also good for reducing headaches. Older snapshots run the risk of reintroducing vulnerabilities that have been addressed or resolved in newer copies. As VP of Customer Experience at SIOS Technology Corp., I saw the consequences firsthand when we worked with a customer who restored from a snapshot. They ran into several problems as they restarted the application. After troubleshooting, we determined that the clone was running an older version of security software. The cached credentials and metadata stored in the user profile were no longer in sync with the actual application data stored on the externally mounted data drives.

6. Limit or restrict cloning of clones in the cloud

Lastly, not everything you do in the cloud needs to be cloned. Consider limiting the types of workloads that you will clone and restrict the number or roles who can create clones in your environment.

In the movie, when Doug’s clones sparked their own series of duplications, an already overwhelmed Doug (Michael Keaton) is forced to exert extra energy to manage his many clones while trying to hide the mess he created from his wife. Achieving clone availability in the cloud with better outcomes is not difficult. Clone carefully to avoid making more work and adding risk from a tool that was supposed to make your work easier and your environment safer.

– Cassius Rhue, Vice President, Customer Experience

Reproduced from SIOS

SIOS SANless Clusters Provide HA Protection Needed to Deploy SAP in Microsoft Azure

September 5, 2017 by Jason Aw Leave a Comment

Case Study: Zespri – SIOS DataKeeper Cluster Edition

Protects One of the largest SAP in Azure Implementations

Azure high availability, SQL Server Clusters

WHY I SHOULD CONVERT MY #AZURE CLUSTERS TO MANAGED DISKS TODAY!

August 9, 2017 by Jason Aw Leave a Comment

You may have heard about the recent storage outage that impacted some instances in the US East region back on March 16th. A root cause analysis of the outage is posted here.

March 16th US East Storage Outage

Customer impact: A subset of customers using Storage in the East US region may have experienced errors and timeouts while accessing their storage account in a single Storage scale unit

You might be asking, “What is a single Storage scale unit”. Well, you can think of it as a single storage cluster, or single SAN, or however you want to think about it. I don’t think Azure publishes their exact infrastructure, but you can probably assume that behind the scenes they are using Scale Out File Servers for backend storage.

So the question is, how could I have survived this outage with minimal downtime? If you read further down that root cause analysis you come across this little nugget.

Virtual Machines using Managed Disks in an Availability Set would have maintained availability during this incident.

What’s Managed Disks you ask? Well, just on February 8th Corey Sanders announced the GA of Managed Disks. You can read all about Managed Disks here. https://azure.microsoft.com/en-us/services/managed-disks/

The reason why Managed Disks would have helped in this outage is that by leveraging an Availability Set combined with Managed Disks you ensure that each of the instances in your Availability Set are connected to a different “Storage scale unit”. So in this particular case, only one of your cluster nodes would have failed, leaving the remaining nodes to take over the workload.

Prior to Managed Disks being available (anything deployed before 2/8/2016), there was no way to ensure that the storage attached to your servers resided on different Storage scale units. Sure, you could use different storage accounts for each instances, but in reality that did not guarantee that those Storage Accounts provisioned storage on different Storage scale units.

So while an Availability Set ensured that your instances reside in different Fault Domains and Update Domains to ensure the availability of the instance itself, the additional storage attached to each instance really represented a single point of failure. Although the storage itself is highly resilient, with three copies of your data and geo-redundant options available, in this case with a power failure the entire Storage scale unit went down along with all the servers attached to it.

So long story short…migrate to Managed Disk as soon as possible in order to help minimize downtime

https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-migrate-to-managed-disks

And if you really want to minimize downtime you should consider Hybrid Cloud Deployments that span cloud providers or on-prem to cloud!

Reposted from original post by Dave Bermingham Microsoft Clustering MVP – https://clusteringformeremortals.com/2017/03/22/why-i-should-convert-my-azure-clusters-to-managed-disks-today/

Deploy 2-node File Server Failover Cluster in Azure using ARM

July 19, 2017 by Jason Aw Leave a Comment

Deploy 2-node File Server Failover Cluster In Azure Using ARM

In this post we will detail the specific steps required to deploy a 2-node File Server Failover Cluster in a single region of Azure using Azure Resource Manager. I will assume you are familiar with basic Azure concepts as well as basic Failover Cluster concepts and will focus this article on what is unique about deploying a File Server Failover Cluster in Azure.

With DataKeeper Cluster Edition you are able to take the locally attached storage, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronously or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-Requisites

You have used the Azure Portal before and are comfortable deploying virtual machines in Azure IaaS.
Have obtained a license or eval license of SIOS DataKeeper

Deploying A File Server Failover Cluster Instance Using The Azure Portal

To build a 2-node File Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our File Server Cluster

Provisioning The Two Cluster Nodes (SQL1 And SQL2)

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way. There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying Servers in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Set – It is important that both SQL1, SQL2 AND DC1 reside in the same availability set. By putting them in the same Availability Set we are ensuring that each cluster node and the file share witness reside in a different Fault Domain and Update Domain. This helps guarantee that during both planned maintenance and unplanned maintenance the cluster will continue to be able to maintain quorum and avoid downtime.

Figure 3 – Be sure to add both cluster nodes and the file share witness to the same Availability Set

STATIC IP ADDRESS

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

Figure 4 – Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even Storage Pools consisting of multiple disks in a storage pool. Just be sure to add the same amount of storage to each cluster node and configure it identically. Also, be sure to use a different storage account for each virtual machine to ensure that a problem with one Storage Account does not impact both virtual machines at the same time.

Figure 5 – make sure to add additional storage to each cluster node

Create The Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes. You will also need to enable the FIle Server Role.

Figure 6 – enable both .Net Framework 3.5 and Failover Clustering features and the File Server on both cluster nodes

Once that role and those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issues a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI when hosts use DHCP (which they have to), there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster will usually complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will want to change the cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.101 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command:

Test-Cluster

Figure 7 – The output of the cluster creation and the cluster validation commands

Create The File Share Witness

Because there is no shared storage, you will need to create a file share witness on another server in the same Availability Set as the two cluster nodes. By putting it in the same availability set you can be sure that you only lose one vote from your quorum at any given time. If you are unsure how to create a File Share Witness you can review this article http://www.howtonetworking.com/server/cluster12.htm. In my demo I put the file share witness on domain controller. I have published an exhaustive explanation of cluster quorums at https://blogs.msdn.microsoft.com/microsoft_press/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2/

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

Figure 8 – Install DataKeeper after the cluster is created

During the installation you can take all of the default options. The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

Figure 9 – the service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create The DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
Connect to SQL1

Connect to SQL2

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”

Give the Job a name and description.

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15 - File Server Failover Cluster

Choose your target server.

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.

You will see the new DataKeeper Volume Resource in Available Storage.

Create The File Server Cluster Resource

To create the File Server Cluster Resource, we will use Powershell once again rather than the Failover Cluster interface. This is when the virtual machines are configured to use DHCP, the GUI based wizard will not prompt us to enter a cluster IP address. Instead, it will issue a duplicate IP address. To avoid this, we will use a simple powershell command to create the FIle Server Cluster Resource and specify the IP Address

Add-ClusterFileServerRole -Storage "DataKeeper Volume E" -Name FS2 -StaticAddress 10.0.0.201

Make note of the IP address you specify here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.

Create The Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

First, create a new Load Balancer

You can use an Public Load Balancer if your client connects over the public internet. But assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the SQL Cluster Resource.

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the Availability Set where your SQL Cluster VMs reside. However, when you choose the actual VMs to add to the Backend Pool, be sure you do not choose your file share witness. We do not want to redirect SQL traffic to your file share witness.

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.

And then finally, we need a load balancing rule to redirect the SMB traffic, TCP port 445 The important thing to notice in the screenshot below is the Direct Server Return is Enabled. Make sure you make that change.

445_ilb

Fix The File Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name 
(Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter 
-Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";
Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  
subnetmask=255.255.255.255

Creating File Shares

You will find that using the File Share Wizard in Failover Cluster Manager does not work. Instead, you will simply create the file shares in Windows Explorer on the active node. Failover clustering automatically picks up those shares and puts them in the cluster.

Note that the”Continuous Availability” option of a file share is not supported in this configuration.

Conclusion

You should now have a functioning File Server Failover Cluster. If you have ANY problems, please reach out to me on Twitter @daveberm. Need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Reproduced with permission from Clusteringformeremortals