SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Webinar: Failover Clustering in the Cloud – Understanding Your Options

May 24, 2019 by Jason Aw Leave a Comment

Failover Clustering in the Cloud – Understanding Your Options

Webinar: Failover Clustering in the Cloud – Understanding Your Options

Windows Server 2016 introduced Storage Spaces Direct, to allow for shared storage in Azure that could help with building and configuring a Windows cluster in the cloud. While this sounded like a killer feature (and it can be with proper infrastructure) it runs into challenges in cloud environments with limited bandwidth available for both storage and data traffic.

In this webinar, you will learn about the configuration of clusters in the cloud, some real-world examples of problems, and alternatives for maintaining a shared storage infrastructure within Azure.

Register Webinar: Failover Clustering in the Cloud – Understanding Your Options

Filed Under: News and Events Tagged With: Azure, Clustering, failover clustering, SQL Server

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

April 14, 2019 by Jason Aw Leave a Comment

New Azure ILB Feature Allows You To Build A Multi-Instance SQL Server Failover Cluster

At Microsoft Ignite this past September, Microsoft made some announcements around Azure. One of these announcements was the general availability of multiple VIPs on internal load balancers. Why is this so important to a SQL Server DBA? Well, up until now if you want to deploy highly available SQL Server in Azure you were limited to a single SQL Server FCI per cluster or a single Availability Group listener.

This limitation forced you to deploy a new cluster for each instance of SQL Server you wanted to protect in a Failover Cluster. It also forced you to group all of your databases into a single Availability Group if you wanted automatic failover and client redirection in your AlwaysOn AG configuration.

How To Get Out Of These Restrictions?

Those restrictions have now been lifted with these new ILB features. In this post I am going to walk you through the process of deploying a SQL Server FCI in Azure that contains two SQL Server instances. In a future post I will walk you through the same process for SQL Server AlwaysOn AG.

Let’s Start With A Multi-Instance SQL Server Failover Cluster

Build a basic, single instance SQL Server FCI in Azure as I describe in my post Deploying Microsoft SQL Server 2014 Failover Clusters in Azure Resource Manager .

That post describes the process of creating the Multi-Instance SQL Server Failover Cluster. Using DataKeeper to create the replicated volume resources used in the cluster, try creating the Internal Load Balancer (ILB) and then fixing the SQL Server Cluster IP Resource to work with the ILB. If you want to skip that process and jumpstart your configuration you can always use the Azure Deployment Template that creates a 2-Node SQL Server FCI using SIOS DataKeeper

Assuming you now have a basic two node SQL Server FCI, the steps to add a 2nd named instance are as follows:

  1. Create another DataKeeper Volume Resource on another volume that is not currently being used. You may need to add additional disks to your Azure instance if you have no available volumes. As part of this volume creation process the new DataKeeper Volume resource will be registered in Available Storage in the cluster. Refer to the article referenced earlier for the details.
  2. Install a named instance of SQL Server on the first node, specifying the DataKeeper Volume that we just created as the storage location.
  3. “Add a node” to the cluster on the second node.
  4. Lock down the port number of this new named instance to a port that is not in use. In my example I use port 1440.

Adjust ILB To Second Instance

Next we have to adjust the ILB to redirect traffic to this second instance. Here are the steps you need to follow:

Add a frontend IP address that is identical to the SQL cluster IP address you used for the second instance of SQL Server as shown below.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Next, we will need to add another probe since the instances could be running on different servers. As shown below, I added a probe that probes port 59998 (instead of the usual 59999). We will need to make sure the new rules reference this probe. We will also need to remember that port number since we will need to update IP address associated with this instance during the last step of this process.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Now we need to add two new rules to the ILB to direct traffic destined for this 2ndinstance of SQL. Of course we need to add a rule to redirect TCP port 1440 (the port I used for the named instance of SQL), but because we are now using named instances we will also need to have a port to support the SQL Server Browser Service, UDP Port 1434.

In the picture below depicting the rule for the SQL Server Browser Service, take note that the Front End IP Address is referencing the new FrontendIP address (10.0.0.201), UDP port 1434 for both the Port and Backend Port. In the pool you will need to specify the two servers in the cluster, and finally make sure you choose the new Health Probe we just created.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

We will now add a rule for TCP/1440. As show in the picture below, add a new rule for port TCP 1440, or whatever port locked down for the named instance of SQL Server. Again, be sure to choose the new FrontEnd IP Address and the new Health Probe (59998). Also, make sure the Floating IP (direct server return) is enabled.

Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

The Last Step

Now that the load balancer is configured, the final step is to run the PowerShell script to update the new Cluster IP address associated with this 2nd instance of SQL Server. This PowerShell script only needs to be run on one of the cluster nodes.

# Define variables

$ClusterNetworkName = “”

# the cluster network name 
(Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)

$IPResourceName = “”

# the IP Address resource name of the second instance of SQL Server

$ILBIP = “”

# the IP Address of the second instance of SQL, 
which should be the same as the new Frontend IP address as well

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:

Get-ClusterResource $IPResourceName | 
Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59998;
SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}

# If you are using Windows Server 2008 R2 use this:

#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59998  
subnetmask=255.255.255.255

You now have a fully functional multi-instance SQL Server FCI in Azure. Let me know if you have any questions to build a Multi-Instance SQL Server Failover Cluster With New Azure ILB Feature

Reproduced from Clusteringformeremortals.com

Filed Under: Clustering Simplified, Datakeeper Tagged With: Azure, failover cluster, ILB, multi instance sql server failover cluster, Multi-Instance SQL Server, SQL Server

Managing a Real-Time Recovery in a Major Cloud Outage

January 19, 2019 by Jason Aw Leave a Comment

Managing a Real-Time Recovery in a Major Cloud Outage

Managing A Real-Time Recovery In A Major Cloud Outage

Disasters happen, making sudden downtime reality. But there are things all customers can do to survive virtually any cloud outage.

Stuff happens. Failures—both large and small—are inevitable. What is not inevitable is extended periods of downtime.

Consider the day the South Central US Region of Microsoft’s Azure cloud experienced a catastrophic failure. A severe thunderstorm led to a cascading series of problems that eventually knocked out an entire data center. In what some have called “The Day the Azure Cloud Fell from the Sky,” most customers were offline, not just for a few seconds or minutes, but for a full day. Some were offline for over two days. While Microsoft has since addressed the many issues that led to the outage, the incident will long be remembered by IT professionals.

That’s the bad news. The good news is: There are things all Azure customers can do to survive virtually any outage. It can be from a single server failing to an entire data center going offline. In fact, Azure customers who implement robust high-availability and/or disaster recovery provisions, complete with real-time data replication and rapid, automatic failover, can expect to experience no data loss, and little or no downtime whenever catastrophe strikes.

See also: Nutanix sees enterprise cloud winning the cloud race

Managing The Cloud Outage

This article examines four options for providing disaster recovery (DR) and high availability (HA) protections in hybrid and purely Azure cloud configurations. Two of the options are specific to the Microsoft SQL Server database, which is a popular application in the Azure cloud; the other two options are application-agnostic. The four options, which can also be used in various combinations, are compared in the table and include:

  • The Azure Site Recovery (ASR) Service
  • SQL Server Failover Cluster Instances with Storage Spaces Direct
  • SQL Server Always On Availability Groups
  • Third-party Failover Clustering Software

RT Insights SIOS_Real-timeRecovery for Cloud Outage_181119

RTO and RPO 101

Before describing the four options, it is necessary to have a basic understanding of the two metrics used to assess the effectiveness of DR and HA provisions: Recovery Time Objective and Recovery Point Objective. Those familiar with RTO and RPO can skip this section.

RTO is the maximum tolerable duration of an outage. Online transaction processing applications generally have the lowest RTOs, and those that are mission-critical often have an RTO of only a few seconds. RPO is the maximum period during which data loss can be tolerated. If no data loss is tolerable, then the RPO is zero.

The RTO will normally determine the type of HA and/or DR protection needed. Low recovery times usually demand robust HA provisions that protect against routine system and software failures, while longer RTOs can be satisfied with basic DR provisions designed to protect against more widespread, but far less frequent disasters.

The data replication used with HA and DR provisions can create the need for a potential tradeoff between RTO and RPO. In a low-latency LAN environment, where replication can be synchronous, the primary and secondary datasets can be updated concurrently. This enables full recoveries to occur automatically and in real-time, making it possible to satisfy the most demanding recovery time and recovery point objectives (a few seconds and zero, respectively) with no tradeoff necessary.

Across the WAN, by contrast, forcing the primary to wait for the secondary to confirm the completion of updates for every transaction would adversely impact on performance. For this reason, data replication in the WAN is usually asynchronous. This can create a tradeoff between accommodating RTO and RPO that normally results in an increase in recovery times. Here’s why: To satisfy an RPO of zero, manual processes are needed to ensure all data (e.g. from a transaction log) has been fully replicated on the secondary before the failover can occur This extra effort lengthens the recovery time, which is why such configurations are often used for DR and not HA.

Azure Site Recovery (ASR) Service

ASR is Azure’s DR-as-a-service (DRaaS) offering. ASR replicates both physical and virtual machines to other Azure sites, potentially in other regions, or from on-premises instances to the Azure cloud. The service delivers a reasonably rapid recovery from system and site outages, and also facilitates planned maintenance by eliminating downtime during rolling software upgrades.

Like all DRaaS offerings, ASR has some limitations, the most serious being the inability to automatically detect and failover from many failures that cause application-level downtime. Of course, this is why the service is characterized as being for DR and not for HA.

With ASR, recovery times are typically 3-4 minutes depending, of course, on how quickly administrators are able to manually detect and respond to a problem. As described above, the need for asynchronous data replication across the WAN can further increase recovery times for applications with an RPO of zero.

SQL Server Failover Cluster Instance with Storage Spaces Direct

SQL Server offers two of its own HA/DR options: Failover Cluster Instances (discussed here) and Always On Availability Groups (discussed next).

FCIs afford two advantages: The feature is available in the less expensive Standard Edition of SQL Server, and it does not depend on having shared storage like traditional HA clusters do. This latter advantage is important because shared storage is simply not available in the cloud—from Microsoft or any other cloud service provider.

A popular choice for storage in the Azure cloud is Storage Spaces Direct (S2D), which supports a wide range of applications, and its support for SQL Server protects the entire instance and not just the database. A major disadvantage of S2D is that the servers must reside within a single data center, making this option suitable for some HA needs but not for DR. For multi-site HA and DR protections, the requisite data replication will need to be provided by either log shipping or a third-party failover clustering solution.

SQL Server Always On Availability Groups

While Always On Availability Groups is SQL Server’s most capable offering for both HA and DR, it requires licensing the more expensive Enterprise Edition. This option is able to deliver a recovery time of 5-10 seconds and a recovery point of seconds or less. It also offers readable secondaries for querying the databases (with appropriate licensing), and places no restrictions on the size of the database or the number of secondary instances.

An Always On Availability Groups configuration that provides both HA and DR protections consists of a three-node arrangement with two nodes in a single Availability Set or Zone, and the third in a separate Azure Region. One notable limitation is that only the database is replicated and not the entire SQL instance, which must be protected by some other means.

In addition to being cost-prohibitive for some database applications, this approach has another disadvantage. Being application-specific requires IT departments to implement other HA and DR provisions for all other applications. The use of multiple HA/DR solutions can substantially increase complexity and costs (for licensing, training, implementation and ongoing operations), making this another reason why organizations increasingly prefer using application-agnostic third-party solutions.

Third-party Failover Clustering Software

With its application-agnostic and platform-agnostic design, failover clustering software is able to provide a complete HA and DR solution for virtually all applications in private, public and hybrid cloud environments. This includes for both Windows and Linux.

Being application-agnostic eliminates the need for having different HA/DR provisions for different applications. Being platform-agnostic makes it possible to leverage various capabilities and services in the Azure cloud, including Fault Domains, Availability Sets and Zones, Region Pairs, and Azure Site Recovery.

As complete solutions, the software includes, at a minimum, real-time data replication, continuous monitoring capable of detecting failures at the application level, and configurable policies for failover and failback. Most solutions also offer a variety of value-added capabilities that enable failover clusters to deliver recovery times below 20 seconds with minimal or no data loss to satisfy virtually all HA/DR needs.

Making It Real

All four options, whether operating separately or in concert, can have roles to play in making the continuum of DR and HA protections more effective and affordable for the full spectrum of enterprise applications. This includes from those that can tolerate some data loss and extended periods of downtime, to those that require real-time recovery to achieve five-9’s of uptime with minimal or no data loss.

To survive the next cloud outage in the real-world, make certain that whatever DR and/or HA provisions you choose are configured with at least two nodes spread across two sites. Also be sure to understand how well the provisions satisfy each application’s recovery time and recovery point objectives. As well as any limitations that might exist, including the need for manual processes required to detect all possible failures, and trigger failovers in ways that ensure both application continuity and data integrity.

About Jonathan Meltzer

Jonathan Meltzer is Director, Product Management, at SIOS Technology. He has over 20 years of experience in product management and marketing for software and SaaS products that help customers manage, transform, and optimize their human capital and IT resources.

Reproduced from RTinsights

Filed Under: News and Events Tagged With: Azure, Cloud, cloud outage, cybersecurity, microsoft azure, multi-cloud, recovery, server failover, SQL, storage

How To Cluster MaxDB On Windows In The Cloud

January 12, 2019 by Jason Aw Leave a Comment

How To Cluster MaxDB On Windows In The Cloud

How To Cluster MaxDB On Windows In The Cloud

How To Cluster MaxDB On Windows In The Cloud #AZURE #AWS #GCP #SAP

Recently I have had a number of customers looking for a high availability solution to cluster MaxDB on Windows in the cloud. Some customers have been in Azure and some in AWS. But regardless of the cloud platform, they all eventually find the post in the SAP Community WIKI that describes the process.

https://wiki.scn.sap.com/wiki/display/MaxDB/HowTo+-+Embed+SAP+MaxDB+in+MSCS

The Challenge

The challenge with this post in a cloud environment is that there is no shared storage (SAN) available in the Azure, AWS or GCP that allows you to build a traditional shared storage cluster. The beauty of HA in the cloud is that cluster nodes typically reside miles away from each other in another data center, AKA, availability zone (AZ). So even if shared storage was available, it wouldn’t make a lot of sense since it would have to reside in a single AZ. It defeats the purpose of HA all together.

The Solution

However, there is an answer to cluster MaxDB on Windows in the cloud. SIOS DataKeeper is a SANless clustering solution from SIOS technology. It allows locally attached storage to be used in a Windows Server Failover Cluster. This eliminates the need for a SAN. Instead, SIOS keeps locally attached disk in sync using synchronous block level replication technology and presents this storage to WSFC as a clustered disk resource called a DataKeeper volume.

cluster MaxDB on Windows in the cloud
Typical 2-node WSFC across Availability Zones with a 3rd node in a different Region

As far as the cluster is concerned, a DataKeeper Volume cluster resource looks like a shared disk. But instead of controlling disk locking (SCSI reservations), it controls the mirror direction. So in every sense of the word it is still a true WSFC, except it uses locally attached storage instead of shared storage. The locally attached storage can be anything from EBS block device to Azure premium disk, or even a local Storage Space with multiple disks stripped together. As long as Windows sees an NTFS formatted volume with a drive letter and the volume size is the same on each instance it can be used in the cluster.

DataKeeper Volume Cluster Resource

This type of cluster is commonly known as a SANless cluster. It has been around for many years enabling geo-clusters and clusters where shared storage was not available. Database admins also love it as it enables them to use local high speed storage devices like PCIe flash or SSD drives. And at the same time, still use WSFC for high availability.

SIOS also supports asynchronous replication. So if you want to add a node in a different geographic location for disaster recovery, you can build a 3-node cluster with 2 nodes in the same region but different fault domains and a 3rd node in an entirely different region, or maybe even back on-prem for disaster recovery options. Or, if you are in Azure you can leverage Azure Site Recovery (ASR) for disaster recovery as SIOS DataKeeper is compatible for ASR.

Both WSFC and SIOS DataKeeper are very dependent upon IP addresses staying the same. So for ASR configurations you will want to make sure you retain your IP address upon failover as described here.

https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-retain-ip-azure-vm-failover

SAP

SIOS is no stranger to high availability and disaster recovery for SAP. The SIOS Protection Suite for Linux is a SAP Certified HA solution for SAP and SAP HANA. SIOS DataKeeper is the preferred HA/DR solution for SAP ASCS on Windows in cloud environments. Providing an HA/DR solution for MaxDB on Azure further solidifies SIOS as the SAP high availability experts.

If you have questions about high availability for SAP, or more details about how to Cluster MaxDB On Windows In The Cloud, do go through our other posts

Reproduced with permission from Clusteringformeremortals.com

Filed Under: Clustering Simplified Tagged With: Azure, cluster, cluster maxdb on windows in the cloud, DataKeeper

Linux failover cluster in Microsoft Azure IaaS without shared storage

December 19, 2018 by Jason Aw Leave a Comment

Step-By-Step: How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage #azure #sanless

In this step by step guide I will take you through all steps required to configure a highly available, 2-node MySQL cluster (plus witness server) in Microsoft Azure IaaS (Infrastructure as a Service).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Microsoft Azure and already have an Azure account with an associated subscription.  If not, you can sign up for a free account today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, etc.

Disclaimer: Azure is a rapidly moving target.  It’s getting better and better every day!  As such, features/screens/buttons are bound to change over time so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a MySQL database highly available, you could certainly adapt this information and process to protect other applications or databases, like SAP, Oracle, PostgreSQL, NFS file servers, and more.

These are the high level steps to create a highly available MySQL database within Microsoft Azure IaaS:

  1. Create a Resource Group
  2. Create a Virtual Network
  3. Create a Storage Account
  4. Create Virtual Machines in an Availability Set
  5. Set VM Static IP Addresses
  6. Add a Data Disk to cluster nodes
  7. Create Inbound Security Rule to allow VNC access
  8. Linux OS Configuration
  9. Install and Configure MySQL
  10. Install and Configure Cluster
  11. Create an Internal Load Balancer
  12. Test Cluster Connectivity

Overview

This article will inform of the steps to configure Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage. It will describe how to create a cluster within a single Azure region.  The cluster nodes (node1, node2 and the witness server) will reside in an Availability Set (3 different Fault Domains and Update Domains), thanks to the new Azure Resource Manager (ARM). We will be creating all resources using the new Azure Resource Manager.

The configuration will look like this:

The following IP addresses will be used:

  • node1: 10.0.0.4
  • node2: 10.0.0.5
  • witness: 10.0.0.6
  • virtual/”floating” IP: 10.0.0.99
  • MySQL port: 3306

Create a Resource Group

First, create a Resource Group.  Your resource group will end up containing all of the various objects related to our cluster deployment: virtual machines, virtual network, storage account, etc.  Here we will call our newly created Resource Group “cluster-resources”.


Be mindful when selecting your region.  All of your resources will need to reside within the same region.  Here, we’ll be deploying everything into the “West US” region:

Create a Virtual Network (VNet)

Creating a Virtual Network  would be your next step in connfiguring Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage. A Virtual Network is an isolated network within the Azure cloud that is dedicated to you.  You have full control over things like IP address blocks and subnets, routing, security policies (i.e. firewalls), DNS settings, and more.  You will be launching your Azure Iaas virtual machines (VMs) into your Virtual Network.

Make sure you select Resource Manager as the deployment model anytime you are given the option:

Give your new Virtual Network a name (“virtual-network”) and make sure you select the resource group that was created in the previous step (“cluster-resources”).  Your Virtual Network needs to reside in the same region your Resource Group.  We will leave the IP Address and Subnet values as default.

 

Create a Storage Account

Before you provision any Virtual Machines, you’ll need to create a Storage Account where they will be stored.

Again, make sure you select Resource Manager as the deployment model anytime you are given the option:

Next, give your new storage account a name.  The storage account name must be unique across *ALL* of Azure.  (Every object that you store in Azure Storage has a unique URL address. The storage account name forms the subdomain of that address.)  In this example I call my storage account “linuxclusterstorage” but you’ll need to select something different as you setup your own.

Select a storage Type based on your requirements and budget.  For the purposes of this guide, I selected “Standard-LRS” (i.e. Locally Redundant) to minimize cost.

Make sure your new Storage Account is added to the Resource Group you created in Step 1 (“cluster-resources”)  in the same Location (“West US” in this example):

Create Virtual Machines in an Availability Set

We will be provisioning 3 Virtual Machines in this guide.  The first two VMs (I’ll call them “node1” and “node2”) will function as cluster nodes with the ability to bring the MySQL database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will be added to the same Availability Set, ensuring that they will end up in different Fault Domains and Update Domains.

Create “node1” VM

Create your first VM (“node1”).  In this guide we will be using CentOS 6.X:

Make sure you use the Resource Manager deployment model.  It should be selected by default:

Give the VM a hostname (“node1”) and username/password that will later be used to SSH into the system.  Make sure you add this VM to your Resource Group (“cluster-resources”) and that it resides in the same region as all of your other resources:

Next, choose your instance size.  For more information on the various instance sizes available, click here.

For the purposes of this guide, I’m using “A3 Standard” for Node1 and Node2, to minimize cost since this won’t be running a production workload.  I used an even smaller “A1 Standard” size for the witness server.  Select the instance size that makes most sense for you.

If you want to be able to connect into the VM from the outside world, set a Public IP address.  I did this so I can later SSH and VNC into the system

IMPORTANT: By default, your VM won’t be added to an Availability Set.  On the Settings screen during make sure you create a new Availability Set, we’ll call “cluster-availability-set”.  Azure Resource Manager (ARM) allows your to create Availability Sets with 3 Fault Domains.  The default values here are fine:

Review your VM properties and click OK to create your first VM:

 

Create “node2” and “witness” VMs

Repeat the steps above twice to create two more VMs.  I created another “A3 Standard” size VM called “node2” and an “A1 Standard” size VM called “witness”.

The only difference here is that you’ll be ADDING these VMs to the Availability Set (“cluster-availability-set”) we just created:

It may take a little while for your 3 VMs to provision.  Once complete, you’ll see your VMs listed on the Virtual Machines screen within your Azure Portal:

Set VM Static IP Addresses

The VMs will be set with the following IP addresses:

  • node1: 10.0.0.4
  • node2: 10.0.0.5
  • witness: 10.0.0.6

Repeat this step for each VM.  Select your VM and edit the Network Interfaces

Select the network interface associated with the VM, and edit IP addresses.  Select “Static” and specify the desired IP address:

Add a Data Disk to cluster nodes

Next, we will need to add a extra disk to of our cluster nodes (“node1” and “node2”).  This disk will store our MySQL databases and the later be replicated between nodes.

Note: You do NOT need to add an extra disk to the “witness” node.  Only “node1” and “node2”.

Edit your VM, select Disks and then attach a new disk:

Select a disk type (Standard or Premium SSD)  and size based on your workload.  Here I create a 10GB Standard disk on both of my cluster nodes.  As far as Host caching goes, “None” or “Read only” caching is fine. I do not recommend using “Read/Write” as there is potential for data loss:

Create Inbound Security Rule to allow VNC access

If your VM is part of a Network Security Group (NSG), which by default it likely is unless you disabled it during VM creation, the only port open in the “Azure firewall” is SSH (port 22).  Later in the guide, I’ll be using VNC to access the desktop of “node1” and configure the cluster using a GUI.  Create an Inbound Security Rule to open up VNC access.  In this guide port 5902 is used.  Adjust this according based on your VNC configuration.

Virtual Machines -> (select node1) -> Network interfaces -> (select NIC) -> Network security group -> (select the NSG) -> Inbound security rules -> Add


Linux OS Configuration

Here is where we will leave the Azure Portal for a little while and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.  You aren’t given the root password to your Linux VMs in Azure, so once you login as the user specified during VM creation, use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.0.0.4    node1
10.0.0.5    node2
10.0.0.6    witness
10.0.0.99   mysql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Configure iptables so that cluster the Virtual IP will work

IMPORTANT: In order to get connectivity to the cluster Virtual IP to work, and also monitoring of the IP resource, a few iptables rules need to be setup.  Note: 10.0.0.99 is the Virtual IP we’ll be using in our cluster, and 3306 is the default port used my MySQL.

On node1 (10.0.0.4), run the following commands:

# iptables – flush
# iptables -t nat -A PREROUTING -p tcp – dport 3306 -j DNAT 
--to-destination 10.0.0.99:3306
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.99 -j SNAT 
--to-source 10.0.0.4
# service iptables save
# chkconfig iptables on

On Node2 (10.0.0.5), run the following commands:

# iptables – flush
# iptables -t nat -A PREROUTING -p tcp – dport 3306 -j DNAT 
--to-destination 10.0.0.99:3306
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.99 -j SNAT 
--to-source 10.0.0.5
# service iptables save
# chkconfig iptables on

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, to later configure our cluster, install VNC server on your cluster node.  In my setup I only did this on “node1”

# yum install tigervnc-server xterm
# vncpasswd
# vi /etc/sysconfig/vncservers

      VNCSERVERS="2:root"
      VNCSERVERARGS[2]="-geometry 1024x768"

# service vncserver start
# chkconfig vncserver on

Test connectivity by opening a VNC client on your laptop/desktop, and connecting to the Public IP of your cluster node

Reboot Cluster Nodes

Reboot your cluster nodes so that SELinux is disabled, and the 2nd disk you previously added is detected. Only “node1” and “node2” need to be rebooted.

Partition and Format the “data” disk

In Step 6 of this guide (“Add a Data Disk to cluster nodes”) we did just that….added an extra disk to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

In Azure IaaS, Linux Virtual Machines use the following arrangement for disks:

  • /dev/sda – OS disk
  • /dev/sdb – temporary disk
  • /dev/sdc – 1st data disk
  • /dev/sdd – 2nd data disk
  • …
  • /dev/sdj – 8th data disk

The disk we added in Step 6 of this guide should appear as /dev/sdc.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/sda (OS) and /dev/sdb (temporary) already have disk partitions and are being used.

# fdisk -l

Disk /dev/sdb: 306.0 GB, 306016419840 bytes
255 heads, 63 sectors/track, 37204 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd3920649

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       37205   298842112   83  Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c23d3

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        3789    30432256   83  Linux
/dev/sda2            3789        3917     1024000   82  Linux swap / Solaris

 

Here I will create a partition (/dev/sdc1), format it, and mount it at the default location for MySQL, which is /var/lib/mysql.  Perform the following steps on BOTH “node1” and “node2”:

# fdisk /dev/sdc
Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder,  cylinders or  size{K,M,G} (1-1305, default 1305): <enter>
Using default value 1305
 
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@node1 ~]#

# mkfs.ext4 /dev/sdc1
# mkdir /var/lib/mysql

On node1, mount the filesystem:

# mount /dev/sdc1 /var/lib/mysql

Install And Configure MySQL

Next, install install the MySQL packages, initialize a sample database, and set “root” password for MySQL.

On “node1”:

# yum -y install mysql mysql-server
# /usr/bin/mysql_install_db – datadir="/var/lib/mysql/" – user=mysql
# mysqld_safe – user=root – socket=/var/lib/mysql/mysql.sock – port=3306 
--datadir=/var/lib/mysql – log &
#
# # NOTE: This next command allows remote connections from ANY host.  
NOT a good idea for production!
# echo “update user set Host='%' where Host='node1'; flush privileges | mysql mysql
#
# #Set MySQL's root password to 'SIOS'
# echo "update user set Password=PASSWORD('SIOS') where User='root'; flush privileges" 
| mysql mysql

Create a MySQL configuration file. We will place this on the data disk  (that will later be replicated – /var/lib/mysql/my.cnf).  Example:

# vi /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
 
[client]
user=root
password=SIOS

Delete the original MySQL configuration file, located in /etc, if it exists:

# rm /etc/my.cnf

On “node2”:

On “node2”, you ONLY need to install the MySQL packages.  The other steps aren’t required:

[root@node2 ~]# yum -y install mysql mysql-server

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with Azure VMs.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (node1, node2, witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “node1” and “node2”.  You do not need to install on “witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
      • lkSQL – LifeKeeper MySQL RDBMS Recovery Kit
    • This will result in the following additional RPMs installed on “node1” and “node2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm
      • steeleye-lkSQL-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (node1, node2, witness):

# cd /tmp/install/quorum
# rpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (node1, node2, witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, edit /etc/group and add the “tony” user (or whatever username you specified during VM creation) to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password in :

# vi /etc/group

lkadmin:x:1001:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Public IP address of node1.  Based on the VNC and Inbound Security Rule configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“node1”).  Enter the linux userid and password specified during VM creation:

Next, connect to both “node2” and “witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

 

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

 

Create Communication Paths

Right-click on “node1” and select Create Comm Path

 

Select BOTH “node2” and “witness” and then follow the wizard.  This will create comm paths between:

  • node1 & node2
  • node1 & witness

 

A comm path still needs to be created between node2 & witness.   Right click on “node2” and select Create Comm Path.  Follow the wizard and select “witness” as the remote server:

 

At this point the following comm paths have been created:

  • node1 <—> node2
  • node1 <—> witness
  • node2 <—> witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

 

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

 

Result:

 

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d node1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node2    TCP     10.0.0.4/10.0.0.5  ALIVE        1

witness  TCP     10.0.0.4/10.0.0.6  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d node2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.0.5/10.0.0.4  ALIVE        1

witness  TCP     10.0.0.5/10.0.0.6  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d witness

 

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.0.6/10.0.0.4  ALIVE        1

node2    TCP     10.0.0.6/10.0.0.5  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Next, create a Data Replication resource to replicate the /var/lib/mysql partition from node1 (source) to node2 (target).  Click the “green plus” icon to create a new resource:

 

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: node1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/lib/mysql
Data Replication Resource Tag: datarep-mysql
File System Resource Tab: /var/lib/mysql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: node2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Target Disk: /dev/sdc1
Data Replication Resource Tag: datarep-mysql
Bitmap File: (default value)
Replication Path: 10.0.0.4/10.0.0.5
Mount Point: /var/lib/mysql
Root Tag: /var/lib/mysql

The cluster will look like this:

 

 

Create Virtual IP

Next, create a Virtual IP cluster resource.  Click the “green plus” icon to create a new resource:

 

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: IP
Switchback Type: Intelligent
IP Resource: 10.0.0.99
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.99

Extend the IP resource with these selections:

Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
IP Resource: 10.0.0.99
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.99

Configure a Ping List for the IP resource

By default, SPS-Linux monitors the health of IP resources by performing a broadcast ping.  In many virtual and cloud environments, broadcast pings don’t work.  In a previous step, we set “NOBCASTPING=1” in /etc/default/LifeKeeper to turn off broadcast ping checks. Instead, we will define a ping list.  This is a list of IP addresses to be pinged during IP health checks for this IP resource.   In this guide, we will add the witness server (10.0.0.6) to our ping list.

Right click on the IP resource (ip-10.0.0.99) and select Properties:

 

You will see that initially, no ping list is configured for our 10.0.0.0 subnet.   Click “Modify Ping List”:

 

Enter “10.0.0.6” (the IP address of our witness server), click “Add address” and finally click “Save List”:

 

You will be returned to the IP properties panel, and can verify that 10.0.0.6 has been added to the ping list.  Click OK to close the window:

 

Create the MySQL resource hierarchy

Next, create a MySQL cluster resource.  The MySQL resource is responsible for stopping/starting/monitoring of your MySQL database.  To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: MySQL Database
Switchback Type: Intelligent
Server: node1
Location of my.cnf: /var/lib/mysql
Location of MySQL executables: /usr/bin
Database Tag: mysql

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10

As a result, your cluster will look as follows.  Notice that the Data Replication resource was automatically moved underneath the database (dependency automatically created) to ensure it’s always brought online before the database:

 

Create a Dependency between the IP resource and the MySQL Database resource

Create a dependency between the IP resource and the MySQL Database resource so that they failover together as a group.  Right click on the “mysql” resource and select “Create Dependency”:

 

On the following screen, select the “ip-10.0.0.99” resource as the dependency.  Click Next and continue through the wizard:

 

At this point the SPS-Linux cluster configuration is complete.  The resource hierarchy will look as follows:

 

 

Create An Internal Load Balancer

If this was a typical on-premises cluster using either physical or virtual servers, you’d be done at this point.  Clients and Applications would connect into the Virtual IP of the cluster (10.0.0.99) to reach the active node.  In Azure, this doesn’t work without some additional configuration.

You will notice that you can’t connect to the Virtual IP from any server other than the node that is currently active.  Most cloud providers, including Azure, do not allow or support gratuitous ARPs which is the reason you can’t connect to the Virtual IP directly.

To workaround this, Azure provides a feature were you can setup an Internal Load Balancer (ILB).  Essentially, when you connect to the IP address of the ILB (which we will actually set to be the same as the cluster’s Virtual IP – 10.0.0.99) you are routed to the currently active cluster node.

Create a Load Balancer:

Give it a name, select “Internal” as the scheme, make sure your virtual network and subnet are properly selected, and assign a static IP that is the same as the cluster’s Virtual IP address.  In this example it’s 10.0.0.99:

 

Next, add a backend pool behind the load balancer.  This how you place the two cluster VMs behind this load balancer

Select both of your cluster nodes (node1, node2) and add them to the Backend Pool:

 

Once saved, expand the backend pool (called “ILBBackEnd” here) and you’ll see both VMs underneath along with their status and IPs.  It may take a few seconds before the screen updates:

 

Next, configure a probe for your ILB.  The probe checks the health of a service behind the ILB to determine which node to route traffic to.  Here we will specify port 3306, which is the default for MySQL:

 

Finally, complete the ILB configuration by creating a Load Balancing Rule.   TCP, Port 3306, and make sure you select “Enabled” for “Floating IP (direct server return)”:

 

Test Cluster Connectivity

At this point, all of our Azure and Cluster configurations are complete!

Cluster resources are currently active on node1:

 

SSH into the witness server, “sudo su -” to gain root access.   Install the mysql client if needed:

[root@witness ~]# yum -y install mysql

Test MySQL connectivity to the cluster:

[root@witness ~]# mysql – host=10.0.0.99 mysql -u root -p

Execute the following MySQL query to display the hostname of the active cluster node:

mysql> select @@hostname;
 – -------- – 
| @@hostname |
 – -------- – 
| node1      |
 – -------- – 
1 row in set (0.00 sec)
mysql>

Using LifeKeeper GUI, failover from Node1 -> Node2″.  Right click on the mysql resource underneath node2, and select “In Service…”:

 

After failover:

 

 

After failover has completed, re-run the MySQL query.  You’ll notice that the MySQL client has detected that the session was lost (during failover) and automatically reconnects:

Execute the following MySQL query to display the hostname of the active cluster node, verifying that now “node2” is active:

mysql> select @@hostname;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    48
Current database: mysql
 – -------- – 
| @@hostname |
 – -------- – 
| node2      |
 – -------- – 
1 row in set (0.56 sec)
mysql>

Find these step by step guides to configure Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage useful? We’ve more here

Reproduced with permission from Linuxclustering

Filed Under: Clustering Simplified Tagged With: Azure, High Availability, linux failover cluster in microsoft azure iaas without shared storage, MySQL

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • …
  • 17
  • Next Page »

Recent Posts

  • Video: The SIOS Advantage
  • Demo Of SIOS DataKeeper For A Three-Node Cluster In AWS
  • 2023 Predictions: Data Democratization To Drive Demand For High Availability
  • Understanding the Complexity of High Availability for Business-Critical Applications
  • Epicure Protects Business Critical SQL Server with Amazon EC2 and SIOS SANLess Clustering Software

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2023 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in