Archives for June 2018

Supported Services With Azure Resource Manager (ARM)

June 20, 2018 by Jason Aw Leave a Comment

Azure Service Management (Classic) or Azure Resource Manager (ARM)?

I deal with users every week that are moving business critical workloads to Azure. The first question I usually ask is whether they are using Azure Service Management (Classic) or Azure Resource Manager (ARM).

I usually recommend ARM. It is the new way of doing things and all the new features are being developed for ARM. However, there are a few things that are not compatible with ARM yet. As time goes by this list of unsupported features gets smaller and smaller. Meantime, it is good to know there is an existing document which seems to be updated on a regular basis which lists all of the features and whether they are supported with ARM. https://azure.microsoft.com/en-us/documentation/articles/resource-manager-supported-services/

It’s a decent list but it’s not complete. I only found App Service Environment was not supported on this page out on the App Service Environment page.

Fix Azure ILB Connection In SQL Server Alwayson Failover Cluster Instance

June 19, 2018 by Jason Aw Leave a Comment

Troubleshooting Azure ILB Connection Issues In A SQL Server Failover Instance Cluster Connection

I use the following tools to help me deal with troubleshooting SQL Server Failover Cluster Instance Connectivity issues. Especially those pesky Azure ILB Connection Issues. I’ll try to update this article whenever I find a new tool.

NETSTAT

The first tool is a simple test to verify whether the SQL Cluster IP is listening on the port it should be listening on. In this case, the SQL Cluster IP address is 10.0.0.201. But it is using the default instance which is port 1433.

Here is the command which will help you quickly identify whether the active node is listening on that port. In our case below everything looks normal.

C:\Users\dave.SIOS>netstat -na | find "1433"
TCP    10.0.0.4:49584         10.0.0.201:1433        ESTABLISHED
TCP    10.0.0.4:49592         10.0.0.201:1433        ESTABLISHED
TCP    10.0.0.4:49593         10.0.0.201:1433        ESTABLISHED
TCP    10.0.0.4:49595         10.0.0.201:1433        ESTABLISHED
TCP    10.0.0.201:1433        0.0.0.0:0              LISTENING
ESTABLISHED
TCP    10.0.0.201:1433        10.0.0.4:49592         ESTABLISHED
TCP    10.0.0.201:1433        10.0.0.4:49593         ESTABLISHED
TCP    10.0.0.201:1433        10.0.0.4:49595         ESTABLISHED

Once I can be sure SQL is listening to the proper port, I use PSPING to try to connect to the port remotely.

PSPING

PSPing is part of the PSTools package available from Microsoft. I usually download the tool and put PSPing directly in my System32 folder so I can use it whenever I want without having to change directories.

Now, assuming everything is configured properly from the ILB, Cluster and Firewall perspective, you should be able to ping the SQL Cluster IP address and port 1433 from the passive server. You will get the results shown below…

C:\Users\dave.SIOS>psping 10.0.0.201:1433
PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
TCP connect to 10.0.0.201:1433:
5 iterations (warmup 1) connecting test:
Connecting to 10.0.0.201:1433 (warmup): 6.99ms
Connecting to 10.0.0.201:1433: 0.78ms
Connecting to 10.0.0.201:1433: 0.96ms
Connecting to 10.0.0.201:1433: 0.68ms
Connecting to 10.0.0.201:1433: 0.89ms
If things are not configured properly you may see results similar to the following…
C:\Users\dave.SIOS>psping 10.0.0.201:1433
TCP connect to 10.0.0.102:1433:
5 iterations (warmup 1) connecting test:
Connecting to 10.0.0.102:1433 (warmup): 
This operation returned because the time out period expired.
Connecting to 10.0.0.102:1433 (warmup): 
This operation returned because the time out period expired.
Connecting to 10.0.0.102:1433 (warmup): 
This operation returned because the time out period expired.
Connecting to 10.0.0.102:1433 (warmup): 
This operation returned because the time out period expired.
Connecting to 10.0.0.102:1433 (warmup): 
This operation returned because the time out period expired.

If PSPing connects but yet your application is having a problem connecting, you may need to dig a bit deeper. I have seen some application like Great Plains also want to make a connection to port 445. If your application can’t connect but PSPing connects fine to 1433. Then you may need to do a network trace and see what other ports your application is trying to connect to. Your last step would be to add load balancing rules for those ports as well.

NAMED INSTANCES

Planning to use a named instances? You need to make sure you lock down your TCP service to use a static port. At the same time, you also need to make sure you add a rule to your load balancer to redirect UDP 1434 for the SQL Browser Service. Otherwise you won’t be able to connect to your named instance.

FIREWALL

Opening up TCP ports 1433 and 59999 should cover all the manual steps required. But when troubleshooting connection issues, I generally turn the Windows Firewall off to eliminate the firewall as a possible cause of the problem. Don’t forget. Azure also has a firewall called Network Security Groups. If anyone changed that from the default that could be blocking traffic as well.

NAME RESOLUTION

Try pinging the SQL cluster name. It should resolve to the SQL Server cluster iP address. Although I have seen on more than a few occasions, the DNS A-record associated with the SQL Cluster network name mysteriously disappear from DNS. If that is the case, go ahead and read-ad the SQL Custer name and IP address as an A record in DNS.

SQL CONFIGURATION MANAGER

In SQL Configuration Manager, you should see the SQL Cluster IP Address listed and port 1433. If by chance you installed a Named Instance, you of course will need to go in here and lock the port to a specific port and make your load balancing rules reflect that port. Because of the Azure ILB limitation of only on ILB per AG, I really don’t see an valid reason to use a named instance. Make it easier on yourself and just use the default instance of SQL. (Update: as of Oct 2016 you CAN have multiple IP addresses per ILB, so you CAN have multiple instances of SQL installed in the cluster.)

Reproduced with permission from Clustering For Mere Mortals.

Azure ILB In ARM For SQL Server Failover Cluster Instances

June 15, 2018 by Jason Aw Leave a Comment

Configuring The #AZURE ILB In ARM For SQL Server Failover Cluster Instance Or AG Using AZURE Powershell 1.0

In an earlier post I went into some great detail about how to configure the Azure ILB in ARM for SQL Server Failover Cluster or AG resources. The directions in that article were written prior to the GA of Azure PowerShell 1.0. With the availability of Azure PowerShell 1.0 the main script that creates the ILB needs to be slightly different. The rest of the article is still accurate. However if you are using Azure PowerShell 1.0 or later the script to create the ILB described in that article should be as follows.

#Replace the values for the below listed variables
$ResourceGroupName ='SIOS-EAST' # Resource Group Name in which the SQL nodes are deployed
$FrontEndConfigurationName = 'FEEAST' #You can provide any name to this parameter.
$BackendConfiguratioName = 'BEEAST' #You can provide any name to this parameter.
$LoadBalancerName = 'ILBEAST' #Provide a Name for the Internal Local balance object
$Location ='eastus2' # Input the data center location of the SQL Deployements
$subname = 'public' # Provide the Subnet name in which the SQL Nodes are placed
$ILBIP = '10.0.0.201' # Provide the IP address for the Listener or Load Balancer
$subnet = Get-AzureRMVirtualNetwork -ResourceGroupName $ResourceGroupName | 
Get-AzureRMVirtualNetworkSubnetConfig –name $subname
$FEConfig=New-AzureRMLoadBalancerFrontendIpConfig -Name $FrontEndConfigurationName 
-PrivateIpAddress $ILBIP -SubnetId $subnet.Id
$BackendConfig=New-AzureRMLoadBalancerBackendAddressPoolConfig 
-Name $BackendConfiguratioName
New-AzureRMLoadBalancer -Name $LoadBalancerName -ResourceGroupName $ResourceGroupName 
-Location $Location -FrontendIpConfiguration $FEConfig 
-BackendAddressPool $BackendConfig

The rest of that original article is the same, but I have just copied it here for ease of use…

Using GUI

Now that the ILB is created, we should see it in the Azure Portal in Resource Group. See pic below.

Azure ILB In ARM For SQL Server Failover Cluster Instances

The rest of the configuration can also be completed through PowerShell, but I’m going to use the GUI in my example.

If you want to use PowerShell, you could probably piece together the script by looking at this article. Unfortunately, this article confuses me. I’ll figure it out some day and try to document it in a user friendly format. As of now, I think the GUI is fine for the next steps.

Let’s Get Started

Follow along with the screen shots below. If you get lost, follow the navigation hints at the top of the Azure Portal to figure out where we are.

First Step

Click Backend Pool setting tab. Selects the backend pool to update the Availability Set and Virtual Machines. Save your changes.

Azure ILB In ARM For SQL Server Failover Cluster Instances

Configure Load Balancer’s Probe by clicking Add on the Probe tab. Give the probe a name and configure it to use TCP Port 59999. I have left the probe interval and the unhealthy threshold set to the default settings. This means it will take 10 seconds before the ILB removes the passive node from the list of active nodes after a failover. Your clients may take up to 10 seconds to be redirected to the new active node. Be sure to save your changes.

Azure ILB In ARM For SQL Server Failover Cluster Instances

Next Step

Navigate to the Load Balancing Rule Tab and add a new rule. Give the rule a sensible name (SQL1433 or something). Choose TCP protocol port 1433 (assuming you are using the default instance of SQL Server). Choose 1433 for the Backend port as well. For the Backend Pool, we will choose the Backend Pool we created earlier (BE). For the Probe that we will also choose the Probe we created earlier.

We do not want to enable Session persistence but we do want to enable Floating IP (Direct Server Return). I have left the idle timeout set to the default setting. You might want to consider increasing that to the maximum value. Reason is that I have seen some applications such as SAP log error messages each time the connection is dropped and needs to be re-established.

Azure ILB In ARM For SQL Server Failover Cluster Instances

At this point the ILB is configured. There is only one final step that needs to take place for SQL Server Failover Cluster. We need to update the SQL IP Cluster Resource just the exact same way we had to in the Classic deployment model. To do that you will need to run the following PowerShell script on just one of the cluster nodes. Make note, SubnetMask=“255.255.255.255” is not a mistake. Use the 32 bit mask regardless of what your actual subnet mask is.

One Final Note

In my initial test I still was not able to connect to the SQL Resource name even after I completed all of the above steps. After banging my head against the wall for a few hours I discovered that for some reason the SQL Cluster Name Resource was not registered in DNS. I’m not sure how that happened or whether it will happen consistently, but if you are having trouble connecting I would definitely check DNS and add the SQL cluster name and IP address as a new A record if it is not already in there.

And of course don’t forget the good ole Windows Firewall. You will have to make exceptions for 1433 and 59999 or just turn it off until you get everything configured properly like I did. You probably want to leverage Azure Network Security Groups anyway instead of the local Windows Firewall for a more unified experience across all your Azure resources.

Good luck and let me know how you make out.

Head over here to see how SIOS helped companies across the globe with creating SQL Server Failover Cluster.

Reproduced with permission from Clustering For Mere Mortals.

High Availability Cluster Solution For High Service Level

June 8, 2018 by Jason Aw Leave a Comment

Improve Ordering Process with High Availability Cluster Solution

The Japan Contact Lens Association has an online ordering system that takes in orders from dealers to manufacturers and agents all year round. SCSK Co., Ltd. (hereinafter referred to as SCSK) manages this system. They added SCSK’s infrastructure cloud service “USiZE Shared Model” with an operating rate of 99.99%. At the same time, they included LifeKeeper to secure a high occupancy rate in the cloud and DataKeeper to build an operational infrastructure with High Availability Cluster Solution. With these softwares, the team could detected abnormalities not only on hardware and virtual machines, and applications. They have set up a system to quickly restore and continue services even in case of a crisis.

Contact Lens Ordering Platform That Provides 24 Hours 365 Days Service To The Cloud

The Japan Contact Lens Association supports the dissemination of contact lenses and the development of the industry. They operate a network system called “CLIOS (Contact Lens Information & Order System)”. “Web-CLIOS” is an ordering system that can easily place orders from web browsers anytime and anywhere.

High Availability Function To Monitor Applications To Minimize Business Impact In Case Of Failure

The USiZE shared model was supposed to be used as a new operational infrastructure of “Web-CLIOS”. It not only provides computing resources, but also provides system operation know-how acquired over 40 years and ITIL. The latter is a best practice in IT service management and a feature that provides high-quality service. However, even this is not perfect. “To realize the service level required by the Japan Contact Lens Association, we needed a mechanism to increase the availability of the application layer,” Mr. Ishihara said. The USiZE shared model is also equipped with high availability function. For example, if a failure occurs in the host server, you can easily move the virtual machine running on it to another healthy host server easily. It can be failed over. However, what can be monitored with the native virtualization infrastructure is only the state of the guest OS. It is impossible to detect failure of the application. Tesio Abe of senior engineer at SCSK cloud service department, explains, “Although the host server and guest OS appear to be operating normally, the application running on it seems to freeze or go down due to some reason. In the case of mission-critical services like “Web-CLIOS”, high availability functions that can monitor applications are essential.”

Short Failover Time With LifeKeeper and DataKeeper

In order to solve this problem, SCSK focused on cloud-ready High Availability Cluster Solution. After comparing and examining products of several vendors, they adopted LifeKeeper. Makoto Nagashima, SCSK Cloud Service Department Foundation Service Manager, said the High Availability Cluster Solution could maintain reliable operation for a long time.

Above all, it is also easy to install. High-speed data mirroring is performed by DataKeeper as a shared storage between the production node and the standby node. Even if abnormality occurs in the production node, LifeKeeper has a mechanism that can fail over in a shorter time which cannot be found with other products.

High Availability Brought About By “Protection Suite” Will Be Their Weapons For the Future

Although more than two years have already passed since the renewal of “Web-CLIOS”, the system infrastructure has continued to operate steadily without causing serious troubles.

SCSK says it will actively utilize “Protection Suite” in other services and projects. Ikeda said, “While high-speed availability of “mission critical “cloud migration is expected to accelerate, LifeKeeper and DataKeeper’s advanced availability for SCSK consulting /SI business and cloud services will become our weapon for the future.”

To find out more about SIOS products, go here
To read about how SIOS helped Japan Contact Lens Association achieve High Availability Cluster Solution, go here