Troubleshooting Azure ILB Connection Issues In A SQL Server Failover Instance Cluster Connection
I use the following tools to help me deal with troubleshooting SQL Server Failover Cluster Instance Connectivity issues. Especially those pesky Azure ILB Connection Issues. I’ll try to update this article whenever I find a new tool.
The first tool is a simple test to verify whether the SQL Cluster IP is listening on the port it should be listening on. In this case, the SQL Cluster IP address is 10.0.0.201. But it is using the default instance which is port 1433.
Here is the command which will help you quickly identify whether the active node is listening on that port. In our case below everything looks normal.
C:\Users\dave.SIOS>netstat -na | find "1433" TCP 10.0.0.4:49584 10.0.0.201:1433 ESTABLISHED TCP 10.0.0.4:49592 10.0.0.201:1433 ESTABLISHED TCP 10.0.0.4:49593 10.0.0.201:1433 ESTABLISHED TCP 10.0.0.4:49595 10.0.0.201:1433 ESTABLISHED TCP 10.0.0.201:1433 0.0.0.0:0 LISTENING ESTABLISHED TCP 10.0.0.201:1433 10.0.0.4:49592 ESTABLISHED TCP 10.0.0.201:1433 10.0.0.4:49593 ESTABLISHED TCP 10.0.0.201:1433 10.0.0.4:49595 ESTABLISHED
Once I can be sure SQL is listening to the proper port, I use PSPING to try to connect to the port remotely.
PSPing is part of the PSTools package available from Microsoft. I usually download the tool and put PSPing directly in my System32 folder so I can use it whenever I want without having to change directories.
Now, assuming everything is configured properly from the ILB, Cluster and Firewall perspective, you should be able to ping the SQL Cluster IP address and port 1433 from the passive server. You will get the results shown below…
C:\Users\dave.SIOS>psping 10.0.0.201:1433 PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility Copyright (C) 2012-2014 Mark Russinovich Sysinternals - www.sysinternals.com TCP connect to 10.0.0.201:1433: 5 iterations (warmup 1) connecting test: Connecting to 10.0.0.201:1433 (warmup): 6.99ms Connecting to 10.0.0.201:1433: 0.78ms Connecting to 10.0.0.201:1433: 0.96ms Connecting to 10.0.0.201:1433: 0.68ms Connecting to 10.0.0.201:1433: 0.89ms If things are not configured properly you may see results similar to the following… C:\Users\dave.SIOS>psping 10.0.0.201:1433 TCP connect to 10.0.0.102:1433: 5 iterations (warmup 1) connecting test: Connecting to 10.0.0.102:1433 (warmup): This operation returned because the time out period expired. Connecting to 10.0.0.102:1433 (warmup): This operation returned because the time out period expired. Connecting to 10.0.0.102:1433 (warmup): This operation returned because the time out period expired. Connecting to 10.0.0.102:1433 (warmup): This operation returned because the time out period expired. Connecting to 10.0.0.102:1433 (warmup): This operation returned because the time out period expired.
If PSPing connects but yet your application is having a problem connecting, you may need to dig a bit deeper. I have seen some application like Great Plains also want to make a connection to port 445. If your application can’t connect but PSPing connects fine to 1433. Then you may need to do a network trace and see what other ports your application is trying to connect to. Your last step would be to add load balancing rules for those ports as well.
Planning to use a named instances? You need to make sure you lock down your TCP service to use a static port. At the same time, you also need to make sure you add a rule to your load balancer to redirect UDP 1434 for the SQL Browser Service. Otherwise you won’t be able to connect to your named instance.
Opening up TCP ports 1433 and 59999 should cover all the manual steps required. But when troubleshooting connection issues, I generally turn the Windows Firewall off to eliminate the firewall as a possible cause of the problem. Don’t forget. Azure also has a firewall called Network Security Groups. If anyone changed that from the default that could be blocking traffic as well.
Try pinging the SQL cluster name. It should resolve to the SQL Server cluster iP address. Although I have seen on more than a few occasions, the DNS A-record associated with the SQL Cluster network name mysteriously disappear from DNS. If that is the case, go ahead and read-ad the SQL Custer name and IP address as an A record in DNS.
SQL CONFIGURATION MANAGER
In SQL Configuration Manager, you should see the SQL Cluster IP Address listed and port 1433. If by chance you installed a Named Instance, you of course will need to go in here and lock the port to a specific port and make your load balancing rules reflect that port. Because of the Azure ILB limitation of only on ILB per AG, I really don’t see an valid reason to use a named instance. Make it easier on yourself and just use the default instance of SQL. (Update: as of Oct 2016 you CAN have multiple IP addresses per ILB, so you CAN have multiple instances of SQL installed in the cluster.)
Reproduced with permission from Clustering For Mere Mortals.