SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Azure Site Recovery for Disaster Recovery

November 11, 2018 by Jason Aw Leave a Comment

Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery

#IGNITE2018 Session: Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery

I’m a big fan of Azure Site Recovery for Disaster Recovery  So, I was glad to attend the Ignite session today presented by Rochak Mittal and Ashish Gangwar

Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery

#IGNITE2018 Session: Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery

I’m a big fan of Azure Site Recovery for Disaster Recovery  So, I was glad to attend the Ignite session today presented by Rochak Mittal and Ashish Gangwar 

BRK3304 – Architecting mission-critical, high-performance SAP workloads on Azure

This session about ensuring Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery is particularly informative. In one of the architecture slides, they showed how an entire SAP deployment could be protected by Azure Site Recovery (ASR) and recovered in the event of a disaster in just a few minutes. Using Azure Recovery Plans allows you to have explicit control over recovery. It includes creating dependencies on resources as well as invoking scripts within a VM to help facilitate the complete recovery.

It seems like yesterday. But it was back in May of 2014 when I first started assisting Microsoft with providing a HA solution for SAP ASCS in Azure. That solution involves using DataKeeper to build a SANless cluster solution for ASCS. It still stands today as the only HA solution that also works with ASR for disaster recovery configurations such as the one shown in this demo at Ignite.

Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery 
Shared disks in Azure with SIOS DataKeeper

Want to know how to Ensure Application Availability With Cloud-Based Disaster Recovery, Azure Site Recovery, let us know and we’ll be glad to help.
Reproduced with permission from Clusteringformeremortals.com

Filed Under: Clustering Simplified Tagged With: Azure Site Recovery, disaster recovery, Microsoft

Azure Outage Post Mortem Part 3

November 8, 2018 by Jason Aw Leave a Comment

azure outage post mortem

Concluding The Azure Outage Post-Mortem Part 3

My previous blog posts, Azure Outage Post-Mortem – Part 1 and Azure Outage Post-Mortem Part 2, made some assumptions based upon limited information coming from blog posts and twitter. I just attended a session at Ignite which gave a little more clarity as to what actually happened. Sometime tomorrow you should be able to view the session for yourself.

BRK3075 – Preparing for the unexpected: Anatomy of an Azure outage

The official Root Cause Analysis will be published soon. In the meantime, here are some tidbits of information gleaned from the session.

The Cause

From the azure outage post mortem, the outage was NOT caused by a lightning strike as previously reported. Instead, due to the nature of the storm, there were electrical storm sags and swells. As a result, it locked out a chiller plant in the 1st datacenter. During this first outage they were able to recover the chiller quickly with no noticeable impact. Shortly thereafter, there was a second outage at a second datacenter which was not recovered properly. That began an unfortunate series of events.

2nd Outage

During this outage, Microsoft states that “Engineers didn’t triage alerts correctly – chiller plant recovery was not prioritized”. There were numerous alerts being triggered at this time. Unfortunately the chiller being offline did not receive the priority it should have. The RCA as to why that happened is still being investigated.

Microsoft states that of course redundant chiller systems are in place. However, the cooling systems were not set to automatically failover. Recently installed new equipment had not been fully tested. So it was set to manual mode until testing had been completed.

After 45 minutes, the ambient cooling failed, hardware shutdown, air handlers shut down because they thought there was a fire. Staff had been evacuated due to the false fire alarm. During this time temperature in the data center was increasing. Some hardware was not shut down properly, causing damage to some storage and networking.

After manually resetting the chillers and opening the air handlers, the temperature began to return to normal. It took about 3 hours and 29 minutes before they had a complete picture of the status of the datacenter.

The biggest issue was there was damage to storage. Microsoft’s primary concern is data protection. Microsoft will work to recover data to ensure no data loss. This of course took some time, which extend the overall length of the outage. The good news is that no customer data was lost. The bad news is that it seemed like it took 24-48 hours for things to return to normal. This was based upon what I read on Twitter from customers complaining about the prolonged outage.

Assumptions

Everyone expected that this outage would impact customers hosted in the South Central Region. But what they did not expect was that the outage would have an impact outside of that region. In the session, Microsoft discusses some of the extended reach of the outage.

Azure Service Manager (ASM)

This controls Azure “Classic” resources, AKA, pre-ARM resources. Anyone relying on ASM could have been impacted. It wasn’t clear to me why this happened. It appears that South Central Region hosts some important components of that service which became unavailable.

Visual Studio Team Service (VSTS)

Again, it appears that many resources that support this service are hosted in the South Central Region. This outage is described in great detail by Buck Hodges (@tfsbuck), Director of Engineering, Azure DevOps this blog post.

POSTMORTEM: VSTS 4 SEPTEMBER 2018

Azure Active Directory (AAD)

When the South Central region failed, AAD did what it was designed to due and started directing authentication requests to other regions. As the East Coast started to wake up and online, authentication traffic started picking up. Now normally AAD would handle this increase in traffic through autoscaling. But the autoscaling has a dependency on ASM, which of course was offline. Without the ability to autoscale, AAD was not able to handle the increase in authentication requests. Exasperating the situation was a bug in Office clients which made them have very aggressive retry logic, and no backoff logic. This additional authentication traffic eventually brought AAD to its knees.

They ran out of time to discuss this further during the Ignite session. One feature that they will be introducing will be giving users the ability to failover Storage Accounts manually in the future. So in the case where recovery time objective (RTO) is more important than (RPO), the user will have the ability to recover their asynchronously replicated geo-redundant storage in an alternate data center if Microsoft experience another extended outage in the future.

What You Can Do Now

Until that time, you will have to rely on other replication solutions such as SIOS DataKeeper Azure Site Recovery. Or application specific replication solutions which has the ability to replicate data across regions and put the ability to enact your disaster recovery plan in your control.

Read more about our azure outage post mortem
Reproduced with permission from Clusteringformeremortals.com

Filed Under: Clustering Simplified Tagged With: azure outage post mortem, Microsoft

How To Avoid Split Brain On Availability Groups With SQL Server On Linux

October 30, 2018 by Jason Aw Leave a Comment

How-To-Avoid-Split-Brain-On-Availability-Groups-With-SQL-Server-On-Linux

SQL Server 2017 On Linus Availability Group Split Brain Problem

How-To-Avoid-Split-Brain-On-Availability-Groups-With-SQL-Server-On-Linux

SQL Server 2017 On Linus Availability Group Split Brain Problem

Avoid Split Brain On Availability Groups With SQL Server On Linux with this support article posted by Microsoft.

Running SQL Server on Linux can have some advantages, including cost savings on the OS if running in Azure. Make some calculations. The cost savings are substantials as the number of cores go up. Furthermore you are licensing at least two servers for every cluster pair.

However, why bother saving money if the technology is not rock solid? One of the biggest issues I see with running SQL Server on Linux is the lack of a cohesive HA/DR story. On Windows, Microsoft owns the whole HA stack and SQL Server relies heavily on Windows Server Failover Clustering to support both Availability Groups and Failover Cluster Instances. This has been running well for many years and has a long track record of success stories.

When moving to Linux, Microsoft no longer owns the HA stack at the OS level. Depending upon your distro of Linux, you are left trying to piece together open source solutions like Pacemaker. Not to mention trying to get things to cooperate with SQL Server Availability Groups.

To avoid Split Brain On Availability Groups With SQL Server On Linux, I would much rather look to a 3rd party high availability solution like the SIOS Protection Suite for Linux (SPS-L). It gives you a tried and true HA solution for your business critical applications running on Linux.

Split Brain On Availability Groups With SQL Server On Linux
SQL Server on Linux Cluster in Azure

Split Brain On Availability Groups With SQL Server On Linux With SIOS

SPS-L has been protecting business critical applications running on Linux since 1999. It is a full HA/DR solution that monitors. It recovers the entire application stack as well as the physical servers and network to ensure your business critical applications are highly available. All this while maintaining a 3rd copy for disaster recover in a remote datacenter or different geographic region of the cloud.

The other benefit of SPS-L is that it doesn’t require the Enterprise Edition of SQL Server, so there can be a significant cost savings advantage on SQL Server licenses as well. Consider SQL Server Standard Edition costs $1859 per core vs $7128 per core for SQL Server Enterprise Edition. The cost savings advantage can be significant, depending upon how many cores you need to license.

Below is a video demonstration of SPS-L protecting SQL Server running on Linux in the Azure Cloud. The demonstration shows a SQL Server Standard Edition Cluster being manually failed over between nodes in different Azure Fault Domains as well as SPS-L responding to an unexpected failure.

Want to learn other tips like avoiding Split Brain On Availability Groups With SQL Server On Linux, read about our blog
Reproduced with ClusteringForMereMortals.com

Filed Under: Clustering Simplified Tagged With: Linux, Microsoft, split brain on availability groups with sql server on linux, SQL Server

SQL Server on Linux High Availability Features and Considerations

August 17, 2018 by Jason Aw Leave a Comment

SIOS will be speaking at Microsoft SQLSaturday event

Speaker Jason Aw, Strategic Business Development, SIOS Technology Corp, will be sharing in this one hour session about SQL Server on Linux High Availability Features and Considerations.

With SQL Server on Linux, Microsoft brings SQL Server’s core relational database engine to the growing enterprise Linux ecosystem. High Availability and Disaster Recovery (HADR) are aspects of SQL Server that are critically important and in this session we discuss the features, limitations and options for High Availability and how SANless clustering can ensure proper functionality, availability and reliability for SQL Server on Linux, on-premise and in the Azure cloud.

Jason is a passionate IT leader with over 20 years of experience in technology, infrastructure, cloud environments particularly with high availability and disaster recovery solutions for different architectures including SQL database applications. For more information on SIOS, please contact here.

Track: Track 3

Level: Intermediate

You Are Invited To Join Us At SQLSaturday

SQLSaturday is a free training event for Microsoft Data Platform professionals and those wanting to learn about SQL Server, Business Intelligence and Analytics.

This event will be held on Aug 18 2018 at Microsoft Singapore Operations Pte Ltd, #22-01 One Marina Boulevard, Singapore, Singapore, 018989, Singapore

REGISTER NOW

About The Event

Welcome to SQLSaturday

We are proud to host SQLSaturday for the 3rd consecutive year in Singapore. SQLSaturday is a training event for SQL Server and data professionals who want to learn how to elevate their careers to the next level.

Admittance to this event is free, all costs are covered by donations and sponsorship. Please register soon as seating is limited, and let friends and colleagues know about the event.

It is a great opportunity to learn from Microsoft Product Team, Microsoft MVPs, SQL Server authors, and professionals who have been through and can relate to the obstacles you have day to day. This one day training event is unlike any other providing free food, training, networking, and also provide opportunity to win a few prizes!

Date & Time: Saturday, August 18th, 2018 – 8:30 AM to 5:30 PM

Cost: Free

Venue: #21-01 One Marina Boulevard, Singapore – 018989

Filed Under: News and Events Tagged With: High Availability, Microsoft, SQL Server, SQL Server High Availability, SQLSaturday

Microsoft Wants Your Input On The Next Version Of Windows Server

March 13, 2018 by Jason Aw Leave a Comment

Microsoft Wants Your Input On The Next Version Of Windows Server

Windows Server has a new UserVoice page: http://windowsserver.uservoice.com/forums/295047-general-feedback with subsections:

  • Clustering: http://windowsserver.uservoice.com/forums/295074-clustering
  • Storage: http://windowsserver.uservoice.com/forums/295056-storage
  • Virtualization: http://windowsserver.uservoice.com/forums/295050-virtualization
  • Networking: http://windowsserver.uservoice.com/forums/295059-networking
  • Nano Server: http://windowsserver.uservoice.com/forums/295068-nano-server
  • Linux Support: http://windowsserver.uservoice.com/forums/295062-linux-support

This is where YOU get to provide Microsoft with your feedback directly.

Reproduced with permission from https://clusteringformeremortals.com/2015/05/12/microsoft-wants-your-input-on-the-next-version-of-windows-server/

Filed Under: Clustering Simplified Tagged With: Clustering, Linux Support, Microsoft, Nano Server, Networking, storage, UserVoice, Virtualization, Windows Server

  • 1
  • 2
  • 3
  • …
  • 5
  • Next Page »

Recent Posts

  • The Best Rolling Upgrade Strategy to Enhance Business Continuity
  • How to Patch Without the Pause: Near-Zero Downtime with HA
  • SIOS LifeKeeper Demo: How Rolling Updates and Failover Protect PostgreSQL in AWS
  • How to Assess if My Network Card Needs Replacement
  • SIOS Technology to Demonstrate High Availability Clustering Software for Mission-Critical Applications at Red Hat Summit, Milestone Technology Day and XPerience Day, and SQLBits 2025

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2025 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in