January 13, 2022 |
Why You Need Business Continuity PlansWhy You Need Business Continuity PlansFaceBook, Instagram And WhatsApp Just Had A Really Bad MondayIt’s the end of the work day here on the east coast and I see that the Facebook is still unavailable. Facebook acknowledged the problem in the following two Tweets. While we don’t know the exact cause of the downtime, and whether it was user error, some nefarious assault, or just an unexpected calamity of errors, we can learn a few things about this outage at this point. Downtime Is ExpensiveWhile we may never know the exact cost of the downtime experienced today, there are a few costs that can already measured. As of this writing, Facebook stock went down 4.89% today. That’s on top of an already brutal September for Facebook and other tech stocks. But what was the real cost to the company? With many brands leveraging social media as an important part of their marketing outreach, how will this outage impact future advertising spends? Minimally I anticipate advertisers to investigate other social media platforms if they have not done so already. Only time will tell, but even before this outage we have seen more competition for marketing spend from other platforms such as TickTock. Plan For The Worst-Case ScenarioThings happen, we know that and plan for that. Business Continuity Plans (BCP) should be written to address any possible disaster. Again, we don’t know the exact cause of this particular disaster, but I would have to imaging that an RTO of 5+ hours is not written into any BCP that sits on the shelf at Facebook, Instagram or WhatsApp. What’s in your your BCP? Have you imagined any possible disaster? Have your measured the impact of downtime and defined adequate recovery time objective (RTO) and recovery point objective (RPO) for each component of your business? I would venture to say that it’s impossible to plan for every possible thing that can go wrong. However, I would advise everyone to revisit your BCP on a regular basis and update it to include disasters that maybe weren’t on the radar the last time you reviewed your BCP. Did you have global pandemic in your BCP? If not, you may have been left scrambling to accomodate a “work from home” workforce. The point is, plan for the worst and hope for the best. Communications In A DisasterCommunications in the event of a disaster should be its own chapter in your BCP.
A truly robust BCP must include multiple fallback means of communication. This becomes much more important as your business spreads out across multiple building, regions or countries. Just think about how your team communicates today. Phone, text, email, Slack might be your top four. But what if they are all unavailable, how would you reach your team? If you don’t know you may want to start investigating other options. You may not need a shortwave radio and a flock of carrier pigeons, but I’m sure there is a government agency that keeps both of those on hand for a “break glass in case of emergency” situation. SummaryYou have a responsibility to yourself, your customers and your investors to make sure you take every precaution concerning the availability of your business. Make sure you invest adequate resources in creating your BCP and that the teams responsible for business continuity have the tools they need to ensure they can do their part in meeting the RTO and RPO defined in your BCP. |
January 9, 2022 |
Fixing Your Cloud JourneyFixing Your Cloud JourneyIn some way or another, the world-changing events of 2020 and 2021 have reshaped nearly everything that we knew, and high availability was no exception. Despite closures and restrictions, many IT teams traded on-prem data centers for the cloud. Many are asking, ‘Now what?’ Here are five things to do to fix your cloud journey in 2022.
Reproduced with permission from SIOS |
January 6, 2022 |
How to Install a SIOS DataKeeper Cluster Edition License KeyHow to Install a SIOS DataKeeper Cluster Edition License KeyOnce you have installed SIOS DataKeeper Cluster Edition software and have activated your license, you will need to install your license key before you can get started. This 4 minute video will review how to install SIOS DataKeeper Cluster Edition software and demonstrate how to activate your license to get started protecting your critical applications. Watch as a SIOS support representative demonstrates each of the three key prerequisites required to install SIOS licenses: ensuring you have the latest version and updates of SIOS DataKeeper software; use our simple license key manager to validate your activated licenses from purchased entitlements, download and apply license keys and start your SIOS DataKeeper software. This video also walks through the process of access our SIOS Documentation portal, where you can find release notes, installation guides, technical documentation and information detailing SIOS DataKeeper Cluster Edition as well as a wide range of topics on everything SIOS. View tips and convenient insights on how to complete steps fast and simply. Now you and begin protecting your critical applications with SIOS DataKeeper clustering software Reproduced from SIOS
|
January 1, 2022 |
Four Avoidance Strategies for Improving Cluster Resilience, Performance, and OutcomesFour Avoidance Strategies for Improving Cluster Resilience, Performance, and OutcomesSimple Steps for Deployment in SIOS Protection Suite Cluster EnvironmentAvoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why? Four Reasons To Use An Avoidance Strategy In High Availability1. Better Performance (minimizing server overload)One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target. 2. Performance OptimizationConsider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available. 3. High Availability OptimizationHA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover. 4. DR AvoidanceSuppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR. So, how do you deploy an avoidance strategy? Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including: 1. Resource prioritizationIn the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha. SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination. 2. Policy or affinity rulesPolicy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include:
The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies.
The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions. For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script. – Cassius Rhue, VP, Customer Experience Reproduced from SIOS |
December 28, 2021 |
Windows ClusteringWindows ClusteringWindows ClusteringHow to Achieve High Availability in WindowsTo mitigate system downtime and ensure high availability for Windows, IT best practice recommends that you cluster servers (or nodes) so that if one node fails, one or more other nodes automatically take over-processing. This is also referred to as Windows clustering. Clustering software is required that monitors the health of the primary node and initiates recovery actions if it detects an issue. HA clustering also requires a way to ensure that, in the event of a failure, the secondary node is accessing the most current versions of data in storage. In most cases, this is achieved by connecting all nodes of the cluster to the same shared storage. The cluster nodes should be separated geographically to protect applications from sitewide and regional disasters. In Windows Server environments, Microsoft includes Windows Server Failover Clustering (WSFC) in the Windows Server platform. What is Windows Server Failover Clustering?With WSFC, each active node has a standby node that has the same hardware specifications and shares the same storage. A third node is often configured as a “witness” server whose sole purpose is to ensure that the primary node is operational and, if an issue is detected, to signal the need to failover operation to the standby node. In addition to monitoring the health of the cluster, the nodes in a WSFC also work together to collectively provide:[1]
How SIOS DataKeeper Complements WSFCWSFC requires shared storage to ensure all cluster nodes are accessing the most up-to-date data in the event of a failover. Often, companies use expensive SAN hardware to assure data redundancy. SANs represent a single point of failure risk. And, if you want to run your application in the cloud with the same Windows Server Failover clustering protection, there is no SAN available. SIOS DataKeeper Cluster Edition seamlessly integrates with and extends WSFC and SQL Server Always On Failover clustering by eliminating the need for shared storage. It provides performance-optimized, host-based replication to synchronize local storage in all cluster nodes, creating a SANless cluster. While WSFC manages the cluster, SIOS DataKeeper performs synchronous or asynchronous replication of the storage giving the standby nodes immediate access to the most current data in the event of a failover. SIOS DataKeeper not only eliminates the cost, complexity, and single-point-of-failure risk of a SAN, but also allows you to use the latest in fast PCIe Flash and SSD in your local storage for performance and protection in a single cost-efficient solution. With SIOS DataKeeper, you can also balance network bandwidth and CPU utilization for each application.
In addition, SIOS DataKeeper’s Target Snapshots feature lets you run point-in-time reports from a secondary node to offload workloads that can impact performance on the primary node. This lets you query and run reports faster and make faster decisions. Working with WSFC, SIOS DataKeeper Cluster Edition protects business-critical Windows environments, including Microsoft SQL Server, SAP, SharePoint, Lync, Dynamics, and Hyper-V using your choice of industry-standard hardware and local attached storage in a “shared-nothing” or SANless configuration.[2] SIOS DataKeeper also provides high availability and disaster recovery protection for your business-critical applications in cloud environments, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Services without sacrificing performance. SIOS Protection Suite – Protecting a Windows Environment Without WSFCSIOS Protection Suite for Windows includes DataKeeper, SIOS LifeKeeper, and optional application Recovery Kits for leading application and infrastructure operations. It is a tightly integrated clustering solution that combines high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies to protect your business-critical applications and data from downtime and disasters. Distributed metadata and notificationsThe WSFC service and node’s metadata/status are hosted on each node in the cluster. When changes occur on any node, updated information is automatically propagated to all other nodes. SIOS Protection Suite does not require WSFC as SIOS monitors the health of the application environment, including servers, operating systems, and databases. It can stop and restart an application both locally and on another cluster server at the same site or in another location. When a problem is detected, SIOS Protection Suite automatically performs the recovery actions and automatically manages cascading and prioritized failovers. With SIOS Protection Suite, you can use your choice of SAN or SANless clusters using a wide array of storage devices, including direct-attached storage, iSCSI, Fibre Channel, and more. SIOS Protection Suite for Windows can meet your high availability and disaster recovery needs within a single site and across multiple sites. Popular SIOS Windows Clustering SolutionsSome of the most popular SIOS Windows clustering solutions – for SQL Server, SAP, and cloud-based environments – are discussed in more detail below. Windows Clustering for SQL Server, SAP, S/4HANA, and OracleSIOS provides comprehensive SAP-certified protection for both applications and data, including high availability, data replication, and disaster recovery. To protect SAP in a Windows environment, SIOS Protection Suite includes SIOS LifeKeeper, which monitors the entire application stack. SIOS protects your Oracle Database whether you are using it with SAP or running standalone Oracle applications – you simply select the Application Recovery Kit that matches your configuration. Windows Clustering in the CloudWhether you need SIOS DataKeeper to enable Windows Server Failover Clustering in the cloud or SIOS Protection Suite for Windows for application monitoring and failover orchestration, as well as efficient, block-level data replication, SIOS delivers complete configuration flexibility. SIOS allows you to create a cluster in any combination of physical, virtual, cloud, or hybrid cloud infrastructures. For example, working with WSFC, SIOS DataKeeper can:
SIOS DataKeeper Cluster Edition can provide high availability cluster protection across cloud ConclusionSIOS provides offerings that support a breadth of applications, operating systems, and infrastructure environments, providing a single solution that can handle all your high availability needs. Here are just a few examples that demonstrate the power of SIOS.
For more information on high availability/disaster recovery solutions to support your Windows environment click here [TM(1] . References https://www.techopedia.com/definition/24358/windows-clustering https://searchwindowsserver.techtarget.com/definition/Windows-Server-failover-clustering [2] A shared-nothing architecture (SN) is a distributed-computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit). https://en.wikipedia.org/wiki/Shared-nothing_architecture Reproduced from SIOS |