performance Archives - SIOS SANless clusters

How to Perform Performance Testing Replication with SIOS DataKeeper

August 10, 2024 by Jason Aw Leave a Comment

How to Perform Performance Testing Replication with SIOS DataKeeper

Configuring replication for a production database can be a pretty daunting task especially if you have not done your research in advance. This blog will cover many parts of the trickiest aspect of setting up your environment properly… performance. Understanding these key points will put you ahead of the pack and ensure your production Go-Live does not have any last minute hiccups.

The first and most basic point to consider is choosing the correct mirror type for your configuration. SIOS DataKeeper comes with two options for mirror type during the creation process, Synchronous and Asynchronous. Either of these options have their own benefits and drawbacks depending on your environment.

Selecting Mirror Type

Synchronous mirrors excel best in LAN environments with high speed connections and provide 1:1 write consistency at time of commit to the primary system. However if the network, or target storage is unable to keep up with the throughput of the primary system you will see reduction in write speed to maintain synchronous write consistency. Therefore synchronous mirroring would not be recommended for WAN or high latency environments.

Asynchronous mirrors however are perfect for a WAN environment. Asynchronous mirrors provide all the same functionality of ensuring 1:1 write consistency between the nodes, but the difference is that writes are committed to the primary system before the write is committed to the target system. This is possible due to the utilization of a bitmap also known as an intent log, a bitmap tracks all of the changes that occur on the system at a block level and writes data to the target as quickly as it can through a backlog known as a write queue. The write queue can be limited by number of writes or total MB in data and when the limit is hit the mirror will pause and the data will sync, preventing a failover while the data is not in sync.

Hardware Configuration:

Now that you have decided which mirror type fits your environment best it is important to understand that DataKeeper is not magic, DataKeeper can only write and replicate as fast as your systems allow so having hardware capable of achieving the throughput needed by your applications is crucial. Here is some advice and tips for ensuring you have the hardware needed to achieve your replication goals.

Ensure that your Primary and Target systems have identical storage hardware. For example target IOPS should be equal to the source IOPS. Otherwise the slowest component in the environment will prove to be the bottleneck of the write speed. Matching hardware will always provide better performance.
Understanding the importance of the bitmap, the easiest and cheapest way to provide a significant boost in performance is to store the bitmap on its own dedicated high speed storage. The bitmap is very small so provisioning a 5 or 10GB SSD will be sufficient and provide great return on performance
enhancement.
Test the standalone hardware with an understanding that replicating data will introduce some overhead. For example if you have a requirement to attain 10,000 IOPS in your environment, ensure that your hardware can at a bare minimum attain consistent 10,000 IOPS standalone on all nodes that will be part of the cluster. If you are intending to perform synchronous mirroring ensure that you have beyond the bare minimum requirements as further overhead is introduced to maintain synchronous consistency. Network will also need to be load tested to ensure you can transfer the data required for your replication scheme.
Know how to test properly. When utilizing a test environment to verify production capabilities it is important to mimic the setup as closely as possible. It is understood that you cannot set up an entire production database clone just to test replication but utilizing the correct data generation tool can provide better indication of current performance capabilities. Diskspd is a free tool that can be used for some basic testing, but in the world of SQL, HammerDB provides a much better indicator of real world performance.

DiskSpd: https://github.com/microsoft/diskspd
HammerDB: https://www.hammerdb.com/

Lastly we have DataKeeper tuning, there are configurable settings beyond the mirror type within DataKeeper. Modifying these is generally a bit more nuanced and best done under the advice of the SIOS support team. However if you have confirmed that all of the other recommendations are squarely in place then tuning some DataKeeper parameters may provide that last boost in performance needed to meet your required metrics. Some examples of tuning would be increasing the number of outstanding writes that can be in your write queue or modifying how often the bitmap file is flushed to disk.

Reproduced with permission from SIOS

Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes

January 1, 2022 by Jason Aw Leave a Comment

Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes

Simple Steps for Deployment in SIOS Protection Suite Cluster Environment

Avoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why?

Four Reasons To Use An Avoidance Strategy In High Availability

1. Better Performance (minimizing server overload)

One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target.

2. Performance Optimization

Consider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available.

3. High Availability Optimization

HA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover.

4. DR Avoidance

Suppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR.

So, how do you deploy an avoidance strategy? Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including:

1. Resource prioritization

In the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha.

SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination.

2. Policy or affinity rules

Policy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include:

- - - - Constraint policies that will block an application from a specific server by default.
        
        Resource policies that will block an application from a server that does not have sufficient resources
        
        Temporal policies that define a time period that resources are allowed or disallowed from a system
        
        Custom policies that define preferred servers or possible application ownership abilities within the cluster.

The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies.

Specific Avoidance Resources

The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions.

For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script.

– Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Four Reasons To Use An Avoidance Strategy In High Availability

November 28, 2021 by Jason Aw Leave a Comment

Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes

Simple Steps for Deployment in SIOS Protection Suite Cluster Environment

Four reasons to use an avoidance strategy in High Availability

Better Performance (minimizing server overload)

Performance Optimization

HA Optimization

DR Avoidance

So, how do you deploy an avoidance strategy?

Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including:

Resource prioritization

SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination.

Policy or affinity rules

- - - - Constraint policies that will block an application from a specific server by default.
        
        Resource policies that will block an application from a server that does not have sufficient resources
        
        Temporal policies that define a time period that resources are allowed or disallowed from a system
        
        Custom policies that define preferred servers or possible application ownership abilities within the cluster