High Availability Archives - Page 5 of 47

How ESPN’s “Get Up” Morning Show Can Improve Your High Availability Strategy

September 10, 2024 by Jason Aw Leave a Comment

How ESPN’s “Get Up” Morning Show Can Improve Your High Availability Strategy

ESPN is often a go-to source for information on the NFL. The network is known for providing multiple shows to cover each game, which provide extensive analysis, opinions, and random information about the team, the games, the upcoming schedule, and the coaches in the league as well as highlights of particular players.

Learn from ESPN: Enhance Your High Availability with Two Key Questions

In a segment that aired back on September 13th, 2021 Dan Orlovsky lamented the Chicago Bears’ usage of their rookie quarterback. In doing so, he incidentally provided two ways to improve your high availability with just two questions.
Two questions that will immediately improve your high availability from ESPN’s Dan Orlovsky.

Question number one, what are we doing here?

The Bears offense placed rookie quarterback Justin Fields on the field for a second down play. The rookie quickly fired a pass for a seven yard completion. However, instead of continuing with the rookie, the play-callers for the Bears went back to their previous QB who promptly threw the ball to the other team. This play elicited the question from Dan Orlovsky, “What are we doing here?”

Okay, so you are not a rookie quarterback, I think. And, you are not playing for the Bears, I hope. But, if you are responsible for any part of your enterprise high availability (HA) and disaster recovery (DR) strategies, including clustering, data protection, data replication, application orchestration, governance/adherence and SLAs, then Orlovsky’s first question is a good place to start.

What are you doing?

Are you looking to deploy software and services or remediate a known availability issue?
Is your team’s central focus on planning for the future, such as moving to the cloud, or addressing some combination of different changes and requirements?
Is your goal saving cost while meeting regulatory requirements, improving client acquisition through more available resources, or optimizing and improving backend processing?

Answering this question with respect to your enterprise high availability and disaster recovery (HA/DR) strategies will help clarify a tremendous amount with regards to the solution and architecture you implement, as well as the team and processes associated with them.

Question number two, why are we doing it?

After firing off a series of rapid comments about his frustration with when and how the Chicago Bears used their rookie quarterback, Orlovsky asked the second question, “Why are we doing it?” If, to paraphrase Orlovsky, the “what” is putting Fields in to check a box and appease the fans, then “why are we doing it?” Why just check the box, when you could be using more of the dynamic rookie’s abilities to gain an advantage and win games? Why not make the change that gives you the best chance of being successful.

Of course, we aren’t the coach of the Bears, though I wouldn’t mind calling a few plays. But, the question still applies to those involved with implementing enterprise availability. Let’s go back to question number one. What was your answer for “what are we doing here?” Now, answer the question of “why are we doing it?” Some possible reasons include:

Providing assurance for data availability
Saving the business time
Saving the business money by avoiding costly unplanned downtime
Saving the business time and money by automating monitoring and recovery
Deploying an architecture to meet a stringent Service Level Agreement (SLA) or Service Level Objective (SLO)
Reducing RTO and RPO to zero
Preventing last year’s disasters from occurring again
Hedging against the unknown threats of downtime
Meeting governance requirements
Checking the box
Trying not to get fined for noncompliance
Because management said so

Why are we doing it is a critical question to answer before, during, and after high availability systems have been deployed. As VP of Customer Experience, I worked with a customer whose IT Administrator and DB Administrator had vastly different reasons for “why are we doing it?” To make matters worse, the executive sponsor held a third view. Without a properly vetted and communicated “why” the team struggled to properly articulate requirements, and when an incident occurred, their efforts for remediation often created more conflict and questions.

There are likely a number of possible reasons why the Chicago Bears didn’t find success or properly utilize their young QB on the upcoming third down, and an equal number of opinions on what they are trying to accomplish with their franchise and veteran quarterback. But, even if Orlovsky couldn’t help Bears fans clarify what or why, he did provide the framework for how to immediately improve your HA. So, how would you answer the two questions: What are we doing for HA? Why are we doing it?

High Availability Solutions

SIOS Technology Corporation provides high availability cluster software that protects & optimizes IT infrastructures with cluster management for your most important applications. Contact us today for more information about our professional services and support.

Reproduced with permission from SIOS

6 High Availability Lessons Learned from Cybersecurity Nightmares

August 30, 2024 by Jason Aw Leave a Comment

6 High Availability Lessons Learned from Cybersecurity Nightmares

Recently, a security provider reported some best practice advice and recommendations for companies in light of rising security threats. While security threats should receive attention from every business, this advice isn’t limited to cybersecurity; it is equally relevant to HA partners and customers with critical applications and services to protect.

Six Takeaways for HA From Recent Articles on Cybersecurity

Take IMMEDIATE steps to ramp up HA
Waiting for a downtime incident and focusing on fast recovery is a bad strategy. Preparing and preventing downtime is a better solution. Start identifying critical Tier1 and Tier2 applications, databases and services. Tier 1 should include all business-critical solutions that cannot be inaccessible. That is, applications that must be available 24/7 and will cause serious business consequences if they go offline. While Tier 2 applications should be running as often as possible, they are less critical and your business can tolerate outages of up to a few hours without significant business impact.
Prepare a comprehensive HA protection plan
Develop a plan for protecting the key applications, databases and services. Be sure this plan includes architecture and design documentation as well as personnel responsibilities for responding to downtime. Always prepare a process for deploying clusters in a QA or sandbox environment with an eye to documenting the activities and details into a runbook. These sandbox and QA systems can also be used for testing, training, and validating upgrades, hotfixes, and maintenance.
Recognize and address high availability risks
All organizations must recognize that no company is safe from the disruption of downtime, regardless of size or location. Small, medium and large businesses are all susceptible to disasters, whether natural or man-made. Large organizations will experience their fair share of user errors, data center failures resulting from local construction, failed infrastructure, and outages from networking components. Small and medium entities, esp. those with on-prem solutions or smaller IT teams, need to add HA protection as well. While larger companies may lose more money in an outage, small to medium businesses are likely to lose a comparable amount as well. It is important to note that moving to the cloud is not enough to prevent all risks.
Business executives need to lead in high availability strategy
Preparing for and guarding against downtime is not just an IT team issue. Business executives need to be onboard with protecting the business from known risks, exposure, and downtime threats. This means key executives and stakeholders need to proactively ask about HA coverage, plans and staffing. Business executives should also prepare to make investments in preparing for the unexpected by ensuring full coverage of tier1 applications, databases, services, and data. They should also proactively expand coverage to tier2 and beyond.
The importance of ongoing communication for HA
Business leadership, Admins from all areas (network, storage, compute, cloud, database, and applications) should convene frequently to discuss new and existing HA threats, new and existing challenges, and ongoing requirements. Keeping internal team discussions going to understand requirements and business continuity plans is a must. In fact, HA considerations need to become an integral part of all relevant internal planning and communications from the C-level to the entry level. However, this communication cannot stop with in-house stakeholders. Instead, business leaders and HA stakeholders must review corporate posture, HA requirements, and critical findings with the HA vendor and their R&D and support teams.
Don’t wait for a disaster to review your HA solution
Companies need to review their plans and ability to execute those plans on a frequent basis. This review needs to go deeper than reading the runbook, documentation, or cluster design. Review the runbook alongside hands-on exercises to validate and update the runbook. Test the process for restoring systems after downtime, including client and business operation restoration

Act Now to Protect Your Systems from Downtime

Similar to security recommendations, companies should take immediate action to secure their systems, solutions, applications and data from downtime and disasters. Don’t wait for a disaster to reveal gaps in your HA strategy. Contact SIOS today to enhance your HA strategy and safeguard your business against unexpected disruptions.

Reproduced with permission from SIOS

DataKeeper UI vs. Car Dashboards: A Guide to High Availability Monitoring

August 24, 2024 by Jason Aw Leave a Comment

DataKeeper UI vs. Car Dashboards: A Guide to High Availability Monitoring

Besides sharing the enjoyment of using DataKeeper Cluster Edition with its high availability and disaster recovery capabilities, most of us have something else in common . . . we drive a car … electric, gasoline or a hybrid. As of 2022, Forbes states that 92% of households owned at least one car.

And those millions of cars have a few things in common with DataKeeper . . .

A dashboard with status indicators.

In DataKeeper’s case, the user interface (DataKeeper.msc) has what are known as Mirror Definitions or Mirror Status.

In cars, indicators are affectionately known as “dashboard lights” or for some of you Gen(eration) Xers may call them “idiot lights”

Let’s jump right in and talk about the similarities.

All cars have some level of combustion (fuel), electric (or not) and cooling abilities; thus the associated lights on the dashboard are usually in the colors of Red, Yellow and Green. Like a car dashboard, the DataKeeper UI has the “traffic light” schema in the Console Tree.

As we mentioned, the possible scenarios in troubleshooting a car, e.g. battery, fuel problems, engine overheating, like that of your engine, DataKeeper troubleshooting too can be consolidated to the following areas of:

Storage
Network
Other (Security, User Administration,etc.)

Referencing back to the “traffic light” identifiers in the Console Tree on the DataKeeper UI, let’s take a look “under the hood” to identify the state of the Mirror(s).

As a driver would take their car in for either to fix a problem or perform regular service, the dealership’s Technical Advisor or Service Technician, will plug in a OBD connector (On-board Diagnostic) to get a general idea of where the problem may be occurring (combustion, electrical or other)

Identify Mirror Status Colors to Diagnose DataKeeper Issues

As a user that’s supporting/driving DataKeeper, your first level of triage should be to identify those “color” changes of the mirror to confirm if Storage, Network or Other have been impacting users, performance, etc.

Those identifiers are affectionately called Mirror definitions. The mirror status, similar to the OBD, can be identified by launching a command called “emcmd . getmirrorvolinfo <drive letter>“.

Note:

To get to the “emcmd” commands, (which, by the way, stands for Extended Mirroring Command), we can launch an elevated (Administrator) command prompt as follows:

“cd %extmirrbase%”
- This is a shortcut to the installation path for where the utilities are located e.g. <root>\Program Files (x86)\SIOS\DataKeeper

The output will be displayed as “# hostname #”

The 1st integer represents the role of the Mirror, (1 = Source, 2 = Target)

As for the Last integer, there are 6 states a Mirror can obtain:

1 – Mirroring (Green)

2 – Mirror is Resyncing (Yellow)

3 – Mirror is Broken (Red)

4 – Mirror is Pause (Yellow)

5 – Resyc is Pending (Yellow)

With several variations – just as it pertains to your car or DataKeeper – one must obtain a starting point to identify the “pain points”

Driver: Do you share the car with your family? Did someone leave a dome light on? Did the fuel type change from 94 octane to 87?
DataKeeper Administrator: Are you the sole Support person? Do other departments have access to your cluster, e.g. Database Administrators, Infrastructure Personnel responsible for Storage, Network and other?

Looking under the Hood: Preventive Steps to Take Before You Scale your Clustered Resources

Storage

Do your homework. Infrastructure and Database Administrators want to scale their storage to meet growing demands. They are very knowledgeable about the tasks at hand but if performed incorrectly or in the wrong order, the Mirror colors in DataKeeper could be Red, Yellow or none. See our documentation and supporting video on How to Properly Resize Your Storage.

Network

Easily avoid full resynchs. Infrastructure and database admins may want to segment existing mirror(s) traffic to a different network without any downtime or loss of High Availability. Can you imagine not having HA for re-creating a new mirror and having to perform a FULL Resync for a couple of Terabytes or even a Petabyte of data? See our documentation and supporting video on how this can be achieved with ease

Other (Security)

Understand password management. Password policies may change via prescribed timeframes in Active Directory. If this is unbeknownst to the Administrators, and if the SIOS DataKeeper Service is restarted, the password that has been changed for the SIOS DataKeeper Service account (Active Directory) is NOT propagated automatically. Therefore a manual update of the Service Account Password is required within the Service applet. (services.msc) For proper usage of SIOS DataKeeper Service Accounts view our supporting documentation

Managing DataKeeper Across Multiple Departments to Avoid Downtime

Your car can have multiple drivers, different Service Technicians, different road conditions and the like. DataKeeper is no different as there may be cross-functional departments that are responsible for Storage, Network, Security and other, that could adversely impact DataKeeper. There may be related clustered resources that have relationships/dependencies with DataKeeper.These departments perform tasks without the DataKeeper Administrator’s knowledge and just like the dashboard lights in your cars, your mirrors will display those “traffic light” colors; noticeably Yellow and Red.

Checklist for Monitoring DataKeeper Mirror Status and Infrastructure Changes

Review “Traffic Light” colors
Identify status of the Mirrors via emcmd . getmirrorvolinfo command
If multiple departments are involved, ask about changes or scheduled events throughout the infrastructure albeit Storage, Network, Security or other
Augment your initial triage via Event Viewer logs, ipconfig /all output, dkhealthcheck, just to name a few

Contact SIOS Support for DataKeeper Issues

If your car has gone “kaput” and left you on the side of the road, you’ll likely reach out to AAA and they’ll provide a tow truck.
If you have concerns about your clustered DataKeeper Volume Resource, contact SIOS Support. Got a question or a non-priority issue, email us at support@us.sios.com. If your issue is urgent, call us at 877-457-5113.

Reproduced with permission from SIOS

Navigating the Transition from Nutanix Xi Leap: Why SIOS is Your Ideal Disaster Recovery Partner

July 31, 2024 by Jason Aw Leave a Comment

Navigating the Transition from Nutanix Xi Leap: Why SIOS is Your Ideal Disaster Recovery Partner

Preparing for the End of Nutanix Xi Leap

As Nutanix plans to discontinue support for Xi Leap on April 16, 2025, organizations are faced with the challenge of finding a new disaster recovery (DR) solution. Rather than viewing this as a setback, it’s an excellent opportunity to implement a more advanced and tailored DR solution with SIOS. By starting your transition early, you can ensure a seamless migration and maintain your business continuity without interruption.

Why Early Planning for a DR Solution is Crucial

Given that a typical migration to a new DR solution can take months, beginning the transition process well ahead of time is essential. Delaying this could result in significant obstacles, putting your data and business operations at risk.

Key Features to Consider in Your New DR Solution

Customized HA/DR Solutions for Your Unique Needs

It’s important to choose a DR provider that offers flexibility and scalability tailored to your organization’s specific requirements. SIOS excels in delivering High Availability (HA) and DR software solutions for both Windows and Linux environments, providing a perfect fit for diverse business needs.

Rapid Disaster Recovery and Reliable Response

Minimizing downtime is a top priority during any disruption. SIOS’s real-time block-level replication technology ensures superior Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO), enabling swift and effective recovery during emergencies.

24/7 Expert Support

Around-the-clock support is vital to ensure continuous protection and prompt assistance. SIOS offers dedicated 24x7x365 support, staffed by knowledgeable technical experts, to give you peace of mind and reliable help whenever necessary.

Real-Time Data Replication and Cluster Integration

SIOS specializes in real-time data replication and seamless cluster integration, offering the best RTO/RPO in the industry. Our technology also simplifies the failback process after a disaster, ensuring your operations can return to normal quickly and efficiently.

The Advantage of Choosing SIOS for Your Disaster Recovery Needs

Selecting a disaster recovery partner with a proven track record and in-depth expertise is crucial. SIOS not only meets these criteria but also ensures a smooth transition from Nutanix Xi Leap, keeping your data secure and minimizing any disruption to your business operations.

Flexibility in Hosting Your DR Solution

One of the key benefits of SIOS’s solutions is the flexibility they offer. You can choose to host your disaster recovery on-premises or in the cloud, depending on what best suits your organization’s strategy. This adaptability, coupled with control over your own data, provides a higher level of security compared to traditional DRaaS solutions.

Start Your Disaster Recovery Transition with SIOS Today

As Nutanix phases out Xi Leap, seize this opportunity to upgrade to a more tailored, secure, and responsive disaster recovery solution with SIOS. Our dedication to your business continuity and operational efficiency makes us the perfect partner for your disaster recovery needs.

If you have any questions or require assistance in selecting your new disaster recovery solution, please reach out to us. Discover how SIOS can help safeguard your business-critical workloads and ensure a smooth transition from Nutanix Xi Leap.

Reproduced with permission from SIOS

CloudStrike Downtime Debrief: Practical Ways To Use HA For Patching

July 21, 2024 by Jason Aw Leave a Comment

CloudStrike Downtime Debrief: Practical Ways To Use HA For Patching

As a company dedicated to protecting critical applications from downtime, we want to share some context and practical advice about IT patching policies and the role of high availability.

Patching policies have evolved significantly over the years. From a cautious approach that prioritized extensive testing to the current urgency-driven model addressing zero-day exploits, the landscape of software patch management has transformed in response to escalating cyber threats. This blog delves into this evolution, the driving forces behind these changes, and how SIOS Technology’s LifeKeeper and DataKeeper high availability (HA) solutions play a crucial role in enabling customers to balance the need for security with operational stability.

The Traditional Approach

Historically, organizations adopted a conservative stance toward patching – particularly in highly critical environments – that was driven by several factors:

Stability Concerns: Patching could potentially introduce new bugs or compatibility issues, leading to system instability.
Complex Environments: Enterprise IT environments are complex, with numerous interdependencies. A patch might fix one issue but break another, necessitating thorough testing.
Operational Downtime: Applying patches often requires system downtime, which could disrupt business operations and lead to financial losses.

In this traditional model, patches were rigorously tested in staging environments that mirrored production systems. Only after exhaustive testing and validation would patches be deployed to production. This approach minimized risks but also meant that systems remained vulnerable to known threats for extended periods.

The Shift: Zero Day Exploits Driving Immediate Patching

The emergence of zero-day exploits has fundamentally changed patching policies. Attackers exploit security flaws before the vendor is aware of them and can issue a patch. Time is of the essence. No one wants to be hacked via a vulnerability addressed in a patch that IT has been slow to apply. The increasing frequency and sophistication of these exploits have forced organizations to prioritize speed over caution.

The New Imperative: Patch Immediately

Several high-profile incidents, such as the WannaCry ransomware attack in 2017, highlighted the devastating potential of zero-day vulnerabilities. These incidents underscored the need for immediate patching to protect against exploits that could cause significant damage.
However, this urgency comes with its own set of challenges:

Increased Risk of Downtime: Rapid deployment of patches without thorough testing can lead to system crashes and service interruptions.
Operational Strain: IT teams must work quickly to assess, test, and deploy patches, often under immense pressure.
Resource Allocation: Prioritizing patching over other IT tasks can strain resources and divert attention from other critical projects.

SIOS High Availability for Rolling Maintenance

SIOS high availability (HA) solutions are a crucial component in modern patch management strategies. SIOS clustering software is designed to ensure continuous operation, even during maintenance activities such as patching. Here’s how SIOS LifeKeeper and DataKeeper software solutions enable organizations to balance the need for security with operational stability:

Seamless Patching and Testing

Redundancy and Failover: SIOS clusters use redundancy and failover mechanisms to maintain service availability. In a SIOS environment, critical applications are run on a primary server node and “clustered” with a secondary node so that if the primary fails, the secondary is ready to automatically take over operation. This setup allows patches to be applied in a “rolling maintenance” strategy. That is, IT applies patches to the secondary node while the primary continues to handle the workload, thereby minimizing downtime. After the maintenance is complete on the secondary node, operation can be moved to the secondary node and the original primary node can be updated.
Staged Rollouts: SIOS HA architectures facilitate staged rollouts of patches. Organizations can deploy patches to a subset of servers or nodes and monitor their impact before applying them to the entire system. This staged approach helps identify and mitigate potential issues without affecting the entire infrastructure.

Benefits of SIOS HA for Patching

Minimized Downtime: By ensuring that at least part of the system remains operational during patching, SIOS LifeKeeper and DataKeeper solutions reduce the risk of service disruptions.
Improved Testing: Staging environments within SIOS HA configurations allow for real-time testing and validation of patches without impacting the production environment.
Enhanced Security: Faster deployment of critical patches reduces the window of vulnerability to exploits, enhancing overall security posture.

Conclusion

The evolution of patching policies from a cautious, test-first only approach to the urgency-driven, immediate deployment model reflects the growing threat landscape and the need for rapid response to zero-day exploits. While this shift has introduced challenges, SIOS provides a robust framework for balancing security and stability. By leveraging SIOS’ HA solutions, organizations can ensure continuous operation, even during critical patching activities, thereby safeguarding their systems and data against emerging threats without compromising on performance and uptime.

Reproduced with permission from SIOS

« Previous Page
1
…
3
4
5
6
7
…
47
Next Page »