SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Using Datadog for Amazon EC2 Monitoring? Pair with SIOS AppKeeper for Automated Remediation

December 11, 2020 by Jason Aw Leave a Comment

Amazon EC2 Monitoring SIOS AppKeeper

Using Datadog for Amazon EC2 Monitoring? Pair with SIOS AppKeeper for Automated Remediation

Have you ever thought to yourself, “It would be nice if Datadog could monitor our Amazon EC2 services and automatically restart them when it detects a failure?”  I thought the same thing, and decided to try it out for myself.

SIOS AppKeeper automatically monitors Amazon EC2 instances for failures and automatically restarts instances or even reboots services when failures are detected.  I thought to myself, “What if we combined the monitoring capabilities of Datadog with AppKeeper’s automated remediation capabilities?”

It worked, and here is how I did it.

If you are already using Datadog and are interested in trying this out for yourself, please sign up at the end of this article for access to our API.

Here are the steps I took to set up AppKeeper to receive alerts from Datadog and restart the webserver on Amazon EC2 when downtime is detected.

To run this experiment successfully, we already had a Datadog account, an AppKeeper account and a NGINX webserver running on Amazon EC2 (using Linux 2).

How to integrate Datadog with AppKeeper to provide automated remediation

Step One: Get the Restart API Token from AppKeeper

Request the API Token for the Datadog integration from this form:

https://mk.sios.jp/BC_AppKeeper_Datadog_api_application

If you request it from the form, the token will be sent to the email address you provide.

Step Two: Create the tenant in AppKeeper

The next step was to register the AWS account to which the monitored instance belongs in AppKeeper. (AppKeeper refers to the registered AWS accounts as “tenants.”)

https://sioscoati.zendesk.com/hc/en-us/articles/900000123406-Quick-Start-Guide#h_39404cfb-4a76-450f-99c2-e197cc63e50d

Step Three: Create  IAM Role in AWS

I then created an IAM Role in AWS (you need this to set up your AppKeeper account).  Here are instructions if you are unfamiliar with this process.

Step Four:  Add the tenant in AppKeeper

The next step was to add the “tenant” in AppKeeper (AppKeeper considers an AWS account a “tenant”).  Here is a link to detailed instructions on doing this.

Step Five: Set up the Synthetics Test in Datadog

I then needed to configure Datadog’s outline monitoring for the Nginx server (EC2 instance) that we want to monitor.  Here’s how to do that:

Open the Datadog dashboard and select UX Monitoring > Synthetic Tests from the menu.

Click the [New Test] button in the upper right corner and select [New API Test] to create an outline monitoring case.

Enter the following information in the form to create an outline monitoring case.

  1. Choose Request Type
    Select “HTTP”.
  2. Define Request:
    Set the following values.
    URL : GET http://{{{ EC2 IP address }}
    Name : AppKeeper Datadog Integration Test (any name)
    Locations : Tokyo

 

3. Specify test frequency
No Change

4. Define assertion
Click on “New Assertion” and set the following values

When : [status code] [is] [200]

5. Define Alert Condition
No Change

6. Notify Your Team
No Change

Step Six: Run the Synthetics test in Datadog

Once the above inputs are completed, press “Create Test” to create the test case for external monitoring.

The results are visible and we can see that the webserver is working properly in the “Test Results” section.

That was all that had to be done to configure Synthetics monitoring using Datadog.

Step Seven: Set AppKeeper to receive Synthetics alerts

Next I had to set AppKeeper as the notification destination.  From the Datadog menu, go to Integrations and select the Integrations tab.

In the search box, enter “Webhooks” to find the Webhooks integration.

Click “Available” to enable the Webhooks integration in your Datadog account. (Once enabled, it will appear in the “Installed” column.)

Click on “Configure” to open the Webhooks integration configuration page.

In the “Webhooks” column at the bottom of the page, click “New +” to create a new Webhooks notification destination. For the parameters, enter the following

Name : The name of the integration (any name)

URL : https://api.appkeeper.sios.com/v2/integration/{{ AWS account ID }}/actions/recover

Payload :

{

“instanceId”: “{{ EC2 Instance ID}}”,

“name”: “nginx”

}

Custom Headers: Check the box and enter the following

{
“Content-type”: “application/json”,
“accept”: “application/json”,
“appkeeper-integration-token”: “{{ Get AppKeeper external integration tokens The tokens obtained in }}”
}

When you are done, press “Save.”

Step Eight: Connecting AppKeeper to the Synthetics test

Next, I had to configure AppKeeper (the registered Webhooks integration) to be called when an alert of the Synthetics monitoring occurs.

Open the test case that you set up in “Configuring the Synthetic Monitoring with Datadog” from UX Monitoring > Synthetic Tests in the menu.

Select “Edit test details” from the top-right gearbox and enter the following values in the “5. Notify Your Team” box to save the changes.

@webhook-{{ Name of Webhook integration in Datadog }}

※ You can set “renotify if the monitor has not been resolved”.  You can retry if AppKeeper fails to recover for the first time.  It is not required for testing purposes, but we recommend you to set it to [10 minutes] (minimum interval).

Setup is now complete.

Step Nine: Confirm the integration by running the test again

I then confirmed that AppKeeper would restore the webserver if Datadog detected it to be down.

Open the Synthetics monitoring test case you just set up from UX Monitoring > Synthetic Tests in Datadog.

Click “Resume Test” in the upper right corner and turn on the Synthetics monitoring.

Now Datadog will perform Synthetics monitoring at regular intervals.

The Test Results show that the server is successfully accessed.

Next, I created a pseudo-failure of the web server to test AppKeeper’s automated remediation.

Since it is difficult to cause a real failure, I stopped the service and created a situation in which you cannot view the web page.  To do this I connected to the EC2 instance where the Nginx server is installed using SSH and stopped Nginx.

sudo systemctl stop nginx

After a short wait, Datadog detected that the web server is no longer accessible.

The Synthetic Tests page in Datadog also shows that the test case has failed.

If the test case fails, Datadog will notify AppKeeper that the Synthetics monitoring has failed.

When AppKeeper receives the notification, it will automatically attempt to restart Nginx.

So, if you wait a little while, you see that Datadog’s Synthetics monitoring check will pass again.

Also, if you log in to your AppKeeper dashboard, you’ll see that the recovery has been performed.

—

In this exercise I used a web server (Nginx) as an example to automate the process of detecting a failure with Datadog and restoring the service with AppKeeper.

Similar automation could be achieved by integrating Datadog with EventBridge and Lambda or by creating custom scripts.

However, if you frequently add target instances or restart a wide variety of services, the cost and complexity of maintaining EventBridge and Lambda or scripts will increase.

AppKeeper’s proven integration with Datadog and the ease with which you can add target instances to your application makes it easy to add automation to your DevOps environment to reduce your downtime.

If you are currently using Datadog and would like to try out AppKeeper’s Restart API, please first sign up for our 14-day free trial here (you can purchase a subscription once you have installed the free trial).  Then click here to request a free trial. We’ll walk you through the process and provide you with a free evaluation token to help you get started.

Apply for an evaluation token

Thank you.  I hope you will take this opportunity to learn more about SIOS AppKeeper, which provides automatic monitoring and recovery of applications running on EC2.

—  Tatsuya Hirao on the SIOS Technology technical team.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: Datadog, SIOS Appkeeper

EC2 Monitoring Best Practices: Using SIOS AppKeeper to Protect NGINX Webservers on Amazon EC2

July 14, 2020 by Jason Aw Leave a Comment

EC2 Monitoring Best Practices: Using SIOS AppKeeper to Protect NGINX Webservers on Amazon EC2EC2 Monitoring Best Practices: Using SIOS AppKeeper to Protect NGINX Webservers on Amazon EC2

NGINX is a web server that can also act as a load balancer, reverse proxy, etc. Together between them, NGINX and Apache serve more than 50% of the traffic on the web.  Today many companies are running their NGINX Open Source or NGINX Plus webservers on the Amazon EC2 environment using either Amazon Linux, Red Hat Linux, and Ubuntu.

Everyone agrees that it is a best practice to monitor applications like NGINX on EC2 and respond to any systems irregularities quickly.  Users expect fast access and constant uptime for their applications.

Current choices for monitoring NGINX webservers on Amazon EC2

Many companies are deploying Amazon CloudWatch to monitor their applications, and are even creating some levels of automation by developing scripts or by using AWS Lambda.  But configuring Amazon CloudWatch properly with custom metrics and setting up Amazon Lambda requires a certain amount of technical expertise that may be beyond that of many companies.  And then there is a cost and effort required to maintain any scripts as the applications evolve.

Another choice is to deploy an application performance monitoring (APM) solution, such as one from New Relic, Dynatrace, Datadog, or LogicMonitor.  APM solutions are great.  They do a really good job of watching over all your systems and pinpointing what happened and why.  They create logs that can be shared with and interpreted by your development team to recreate the issue and ensure that it doesn’t happen again.  But here’s the thing:  APM solutions provide a lot of data that you have to sort through (separating “signals from the noise”) and they do nothing to recover from failures when they occur.  APM tools are only part of the solution when it comes to reducing downtime for your NGINX webservers.

But some companies don’t have the internal staff or tools to monitor their EC2 environment themselves. This is the reason why they choose to outsource the task to a managed service provider.  There are some very real benefits to working with an MSP to manage your environment, such as not having to hire more staff as your environment expands, or not having to train your team on new technologies.  And the MSPs enjoy efficiencies as they can spread out their investments over many clients.  But there are downsides.  In some cases, you can be locked into high, fixed-cost contracts, and costs can escalate if issues are experienced and they have to escalate to address them.  And you lose continuity between the team that is monitoring the environments and those responsible for building and deploying the applications.

Whether you chose to invest in an APM solution or to outsource to an MSP, you still need to think about how quickly you can recover your NGINX webservers from downtime if and when it occurs.  We’d like to propose another alternative:  automated remediation with SOIS AppKeeper.

SIOS AppKeeper:  Automated remediation for NGINX webservers on EC2

Many of our customers have chosen to use SIOS AppKeeper to protect their NGINX webservers.  While they could have chosen a standard application performance monitoring (APM) solution or third-party monitoring solutions, they chose instead to rely on AppKeeper to automatically recover services or the entire EC2 instances if a failure occurs.  We will take a look at some of the reasons why and share with you a short video showing how AppKeeper works with NGINX.

SIOS AppKeeper is a SaaS service that is easy to install and configure and monitors any applications running on Amazon EC2, such as your NGINX webservers and their “nginx”, “cache manager”, and “worker” services.  When an anomaly is detected, AppKeeper automatically restarts the service, and if that doesn’t work it reboots the entire instance.  No more reading through painful logs to pinpoint the reason for the failure, or escalation to developers to restart your service or expensive outsourcing fees.  AppKeeper provides “set-it-and-forget-it” functionality so that you can rest assured knowing that your NGINX webservers are following EC2 monitoring best practices and are running properly, or will be quickly restarted if they experience any issues.

Wistia video thumbnail

Today hundreds of companies rely on AppKeeper to keep their cloud environments running.  We invite you to check out this quick video for a demonstration of how AppKeeper protects NGINX webservers.

If you would like to try SIOS AppKeeper for yourself, we offer a 14-day free trial.  Simply click here to sign up.

Filed Under: Clustering Simplified Tagged With: NGINX, SIOS Appkeeper

Recent Posts

  • The Power of Approximation in Business Decisions and Communication
  • SAP Disaster Recovery: Techniques and Best Practices
  • Designing for High Availability and Disaster Recovery
  • The Importance of Proper Memory Allocation in HA Environments
  • Top Reasons Businesses Are Adopting Disaster Recovery as a Service (DRaaS) Solutions

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2026 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in