Event-Driven Ansible for Alert Notifications
Last updated
Last updated
This article shows you how to receive ThousandEyes alert notifications in Event-Driven Ansible (EDA), part of Ansible Automation Platform, using custom webhooks.
Event-Driven Ansible (EDA) automatically executes actions, such as running an Ansible playbook, in response to events in a system or network. Using EDA, automation activities can be triggered based on specific events, such as a system failure, network change, or a new device being added. This approach allows for real-time response and proactive management of IT infrastructure. Instead of running Ansible scripts manually or on a schedule, the system can react to events and make necessary adjustments automatically. This can greatly improve the efficiency and responsiveness of IT operations.
The diagram below shows the high-level architecture of integrating ThousandEyes with Event-Driven Ansible in your environment.
The key component of EDA is the rulebook. The rulebook is where event sources, rules, and actions are configured. To integrate ThousandEyes alert notifications with an Event-Driven Ansible rulebook, you can use the webhook event source plugin from the ansible.eda
collection in your Ansible rulebook.
Let’s begin with the simplest Ansible rulebook to receive a webhook from the ThousandEyes platform. The code block below shows an Ansible rulebook that defines a webhook event source that listens on port 8080. The rulebook contains one rule, and the rule’s condition always passes, so this rulebook will perform an action for every event received. In this case, the action is simply printing out the event details on the console.
Save this to a file named simple-rulebook.yaml and run the rulebook with the following command:
Now that the rulebook is running on your EDA host and it’s ready to receive webhook events on port 8080, let’s create a ThousandEyes custom webhook to send alert notifications from the ThousandEyes platform to your EDA instance.
In the ThousandEyes platform, navigate to Integrations, click + New Integration, and click Custom Webhook.
In the Add Custom Webhook Integration panel, configure the fields for your new custom webhook.
Name: EDA Webhook, or whatever you choose
URL: http://<your-eda-host>:8080
Preset Configurations: Generic
Click Test at the bottom of the Add Custom Webhook Integration pane to send a test webhook from the ThousandEyes platform to your EDA server.
You should see a success message in ThousandEyes indicating the webhook was successfully sent to EDA. On your EDA host, you should see the webhook body printed on the console.
When your test has completed successfully, click Save to save the custom webhook integration for use with alert rules later in this guide.
In the minimal example above, we showed how to send ThousandEyes alert notifications to EDA using a custom webhook. Next, let’s take a look at how we can use that webhook payload in Ansible rule conditions so that EDA can automatically perform actions in response to the event.
The example rulebook and custom webhook shown above are for demonstration purposes only and do not include any authentication. In production or other sensitive environments, you should follow information security best practices, such as placing the EDA server behind a web application firewall, network firewall, and/or secure reverse proxy with TLS and authentication.
The custom webhook configuration shown above uses the built-in Generic preset body. The rest of this article is based on that body format. However, custom webhooks are built to be flexible; they allow you to customize the body of the webhook by adding, removing, or re-organizing fields within the body. To learn more about custom webhooks more generally, see Custom Webhooks.
When an event is received, EDA uses rules to determine if one or more actions should be executed. Each rule must contain a condition that is evaluated when an event is received. If the condition is met, the actions are executed.
Events from the webhook event source plugin include a payload
field containing the body of the webhook. This allows you to use the details of the ThousandEyes alert notification in your EDA rule conditions. This section describes how to reference the webhook event’s payload to implement an “if-this-then-that” logic in Ansible rulebooks.
One of the most important event properties when writing Ansible rulebook conditions for ThousandEyes alert notifications is the notification’s type
: whether this notification indicates an alert is “active” versus being “cleared”. An active alert notification is sent when the alert rule conditions are first seen, i.e, when the alert begins. A cleared alert notification is sent when the alert rule conditions are no longer met, i.e, when the alert is resolved.
For example, if we had a rulebook that performs a remediation action in response to an alert, we would only want to perform the remediation action when the alert is active, not when it’s cleared. However, we might still want downstream notifications in both cases: e.g., posting in a Webex or Slack channel as part of a ChatOps model.
For more information on alerts and how they change from active to cleared, see the Clearing Alerts section of the Alerts article.
The example rulebook below shows two rules. The first rule matches alert notifications that are active, and executes two actions: a remediation playbook, and a notification playbook. The second rule handles alert notifications that are cleared, and executes one action: the notification playbook.
This rulebook contains two rules, one named “Active Alert” and the other named “Cleared Alert”. The condition for both rules checks against the event.payload.type
. Type “2” indicates the alert is active, and type “1” indicates the alert is cleared.
Another important event property is the alert rule expression. Each ThousandEyes alert is created from an alert rule. Alert rules are configured with conditions that determine when an alert is triggered. The alert rule expression is a machine-readable representation of those alert rule conditions.
Alert rule expressions let you determine what kind of alert you are receiving in EDA. For example, an alert rule for HTTP server tests may have a condition that the HTTP response time is > 500ms. Such an alert rule would have an expression of (responseTime > 500 ms)
. As another example, an alert rule for an agent-to-server test may have multiple conditions, triggering when packet loss is at least 10% and network latency is at least 50 ms. This rule would have an expression of ((loss >= 10%) && (latency >= 50 ms))
.
The alert rule expression can be used in the Ansible rule condition by referencing event.payload.alert.rule.expression
. The codeblock below shows two EDA rules that match the example ThousandEyes alert rules described above.
By using the alert rule expressions in your Ansible rulebook conditions, you can create automations for general use cases based on the type of alert received, without coupling your rulebook to specific test IDs or alert rule IDs. In the example shown above, different automated actions could be executed depending on the details of the event that was received.
For a full list of alert rule expressions, see the alert rule metadata documentation in the ThousandEyes API developer reference. You can also use the Alert Rules API to query the details, including the expression, for a given alert rule.
In some cases, you may want to create automations that are tightly coupled to one or more specific tests. For example, to specify that the automation action (such as an Ansible playbook) is specific to one host (the test target) and not applicable to your entire fleet of hosts and their respective ThousandEyes tests. In these cases, you can include the ThousandEyes test metadata as part of the EDA rule condition. Available properties include:
event.payload.alert.test.id
event.payload.alert.test.name
event.payload.alert.test.description
event.payload.alert.test.testType
There are many other alert details that can be used in your Ansible rulebooks. Additional fields include timestamps, severity, and distinct measurements, if applicable. For a complete list of webhook variables, see Webhook Variables.
When you use these variables within an Ansible rulebook, be sure to prefix the names with event.payload
. For example, to use the alert.targets.size
webhook variable, reference event.payload.alert.targets.size
in your EDA rule condition.
The following sections demonstrate example use cases of integrating ThousandEyes monitoring with Event-Driven Ansible automation.
Failure to renew a certificate on time can result in website downtime and other service disruptions. Certificate renewal can be time-consuming and resource-intensive, particularly if it's done manually, and is prone to human error, which could lead to a certificate not being renewed correctly or on time.
In addition to capturing performance metrics like response time and the application response code, the ThousandEyes HTTP server test inspects TLS certificates when monitoring HTTPS targets and checks for validity, including expiration. An alert rule can be configured to trigger when TLS certificates have expired or when they will expire within some number of days. By combining ThousandEyes TLS monitoring with Event-Driven Ansible, you can automate certificate renewal and prevent service disruptions.
This example is based on a ThousandEyes Enterprise Agent monitoring an internal web application with a TLS certificate from an internal certificate authority, but is also applicable to Cloud Agents, public web applications, and public certificate authorities.
Navigate to Alerts > Alert Rules and click Add New Alert Rule.
In the Add New Alert Rule dialog, choose the alert type of Web > HTTP Server.
In the Alert Conditions section, select Certificate / expires within / 14 days, as shown in the screenshot below.
Use the Tests selector to assign the rule to one or more HTTP server tests.
Click the Notifications tab, and in the Integrations section, select the custom webhook integration you had previously created.
Finally, click Create New Alert Rule to apply your changes.
Use the following rulebook to receive ThousandEyes alert notifications in EDA. This example rulebook uses the webhook event source plugin and listens on port 8080, but you can change this to whatever port fits your requirements. The rulebook contains three rules:
A rule when the ThousandEyes alert notification is triggered and received, which executes an Ansible playbook to renew the TLS certificate
A rule when that renewal playbook execution succeeds
A rule when that renewal playbook execution fails
The first rule, which should match when the ThousandEyes alert notification is triggered, is based on the alert rule’s expression and the alert notification’s type. Specifically, this rule condition matches when the alert notification is triggered, not cleared, and when the alert rule expression matches certificate expiration, as in the alert rule we created above. When this first rule is matched, it runs an action of executing a playbook. In this example, the playbook will renew the TLS certificate on the target host. Additionally, the results of the playbook execution will be fed back into the rulebook.
The second and third rules match based on the results of the playbook executed in the first rule. This allows EDA to handle both possible outcomes of the attempt to automatically renew the certificate. If the playbook succeeded, one followup action may be to post an informational message to a Slack or Webex channel. If the playbook failed, there may be additional playbooks to run to try to renew the certificate, or to create an incident in an ITSM to escalate the issue and fall back to a manual process.
Today, many web applications are deployed behind load balancers or reverse proxies to distribute network or application traffic. This setup is designed to improve availability and reliability by balancing the load between servers and, in turn, preventing any single server from becoming a bottleneck.
However, if a backend application goes down, while clients can still reach the frontend load balancer, the load balancer will not be able to reach the backend. This inability to communicate with the backend can result in the load balancer timing out or responding with a 5XX HTTP response code. The 5XX codes indicate a server error and that the server is aware it has encountered an issue but is unable to perform the request.
This example is based on ThousandEyes Cloud and Enterprise Agents monitoring a public web application behind a reverse proxy and automating remediation when that reverse proxy responds with 5XX errors indicating a loss of connection to its backend.
Navigate to Alerts > Alert Rules and click Add New Alert Rule.
In the Add New Alert Rule dialog, choose the alert type of Web > HTTP Server.
In the Alert Conditions section, select Response Code / is / server error (5xx), as shown in the screenshot below.
Use the Tests selector to assign the rule to one or more HTTP server tests. Then, click the Notifications tab, and in the Integrations section, select the custom webhook integration you had previously created.
Finally, click Create New Alert Rule to apply your changes.
Use the following rulebook to receive ThousandEyes alert notifications by webhook to EDA. This example rulebook uses the webhook event source plugin and listens on port 8080, but you can change this to whatever port fits your requirements. The rulebook contains three rules:
A rule when the ThousandEyes alert notification is triggered and received, which executes an Ansible playbook to restart the web application
A rule when that restart app playbook execution succeeds
A rule when that restart app playbook execution fails
The first rule, which should match when the ThousandEyes alert notification is triggered, is based on the alert rule’s expression and the alert notification’s type. Specifically, this rule condition matches when the alert notification is triggered, not cleared, and when the alert rule expression matches the one we just configured above, i.e., Response Code is server error (5xx)
. When this first rule is matched, it runs an action of executing a playbook. In this example, the playbook will restart the web application on the target host. Additionally, the results of the playbook execution will be fed back into the rulebook.
The second and third rules match based on the results of the playbook executed in the first rule. This allows EDA to handle both possible outcomes of attempting to restart the web application. If the playbook succeeded, one followup action may be to post an informational message to a Slack or Webex channel. If the playbook failed, there may be additional playbooks to run to try to remediate the 5xx server error, or to create an incident in an ITSM to escalate the issue and fall back to a manual process.