The ThousandEyes platform allows you to configure highly customizable alert rules and assign them to tests, in order to highlight or be notified of events of interest. For those who want simplicity in alert configuration and management, the ThousandEyes platform ships with default alert rules configured and enabled for each test.
Alert notifications are delivered either via email, SMS, or classic or custom webhooks – including templated webhooks for sites such as PagerDuty, Slack, Cisco Webex and AppDynamics. Notification recipients are configured in the Alert Rules > Notifications tab, or through the Integrations page. Alerts will be active in the ThousandEyes platform as long as your alert rule conditions are met, but notification of the alert being active will only occur at the start of the active period. Alerts can optionally be configured to send notifications once the alert is no longer active.
For email notifications, when multiple alerts are raised simultaneously, their data will be grouped into a single email notification.
Classic and custom webhooks permit users to send JSON-formatted alert data to a webhooks-enabled server via HTTP. The information can then be programmatically processed and subsequent actions taken automatically. For more information on configuring ThousandEyes alert rules with webhooks, see Alert Notifications Via Webhooks.
For notifications into AppDynamics, ThousandEyes sends alert notifications to an AppDynamics instance for a specific application. You can set up multiple integrations to the same instance, targeting different applications or severity levels. For more information on configuring ThousandEyes alert notifications with AppDynamics, see AppDynamics Integration.
For notifications into PagerDuty, ThousandEyes allows you to use an Escalation Policy (which defines rules for notification destinations, repeat notifications and other actions) in your PagerDuty service to receive notifications from ThousandEyes. For more information on configuring ThousandEyes alert rules with PagerDuty, see PagerDuty Integration.
For notifications into Slack, ThousandEyes allows alert data to be sent to a chat or instant-message application. Users can send notifications to the Slack channel of their choice. For more information on configuring ThousandEyes alert rules with Slack, see Slack Integration.
For notifications into ServiceNow, ThousandEyes facilitates delivery of direct notification into a ServiceNow account so it may be processed and acted upon based on workflows defined within that system. For more information on configuring alert rules to send notifications directly into the ServiceNow platform, see ServiceNow Integration.
There are two different types of alert conditions in an alert rule: global alert conditions and location alert conditions. It is important to note that these conditions are not, in fact, triggered based on physical location, but on the conditions they meet. A global alert is a triggering event, where all the conditions set out in the alert have been met and the alert becomes active. A location alert is a qualifying event, where only a portion of the conditions are met, but still qualify them as belonging to the global event.
In the example below, a global alert is triggered on the HTTP connect response alert if the following conditions are met:
- Any location conditions are met by 10% of agents associated with 8 tests for 2 of 2 times in a row.Global and location conditions
The location conditions are:
- Connect Time is greater than or equal to 150 ms.
- Response Time is greater than or equal 100 ms.
Once the global alert condition has been triggered, any agent which meets the location alert conditions in a single round will be included as “active” in the alert as long as the global alert remains active. When an agent no longer meets the location alert conditions, it will no longer show as “active” but will remain associated with the alert.
For example, in the image below, you can see that for the HTTP connect response alert the Panama City agent triggered the global alert first at 11:25 (see Start column). While it was still active, the Copenhagen, Palermo, and Seoul agents also met the location alert conditions at 11:40, but then became inactive once their response times decreased to below the local alert conditions (as seen in the Current metric column). The alert remains active until Panama City - or the last remaining active agent - no longer meets the location alert conditions.
Panama City alert
A location alert is included within a global alert when a single alert trigger meets the location alert conditions for at least one round, regardless of the thresholds set for the global alert. An alert trigger is the element that a specific test type is set to examine, and includes:
- For Cloud and Enterprise Agent tests, the alert is triggered by agents.
- For Endpoint Agent tests, the alert is triggered by visited sites or by Endpoint Agents.
- For BGP tests, the alert is triggered by BGP monitors.
- For device tests, the alert is triggered by Interfaces.
- For Internet Insights tests, the alert is triggered by affected tests or by catalog providers.
It is important to note that location alerts trigger and clear independently from the global alert. If you see multiple location alerts triggered under a global alert, you cannot assume that all the listed location alerts met the initial alert criteria from a per-round basis. They could have been added for meeting the condition for only one round. To verify which location alerts initially triggered the global alert condition, it is best to check the test data.
It is also important to note that the only location alerts that will be displayed in the UI at the start of an active global alert will be the location alerts active at the time of trigger. This can lead to scenarios where a flapping alert trigger was involved in the evaluation criteria of a location alert being triggered, but has since cleared before the global alert becomes active. For example, imagine alert criteria that states "Any 2 agents have an error 3 out of 3 rounds." And the following occurs:
- Agent A - meets condition in rounds 1, 2, 3
- Agent B - meets condition in rounds 2 and 3
- Agent C - meets condition in round 1
In the scenario listed above, 2 agents meet the criteria 3 out of 3 rounds: round 1 is agent A and C, rounds 2 and 3 are agents A and B. At the global alert trigger, only agents A and B will be listed in the location alerts, since agent C cleared before the global alert triggered, even though agent C contributed to the trigger of the alert. This will only happen when the alert conditions have multiple agents that need to meet an alert criteria multiple rounds in a row.
There are two primary states of an alert: active and cleared. An alert is active when the conditions set forth in the global alert rule are met. A global alert is only cleared once all the location alerts associated with it have cleared. For more information about global and location alerts, see Global and Location Alert Conditions.
For example, if the alert conditions specify that a minimum of three agents have to meet location alert conditions in order to trigger a global alert, once those three (or more) agents are in an alert state, all of the agents must no longer meet location alert criteria for the global alert to be cleared. This means that if at least one of those agents is still in the alert state, the global alert will remain active. The global alert will not show as cleared until that last agent is no longer meeting the location alert criteria.
There are times when an alert trigger becomes associated with a global alert, but then goes offline and stops being able to send data to the ThousandEyes platform. This results in the alert staying perpetually active because the alert trigger can’t provide data to clear the alert state. ThousandEyes automatically clears these alerts if the alert trigger is unable to send data for at least 12 hours. When alerts are cleared in this way, the Current Metric field in the Alerts History page automatically updates to "N/A (No Longer Valid)", and sends you a notification of the cleared alert if you have opted into alert clearing notifications.
Proxied agents can only collect proxied metrics. They will not collect non-proxy metrics such as packet loss, latency, and jitter. Thus, any active alerts using non-proxy conditions can never be cleared by proxied agents (for more on setting up proxy agents, see the Proxy Environments section). To avoid this, we recommend separating any tests and alert rules for proxied agents from those for non-proxied agents. Tests that collect proxy metrics should only be assigned to the proxied agents and should use proxy metric-based alert rules. Tests without any proxied agents should use non-proxy metric-based alert rules.
If you run into the above situation where an alert is not clearing due to missing data or a mismatch in proxy versus non-proxy metrics, you can manually clear the alert by un-assigning the alert rule from the test and waiting for a round of data collection. This will clear the alert. You should then further refine the alert rule to match the specific criteria you need before re-assigning it to a test, based on the guidelines above.
Current and past alerts can be viewed on the Alerts List page. The Alerts List page has two tabs, with lists arranged chronologically by Start time by default (though you may sort by other columns as well):
- Active Alerts: List of alerts currently active for any test within your account group. The tab refreshes every two minutes.
- Alerts History: List of alerts no longer active from tests in your account group.
Active alert screen
On the Active Alerts tab, you will find:
Search: Type search criteria into the field marked "Filter by test name or scope…" at the top to search for matching alerts. The number of results are shown to the right of the search field. The text you enter can match Alert ID, Alert Rule Name, Alert Type, Test ID, Test Name, Test Type, or Severity. If you enter more than one search criterion, select either All or Any in the dropdown next to the search field to specify whether the results returned should match all (AND) or any (OR) of the selected criteria. You must hit Return/Enter (not Space) between search criteria to create multiple search terms. When you search on the Alert Rule Name, the results include alerts with names that fit any of the following:
- Match at least 75% of keywords in the search text.
- Contain the search text as a phrase.
- Match the search text exactly.
Note: The search field acts exactly the same in the Alerts History tab.
Alert Rule: Name of the alert rule currently active. For a quick overview of the alert criteria, click the info icon that appears next to the alert name on hover. A tooltip appears that identifies the test type, test direction, and alert condition(s) triggered.
Alert rule tooltip
For more detailed information, click anywhere in the alert rule row; a side panel will open up.
Alert side panel
Side panel: The side panel offers metadata about the alert rule along the top, including Start date and time, Scope, Impacted Tests (if applicable) and Severity.
The table underneath offers detailed information about each alert trigger within the scope of the alert (alert triggers could be, for example, agents, monitors or catalog providers affected, depending on the test type). The table shows different columns of information depending on the test type: for example, a Prefix column appears on a BGP test alert but does not appear on an alert where agents are affected. Conversely, a Server column will appear on an HTTP server test alert, but not on a BGP test alert.
Note: You can adjust the column widths to view all relevant data within the columns.
As with the Alert List, a search field at the top allows you to search the affected alert triggers. You can search by Scope, Metrics at start, and Current metric. Note: the Metrics at start displays the alert condition triggered, while the Current metric displays the alert status.
Next to each affected alert trigger is a stack icon. This is a link to the test and the test round in which the alert trigger matched the triggering criteria for the alert rule. Clicking this link opens a new tab which takes you to the relevant Views screen, test, and test round.
At the bottom left of the side panel you will find the Alert ID, which you can copy for use within other areas of the platform, such as custom webhooks and API calls. The copy icon appears on hover.
The bottom right offers selectable page view parameters, where you can choose to view up to 50 items per page, move to the next or previous page, or move to a specific page.
Side panel pagination
The Alert History tab provides much the same information, and presented in the same way, as the Active Alerts tab, with some notable exceptions, described below.
The Alert History tab lists previously triggered alerts which are currently in a "cleared" or "inactive" state or are "disabled".
Search: The search field within Alerts History acts in exactly the same way as the search field in Active Alerts. See Active Alerts for information.
Date and time selector: Click the date and time field on the top right to narrow your results to a specific time frame.
Date and time selector
The selector defaults to the Fixed Time Interval view when first opened (then to the view last used thereafter). You can select the relevant dates on the calendar itself, type in new dates and times in the date and time fields at the bottom, or select a predetermined time span from the left column (including Today, Yesterday, This Week, Previous Week, This Month and Previous Month). When you’re finished selecting dates, click Apply to view the results.
Select the Relative Time Interval view to see a wider range of time spans, from the last one hour to the last 90 days. Note: the Relative Time Interval ranges always look backwards from today. The time spans in the Fixed Time Interval view are not fixed to today’s date, such as Previous Week and Previous Month.
Relative Time Interval view
Alert Rule: Name of the alert rule no longer active. The table includes a column for Duration that shows how long the alert was active before clearing. Clicking the Alert Rule row will open up the side panel for more information about the alert.
Side panel: The side panel works and presents information about cleared alerts in the same way as it does for active alerts. See Active Alerts for information. The only difference is that the Duration is presented at the top alongside Start, Scope, Impacted Tests and Severity.
Once you have created an alert rule it can be assigned to any test which has the Enable box checked, on the test configuration page. By default, each test has the rule "Default <test type> Rule" assigned to it, with your account's email address configured as the recipient for email notification. To add or remove rules, click the pull-down menu below the Enable box, and select or deselect rules. To create a new rule, click the Edit Alert Rules link to access the Add New Alert Rules page, and create your rule. You will then return to the test configuration page, and use the pull-down menu to assign your new rule to the test.
Each rule has a name, a series of tests against which it is enabled, a scope of locations to which the alert rule applies, Boolean criteria defining the alert conditions, and the number of locations from which the alert conditions must be met in order to trigger an alert. The rule also can include a notification mechanism, such as a list of email recipients (recipients need not be users of ThousandEyes in order to receive email notifications), a PagerDuty Service or one or more webhooks.
Each alert rule assigned to a test is evaluated independently. For tests with multiple alert rules assigned, any alert can be triggered when alert conditions are met. A test with multiple alert rules assigned to it can show zero, one, or multiple triggered alerts depending on what alert criteria were met during a single test pass.
The image below displays the rest of the configuration options of a new alert rule:
- 1.Specify the number of agents, all/any of the following alerting conditions, and the number of test rounds the conditions must be met before alerting.
- 2.Sticky Agents: Select “any of” if you want an alert sent when any set of agents meet the alert condition(s) in consecutive rounds. Select “the same” if you want an alert sent only if the same set of agents meet the alert conditions(s) across multiple rounds. For example, an alert rule is configured for if the same agent trips a specified threshold in three consecutive rounds. The Atlanta cloud agent trips the rule in round one, the Ashburn cloud agent trips it in round two, and the San Francisco cloud agent trips it in round three. In this scenario, the alert rule would not trigger when using sticky agents. Either Atlanta, Ashburn, or San Francisco would need to trip the rule in three consecutive rounds to trigger the alert. In addition, keep in mind that location alerts are triggered and cleared on a single-round basis, independently of the global alert. Therefore, a location alert appearing on a rule using sticky agents does not always imply that the location was part of the set of agents that met the "same agents X out of Y times” criteria, just that the agent met the alert condition(s) at least once while the global alert was active. Note: Sticky Agents are currently only available for Cloud and Enterprise Agent alerts.
- 3.Threshold: Specify the threshold value for locations (agents, monitors, or countries, depending on rule type) that must meet the alert conditions in order to trigger this alert rule. This value will be either a number of agents/monitors/countries, or a percentage of agents/monitors/countries, as specified in the next setting. NOTE: When a percentage of agents, monitors, or countries is used, and the percentage results in a non-whole number threshold value of actual agents, monitors, or countries, the fractional part of the value is significant. For example, when an alert rule with a threshold of 25% of all agents is applied to 13 agents, the threshold is 3.25 agents. This threshold will require 4 agents to meet the alert criteria in order to trigger the alert rule.
- 4.Threshold units: Select either agent, monitor, or country, or percentage of agents, monitors, or countries.
- 5.Rounds (met): Select the number of test rounds that the following alert condtion(s) must be met out of a total number of rounds in order to trigger the alert rule. See the Rounds (total) entry below.
- 6.Rounds (total): Select the total number of test rounds in which the Rounds (met) selection is evaluated. For example, if Rounds (met) = 2 and Rounds (total) = 3 then for every three rounds, the alert rule will trigger if the condition(s) were met twice.
- 7.Metric: Select a test metric for this condition.
- 8.Operators: The following operators are available:
- >, <, ≥, ≤ : Numerical comparisons for greater than, less than, greater than or equal to, less than or equal to. Available for all numerical (decimal and integer) metrics, such as packet loss percentage (decimal) of Network Layer tests, or Error Count (integer) of a Page load test.
- is, is not: Numeric comparison for values which are not continuous ranges (e.g. HTTP status codes) or to a fixed string value, such as the Error Type (e.g. "DNS", "Connect", "SSL").
- in, not in: Numeric or string comparison to a list of values. For example, a BGP Routing rule compares a test metric's AS number (integer) to a list of one or more AS numbers to determine if the test metric is found or not found in the list. Use a wildcard * when matching against word spaces. For example, "10*aspmx3.googlemail.com."
- is empty, is not empty: Determines whether a metric has a value or has no value.
- is incomplete: Determines whether a test completed the operations for a given metric. For example, a Path Trace alert rule is used to determine whether the path trace reached its destination, or a Page Load test fully loaded a page.
- is present: Triggered when an error condition is present.
- matches, does not match: Determines whether the POSIX regular expression in the alert rule is found within the string produced by the test metric (i.e. a substring will produce a match). For example, an alert rule for the Error metric of an HTTP Server test with the following alert condition:
will alert when the test's Error Details text is "SSL certificate problem: certificate has expired":
because the regular expression "certificate\s*\w*:" matches the sub-string "certificate problem:". The operators available per type of alert rule are also shown in the table below. 9. Threshold: The value that the Metric setting will be compared against, using the chosen operator. Note that some operators do not have a Value field. 10. Add/Delete: Click the + or - icon to add or delete alert criteria to this alert rule. Criteria can be nested for some types of alert rule.
DNS server tests differ from other ThousandEyes tests in that multiple servers can be explicitly targeted in a single test. As a result, DNS server alert rules are evaluated on a per-server basis. That is, for each server in the DNS Servers field of the test configuration, the alert conditions are evaluated separately from all other servers in the DNS Servers field. For example, consider an alert rule that has the following alert conditions:
When assigned to a DNS server test with two servers configured as the targets, each server will be evaluated separately against the above alert condition. To trigger the alert rule, at least four agents must receive an error against same DNS server. The alert rule would not be triggered if, for example, three agents received an error when testing the first DNS server and a fourth agent received an error when testing the second DNS server.
A BGP alert rule can be applied to a Routing Layer BGP test, or to a different Layer type that provides the BGP Route Visualization View. It is important to note that some alert rule conditions can be applied differently depending on which type of test the rule is assigned to. For example, a BGP test has only a single target prefix which will be evaluated against the alert Conditions. If the "Covered Prefixes" box is checked, any covered prefixes found are not evaluated against the alert Conditions except the explicit "Covered Prefix" condition.
In contrast, a non-BGP test type can have one or more targets. DNS Server tests can explicitly test multiple DNS servers. An Agent to Server target's domain name can resolve to multiple servers IP addresses. When creating the BGP Path Visualization, the Prefix selector will show these multiple target prefixes, and evaluate each prefix against any BGP alert rules assigned to the test. Thus, prefixes which would be considered covered prefixes under a BGP test and not evaluated by the alert rule (unless by a "Covered Prefix" condition) are evaluated when assigned to the non-BGP test. Similarly, the "Covered Prefix" condition does not have any relevance when assigned to a non-BGP test.
BGP alert rules have a parameter named "Prefix Length", which is used to determine the length of prefixes evaluated by the rule. The "Prefix Length" can be individually configured for IPv4 and IPv6 protocols.
The default BGP alert rule will fire when 10% of monitors have less than 100% reachability.
In addition to presenting the alert in the app.thousandeyes.com UI, the ThousandEyes platform can deliver notifications of alerts through a number of services. The image below displays the Notifications configuration options of a new alert rule.
- 1.Send emails to: A list of addresses to which an alert email will be sent when the alert rule is first triggered. Addressees need not be users of the ThousandEyes platform.
- 2.Edit emails: Click this link to add email addresses to the Notifications address book.
- 3.Send an email: Check this box to send an email when the alert rule is no longer active.
- 4.Add/Remove Message: Enter text to be added to the body of the alert rule's email notification.To prevent code injection, custom messages cannot contain words or phrases wrapped in angle brackets "<like this>"
- 5.Webhooks: Webhooks-enabled web services that receive the alert notification.
- 7.Integrations: integrations that should receive the alert Notification.
Note: Alerts are active as long as your alert rule criteria are met, but any configured email notification will only occur at the beginning of the alert.
The following table shows a list of test types which are available in the ThousandEyes platform, and the test metrics and operators.
- 2.These metrics are configurable under the "Any Hop", "Last Hop", or "Hop #" entries in Path Trace alert rules. Select "Any or "All" for multiple sub-conditions.
- 3.These metrics are accessed under the "Any Component" alert condition in Page Load Tests. Select "Any or "All" for multiple sub-conditions.
- 4.Only BGP Routing tests provide Covered Prefix data. Do not assign a BGP alert rule with a Covered Prefix metric to a non-BGP test type that has BGP Path Visualization measurements enabled. For non-BGP test types, use an alert rule that does not include the Covered Prefix metric, and if needed create a separate BGP test and an a separate alert rule with the Covered Prefix metric.
Default alert rules are defined according to the following list. Within the account group, default alert rules can be applied to tests and changed by any user having a role with the View alert rules and Edit alert rules permissions, such as the built-in Account Admin or Organization Admin roles. Most default alert rules apply to Cloud and Enterprise Agent tests, though you will also find default alert rules for Endpoint Agent tests, BGP tests and Internet Insights outages, each found on their corresponding tab within Alert Rules, and indicated in the table below by a menu path.
Default alert rules have the following common characteristics:
- Severity is set to Minor.