Creating and Editing Alert Rules
Last updated
Last updated
You can configure alert rules for different conditions including both synthetic tests as well as alerts that don’t rely on tests such as WAN Insights. This article will walk you through the common parts of alert rule configuration, plus those parts that are unique to each kind of test.
As for common characteristics, each alert rule has:
A name.
A series of tests against which it is enabled.
A scope of alert triggers (such as agents or monitors) to which the alert rule applies (with the exception of Endpoint Agent scheduled tests).
Criteria defining the alert conditions.
The number of alert triggers that the alert conditions must meet in order to activate an alert.
Alert rules also include a notification mechanism, such as a list of email recipients (recipients do not need to be users of ThousandEyes in order to receive email notifications), a PagerDuty service or one or more webhooks.
Each alert rule assigned to a test is evaluated independently. For tests with multiple alert rules assigned, any alert can be triggered when alert conditions are met. A test with multiple alert rules assigned to it can show zero, one, or multiple triggered alerts depending on what alert criteria were met during a single test pass.
To create a new alert rule, click Alerts > Alert Rules. The Alert Rules page opens.
From the tabs at the top of the page, select the desired alert source:
Cloud and Enterprise Agents
Endpoint Agents
BGP Routing
Devices
Internet Insights
Then click Add New Alert Rule. The Add New Alert Rule panel opens. The image below shows the panel that opens for Cloud and Enterprise Agents.
Every new alert panel within each alert source opens with three sections. The top section is where you choose the type of alert you wish to configure and give it a name. The bottom two panels consist of the Settings tab, where you specify the alert triggers (middle section) and alert conditions (bottom section). For information about the Notifications tab, see Alert Notifications.
In the top section of the panel for each new alert, you will find:
Alert Type: Select the test layer for this alert rule.
Compatible Test Types (for Cloud, Enterprise, and Endpoint Agents only): As you select the test layer in the Alert Type field, the dropdown field to the right displays the test types to which this alert rule can be assigned.
Rule Name: Specify a name for the alert rule.
The middle and bottom sections of the panel consist of the Settings tab. The middle section is where you configure your alert triggers (such as agents, monitors, or catalog providers). The fields in this section vary depending on the alert source and type, set out below.
Direction (only for Network: Agent to Agent and Network: Path Trace tests): Enables you to choose whether the alert triggers in the Source-to-Target, Target-to-Source, or Both (Agent to Agent) or Either (Path Trace) direction.
Tests: A dropdown menu listing all the tests set up in your account group. Select one or more tests to assign them to this alert rule.
Agents: Select the agents to which you will assign this alert rule. The options are:
All agents: All agents will be assigned this alert rule.
All agents except: All agents will be assigned this alert rule except for the ones selected.
Specific agents: Only the selected agents will be assigned to this alert rule.
Note: Selecting All agents except or Specific agents opens another dropdown menu where you can select the agents you do or don't want to alert on.
Severity: Choose from Info, Minor, Major, and Critical.
Real User Tests
Agents: Select the agents to which you will assign this alert rule. The options are:
All agents: All Endpoint Agents belonging to the account group will be assigned this alert rule.
Specific agents: Only the selected Endpoint Agents will be assigned to this alert rule.
Agent labels: Only the Endpoint Agents with the specified label will be assigned to this alert rule.
Note: Selecting Specific agents or Agent labels opens another dropdown menu where you can select the agents or labels you want to alert on.
Visited Sites: Select the sites for which this alert will be triggered. The options are:
Any visited site: Any site within the monitored domain set that a user visits will be assigned to this alert rule.
Specific visited sites: Only the selected visited sites will be assigned to this alert rule. If you select this option, a dropdown menu appears where you can select from a number of suggested domains or type in a custom domain.
Severity: Choose from Info, Minor, Major, and Critical.
Scheduled Tests
Tests: A dropdown menu listing the all compatible Endpoint tests set up in your account group. Select one or more tests to assign them to this alert rule.
Severity: Choose from Info, Minor, Major, and Critical.
Tests: A dropdown menu listing the all the tests set up in your account group. Select one or more tests to assign them to this alert rule.
Prefix Length: A dropdown menu allowing you to specify the length of prefix for both IPv4 and IPv6. The length defaults to between 16-32 for IPv4 and 32-128 for IPv6.
Monitors: Select the monitors to which you will assign this alert rule. The options are:
All monitors: All monitors will be assigned this alert rule.
All monitors except: All monitors will be assigned this alert rule except for the ones selected.
Specific monitors: Only the selected monitors will be assigned to this alert rule.
Note: Selecting All monitors except or Specific monitors will open another dropdown menu where you can select the monitors you do or don't want to alert on.
Severity: Choose from Info, Minor, Major, and Critical.
Devices (for the Device alert type only): A dropdown menu listing all the monitored devices set up in your account group. Select one or more devices to assign them to this alert rule.
Interfaces (for the Interface alert type only): A dropdown menu listing all the monitored interfaces set up in your account group. Select one or more interfaces to assign them to this alert rule.
Affected Tests: Select the affected tests to which you will assign this alert rule. The options are:
Any: Any affected tests will be assigned this alert rule.
Specific: Only the selected affected tests will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the affected tests you want to alert on.
Catalog Providers: Select the catalog providers to which you will assign this alert rule. The options are:
Any: Any catalog providers will be assigned this alert rule.
Specific: Only the selected catalog providers will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the catalog providers you want to alert on.
Severity: Choose from Info, Minor, Major, and Critical.
As with the alert triggers, the alert conditions vary depending on alert type, but also test type. First, we'll explain how to apply alert conditions to alert triggers (the first line under Alert Conditions in the image below); these are called global alert conditions. Then we'll explain how to set the alert conditions themselves (those items with a "-/+" next to them in the image below); these are called location alert conditions. For more information about global and location alert conditions see Global and Location Alert Conditions.
The global alert condition is where you specify how your location alert conditions will be applied to your alert triggers, including how many location alert conditions, the number or percent of alert triggers, and how many test rounds must be met before alerting. Except for Internet Insights, all alert rules have similar options for configuring global alert conditions (Internet Insights is automatically configured to have any or all conditions met by the outage network within 5 minutes). The following list will explain how to configure each configurable field as you read the global condition from left to right.
All/Any: Select All when all the specified location alert conditions must be met (AND) or select Any when any one of the specified location alert conditions must be met (OR).
When only one location alert condition is specified, the system defaults to "All" conditions. You must add at least one other location alert condition to see the dropdown options.
any of/the same (for Cloud and Enterprise Agents and BGP routing only): Select any of if you want an alert activated when any set of alert triggers (agents or monitors) meet the alert condition(s) in consecutive rounds. Select the same if you want an alert activated only if the same set of alert triggers meet the alert conditions(s) in consecutive rounds. When you select the same, this is called selecting "sticky triggers".
Sticky triggers: For example, an alert rule is configured for the same agent to trip a specified threshold in three consecutive rounds. If the Atlanta Cloud Agent trips the rule in round one, the Ashburn Cloud Agent trips it in round two, and the San Francisco Cloud Agent trips it in round three, an alert would not be activated. Either Atlanta, Ashburn, or San Francisco would need to trip the rule in three consecutive rounds to activate an alert.
Threshold value (not applicable to Devices): Specify the threshold value for alert triggers that must meet the alert conditions in order to trigger this alert rule. This value will be either a number of alert triggers or a percentage of alert triggers, as specified in the next setting.
Note: When a percentage of alert triggers is used, and the percentage results in a non-whole number threshold value of actual alert triggers, the fractional part of the value is significant. For example, when an alert rule with a threshold of 25% of all agents is applied to 13 agents, the threshold is 3.25 agents. This threshold will require 4 agents to meet the alert criteria in order to activate the alert rule.
Threshold units: Select either the alert trigger, or the percentage thereof. The options for each alert rule are:
Cloud and Enterprise Agents (all test types): agent(s) or % of agents.
Endpoint Agent Real User Tests - Application test type: agent(s) or % of agents.
Endpoint Agent Real User Tests - Endpoint test type: visited site(s) or % of visited sites.
For Endpoint Scheduled Tests (all test types), you select a threshold value for both number and percentage of Endpoint Agents.
BGP Routing: monitor(s) or % of monitors.
For Devices, the threshold unit is part of the location alert condition, where the options are: Any interface or Any interface matching.
Rounds (met): Select the number of test rounds that the subsequent location alert condition(s) must meet out of a total number of rounds in order to activate the alert rule. See also the Rounds (total) entry below.
Rounds (total): Select the total number of test rounds against which the Rounds (met) selection is evaluated. For example, if Rounds (met) = 2 and Rounds (total) = 3, then for every three rounds, the alert rule will activate if the condition(s) were met twice.
Time interval (for BGP Routing only): Select the time interval, in minutes, that the alert triggers on. For example, if 1 monitor must meet the location condition (e.g. reachability is less than 80%) to trigger an alert, and the time interval is set to 3, the alert triggers when 1 monitor's reachability is less than 80% for 3 consecutive minutes. If you change the time interval to 5, reachability must be less than 80% on any 1 monitor for 5 consecutive minutes before the alert triggers. Time intervals are available in 1-minute increments, starting with 1 minute up to a maximum of 180 minutes.
Location alert conditions are where you set the specific metrics on which an alert becomes active. You can set any number of metrics for an alert, though bear in mind that the more metrics you set, the less likely it is an alert will activate. Location alert conditions are configured by choosing at least one metric (the test characteristic against which you're measuring change) and one operator (the type of measure). Depending on the metric, other configurable options include threshold values and units. Reading left to right, location alert conditions include the following configurable fields:
Metric: Select a test metric for this condition.
Operators: Select an operator for this condition. There are many operators to choose from, some of which are self-explanatory. Below is a selection with more explanation. For a full list of metrics, operators and units, see the table under [Available Metrics, Operators, and Units].
>, <, ≥, ≤ : Numerical comparisons for greater than, less than, greater than or equal to, and less than or equal to. Available for all numerical (integer only) measures, such as packet loss percentage on network layer tests, or error count on page load tests.
is, is not: Non-numeric comparison for values that are not continuous ranges (e.g., HTTP response codes) or that are a fixed string value, such as the error type (e.g., "DNS", "Connect", "SSL"). Also, when suffixed with "empty", determines whether a metric has a value or has no value.
in, not in: Numeric or string comparison to a list of values. For example, a BGP routing rule compares a test metric's AS number (integer) to a list of one or more AS numbers to determine if the test metric is found or not found in the list. Use a wildcard * when matching against word spaces. For example, "10 * aspmx3.googlemail.com."
is incomplete: Determines whether a test completed the operations for a given metric. For example, this metric can be used to determine whether a path trace reached its destination, or a page load test fully loaded a page.
is present: Used when an error condition is present.
matches, does not match: Determines whether the POSIX regular expression in the alert rule is found within the string produced by the test metric (i.e., a substring will produce a match). For example, an alert rule for the Error metric of an HTTP server test with the following alert condition
will alert when the test's Error Details text is "SSL certificate problem: certificate has expired":
because the regular expression "certificate\s*\w*:" matches the sub-string "certificate problem:". The operators available per type of alert rule are also shown in the table below.
Threshold: The value that the metric setting will be compared against, using the chosen operator. Note that some operators do not have a value field.
Unit: Often, the unit is fixed once an operator is chosen, such as threshold value, %, ms, or kbps, but sometimes you can choose the unit, such as for dynamic baselines or for device interface thresholds.
Add/Delete: Click the + or - icon to add or delete location alert criteria to this alert rule. Criteria can be nested for some types of alert rule.
The following table shows a list of test types which are available in the ThousandEyes platform, and the test metrics and operators.
Test Layer | Alert Type | Metric | Operators | Units |
---|---|---|---|---|
Network | End-to-End (Server), End-to-End (Agent) | Packet loss | ≤, ≥ | % |
Network | End-to-End (Server), End-to-End (Agent) | Latency1 | ≤, ≥ | ms |
Network | End-to-End (Server), End-to-End (Agent) | Jitter | ≤, ≥ | ms |
Network | End-to-End (Server), End-to-End (Agent) | Error | is present, matches, does not match | n/a |
Network | End-to-End (Agent) | Throughput | ≤, ≥ | Kbps |
Network | End-to-End (Server) | Available Bandwidth | ≤,≥ | Mbps |
Network | End-to-End (Server) | Capacity | ≤, ≥ | Mbps |
Network | End-to-End (Server) | Probe Response Type | is | TCP RST |
Network | Path Trace | Delay | ≤, ≥ | ms |
Network | Path Trace | IP Address2 | in, not in | IP address or prefix |
Network | Path Trace | ASN2 | in, not in | List of ASNs |
Network | Path Trace | rDNS2 | in, not in | exact hostname or wildcard-based match to domain |
Network | Path Trace | MPLS Label2 | is, is not | empty |
Network | Path Trace | DSCP2 | is, is not | DSCP value selected from list |
Network | Path Trace | Server IP | in, not in | IP address, prefix |
Network | Path Trace | Server MSS | <, > | bytes |
Network | Path Trace | Path MTU | <, > | bytes |
Network | Path Trace | Path Length | <, > | hops |
Network | Path Trace | Trace is incomplete | n/a | |
DNS | Server, Trace DNSSEC | Error | is present, matches, does not match | n/a |
DNS | Server | Resolution time | ≤, ≥ | ms |
DNS | Server, Trace | Mapping | in, not in | quoted <comma-separated list of mappings> use * when matching against word spaces. For example, "10*aspmx3.googlemail.com." |
Web | HTTP Server | Response code | is | any error (≥ http/400 or no response) ok (http/200) redirect (http/300 |
Web | HTTP Server | Response Header | matches, does not match | |
Web | HTTP Server | DNS time | ≤, ≥ | ms |
Web | HTTP Server | Connect time | ≤, ≥ | ms |
Web | HTTP Server | SSL negotiation time | ≤, ≥ | ms |
Web | HTTP Server | Wait time | ≤, ≥ | ms |
Web | HTTP Server | Receive time | ≤, ≥ | ms |
Web | HTTP Server | Response time1 | ≤, ≥ | ms |
Web | HTTP Server | Total Fetch Time | ≤, ≥ | ms |
Web | HTTP Server | Throughput | ≤, ≥ | kBps |
Web | HTTP Server | Error | is present, matches, does not match | n/a |
Web | HTTP Server | Error type | is, is not | DNS, Connect, SSL, Send, Receive, Content, HTTP, Any |
Web | HTTP Server | Client SSL Alert Code | is, is not | SSL error type. E.g., Unexpected message ( 10 ), Bad Certificate (42) |
Web | HTTP Server | Server SSL Alert Code | is, is not | SSL error type. E.g., Unexpected message ( 10 ), Bad Certificate (42) |
Web | Page Load | Page load | Is incomplete | n/a |
Web | Page Load | Response time | ≤, ≥ | ms |
Web | Page Load | DOM load time | ≤, ≥ | ms |
Web | Page Load | Page load time1 | ≤, ≥ | ms |
Web | Page Load | Error Count | ≤, ≥ | # |
Web | Page Load | Domain Name3 | in, not in | quoted <comma-separated list of mappings> |
Web | Page Load | Total Fetch Time3 | ≤, ≥ | ms |
Web | Page Load | Blocked Time3 | ≤, ≥ | ms |
Web | Page Load | DNS Time3 | ≤, ≥ | ms |
Web | Page Load | Connect Time3 | ≤, ≥ | ms |
Web | Page Load | Send Time3 | ≤, ≥ | ms |
Web | Page Load | Wait Time3 | ≤, ≥ | ms |
Web | Page Load | Receive Time3 | ≤, ≥ | ms |
Web | Page Load | SSL Negotiation Time3 | ≤, ≥ | ms |
Web | Page Load | Component Load3 | is incomplete | n/a |
Web | Transaction (Classic) | Error | is present | n/a |
Web | Transaction (Classic) | Transaction Time | ≤, ≥ | ms |
Web | Transaction (Classic) | Completion | ≤, ≥ | % |
Web | Transaction (Classic) | Steps Completed | ≤, ≥, is | # |
Web | Transaction (Classic) | Any Steps meets | any, all | of the following conditions: Step Duration |
Web | Transaction (Classic) | Step # meets | any, all | of the following conditions: Step Duration |
Web | Transaction (Classic) | Any Page meets | any, all | of the following conditions: Page Duration |
Web | Transaction (Classic) | Page # meets | any, all | of the following conditions: Step Duration |
Web | Transaction | Page |
| POSIX Extended Regular Expression Syntax, positive integer |
Web | Transaction | Page/Any Page > Page Load Time | ≤, ≥ | ms |
Web | Transaction | Page/Any Page > Page Load Error |
| |
Web | Transaction | Page/Any Page > Response Time | ≤, ≥ | ms |
Web | Transaction | Page/Any Page > DOM Load Time | ≤, ≥ | ms |
Web | Transaction | Marker (name) | exact textual matching, case-sensitive | n/a |
Web | Transaction | Marker (presence) |
| n/a |
Web | Transaction | Marker (duration) | ≤, ≥ | ms |
Web | Transaction | Assert Error |
| |
Web | Transaction | Transaction Time | ≤, ≥ | ms |
Web | Transaction | Transaction Completion |
| n/a |
Web | Transaction | Error |
| |
Routing | BGP | Reachability | <, > | % |
Routing | BGP | Path Changes | <, > | n/a |
Routing | BGP | Origin ASN | in, not in | comma-separated list of ASNs. |
Routing | BGP | Next Hop ASN | in, not in | comma-separated list of ASNs. |
Routing | BGP | Prefix | in, not in | comma-separated list of covered prefixes |
Routing | BGP | Covered Prefix4 | exists, in, not in | comma-separated list of sub-prefixes |
Routing | BGP | RPKI Status | is | Valid, Invalid, NotFound |
Voice | RTP Stream | Error | is present, matches, does not match | n/a |
Voice | RTP Stream | MOS | ≤, ≥ | # |
Voice | RTP Stream | Packet loss | ≤, ≥ | % |
Voice | RTP Stream | Discards | ≤, ≥ | % |
Voice | RTP Stream | DSCP | is, is not | DSCP Values. E.g., Best Effort (0), Expedited Forwarding (46) |
Voice | RTP Stream | Latency | ≤, ≥ | ms |
Voice | RTP Stream | Packet Delay Variation | ≤, ≥ | ms |
Device | Device | Interface name | matches, doesn't match | |
Device | Device | Interface type | n/a | |
Device | Device | Exclude interfaces | n/a | |
Device | Device | IP address | matches | IP address, range, or prefix |
Device | Device | Throughput | either, in, out ≥, >, ≤, < | Mbps, % |
Device | Device | Discards | either, in, out ≥, >, ≤, < | pps, % |
Device | Device | Errors | either, in, out ≥, >, ≤, < | pps, % |
Device | Device | Discards and Errors | either, in, out ≥, >, ≤, < | pps, % |
Device | Device | Operational Status | offline, online | |
Device | Device | Admin Status | Disabled, Enabled | |
Device | Device | State | Unchanged, Changed | |
Web | API | API transaction time (Dynamic-New) | n/a | Low/Medium/High Sensitivity |
Web | API | API transaction time (Static) | ≥ | ms |
Web | API | API transaction time (Dynamic-classic) | ≥ | Std. deviations, ms, % |
Web | API | API completion | ≤ , ≥ | % |
Web | API | Step: API call time | ≥, ≤ | ms |
Web | API | Step: step completion | completed, not completed | n/a |
Web | API | Step: response time | Auto, ≥, ≤ | ms |
Web | API | Step: receive time | ≥, ≤ | ms |
Web | API | Step: assert error | is present | n/a |
For some metrics, dynamic baselines can be configured. For more information, see Dynamic Baselines.
These metrics are configurable under the "Any Hop", "Last Hop", or "Hop #" entries in path trace alert rules. Select "Any or "All" for multiple sub-conditions.
These metrics are accessed under the "Any Component" alert condition in page load tests. Select "Any or "All" for multiple sub-conditions.
Only BGP routing tests provide covered prefix data. Do not assign a BGP alert rule with a covered prefix metric to a non-BGP test type that has BGP path visualization measurements enabled. For non-BGP test types, use an alert rule that does not include the covered prefix metric, and if needed create a separate BGP test and an a separate alert rule with the covered prefix metric.
For Cloud and Enterprise Agent tests, each metric from the table above is defined in the article ThousandMetrics: What Do Your Results Mean? For Endpoint Agent tests, metrics are defined at Data Collected by the Endpoint Agent. For device tests, metrics are defined at Device Discovery Results.
Editing an alert rule follows the same configuration steps set out above for adding a new alert rule. The only difference is that to edit an alert rule, you click an existing alert rule (instead of clicking the Add New Alert Rule button). A panel appears with the current alert rule configuration; you can then change any of the field settings to your desired configuration.
When you edit an alert rule that has a currently active alert, any change to the alert rule's conditions will cause the currently active alert to clear. A new alert will be triggered after the ThousandEyes platform takes the updated alert rule into account.
In the editing pane of an alert rule, you also have the option to delete the alert rule or duplicate it. Duplicating an alert rule is an easy way to configure a new alert rule where you only want to change one or two parameters; for example, if you want to alert on the existence of an error separately from resolution time in a DNS server alert rule. You can duplicate the alert rule specifying the error condition and just change the condition to resolution time without having to configure the entire rule again from scratch.
You will find the delete and duplicate symbols (trash bin and two overlapping pages) in the bottom left of your editing pane. Tooltips appear on hover (see image below). When you click the trash bin, you are prompted to confirm you wish to delete the alert rule. When you click the overlapping pages, a fresh Add New Alert Rule pane opens with the same configuration as the current alert rule.
If an alert is throwing notifications that exceed your operational requirements, you can adjust the alert condition thresholds.
Go to Alerts > Alert Rules.
Select the name of the alert rule that you want to adjust.
On the Settings tab, in the Alert Conditions section, review the current thresholds.
Make changes to these settings to reduce the frequency of alerts, according to your requirements.
For the best way to reduce noise, try using dynamic baselines in your alert configuration instead of static thresholds. To learn more about dynamic baselines, see Dynamic Baselines.
After you adjust a noisy alert to meet your service-level expectations, the alert should begin to clear. An active alert that clears is moved to the Alert History tab. To view cleared alerts, go to Alerts > Alert List > Alerts History.
DNS server tests differ from other ThousandEyes tests in that multiple servers can be explicitly targeted in a single test. As a result, DNS server alert rules are evaluated on a per-server basis. That is, for each server in the DNS Servers field of the test configuration, the alert conditions are evaluated separately from all other servers in the DNS Servers field. For example, consider an alert rule that has the following alert conditions:
When assigned to a DNS server test with two servers configured as the targets, each server will be evaluated independently against the above alert condition. To activate the alert rule, at least four agents must receive an error against the same DNS server. The alert rule would not be triggered if, for example, three agents received an error when testing the first DNS server and a fourth agent received an error when testing the second DNS server.
BGP alert rules can be applied to Routing tests that explicitly monitor BGP as well as any test that has the Collect BGP data
option enabled. Alert rule conditions can be applied differently depending on which type of test the rule is assigned to.
The default BGP alert rule will activate when 10% of monitors have less than 100% Reachability for at least 1 minute. You can use the time selection range to customize your alert configuration.
BGP alert rules have a parameter named Prefix Length, which is used to determine the length of prefixes evaluated by the rule. The Prefix Length can be individually configured for IPv4 and IPv6 protocols.
For example, a BGP test has only a single target prefix that will be evaluated against the alert conditions. If the Covered Prefixes box is checked, any covered prefixes found are not evaluated against the alert conditions except the explicit Covered Prefix condition. -->
In contrast, a non-BGP test type can have one or more targets. DNS server tests can explicitly test multiple DNS servers. An agent-to-server test target's domain name can resolve to multiple servers' IP addresses. When creating the BGP path visualization, the Prefix selector shows these multiple target prefixes, and evaluates each prefix against any BGP alert rules assigned to the test. Thus, prefixes that would be considered covered prefixes under a BGP test and not evaluated by the alert rule (unless by a Covered Prefix condition) are evaluated when assigned to the non-BGP test. Similarly, the Covered Prefix condition does not have any relevance when assigned to a non-BGP test.