Creating and Editing Alert Rules

You can configure alert rules for different features including synthetic tests as well as alerts that don’t rely on tests such as our Insights products (Cloud, Traffic, Internet, WAN) or event detection. Alert rules are made up of multiple parts; parts that define and name what the alert applies to, such as to a test or an event; the part that defines what is considered an anomaly for said test or event; and the part that defines whether that anomaly constitutes an alertable issue. The latter two parts are called location and global alert conditions and you can learn more about them at Global and Location Alert Conditions, as well as how to set them manually. This article explains how to create an alert rule using the default intelligent algorithmic settings we call quantile dynamic baselining (location condition) and adaptive alerting (global condition).

Not all alert rules currently use quantile dynamic baselining (QDB) or adaptive alerting. Only select metrics within Network & App Synthetics and Endpoint Experience alert rules offer QDB; you can find the list of supported metrics at Metrics for Dynamic Baselines. Adaptive alerting is currently available for Network & App Synthetics alert rules only.

Quantile Dynamic Baselining Overview

There are many different types of location alert condition, depending on the metric you wish to alert on, but the default location alert condition for many metrics is called quantile dynamic baselining.

What Is Quantile Dynamic Baselining?

Quantile dynamic baselining (QDB) is a method for automatically adjusting the thresholds of your alert rule metrics in real-time so that what the alert ‘sees’ as an anomaly takes into account your real-world network environment. See Metrics for Dynamic Baselines for the list of metrics dynamic baselines apply to, as not all metrics support dynamic baselining.

What Are Quantiles?

Quantiles are data ranges within a dataset containing a certain percentage of data points, such as the bottom 10% or top 50%. Quantiles are different from averages because they are agnostic to the values of the data points; they are only concerned with how many data points are in any given slice of the dataset. Slicing the data up this way makes it easier to take more of the data into account than a simple average, while also giving a better sense of how many data points sit outside a central “norm”, irrespective of their values.

Why Use Quantile Dynamic Baselining?

  1. Made for real-world scenarios: Unlike standard deviation, which is optimized for idealized bell-curve data distributions, QDB does not make assumptions about the underlying data distribution. This makes it more adaptable and effective for detecting anomalies in real-world datasets with varying, skewed, and changing shapes.

  2. Enhanced accuracy: QDB is more robust against extreme values than mean or standard deviation, meaning outliers can't significantly skew a data set, leading to more accurate baselines and alerts.

  3. Evolves with your environment: As your test environment evolves, QDB is responsive enough to adjust the baselines accordingly. This flexibility means it can accommodate gradual shifts in performance, ensuring that alerts are triggered only when there are meaningful deviations from what is considered normal at any given time.

How Quantile Dynamic Baselining Works

QDB monitors over time your test results. QDB constantly looks at all your recent data points and determines which results are considered “outliers”: data points too far above or below a subset of the data to be deemed “normal” test behavior. Once QDB has established what is “normal” at the current point in time, it sets the threshold beyond which test behavior isn’t considered normal, or what we call an anomaly. QDB revises its definition of “normal” and resets its thresholds constantly, dynamically adjusting for network conditions and test irregularities, so what is considered an alertable anomaly is always closer to the truth, rather than mere noise.

Importantly, QDB takes the shape of your data into account. If your test behavior is relatively flat, a small fluctuation could be considered an anomaly, while in wide-ranging datasets, a data point would have to deviate much further from the norm to be considered an anomaly. What’s more, since QDB is constantly learning and adjusting to your test and network behavior, if the shape of it slowly changes from flat to a Richter 9.0 earthquake, the baseline, or threshold, adapts accordingly.

How to Activate Quantile Dynamic Baselining

QDB is the default location alert setting for many metrics within your new alerts (for the list of supported metrics, see Metrics for Dynamic Baselines), so no actual activation is required for these metrics. If you have alerts set for supported metrics from before 17 September 2024, you can update them to QDB in just a click. Simply select Dynamic (New).

Selecting Dynamic (New) location alerts

You can also change the sensitivity setting from low to high (default is medium), where low results in fewer alerts and high results in more. See Sensitivity Level for more information.

For a detailed overview of how quantile dynamic baselining works and a worked example, see Quantile Dynamic Baselining.

For an explanation of how quantile dynamic baselining works with adaptive alerting, see Adaptive Alerting and Quantile Dynamic Baselining.

Adaptive Alerting Overview

The default global alert condition for Network and App Synthetics alert rules is called adaptive alerting.

What is Adaptive Alerting?

Adaptive alerting is an intelligent alerting system designed to help you keep track of important degradations in your test performance without overwhelming you with notifications. Adaptive alerting uses data from your location alert condition, such as quantile dynamic baselining (QDB), to analyse your test performance. For a discussion of how QDB and adaptive alerting work hand-in-hand, see Adaptive Alerting and Quantile Dynamic Baselining.

Why Use Adaptive Alerting?

  • Reduces noise: By only alerting for anomalies deemed significant based on your test’s response to real network conditions, it helps prevent alert fatigue – where you might start ignoring alerts because there are too many.

  • Improves response times: With fewer, more meaningful alerts, you can respond more quickly to genuine problems.

  • Adapts to change: As network performance changes, the alerting thresholds adapt, ensuring alerts remain relevant and effective.

  • Saves you effort: Select the metric you want alerting on and go. No need to define a dozen different parameters, and there’s no subsequent fine-tuning, either; the model does that all for you.

How Adaptive Alerting Works

Distinguishing Issues from Noise

Adaptive alerting automates and strengthens the process of determining actionable alerts by analyzing historic test data, understanding current network behavior, and setting alert conditions using a learning model to distinguish between anomalies that are simply “noise” and those that are actual problems, leading to more actionable alerts. The system automatically accounts for three main factors to determine the likelihood that an anomaly really is an issue:

  1. How often a test runs – Recognizing that network issues typically last for some time, the model leverages knowledge about your test frequency and consistently calculates the probability of an ongoing issue, regardless of how often tests run.

  2. How often an anomaly surfaces in those test runs – The model looks back over a period of days to generate an anomaly “baseline” for what normal looks like (note: this is distinct from the baseline that determines whether a test result is an anomaly, as used in quantile dynamic baselining).

  3. Which agents are detecting the anomalies – If the same agent is consistently flagging an anomaly, even if it’s within the normal range of frequency, this could also be cause for concern.

How Alerts Are Triggered

Once “normal” is established, the system only alerts you when the frequency of anomalies or their agent makeup goes outside those norms. This procedure can be summarized into two basic processes, which constantly update based on your recent test patterns and network conditions.

  1. Learning: The system regularly observes your recent test performance to understand typical patterns. This includes regular fluctuations that are not necessarily problematic, as reported by any agents.

  2. Application: The model calculates issue probability based on these observed patterns, and applies it to an issue-probability threshold (see Setting Sensitivity for more information about the thresholds). This threshold allows for flexibility and adjustment according to changes in the network environment, meaning it is not a fixed number (such as above 100 ms) but a fixed probability (percent likelihood of an issue, such as 80% likely based on recent anomaly patterns). When a threshold is breached, an alert is triggered.

Setting Sensitivity

You have agency to determine how sensitive the issue-probability thresholds are to triggering an alert thanks to a configurable sensitivity setting (high, medium, or low). The higher the sensitivity setting, the more alerts you receive, and vice versa, with all alerts defaulting to medium sensitivity. The sensitivity setting adjusts the issue-probability threshold up or down, meaning, for example, that anomaly frequency has to be further from the norm to trigger a low-sensitivity alert, thus resulting in fewer alerts.

How to Implement Adaptive Alerting

Adaptive alerting is the default setting for all your new Network & App Synthetics alerts. If you have alerts set from before 2 October 2024, you can update them to adaptive alerting in a few clicks; see Updating an Existing Rule to Adaptive Alerting for instructions how.

For a detailed overview of how adaptive alerting works and a worked example, see Adaptive Alert Detection.

Adaptive Alerting and Quantile Dynamic Baselining

Adaptive alerting and quantile dynamic baselining (QDB) are separate processes, but which share a sensitivity setting when used together. QDB determines the point at which a test result for any given agent is an anomaly (a location alert condition - one of many depending on the metric), while adaptive alerting monitors the frequency with which anomalies are detected to know when to trigger an alert (the default global alert condition, regardless of which type of location alert condition identified the anomaly). While the two processes determine different aspects of an alertable issue, when combined, you set just one sensitivity setting for both processes. So, instead of setting over a dozen different alert parameters yourself in manual mode, you need only select a QDB-supported metric you want to be alerted on and these two processes do the rest, with improved precision. See Quantile Dynamic Baselining for a more detailed overview of quantile dynamic baselining, and Metrics for Dynamic Baselines for the list of metrics dynamic baselines apply to.

Adding a New Alert Rule

To create a new alert rule, navigate to Manage > Alert Rules. The Alert Rules page opens.

Alert Rules landing page

From the tabs at the top of the page, select the desired alert source:

  • Network & App Synthetics

  • Endpoint Experience

  • Routing

  • Devices

  • Internet Insights

  • WAN Insights (for information about WAN Insights alert rules, see WAN Insights and ThousandEyes Alerts).

  • Cloud Insights

  • Traffic Insights

  • Event Detection

Then click Add New Alert Rule. The Add New Alert Rule panel opens. The image below shows the panel that opens for Network & App Synthetics.

Add New Alert Rule panel

Configuring the Alert Rule

Each alert rule has the following common elements:

  • A name.

  • A series of tests against which it is enabled (for alerts that rely on tests).

  • A scope of alert triggers (such as agents or monitors) to which the alert rule applies (with the exception of Endpoint Experience Scheduled Tests, WAN Insights, and Event Detection).

  • Alert severity (excepting Device and Event Detection alerts).

  • Alert detection settings (global condition).

  • Metric settings (location condition).

Alert rules also include a notification mechanism via the Notifications tab, such as a list of email recipients (recipients do not need to be users of ThousandEyes in order to receive email notifications), a PagerDuty service or one or more webhooks. See Alert Notifications for information about setting up the notification mechanism.

Each alert rule assigned to a test is evaluated independently. For tests with multiple alert rules assigned, any alert can be triggered when alert conditions are met. A test with multiple alert rules assigned to it can show zero, one, or multiple triggered alerts depending on what alert criteria were met during a single test pass.

Every new alert panel within each alert source opens with three sections. The top section is where you choose the type of alert you wish to configure and give it a name. The bottom two panels consist of the Settings tab, where you specify the alert triggers (middle section) and alert conditions (bottom section).

Naming the Alert Rule

In the top section of the panel for each new alert, you find:

  • Alert Type: Select the test layer for this alert rule.

  • Compatible Test Types: For features that alert against tests, such as Network & App Synthetics and Endpoint Experience; as you select the test layer in the Alert Type field, the dropdown field to the right displays the test types to which this alert rule can be assigned.

  • Rule Name: Specify a name for the alert rule.

Selecting the Alert Triggers

The middle and bottom sections of the panel consist of the Settings tab. The middle section is where you configure your alert triggers (such as agents, monitors, or catalog providers; see Alert Triggers for the full list). The fields in this section vary depending on the alert source and type, set out below.

Network & App Synthetics

  • Direction (only for Network: Agent to Agent and Network: Path Trace tests): Enables you to choose whether the alert triggers in the Source-to-Target, Target-to-Source, or Both (Agent to Agent) or Either (Path Trace) direction.

  • Tests: A dropdown menu listing all the tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Agents: Select the agents to which you will assign this alert rule. The options are:

    • All agents: All agents will be assigned this alert rule.

    • All agents except: All agents will be assigned this alert rule except for the ones selected.

    • Specific agents: Only the selected agents will be assigned to this alert rule.

      Note: Selecting All agents except or Specific agents opens another dropdown menu where you can select the agents you do or don't want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Endpoint Experience

Real User Tests

  • Agents: Select the agents to which you will assign this alert rule. The options are:

    • All agents: All Endpoint Agents belonging to the account group will be assigned this alert rule.

    • Specific agents: Only the selected Endpoint Agents will be assigned to this alert rule.

    • Agent labels: Only the Endpoint Agents with the specified label will be assigned to this alert rule.

      Note: Selecting Specific agents or Agent labels opens another dropdown menu where you can select the agents or labels you want to alert on.

  • Visited Sites: Select the sites for which this alert will be triggered. The options are:

    • Any visited site: Any site within the monitored domain set that a user visits will be assigned to this alert rule.

    • Specific visited sites: Only the selected visited sites will be assigned to this alert rule. If you select this option, a dropdown menu appears where you can select from a number of suggested domains or type in a custom domain.

  • Severity: Choose from Info, Minor, Major, and Critical.

Scheduled Tests

  • Tests: A dropdown menu listing the all compatible Endpoint tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Severity: Choose from Info, Minor, Major, and Critical.

Routing

  • Tests: A dropdown menu listing the all the tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Prefix Length: A dropdown menu allowing you to specify the length of prefix for both IPv4 and IPv6. The length defaults to between 16-32 for IPv4 and 32-128 for IPv6.

  • Monitors: Select the monitors to which you will assign this alert rule. The options are:

    • All monitors: All monitors will be assigned this alert rule.

    • All monitors except: All monitors will be assigned this alert rule except for the ones selected.

    • Specific monitors: Only the selected monitors will be assigned to this alert rule.

      Note: Selecting All monitors except or Specific monitors will open another dropdown menu where you can select the monitors you do or don't want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Devices

  • Devices (for the Device alert type only): A dropdown menu listing all the monitored devices set up in your account group. Select one or more devices to assign them to this alert rule.

  • Interfaces (for the Interface alert type only): A dropdown menu listing all the monitored interfaces set up in your account group. Select one or more interfaces to assign them to this alert rule.

Internet Insights

  • Affected Tests: Select the affected tests to which you will assign this alert rule. The options are:

    • Any: Any affected tests will be assigned this alert rule.

    • Specific: Only the selected affected tests will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the affected tests you want to alert on.

  • Catalog Providers: Select the catalog providers to which you will assign this alert rule. The options are:

    • Any: Any catalog providers will be assigned this alert rule.

    • Specific: Only the selected catalog providers will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the catalog providers you want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Cloud Insights

  • Provider: Select the provider to which you will assign this alert rule. The current options are:

    • Amazon Web Services

  • Scope Types: Select the scope type to which you will assign this alert rule. The selected scope type will be monitored from both inbound and outbound directions. The options are:

    • Account

    • VPC

    • Transit Gateway

    • Transit Gateway Attachment

    • Application Load Balancer

    • Network Load Balancer

    • Availability Zone

    • Region

    • Subnet

  • Severity: Choose from Info, Minor, Major, and Critical.

Traffic Insights

  • Scope Types: Select the scope type to which you will assign this alert rule. The selected scope type will be monitored from both inbound and outbound directions. The options are:

    • Devices

    • Applications

    • Subnet Tags

    • Geolocations

  • Severity: Choose from Info, Minor, Major, and Critical.

Setting the Alert Conditions

The Alert Conditions section is where you set your global and location alert conditions: the settings that define how and when an alert is triggered. By default, your global alert condition for Network & App Synthetics alerts is set to adaptive alert detection, and for many of your location alerts your default is quantile dynamic baselining.

To learn how to set the adaptive global alert condition, see Implementing Adaptive Alerting. To set your global alert condition manually, see Setting Global Alert Conditions Manually.

Quantile dynamic baselining (QDB) is the default location alert setting for a selection of Network and App Synthetic and Endpoint Experience test metrics. See Metrics for Dynamic Baselines for the list of supported metrics. When you select any of the supported metrics, and used in conjunction with the adaptive global alert setting, there’s no other adjustment to make for the location alert setting.

When you use QDB with a manually set global alert condition, you also set the sensitivity setting for your QDB metric. See Sensitivity Level for information about setting the QDB sensitivity.

Other location alert metrics require a few more steps to set them. For these, see Setting Location Alert Conditions Manually.

Editing an Alert Rule

Editing an alert rule follows the same configuration steps set out above for adding a new alert rule. The only difference is that to edit an alert rule, you click an existing alert rule (instead of clicking the Add New Alert Rule button). A panel appears with the current alert rule configuration; you can then change any of the field settings to your desired configuration.

When you edit an alert rule that has a currently active alert, any change to the alert rule's conditions will cause the currently active alert to clear. A new alert will be triggered after the ThousandEyes platform takes the updated alert rule into account.

Duplicating and Deleting Alert Rules

In the editing pane of an alert rule, you also have the option to delete the alert rule or duplicate it. Duplicating an alert rule is an easy way to configure a new alert rule where you only want to change one or two parameters; for example, if you want to alert on the existence of an error separately from resolution time in a DNS server alert rule. You can duplicate the alert rule specifying the error condition and just change the condition to resolution time without having to configure the entire rule again from scratch.

You will find the delete and duplicate symbols (trash bin and two overlapping pages) in the bottom left of your editing pane. Tooltips appear on hover (see image below). When you click the trash bin, you are prompted to confirm you wish to delete the alert rule. When you click the overlapping pages, a fresh Add New Alert Rule pane opens with the same configuration as the current alert rule.

Delete and duplication symbols in the editing pane

Routing Alert Rules

Routing alert rules can be applied to Routing tests that explicitly monitor BGP as well as any test that has the Collect BGP data option enabled. Alert rule conditions can be applied differently depending on which type of test the rule is assigned to.

The default routing alert rule will activate when 10% of monitors have less than 100% Reachability for at least 1 minute. You can use the time selection range to customize your alert configuration.

Prefix Length

BGP alert rules have a parameter named Prefix Length, which is used to determine the length of prefixes evaluated by the rule. The Prefix Length can be individually configured for IPv4 and IPv6 protocols.

Covered Prefixes

For example, a BGP test has only a single target prefix that will be evaluated against the alert conditions. If the Covered Prefixes box is checked, any covered prefixes found are not evaluated against the alert conditions except the explicit Covered Prefix condition. -->

In contrast, a non-BGP test type can have one or more targets. DNS server tests can explicitly test multiple DNS servers. An agent-to-server test target's domain name can resolve to multiple servers' IP addresses. When creating the BGP path visualization, the Prefix selector shows these multiple target prefixes, and evaluates each prefix against any BGP alert rules assigned to the test. Thus, prefixes that would be considered covered prefixes under a BGP test and not evaluated by the alert rule (unless by a Covered Prefix condition) are evaluated when assigned to the non-BGP test. Similarly, the Covered Prefix condition does not have any relevance when assigned to a non-BGP test.

Last updated