WAN Insights and ThousandEyes Alerts

The ThousandEyes Alerts feature generates alerts and notifications when specified error or network conditions are present, based on alert rules that you create. When an alert rule is triggered, a list item displays in the Alert List page in ThousandEyes. For example, you can set WAN Insights alerts to call attention to changes in path quality or capacity utilization.

WAN Insights alert rule types are Capacity and Site Quality. Within each alert rule, you can add one or more conditions, which can be combined together.

  • If you need alert conditions to trigger separately, configure two separate alert rules of the same type.

  • Alert rule types are either capacity or site quality. You can’t have a single alert rule that focuses on both.

  • Within each alert rule, you can use alert conditions to further focus on specific subsets of sites, numbers of users, or specific applications to focus on.

To configure alert rules for WAN Insights, go to Alerts > Alert Rules and select the WAN Insights tab. Click Add New Alert Rule. You can find documentation for ThousandEyes alerts in the Alerts area of the ThousandEyes user interface.

If you want visibility into WAN Insights alerts outside of the ThousandEyes platform itself, you can use custom webhooks to send notifications to other systems as described in Custom Webhooks. You can also add alerts to custom Dashboards and embed those dashboards in other systems.

Notes

  • To use Capacity Alerts, you’ll need to have the bandwidth for each network device configured as described on Enter or Upload Bandwidth Data.

  • WAN Insights alerts are not currently supported on the Alerts Widget in Dashboards.

  • WAN Insights alerts are not currently supported via the ThousandEyes API

WAN Insights Alert Types

For WAN Insights, the alert types are:

Site Quality Alert Conditions

Use a site quality alert to identify sites experiencing poor path quality.

For the Site Quality alert type, the available alert rule conditions are:

  • Percentage. Site quality itself, expressed as a percentage. See Understanding Quality.

  • App Class. See Application Categories. Application categories or classes are bundles of applications with similar traffic characteristics.

  • Number of Users. Minimum/maximum number of active users on a site in order for it to be considered.

  • Site ID. Use to restrict alerting to a particular site or set of sites.

For WAN Insights alert rule type of Quality, you must add at least one rule condition specifying a percentage, which is the threshold of site quality. You will not be able to save a rule without a percentage added.

Capacity Alert Conditions

Use a capacity alert to call out capacity saturation affecting particular hostnames and/or interfaces.

For the Capacity alert type, the available alert rule conditions are:

  • Percentile. Use to exclude outliers from your capacity alerts. Outliers represent conditions that may occasionally cause saturation events, but are not indicative of the most frequent or prevailing patterns of capacity utilization. See Capacity Alert Rule Percentile, Explained further down on this page.

  • Hostname. Use to focus on capacity conditions involving a particular hostname.

  • Interfaces. Choose one or more interfaces from the drop-down list with check boxes. For example, if you are using interfaces biz-internet and gold, you could choose to focus this alert rule only on capacity saturation for biz-internet.

For WAN Insights alert rule type of Capacity, you must add at least one rule condition specifying a percentile which also specifies a percentage, which is the threshold of capacity saturation. You will not be able to save a rule without a percentile added.

X out of N times in a Row and WAN Insights Quality Alerts

What does “x out of n times in a row” mean for WAN Insights alerts? It’s based on a WAN Insights alert rule evaluation that occurs in a particular round.

  • The first number x is the minimum number of occurrences

  • The second number n is the overall amount of rounds you are considering.

The frequency will always be 1 out of 1 times in a row for Capacity alerts.

The term “round” here refers to different time intervals for each WAN Insights alert type, as follows:

  • For capacity data, where the Alert Type is specified as Capacity | Utilization, note that a WAN Insights “round” is equal to 24 hours.

  • For path quality data, where the Alert Type is specified as Quality, note that WAN Insights calculates quality on an hourly basis – i.e., a single round for quality data is equal to 60 minutes. Therefore, by selecting 3 out of 5 times in a row, will mean that an alert will only fire if the conditions are met in 3 rounds, at minimum. Thus, the soonest an alert will fire will be every 3 hours. If you want an alert to fire sooner, adjust the first number accordingly.

WAN Insights Alert Rules Data

WAN Insights data comes from your SD-WAN, as described in WAN Insights Key Components.

  • Path quality data, as referenced in the alert rules for path quality, come from synthetic network probes as described on Understanding Quality.

  • Capacity data has two parts. Bandwidth settings are ingested, and usage data subsequently collected, as described in Capacity Planning.

How Often are WAN Insights Alerts Triggered or Cleared?

WAN Insights alerts are evaluated on an hourly or daily basis, depending on the alert type. ThousandEyes compares the alert rules that you create with the data collected from WAN Insights. If this comparison shows that a condition meets or exceeds an alert rule threshold, then the alert will trigger.

As a reminder, capacity rounds are daily and quality rounds are hourly.

An alert will be cleared whenever data no longer matches the corresponding alert rule, or after 36 hours in the case of no-data.

Capacity Alert Rule Percentile Explained

One of the conditions for the WAN Insights alert type of Capacity is a condition called Percentile that’s designed to exclude outliers. This percentile is not a percentage, and it’s not related to the path quality measure.

Note that the percentile condition also includes the actual capacity saturation percentage as the second value. You should always have a percentile defined for this alert type, as one of the alert rule conditions.

The percentile relates to the main Capacity Planning Screen. On the capacity planning main screen, in order to eliminate outliers, we use percentiles also known as utilization aggregates, which are explained on the Measurement and Aggregation section of the WAN Insights Capacity Planning page.

The alert rule for capacity lets you specify a different percentile than the one used on the Capacity Planning screen on the Settings tab, and the alert rule is based on the past 30 days. In addition to the percentile, the alert rule condition also includes a percentage which relates to the percentage of capacity used.

For example, we could create an alert rule for capacity that specified the following:

95th percentile that is greater than 80 %

In this example, the alert will trigger when the 95th percentile of utilization is greater than 80% of capacity over the past 30 days.

Capacity Alert Use Cases

A few ways you can use WAN Insights capacity alerts:

  • Specifying an alert rule with a higher percentile takes more extreme situations into account. For example, the 100th percentile would consider every data point including outliers, whereas the 95th percentile would take the highest values excluding the topmost 5%.

  • Specifying a different alert rule with a lower percentile, say 95 instead of 98, will alert for degradation that is more pervasive and frequent.

  • You might want to have alerts for different percentiles as an advance warning.

  • Contractually you might have an SLA for 95th percentiles or 98th percentiles.

Last updated