Event Detection
Last updated
Last updated
ThousandEyes events automatically analyze, detect, and correlate anomalies to identify problem domains based on results from synthetic tests. These events are generated when sudden or anomalous deviations are seen for a metric, particularly across multiple tests and/or agents.
This article outlines what events are, how event detection works, the types of events, and how to drill down further from the event into the related views.
Events analyze test results from all Cloud & Enterprise Agents across all account groups in an organization.
Event detection occurs when ThousandEyes identifies that error signals related to a component (proxy, network node, AS, server etc) have deviated from the baselines established by events.
To determine this, ThousandEyes takes the test results from all accounts groups within an organization, and analyzes that data. Noisy test results (those that have too many errors in a short window) are removed until they stabilize, and the rest of the results are tagged with the components associated with that test result (for example, proxy, network, or server).
Next, any increase in failures from the test results and each component helps in determining the problem domain and which component may be at fault. When this failure rate increases beyond a pre-defined threshold (set by the algorithm), an event is triggered and an email notification is sent to the user (if they've enabled email alerts).
To enable email alerts for events, toggle the "Event Email Alerts" switch in the top right of the Cloud & Enterprise Agents -> Events page:
This will send an email only to the user who enabled the event email alerts.
There are three impact levels for events:
Low Impact events affect more than 1%, but less than 5%, of agents/tests across the organization.
Medium Impact events affect more than 5%, but less than 10%, of agents/tests across the organization.
High Impact events affect more than 10% of agents/tests across the organization.
There are currently five types of events that can be generated, based on where the fault or problem domain lies.
Local network events identify when issues are seen with the network in which the agent resides in.
Network outage events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) through a point of presence (PoP) or network node, or in the network in which the target is located (target-prefix issue).
Network events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) between two autonomous systems (AS).
Proxy events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) when traversing the proxy.
Server events are indicative of issues with the application/target or the server that hosts the application.
In order to make it easier to determine the cause of a server event, and to save users time investigating the same issue from the affected tests perspective, we have now added the Cause field to show the major error type seen across tests that are part of the server event. This includes both the error seen in the HTTP phases (such as HTTP receive timeout) as well as specific response codes (such as HTTP 5xx responses).
Event summaries are currently in limited availability (LA), and only available to customers who have their ThousandEyes' organization in the US cloud.
ThousandEyes uses large language models (LLMs) to summarize events, assisting users (particularly first time and inexperienced ThousandEyes users) with understanding the underlying issue/s. These summaries are presented as part of the Event Details section of the UI, and include a breakdown of what the event is, and why it was generated.
You can also click the Read More link beside either Event Summary or Why am I seeing this? if the summary is too long for the current view, to open a pop-up modal with the full generated text:
ThousandEyes events-based LLM summaries are built with Llama. We use an LLM model in AWS to generate the summaries, and do not use your data (via events) to train the LLM model. Additionally, no part of your data for this is stored outside the ThousandEyes cloud instance.
These summaries adhere to the Cisco principles of responsible artificial intelligence, found here: Cisco Responsible Artificial Intelligence Principles. Cisco's responsible AI framework can be found here: Cisco Responsible Artificial Intelligence Framework.
AI-generated summaries may include errors, and should be viewed as additional information, rather than a definitive identification of the problem.
To opt-out of large language model summaries, contact ThousandEyes Customer Support.
ThousandEyes event detection can identify events that have occurred multiple times in the last 30 days, and marks them as recurring events, to ensure you are able to see repeat patterns and resolve them. Recurring events have a timeline at the top of their individual view, showing you when in the last 30 days the event occurred.
You can also click the Summarize Recurrence button to generate a summary of the recurrence, how often it has happened, and when it has occurred.
This button is not available if you have opted out of large language model summaries (see Opt-Out of Large Language Model Summaries.
No configuration or additional setup is needed to enable events as long as Cloud and/or Enterprise Agents are used for testing.
Event detection can be found under Cloud & Enterprise Agents in the ThousandEyes web application:
The main events page shows all events from the last 30 days by default. A number of filters are available, in addition to the search bar, to help you navigate the list of current events:
The date filter allows you to filter either by a relative or fixed time interval within the last 30 days. All data shown on the page reflects the selection made in the date filter.
The summary cards allow you to view all events of a single impact level within the previously defined date range, all currently ongoing events, or events that have recurred during the last 30 days.
The table filters allow you to further narrow your field of vision, by filtering by one or more event types or impact levels, as well as event duration and whether the events are ongoing.
The "Recurring" filter is set to "Yes" by default, ensuring all recurring events are grouped together in the events table. The other summary cards (high, medium, low, ongoing) do not consider recurrence, and only show the count for individual events.
When you select an impact card (or filter), the "Recurring" filter will automatically change to "No". You can click the filter to toggle it back to "Yes".
You can then dive deeper into individual events by clicking on the name of one.
Here is an example event with an LLM summary:
Here is an example event without the summary:
In both examples, the view can be broken into four parts: the Top Panel, Event Details, the Map view, and the Affected Items section.
The top panel includes the title of the event, the affected areas/domains, and a View More Events button, that takes you back to the full event list.
For recurring events, a timeline view of when the event has occurred in the last 30 days is also present. You can navigate between the individual events by click the back and forth arrows above the timeline:
The event details section provides a summary of the event, why it was generated, the impact (see Event Impact Levels), the start date and duration, and the number of affected agents/unique tests.
If event summaries have been opted out of (see Opt-Out of Large Language Model Summaries, the event details sections provides a brief summary, the impact, start date and duration, and the number of affected agents/unique tests.
In addition, the Rate This Event link allows customers to provide detailed feedback on the usefulness of each event.
The map view shows the locations where affected agents can be found.
The affected items section provides a list of all tests and agents impacted by the event. The table shows the number of affected agents, the test target/type, and the account group. You can use the drop-down menu to filter by the affected account groups.
If the user viewing the event has permissions to view an individual test in the table, a link will be present to take the user directly to the relevant test view, drilled-down to the exact moment of the event. Tests the user does not have permission to view will show (No access).
If multiple tests have been impacted, an additional button is added to the view, allowing users to navigate to a multi-test view of the impacted tests.
For more information on multi-test views, see Multi-Service Views.
Although events (generated as part of event detection) are sent as alerts, they are different from alerts, and each serves their own purpose/use case. The table below outlines the difference between the features:
Value
Quickly know about issues seen in each test.
Know the common issue (problem domain) affecting multiple tests and / or agents.
Global view of the health of the Internet.
Data Source
Individual test in each account group.
All Cloud and Enterprise Agent tests across all account groups in the organization.
All Cloud and Enterprise Agent tests across all ThousandEyes customers (with anonymized data).
Notifications
Email, webhook, other integrations.
Email only.
Email, webhooks, other integrations.
Configuration Setup
Requires alert rule configuration.
No configuration setup required; only needs Cloud and Enterprise Agent tests.
No configuration setup required; only needs an Internet Insights license.
Triggered By
User-defined rules.
Multiple failing signals across one or more tests that share the same problem domain.
Outage detected when many tests to the same target or network fail.
Triggered On
Single test type (e.g. HTTP test).
Correlation across test types and tests.
Packet loss (for network outage), or server error (for application outage).
Event detection is included as part of a customer’s purchase of units to utilize Cloud and Enterprise agents at no additional cost.
Event detection does not analyze Endpoint Agent data.
Event detection covers HTTP tests and network tests from all test types (except agent-to-agent and voice).
Only email notifications are available for event detection.
Event detection sets baselines for metric values. Noisy network components that fail frequently are filtered out of event detection logic.
When an event is triggered, it is definitely indicative of an issue. Having said that, events may not catch all issues, as event detection only analyzes test results from HTTP tests & network tests from all test types (except agent-to-agent, voice). The more test data a ThousandEyes customer generates, the more accurate (and valuable) events will be for them.