Event Detection
Last updated
Last updated
ThousandEyes events automatically analyze, detect, and correlate anomalies to identify problem domains based on results from synthetic tests. These events are generated when sudden or anomalous deviations are seen for a metric, particularly across multiple tests and/or agents.
This article outlines what events are, how event detection works, the types of events, and how to drill down further from the event into the related views.
Events analyze test results from all Cloud & Enterprise Agents across all account groups in an organization.
Event detection occurs when ThousandEyes identifies that error signals related to a component (proxy, network node, AS, server etc) have deviated from the baselines established by events.
To determine this, ThousandEyes takes the test results from all accounts groups within an organization, and analyzes that data. Noisy test results (those that have too many errors in a short window) are removed until they stabilize, and the rest of the results are tagged with the components associated with that test result (for example, proxy, network, or server).
Next, any increase in failures from the test results and each component helps in determining the problem domain and which component may be at fault. When this failure rate increases beyond a pre-defined threshold (set by the algorithm), an event is triggered and an email notification is sent to the user (if they've enabled email alerts).
To enable email alerts for events, toggle the "Event Email Alerts" switch in the top right of the Cloud & Enterprise Agents -> Events page:
This will send an email only to the user who enabled the event email alerts.
There are three impact levels for events:
Low Impact events affect more than 1%, but less than 5%, of agents/tests across the organization.
Medium Impact events affect more than 5%, but less than 10%, of agents/tests across the organization.
High Impact events affect more than 10% of agents/tests across the organization.
There are currently five types of events that can be generated, based on where the fault or problem domain lies.
Local network events identify when issues are seen with the network in which the agent resides in.
Network outage events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) through a point of presence (PoP) or network node, or in the network in which the target is located (target-prefix issue).
Network events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) between two autonomous systems (AS).
Proxy events identify when traffic is unable to reach the destination due to network issues (like high packet loss/latency) when traversing the proxy.
Server events are indicative of issues with the application/target or the server that hosts the application.
In order to make it easier to determine the cause of a server event, and to save users time investigating the same issue from the affected tests perspective, we have now added the Cause field to show the major error type seen across tests that are part of the server event. This includes both the error seen in the HTTP phases (such as HTTP receive timeout) as well as specific response codes (such as HTTP 5xx responses).
No configuration or additional setup is needed to enable events as long as Cloud and/or Enterprise Agents are used for testing.
Event detection can be found under Cloud & Enterprise Agents in the ThousandEyes web application:
The main events page shows all events from the last 30 days by default. A number of filters are available, in addition to the search bar, to help you navigate the list of current events:
The date filter allows you to filter either by a relative or fixed time interval within the last 30 days. All data shown on the page reflects the selection made in the date filter.
The summary cards allow you to view all events of a single impact level within the previously defined date range, or all currently ongoing events.
The table filters allow you to further narrow your field of vision, by filtering by one or more event types or impact levels, as well as event duration and whether the events are ongoing.
You can then dive deeper into individual events by clicking on the name of one.
Here is an example event:
This view can be broken into four parts: the Top Panel, Event Details, the Map view, and the Affected Items section.
The top panel includes the title of the event, the affected areas/domains, and a View More Events button, that takes you back to the full event list.
The event details section provides a brief summary of the event, the impact (see Event Impact Levels), the start date and duration, and the number of affected agents/unique tests.
In addition, the Rate This Event link allows customers to provide detailed feedback on the usefulness of each event.
The map view shows the locations where affected agents can be found.
The affected items section provides a list of all tests and agents impacted by the event. The table shows the number of affected agents, the test target/type, and the account group. You can use the drop-down menu to filter by the affected account groups.
If the user viewing the event has permissions to view an individual test in the table, a link will be present to take the user directly to the relevant test view, drilled-down to the exact moment of the event. Tests the user does not have permission to view will show (No access).
If multiple tests have been impacted, an additional button is added to the view, allowing users to navigate to a multi-test view of the impacted tests.
For more information on multi-test views, see Multi-Service Views.
Although events (generated as part of event detection) are sent as alerts, they are different from alerts, and each serves their own purpose/use case. The table below outlines the difference between the features:
Value
Quickly know about issues seen in each test.
Know the common issue (problem domain) affecting multiple tests and / or agents.
Global view of the health of the Internet.
Data Source
Individual test in each account group.
All Cloud and Enterprise Agent tests across all account groups in the organization.
All Cloud and Enterprise Agent tests across all ThousandEyes customers (with anonymized data).
Notifications
Email, webhook, other integrations.
Email only.
Email, webhooks, other integrations.
Configuration Setup
Requires alert rule configuration.
No configuration setup required; only needs Cloud and Enterprise Agent tests.
No configuration setup required; only needs an Internet Insights license.
Triggered By
User-defined rules.
Multiple failing signals across one or more tests that share the same problem domain.
Outage detected when many tests to the same target or network fail.
Triggered On
Single test type (e.g. HTTP test).
Correlation across test types and tests.
Packet loss (for network outage), or server error (for application outage).
Event detection is included as part of a customer’s purchase of units to utilize Cloud and Enterprise agents at no additional cost.
Event detection does not analyze Endpoint Agent data.
Event detection covers HTTP tests and network tests from all test types (except agent-to-agent and voice).
Only email notifications are available for event detection.
Event detection sets baselines for metric values. Noisy network components that fail frequently are filtered out of event detection logic.
When an event is triggered, it is definitely indicative of an issue. Having said that, events may not catch all issues, as event detection only analyzes test results from HTTP tests & network tests from all test types (except agent-to-agent, voice). The more test data a ThousandEyes customer generates, the more accurate (and valuable) events will be for them.