Alerts Reference for Connected Devices
This reference guide provides best practices, metric definitions, and key concepts to help you optimize your alerting strategy for Connected Devices. Use these resources to fine-tune your alert rules, reduce noise, and ensure you are monitoring the metrics that matter most for your device fleet. For in-depth information about the broader alerting platform, see Alerts.
Best Practices for Alert Management
Creating an effective alerting strategy involves more than just setting thresholds. Use these best practices to reduce noise, improve response times, and ensure your alerts provide actionable insights.
Use Descriptive Naming Conventions: Give your alert rules clear, consistent names that describe their scope and purpose. A format like
[Region] - [Metric] - [Threshold](such as), "West Coast - Latency > 100ms") makes it easy to scan your alert list and quickly identify issues.Implement Layered Detection: Create multiple alert rules for critical services to detect issues at different stages. For example, set a "Minor" severity alert with a lower threshold to act as an early warning system, and a "Critical" severity alert with a higher threshold to notify you of service-impacting outages.
Tune Thresholds to Reduce Noise: Avoid alert fatigue by tuning your thresholds and using the rolling time window logic. Instead of alerting on a single spike, configure your rule to trigger only if a condition persists (such as,
3 times in 5 intervals). This filters out transient network noise and ensures you are notified only for persistent issues.Leverage Suppression Windows: During planned maintenance or known outages, use Alert Suppression Windows to temporarily silence notifications. This keeps your alert history clean and prevents your team from being spammed with irrelevant notifications.
Integrate with Your Workflow: Don't let alerts sit in an inbox. Use webhooks to send alert data directly to your incident management system (like ServiceNow or PagerDuty) or team chat (like Slack). This creates a closed-loop workflow where alerts automatically trigger the correct response process.
Alert Metrics Reference
Connected Devices alerts allow you to monitor three key performance metrics. Understanding what each metric measures will help you set appropriate thresholds for your device fleet.
Latency (ms): Measures the round-trip time it takes for a packet to travel from the device to the test target and back. High latency results in "lag" or delays, which can severely impact real-time applications like online gaming, video conferencing, and remote desktop sessions.
Packet Loss (%): Measures the percentage of data packets that are lost during transmission and fail to reach their destination. Packet loss is a critical indicator of connection quality; even low levels of loss can cause buffering, slow download speeds, and poor application performance.
Jitter (ms): Measures the variation in latency between consecutive packets. High jitter means the arrival time of packets is inconsistent. This is particularly damaging for Voice over IP (VoIP) and video calls, causing choppy audio, robotic voice artifacts, or frozen video frames.
Key Alerting Concepts
To effectively manage your alerts, it is helpful to understand a few key terms used throughout the ThousandEyes platform.
Scope: The set of agents (devices) that an alert rule is monitoring. You can define the scope to include all agents running a test or limit it to a specific list of agent IDs.
Conditions: The specific logic and thresholds that determine when an alert should fire. This includes the metric (such as, Latency), the operator (such as, >), the value (such as, 100ms), and the rolling time window.
Trigger: A specific event that contributes to an alert. For example, if a single agent exceeds a latency threshold, that agent is a "trigger." An alert rule might require multiple triggers (such as, 5 agents) to activate a global alert.
For a more comprehensive overview of alerting fundamentals, including the difference between an alert and an event, see our main documentation on Alerting Basics.
Last updated