Understanding Scores

Overview

When comparing providers, raw metric values, like milliseconds of latency, can't be meaningfully averaged across multiple destinations. For example, averaging 10 ms latency to AWS and 50 ms latency to Webex produces a 30 ms average, a figure that is both statistically misleading and contextually meaningless (latency to where?).

Provider Intelligence solves this conundrum by converting raw measurements into scores from 0-100 for each metric, which can be fairly compared across any number of destinations. These metric scores roll up billions of ThousandEyes measurements into a single, comparable 0-100.00 overall score for each provider, making side-by-side comparisons objective and intuitive.

When You See Scores

How data is displayed depends on which destination view you’ve chosen:

  • Summary views (Summary View and Universal Application Summary) display scores per metric, enabling clean cross-destination comparisons.

  • Single destination views (Webex or us-east-1, for example) display raw values per metric such as milliseconds latency or percent loss, where averaging is not required and precision is more useful. Trendlines are also visible in this view.

Score Types

There are four distinct score types used throughout Provider Intelligence, each independently calculated. The first three are used in part to generate the fourth: the overall score.

Score Type
Where Visible
Relational Status
Calculation Method

Performance Score

Each metric column: used as a results calculation method (see below)

Relative: Measures a provider against its peers for the same city and destinations. The same raw value can yield different scores in different cities depending on how peers perform.

Raw metric values for all providers sharing the same city and destinations are averaged and ranked over the time frame. Each provider's value is scaled between the best and worst in that peer group, producing a 0-100 score where 100 equals the top performer and 0 equals the bottom.

Stability Score

Each metric column

Absolute: Measures the consistency of a provider's own performance over time. Independent of peer behavior; multiple providers could score 100 (perfect consistency) regardless of performance score.

The provider's metric values over the time frame are compared to its own typical (median) value. Smaller and less frequent deviations from that baseline yield a higher score, up to 100 for perfect consistency.

Performance & Stability Score

Not visible: used as a results calculation method (see below)

Absolute: Combines the performance score and stability score for the same metric. The calculation penalizes imbalance between the two inputs, making this an intrinsic, provider-specific value.

Performance and Stability scores are combined using an average (harmonic mean) that is weighted toward the lower of the two. A provider must score well on both dimensions to receive a high combined score; strong performance cannot mask poor consistency, or vice-versa.

Overall Score

Overall Score column

Absolute: An aggregated score derived from already-normalized scores; it reflects a fixed, comparable value rather than a peer ranking.

Each metric's score, as calculated using your chosen results calculation mode (see below), is weighted according to your priority ranking (highest-priority metric gets the most weight). The weighted scores are summed to produce a single 0-100.00 value. See example below.

For a deeper understanding of each score type and its calculation method, see Performance Scoring, Stability Scoring, Performance & Stability Scoring, and Overall Scoring.

Trend Indicators

In single-destination views, trend arrows (↑ ↓ →) show whether a provider's raw metrics have improved, worsened, or stayed flat over your chosen time frame. Not all metrics show trendlines, only those where a lower value means better performance (except throughput, where higher = better). So, for metrics other than throughput*:

  • Improving = green down arrow

  • Worsening = red up arrow

  • Flat = gray horizontal arrow

*For throughput only, down is red and up is green.

Summary of Common Score Characteristics

  • Scores allow fair comparison across disparate destinations.

  • Scores range between 0-100 (or 100.00 for overall scores), where best is 100.

  • Scores are either relative or absolute depending on the calculation method.

  • Trends show performance directionality over the time frame.

Scoring End to End

The scoring process in full:

  1. Raw data collection: Tests run continuously, measuring actual performance.

  2. Performance scoring: Normalizes relative performance between providers hourly.

  3. Stability scoring: Measures the daily consistency of each provider individually.

  4. Performance & Stability scoring (if selected): Balances performance and stability to favor strength in both areas.

  5. Overall scoring: Weighted metrics, calculated using either performance scores alone or performance and stability scores, are summed to calculate the overall score.

Last updated