# Data Sources and Coverage

## Overview

Provider Intelligence's recommendations are only as good as the data underpinning them. This article explains:

* **Where the data comes from**: ThousandEyes Cloud and Enterprise Agents and third-party provider databases.
* **Coverage gaps**: Why some providers appear with "No Data" and how to interpret this.
* **Agent proxies**: How Cloud and Enterprise Agent data serves as a proxy for end-user experience.

Understanding data sources is critical for evaluating the reliability and objectivity of Provider Intelligence results.

## Primary Data Source

ThousandEyes operates a global network of Cloud and Enterprise Agents, which are virtual machines deployed in data centers worldwide that continuously generate synthetic tests. These tests measure:

* **Network-layer metrics**: Latency, packet loss, jitter.
* **Application-layer metrics**: Time to First Byte (TTFB), HTTP response times, page load times.

### Coverage

Cloud and Enterprise Agents are deployed in hundreds of cities across six continents, providing broad geographic coverage. However, coverage is not universal:

* **High Coverage**: Major cities in North America, Europe, and parts of Asia-Pacific (for example, New York, London, Tokyo).
* **Moderate Coverage**: Secondary cities and emerging markets (for example, Bangalore, São Paulo, Johannesburg).
* **Low Coverage**: Remote or smaller cities might have limited or no agent presence.

To check current coverage:

1. Navigate to **Internet Insights > Catalog Settings > Providers** or **Internet Insights > Catalog Settings > Packages.**
2. Click any package or provider to view its **Coverage Map** (if available).

### Test Frequency and Volume

* **Granularity**: Tests run continuously, with data aggregated at 1-hour intervals.
* **Volume**: Provider Intelligence processes over 8 billion data points daily from Cloud and Enterprise Agent tests.
* **Duration**: Historical data is retained for up to 6 months.

## Third-Party Data Sources

Not all Autonomous Systems (ASNs) represent providers that you can purchase internet services from. Some are:

* **Private ASNs**: Used by individual organizations for internal routing (for example, a bank's internal network).
* **Content Delivery Networks (CDNs)**: Not traditional networks (for example, Cloudflare, Akamai).
* **Hosting Providers**: Might not offer commercial connectivity contracts (for example, AWS, Google Cloud).

To ensure Provider Intelligence only compares **commercially available** **providers**, ThousandEyes uses two authoritative databases:

* APNIC (Asia-Pacific Network Information Centre)
* PeeringDB

### APNIC

Website: <https://stats.labs.apnic.net/aspop>

#### What It Provides

APNIC maintains a database of ASNs with metadata including:

* Geographic presence (countries, cities).
* Provider type (ISP, CDN, hosting provider).
* Commercial availability status.

#### How Provider Intelligence Uses It

APNIC data is used to:

1. Identify which ASNs are ISPs vs. other network types.
2. Determine geographic footprint (for example, "Verizon operates in 50 US cities").
3. Filter out private or non-commercial ASNs from user-facing results.

### PeeringDB

Website: [https://www.peeringdb.com](https://www.peeringdb.com/)

#### What It Provides

PeeringDB is a community-maintained database of:

* Internet exchange points (IXPs).
* Peering relationships between ASNs.
* Provider facilities and points of presence (PoPs).

#### How Provider Intelligence Uses It

PeeringDB data is used to:

1. Validate which providers have peering relationships with major cloud providers (for example, AWS, Azure).
2. Identify providers with extensive peering, which often correlates with better performance.
3. Cross-reference APNIC data for accuracy.

### Legal and Attribution

ThousandEyes complies with the licensing terms of both APNIC and PeeringDB:

* **APNIC**: Re-use is permitted with attribution.
* **PeeringDB**: Data is used in accordance with PeeringDB's community guidelines.

## Agent Proxies

### The Proxy Model

Cloud and Enterprise Agents are not end-user devices. They are centralized test infrastructure deployed in data centers. However, they serve as **proxies** for end-user experience because:

* **Location**: Agents are deployed in cities where your end-users are located (for example, agents in Austin represent end-users in Austin).
* **Provider network path**: Agents send tests through the same provider networks that your end-users would use.
* **Synthetic tests**: Tests mimic real user actions (for example, loading a web page, connecting to a cloud service).

### Limitations

* **Last-mile differences**: Cloud Agents are in data centers with high-quality connectivity. Your end-users on residential or cellular networks might experience different performance.
* **Device variability**: Agents use standardized configurations. Real users have diverse devices, browsers, and provider network conditions.
* **Geographic granularity**: An agent in "Austin, TX" might not represent performance in specific neighborhoods or suburbs.

## Data Freshness and Updates

### Aggregation Schedule

Provider Intelligence data is updated on a **daily basis**:

1. **Hourly tests**: Cloud Agents generate test data every hour.
2. **Daily aggregation**: At midnight UTC, the platform processes the previous 24 hours of data, calculating scores and trends.
3. **Historical queries**: When you run a query, pre-aggregated scores for your selected time frame (1, 3, or 6 months) are retrieved from the database.

**Implication**: Data reflects performance up to 24 hours prior. Real-time data (for example, last hour) is not available in Provider Intelligence; use **Network & App Synthetics > Views** for real-time monitoring.

### Data Retention

* **6 months**: Standard retention for all Internet Insights customers.
* **Longer retention**: Contact ThousandEyes Sales to discuss data retention.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.thousandeyes.com/product-documentation/internet-insights/provider-intelligence/data-sources-and-coverage.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
