Responsiveness (Latency under Load) Tests

The responsiveness test aims to measure the responsiveness under working conditions of the internet connection, also variously referred to as working latency, bufferbloat, or latency under load. Our responsiveness test specifically attempts to measure the queuing latency under network congestion.

The methodology is based on the “Responsiveness Under Working Conditions” draft by the IETF IP Performance Measurement working group.

Test Concept

The core idea of the test is to make many latency probes (round-trip time measurements) of different kinds in both unloaded and loaded (“working”) conditions.

The loaded conditions consist of as much throughput as possible being sent through a number of TCP connections in order to create network congestion and fill intermediate network queues/buffers. The generated load is either all in the uplink direction, or all in the downlink direction.

Test Sequence

The test sequence has three phases:

  1. Pre-test Latency Probing

  • During this phase, the test makes regular latency measurements to establish the baseline latency from the agent to the test server. After a fixed amount of time taking measurements, the test moves on to the next phase.

  1. Warm-Up

  • The agent communicates with the test server via a TCP connection and starts a test “session” on a control connection which remains unused during the test.

  • The agent then establishes a fixed number of TCP connections to the test server and starts either sending or receiving data as fast as possible on each. The total throughput across all connections is monitored and when it reaches a condition of stability, the warm-up is terminated, and the test moves on to the next phase.

  • The conditions of stability can be different according to the test parameters:

    • The total throughput across all connections as measured by the agent having variance of less than 15% over the last second.

    • The total throughput across all connections as measured by the receiver (which is the server in the case of an upload test) having variance of less than 15% over the last second.

    • The total throughput across all connections as measured by the agent having variance of less than 15% over the last second, followed by the last four values of the median round-trip time aggregated over a 250 ms window, and having a variance of less than 15%.

  • Based on a test parameter, the test can either fail or continue if the warm-up stage does not reach stability within a given time limit.

  1. Working Latency Measurement

  • During this phase, the high-load traffic is maintained by the agent and server, whilst a large number of latency measurements are made.

  • Once the requested test time window has elapsed, the load-generating connections close and the latency measurements stop.

  • The agent then uses the control connection to download the server’s measured results for the test session, including the transfer counters over time (in order to compute the received speed for upload tests) and the estimated load connection TCP round-trip time (RTT) over time (in order to have valid RTT estimates for download tests, as only the sending side can observe the RTT).

  • The agent then computes a variety of summary statistics from the collected data on both the server and agent and outputs them for collection.

Latency Probes

Each kind of latency probe under a certain condition is referred to as a “class” of latency probe.

Latency probes on load-generating connections are one class and are measured as estimates of the TCP connection round-trip time. These are computed based on the time between a segment being transmitted and the ACK (acknowledgement) being received. Notably, it does not include the contribution of dropped packets and retransmissions.

Latency probes on separate, dedicated connections are another class, and are measured by making an HTTP request to a server with a known short response body. As an application-layer end-to-end measurement, this does include the contribution of any dropped packets and retransmissions, as well as a small contribution of processing time from the server application (which is negligible compared to practical latency values between agents and servers). The TCP connection establishment time and application (HTTP) round-trip time are recorded separately and reported separately.

The different classes measured are shown in the table below.

Unloaded connection

Loaded connection (“working conditions”)

Load-generating connections

(N/A)

Connection round-trip time

Separate, dedicated connections

Connection establish time, application round-trip time

Connection establish time, application round-trip time

The connection round-trip time of the load-generating connections is referred to as the “in-band” latency. It captures the effect of large buffers on the transmission path causing an excessively large send window within a single connection. For example, if two seconds of video streaming data is already in transmission queues at the time of the user moving the video cursor, it will take a minimum of two seconds to consume that data before any new video content can arrive.

The use of separate, dedicated connections with different source port numbers captures the effect of one application’s internet traffic on another, for example, a user carrying out a bulk transfer while making a video call. The amount of queuing latency incurred in probes of this kind depends on the flow-queuing method being performed on the bottleneck link.

Being able to measure and report on these two different classes of queuing latency is one of the key benefits of this test.

Test Outputs

The key outputs from this test are:

  • Estimated responsiveness value from the IETF draft in “RPM” (roundtrips per minute).

  • Throughput of all load-generation threads during the latency measurement period.

  • For loaded conditions, the TCP round-trip time measurements of the load generation connections (min, 25/50/75/90/95/99th percentiles, max, mean).

For both loaded and unloaded network conditions:

  • TCP connection establishment time (min, 25/50/75/90/95/99th percentiles, max, mean) for a dedicated connection latency probe.

  • Application data round-trip time (min, 25/50/75/90/95/99th percentiles, max, mean) for a dedicated connection latency probe.

Extensions to IETF IPPM Responsiveness Draft

Quiescent Connection Latency Measurement

The IETF draft was written within the context of user-triggered testing on a consumer device (such as a laptop, desktop, or smartphone) in mind. Our test runs in the context of the Device Agent (in this case, a router), which monitors for cross-traffic, and therefore ensures the test can run while the local network is not currently in use.

This means that our responsiveness test can collect baseline latency measurements to the same test server using the same protocol as under the loaded test condition. This allows insight into not only the responsiveness under working conditions, but also the increase in latency when the line is saturated.

Static Thread-Count Ramp-Up

The IETF draft was written to define a test procedure that would saturate network conditions for purposes of measuring responsiveness without any knowledge of the network that the device is operating on. For that reason, the draft defines a ramp-up procedure that progressively adds threads until the throughput and latency no longer increase.

The Device Agent runs tests according to a schedule set centrally and with knowledge of the device's ISP, package, and access technology. For this reason, a value can be picked for the number of threads required to complete ramp-up as quickly as possible to reduce the load on the test servers.

This approach also mitigates instability caused by dynamically adding connections or working threads on the relatively constrained computing environments that host the device agent.

Copying Reduction

Many devices on which the agent runs (such as home Wi-Fi routers) are cost-sensitive devices, designed to optimize performance of routing traffic. As such, they are not optimized for sending traffic through TCP connections which terminate on the device itself – the application CPU, memory bus or many other components could be a bottleneck.

To make sure that our test can saturate the network link on as many devices as possible, we make several optimizations to minimize the processing of data by the test application. In the case of downlink tests, the actual application data sent over the load-generating threads is not copied out of the network stack. In the case of uplink tests, the application data that is sent is a constant buffer to avoid having to load or generate data on the fly.

This means our test cannot conduct a higher-level application protocol like HTTP/2 or TLS as specified in the IETF draft to enable multiplexed application latency probes. As a result, we do not directly measure:

  • The application layer round-trip time (RTT).

  • The TLS session establishment time.

As a proxy for the application layer RTT, we use the TCP stack estimate of the RTT. This estimate is computed by recording the average time between a TCP segment being transmitted and the sender receiving the acknowledgement. Because not every TCP segment is acknowledged, and because the estimate is averaged over time, this does not accurately capture the effect of individual packet losses and retransmissions in the same way as the IETF draft specifies; however, it is the best available estimate.

As a result of the TLS session establishment times being undefined, we cannot use the exact formula for the responsiveness value in RPM. However, the spirit of the draft is maintained and we still use the average of the trimmed means for each available class of latency probe under loaded conditions.

Responsiveness Example

We used data from our responsiveness (latency under load) test to show how a relatively unknown but common cause of high latency - bufferfbloat - can badly affect video streaming, online gaming, and teleconferencing.

Last updated