Capacity Planning

Network teams periodically review network performance, including whether there is enough bandwidth to serve user needs. Increasing capacity takes several months of advance planning. The WAN Insights capacity planning tool allows you to quickly identify circuits that are tending towards saturation, long before saturation is reached.

If the team sees ongoing sustained load above their own user-defined safety margins, they can initiate an internal process to upgrade circuit capacities with their service providers. Users may also complain about poor application performance, in which case the network administrators can review bandwidth utilization as part of a broader root-cause analysis.

How Capacity Planning Works

  • Capacity data is ingested into WAN Insights from vManage, or alternatively, you can bulk upload or manually edit the maximum bandwidth directly in WAN Insights. This value should be the maximum capacity of each circuit, typically configured using a bandwidth command on the SD-WAN routers.

  • Bandwidth usage data is collected in 10-minute increments, and is displayed in WAN Insights in a series of data rollups.

  • On the top rollup, a capacity list screen prioritizes routers and interfaces that are closest to saturation. This list shows all routers across your entire SD-WAN.

    • Saturation for ingress and egress uses color-coded warning thresholds for those routers that are closest to saturation.

    • Potentially, any line item that sees at least one 10-minute period close to saturation in the past 3 months is flagged, although you can adjust the percentiles to exclude extreme outliers.

  • A detail view shows the past 3 months of calendar data with a drill-down to 10-minute increments over a 24-hour period. Saturation for ingress and egress traffic is shown separately. The detail view shows patterns of saturation over an extended period of days or weeks.

  • The detail view also lists the “top talkers”, which are application categories (application lists in vManage) consuming the most bandwidth.

Frequency of vManage updates: Capacity data that changes in vManage is updated once every 24 hours, unless you’ve edited it in WAN Insights. Modifications made in WAN Insights will override any changes in vManage.

No individual users: Although you can see the number of users per application and site on a timeline in another screen in Site Details, you can’t see who is using the bandwidth.

When Saturation Exceeds Capacity

At times the capacity summary list screen might show saturation levels several orders of magnitude over capacity, which is logically impossible. This condition is due to incorrect bandwidth data. A warning message appears prompting the user to verify the original capacity settings. Depending on where your capacity data came from, you’d have to correct the capacity either in WAN Insights, or in vManage.

Note that some over-saturation is possible. If the utilization value is 3000%, it is more probable that the bandwidth setting is incorrect. However, a circuit saturation of 120% could be legitimate. While every circuit has a certain capacity, saturation can occasionally go a bit beyond the stated capacity, if the service provider allows.

When Capacity Data is Missing

If any bandwidth data is missing, the capacity screens won’t show saturation levels and you will see a warning message prompting you to upload or input capacity data.

User-Defined Capacity Settings

The Settings tab on the capacity screen lets you enter or modify circuit capacities, restrict monitoring to office hours only, define usage aggregation percentiles for the capacity summary page, and set warning thresholds that affect the color-coded severity levels on the list and detail windows.

Where Does Bandwidth Information Come From?

In order to judge whether a circuit is saturated, WAN Insights needs the maximum capacity. This data can come into WAN Insights in one of the following ways:

  • Automatically ingested from vManage.

  • Bulk uploaded as a CSV file in WAN Insights. You can download the current settings, make the desired changes or additions using the ingressOverrideCapacity and egressOverrideCapacity columns, and then re-upload the edited CSV file.

Missing capacity data is visually flagged on the WAN Insights user interface.

The CSV file contains the following columns:

  • hostname

  • systemIp

  • country

  • city

  • interfaceId

  • ingressVManageCapacity - WAN Insights populates the vManage column with the values obtained from the customer’s vManage config (if any). Do not modify this column.

  • egressVManageCapacity - WAN Insights populates the vManage column with the values obtained from the customer’s vManage config (if any). Do not modify this column.

  • ingressOverrideCapacity - Use this column to enter an override value for incoming traffic capacity.

  • egressOverrideCapacity - Use this column to enter an override value for outgoing traffic capacity.

Office Hours

The Time Window lets you specify whether to only consider office hours, or whether to flag anything in a 24-hour period. Office hours are pre-set as 9am to 5pm in the local time zone where the router is located.

Measurement and Aggregation

This section explains how the rollups on the main capacity screen are obtained. Initial bandwidth measurements occur every 10 minutes.

On the summary list screen, the 10-minute measurements for each router and interface have to be aggregated into a single number, which then determines that item’s position in the list. WAN Insights lets you summarize this metric by looking at its peak value. You can exclude isolated outliers with user-defined percentiles as explained below. Otherwise, if we take every single measurement into account, this method would prioritize the most acute 10-minute interval within the last 3 months, without considering whether this was a one-time occurrence or a chronic problem.

To limit the impact of outliers, you can set the aggregation level on the capacity Settings tab using the following percentile values:

  • 95% - corresponds to taking a value of the 10-minutes time series such that 95% of the data points have a value smaller or equal to the one used.

  • 98% - corresponds to taking a value of the 10-minutes time series such that 98% of the data points have a value smaller or equal to the one used.

  • 100% - summarize the entire series of 10-minutes data points with its maximum

Warning Thresholds

You can specify your own warning thresholds for moderate and severe saturation. The default is 80% for moderate and 85% for severe. Due to long planning cycles, however, some network teams might specify a warning threshold as low as 60% saturation. These thresholds affect the color coding and heatmap displays in the capacity list and detail screens.

Last updated