WAN Insights Introductory Tour, Part 3
Last updated
Last updated
In the previous parts of our WAN Insights introductory tour, we focused on a proactive approach: review the top recommendations, choose one recommendation to review more closely, and drill down past the aggregated quality data to isolate poorly performing circuits and endpoint pairs. In this section, we’ll show you how to use WAN Insights to respond to user-reported quality issues.
A troubleshooting scenario seeks to resolve user complaints quickly by isolating the root cause and fixing the issue. We’re identifying sites with the lowest quality and highest user impact, regardless of whether WAN Insights has generated a recommendation or not.
Keep in mind also that:
Recommendations are only generated when there is a better path available.
Sometimes even on the recommended path, the quality is not optimal.
If it’s a recent event, users might be complaining even when there’s no recommendation in place. This troubleshooting workflow allows you to quickly identify sites with low quality, and troubleshoot those sites before the issue becomes worse.
A user complaint could be initially vague. For example, users at a particular site could report that "Voice seems bad," or "web applications are slower than normal." The first question is: is it the application itself, or is it the network? You can use WAN Insights to eliminate the network as the cause of the user problem. Be sure to add your business-critical applications to WAN Insights as described in Adding Business-Critical Applications to WAN Insights.
Start by evaluating network path quality from the context of the quality thresholds that are defined per application (or per application class) in its SLA. As described in Understanding Quality, Quality is an independent measure of network health. This means that quality always exists even when there are no recommendations, or no application traffic.
If any underlying circuits are consistently missing their SLA quality thresholds, the next step would be to reach out to the service providers who are responsible for those circuits.
Let’s say that your network team support queue includes a report of Office 365 problems at site XYZ. In ThousandEyes, choose WAN Insights > Site Details. This screen shows site quality over the last 7 days, for all sites and application categories.
By default, this list is sorted by highest impact. A greater quality improvement, that impacts a greater number of active users, translated into a greater overall impact.
Each row represents the quality for a particular application category and site, along with the estimated number of impacted users. The word “Available” appears if that site and application category has an active recommendation. The arrow on the left of each row expands the quality and number of users on each interface pair on the site.
Clicking on a row opens the troubleshooting details page, which is almost the same as the recommendation details page. You can use the timelines to visually confirm when the problem began, because users reporting problems might not have accurate information.
Some patterns to look for:
Is there a correlation between quality and the number of impacted users?
Is this a one-off problem, or a recurring problem?
Is the problem consistent, constant, or intermittent?
Which path(s) are most affected?
The next step might be to determine who or what is responsible for this problem. For example a long-term issue with bronze or biz-internet circuits could be a problem on the provider’s side, or it could be simply that the paths are overloaded and need more bandwidth allocation. Drill down to the endpoint-pairs in order to better understand the path degradation in terms of its main SLA measures (loss, latency, and jitter).
After examining the raw quality measures, the next step is to determine what can be done. For example:
If there is a recommendation, the improvement is significant, and the recommendation affects a large number of users, you can make the recommended path changes and see if things improve over the next few days.
If the issue is persistent on the same interface, you can discuss the situation with your service provider to ask if they can improve service quality.
If no alternative paths are available, consider what changes could be made in your network topology.
You can leverage ThousandEyes network and Internet intelligence to gain even more insight into factors affecting your network – particularly those portions of the network path that you don’t own.
In addition to looking into individual recommendations, you might want to investigate circuit utilization to look for saturation. If you’re experiencing poor performance, for example, across numerous biz-internet interfaces, you can visit the Capacity screen and compare utilization with the application traffic volumes shown on the Site Detail or Recommendation Detail screens. See Capacity Planning Screen and Capacity Detail Modal for more information.