At 00:40 to 08:25 UTC on June 23, 2023 users connected to the Zscaler DUS1 data center using DT (Deutsche Telekom) as their local ISP reported experiencing slow network performance. The issue quickly gained attention, leading to an influx of support tickets from customers.
Upon further analysis, Zscaler customers could see the issue in the ZDX heatmap below and quickly drill into the impacted users. Zooming into Germany, we see that half the users in the country were experiencing an issue, with the majority of the users not impacted.
ZDX Dashboard indicates the most impacted locations
With ZDX, customers can proactively identify service issues and quickly isolate them, giving IT teams confidence in the root cause while reducing mean time to resolve (MTTR) and mean time to detect (MTTD).
ZDX Score highlights Deutsche Telekom ISP slowness
A ZDX score represents all users in an organization, across all applications, locations, and cities. You can see the score on the ZDX Admin Portal dashboard. Depending on the time period and filters selected in the dashboard, the ZDX Score will adjust accordingly on a scale of 1 to 100, with the low end indicating a poor user experience.
As we look into users in Germany, we can see the ZDX Score drop into the poor experience range. That correlates to the slowness users were experiencing.
To get additional details, you can simply mouse over the ZDX Score in the poor range and select the AI-powered analyze button. In this instance, you can clearly see the “High latency between client’s egress and Zscaler, as observed during reverse Cloud Path from Zscaler cloud” as the issue.
ZDX AI-powered root cause analysis indicates the reason for the outage
Once you have an idea as to what the issue is, you can drill into the page fetch times for the particular time period when the issue was occurring. The Cloud Path is then highlighted which indicates the high latency in the Deutsche Telekom network.
ZDX Cloud Path indicates networking issues
ZDX also allows you to filter based on the ISP. In the screenshot below, you can see that Deutsche Telekom ISP is selected and the ZDX Score dropped with high page fetch times for the Outlook Online application. In this case, it’s not application specific, as it was the network that was causing the slowness.
ZDX Dashboard shows a low ZDX Score with high page fetch times
Additionally, you can click the “compare to” button to compare the ZDX Score to the last known good score, same time one day ago, same time two days ago, same time seven days ago, or custom time. To get a sense of what a good experience is compared to a slow experience, simply select the “last known good score.”
ZDX AI-powered root cause analysis comparison mode
You will now see a side-by-side comparison of the ZDX Score with the last known good score. It’s clear that the users experienced slowness and the higher latencies are not the baseline for their specific region. Keep in mind, in some instances, latencies are higher depending on the region, so “normal” might be different depending on where your users are located.
ZDX AI-powered root cause analysis comparison mode
Within ZDX you can see the Cloud Path which walks you through the hop-by-hop view from the end device to the application and all the hops in between. The comparison mode also shows you a side-by-side comparison of the Clouth Path. Here you can see the difference between the hops and clearly identify the actual hop which is causing the issue within the Deutsche Telekom network. Normally the users in this region don't see a higher latency within the DT network.
ZDX AI-powered root cause analysis comparison mode
The ZDX comparison mode also highlights the page fetch times and application response times. You can see the high latencies when users complained of slowness.
ZDX AI-powered root cause analysis comparison mode
With ZDX alerting, our customers were proactively notified about end user problems, and incidents were opened automatically with our service desk integration long before users started to report them. From a single dashboard, customers were able to quickly identify this as a Deutsche Telekom issue, not an internal network outage, saving precious IT time.
ZDX successfully detected a Deutsche Telekom outage along with its root cause, giving our customers visibility into who was impacted, their networks, and devices, thus averting critical impact to their business.
Summary
Imagine a world where we have real-time knowledge of ongoing incidents, including their start times, impact, and severity. With this information at hand, we can proactively engage users to mitigate the impact. ZDX can take us there!
As demonstrated in the above analysis, ZDX not only detects issues with precise start and end times but also provides a comprehensive overview of the affected companies and users. Moreover, it offers valuable insights into the paths that users traverse, enabling us to pinpoint the exact problematic point, whether it be the last mile ISP or an intermediate ISP. This capability facilitates effective issue resolution and, in certain cases, even allows for rerouting to bypass the problem altogether.
These analyses are made possible by advanced AI/ML models that drive the functionality of the ZDX incident dashboards.
Try Zscaler Digital Experience today
ZDX helps IT teams monitor digital experiences from the end user perspective to optimize performance and rapidly fix offending application, network, and device issues. To see how ZDX can help your organization, please contact us.
↧