Endpoint Agent 2.0
Redesigned a monitoring product for ThousandEyes to help IT department ensure network quality for the employees.
Background
Endpoint Agent, as a new strategic initiative of ThousandEyes product portfolio, was launched at the time I joined ThousandEyes. It aims to be the must-have network operation product for Enterprise IT. However, it hasn't performed very well to engage customers.
I led the efforts to evolve the product and address critical user experience challenge through conducting user research to rephrase problems, brainstorming design concepts to solve the problem, explore interaction design to optimize the experience and final visual design.
The design solution significantly impacted the product roadmap and has been adopted as the framework for all other monitoring products
Role
Product designer
Theme
Business-facing Product, Data visualization
Timeframe
11/2016 - 07/2017
PRODUCT VISION
Give IT Operation Full Visibility of Network
We're using all kinds of web applications for our daily work. The digital productivity of the workforce has an inevitable dependency on the quality of internet service. That's why the IT operations team need to manage the digital experience of every employee.
With Endpoint Agent installed, an employee's laptop can collect network performance of web application they're using.
IT departments can use ThousandEyes platform to configure what applications to be monitored as well as digest collected data to get actionable insights, e.g. identify the bottleneck of slow network quality.
The Client side of Endpoint Agent: A windows or MacOS software
Server side of Endpoint Agent: A SaaS application to manage and troubleshoot
Some useful knowledge to understand this project
ISP = Internet Service Provider
Different business could choose different service providers (AT&T, Comcast, Verizon).
Network Path
A network path is where data need to travel through in order to get connected. For example, if you want to use Slack in the office, the network traffic will travel from your laptop to Slack’s server. It sounds straightforward while the complete path includes more steps — office network > AT&T > Comcast > Verizon > Slack’s server.
Any issue with any one of the stops could break the network connectivity. The key value of ThousandEyes platform is to visualize the complete path to diagnose and troubleshoot connectivity issues.
CHALLENGE
Reconsile IT Admin with Use of Mass Data
Since the vision of this product is to help IT operation team monitor digital experience for the organization, being able to support company-size deployment is the key business goal of this product. From the user experience perspective, data collection from all employees in an organization will also provide more reliable insights and deliver full coverage of monitoring.
However, the enterprise customers expressed the concerns over adopting the solutions.
Hundreds of thousands of agents seem unmanageable. When the organizations have hundreds of thousands of employees, customers are worried about the efficiency of managing agents since we provided poor support for the batch operation.
Messy visualization prevents people from getting insights. The existing troubleshooting view, designed in 2013 for the flagship product, struggled alongside the hyper-growth of the data. In addition, since the employees could shut down their laptop at anytime or work from home sometimes. These behaviors could make it challenging for us to present data.
Existing visualization works well for small amount of data (left) but looks very messy when the scale of data is large (right)
DISCOVERY
What's the ideal experience for IT operation?
The customer concerns mentioned above are very obvious problems when looking at our existing UI. Instead of starting with a highly objective-based exploration about the desirable solution to scaled agent management and visualization, I opted in building the foundational understanding of context, motivation and broad challenge on the way to large deployment. The very basic questions I’m trying to figure out is:
What type of roles in the IT Operation team were involved?
What type of information can the IT department rely on to ensure the good quality of network the employees?
What are the key insights that this data could provide when something goes wrong?
RESEARCH
Target Users
I did 15 customer calls with 8 organizations of different sizes including Global 500 companies and small&medium business with the product manager. Each customer call includes a semi-structured interview about the view and visions on their jobs as well as the current experience working with the product.
In order to draft the scenarios and get them translated into product features, I started with creating a set of customer archtypes. In the beginning, the two roles I defined was based on the job functions: IT admin vs network engineer. But after participating in more sessions to validate, I quickly realized this doesn't work very well since some small team could have several or all roles combined, and some team could have other job titles to handle the same job. The better separation should be the responsibility of the job.
User Story
In order to figure out the ideal experience, what I did is to try to dig deep into what people are looking for. By synthesizing the user research and customer calls we did, we came up with a set of user stories to summarize what people want to achieve.
Finding 1
Finding 2
Finding 3
Straightforward indicators are strongly preferred to identify issues
Guards (include Ops and SREs) are those who consume data on a daily basis to find issues. They have tons of different application/service to look at. What they want is a very clear indicator to tell them which is good and which is bad.
However, it was mentioned that in 90% of the use cases, issues could only be identified after the network problems were reported, which means users don't know where to start unless they're told the problem.
DESIGN GOAL
From Data to Intelligence
The discovery led up to a true summary deliverable listing the overall high-level goal of the revamp:
Come up with a data analytics experience that could provide appropriate granularity in different stages of troubleshooting so the visualization could keep clean, digestible and insightful
Make sure people could get insights from data effortlessly.
Improve the efficiency of agent management and make the plan of deployment easy to track.
As part of this redesign effort, here's a list of decisions we made to achieve the high-level goal:
Reorganize current information architecture of the platform to group data view and setting pages for endpoint agent into its own menu section;
Design an "Overview" dashboard to facilitate the construction of problem indicator and trend analysis;
Design a filter system that helps users flexibly control the scope of data;
Design a dynamic label system that helps users slice and dice the agent populations.
After we define the vision of the Endpoint Agent, each point above became a separate design project that would be added to different design sprint in order to better sync with Engineering resource.
The Design
Make discovering problems easier
Get data aggregated and put thresholds so that customers can get an idea of the overall performance and bottlenecks. Users can easily find issues by looking at red (bad metrics) or orange (alerts)
Shape the Visualization to Give Clear Insights
Nodes are grouped based on their location and/or network, so users can get a clearer visualization. When customers figure out the segment of data they want to analyze and find an outage, the visualization will clearly tell customers what is the location of the root cause, which business units get impacted because the data is aggregated by node grouping.
Efficiency of Management
Customers can always get a high-level idea of how the agents are allocated across different locations. At the same time, customers can quickly zero in on agents that are uninstalled, out of contact, old versions, etc, and take the action in batches.
Slice and Dice Agent Population Smartly
If customers want to customize how the agent population is organized. Agents can be enrolled to a label automatically and arbitrarily so that anyone in the team should be able to understand what is the theme and purpose of the data collection.
BEHIND THE DESIGN
Overview Page Exploration
One major workflow change we bring to Endpoint Agent was that we add an overview page to highlight problems based on the data we collect. In this way, people are promoted to start with agent populations that have network problems. In the early stage, we explore different direction to achieve this workflow.
Idea Explored: Cover Page
One option was to add a cover page for every hour in which applications and locations with network problems would be highlighted. Customers can drill down to details by clicking the link.
After showing these concepts to the customers, we realize that this overview concept is not good enough since the user group who need this overview urgently usually sit in a Network Operation Center where the data will be shown in a large screen. The current layout will generate a large white space on the TV screen. That's why we start brainstorming the idea of the built-in dashboard with new visualization that is optimized for large screen scenario.
Finally, I came up with 3 new visualizations for endpoint agent dashboard. One of the key visualizations we were exploring is the color grid visualization since the card view is the best way to utilize horizontal space.
Exploration of color grid visualization
Final Version
The final design we end up with takes full advantage of the screen size and is responsive. When we show this visualization to customers, we got very positive feedbacks.