To assess the existing Feature Store product, I designed a comprehensive research strategy, comprised of both quantitative and qualitative research methods. I utilized foundational research to better understand how users utilize the Feature Store product, and evaluative research to diagnose the usability problems. The existing KPI's, set by the business, were used as the quantitative measure of evaluation. I relied upon the qualitative research findings to help understand the quantitative measurements.
My team and I executed the research, and affinitized the data from our observations & findings. We evaluated the findings, first on a per-user basis, then organized the insights into 5 core categories.
I wanted our research assets to first and foremost serve as signposts for user empathy. The existing experience was riddled with friction points that caused users both cognitive & physiological problems. These real-world problems were significant, & needed to remain consistently in focus as we progressed through designing the new experience.
The personas are real users, Andela Data Scientists. For this project, I identified 4 major areas of friction, and utilized the full pilot user base to unravel each major friction point.
I created empathy maps for each user, then mapped the critical commonalities onto a summary empathy map.
The journey maps helped contextualize the core concepts from the empathy maps, by overlaying the insights against each phase of the Data Scientist's Machine Learning Model deployment process.
The insights from user research helped frame what my team & I needed to ideate upon, as we were redesigning the Feature Store product experience.
The MVP version of the product was heavily focused on the Command Line Interface. The experience forced the user to work through their process by memory, solely by executing commands in the CLI.
The flow of the existing experience was disjoint; it didn't allow the users to linearly work through the process of creating a new machine learning Feature. There was a lot of jumping around between different contexts, launching them directly from the command line interface.
Based on my contextual inquiry, and moderated usability test of the existing experience, I was able to map out the process the users utilize to design a machine learning feature. Understanding the process the user actually uses to design and deploy a new feature, I was able to design the new flow in-line with the user's actual process, thus completely linearizing the flow.
Even though I successfully designed the new Feature Store experience to be UI-driven, it was still important to integrate the command line interface into the product. The tab group in the top panel presents the flow of the experience linearly, & allows the users to seamlessly switch between the UI and CLI on-demand, without losing context of the UI experience. This further enhances the concept of linearization.
Data scientists use Andela Feature Store to execute abstract mathematical/statistical concepts. To represent these concepts, I came up with 3D icons that work as visual metaphors for the abstract concepts.
One such abstract concept is a transformation. A transformation is a replacement that changes the shape of a distribution or relationship. The visual metaphor for transformation, seen above, is a glyph showing a transparent rectangle transforming into an opaque circle.
Adding colors to the glyphs also helps them work as signposts when they are deployed in specific places within the product. These signposts ultimately become the cornerstone to helping users begin to establish a sense of place within the experience.
By its nature, Andela Feature Store is a highly technical product. Many of the interfaces are tables and code snippets. When a user clicks into a page for more detail, a subtle animated panel pops out from the page to ensure the context of the previous screen is kept in view. Breadcrumbs at the top of the screen reinforce this, and help the user understand where they are within the experience, & take desired actions.
These two functions are critical to enabling a Data Scientist to productionize a machine learning model, using the feature store. The Data Scientist needs to be able to first visualize data processing over time, then monitor the variables that affect how the data moves through the system, and finally, visualize the actual flow of data through the feature store system pipeline.
The screen above is designed to show a type of data processing called materialization. This bar chart displays the scheduling of materialization over time, & sets the stage for the Data Scientists to visualize the materialization's flow of data through the Andela Feature Store system.
In the original design, it wasn't possible for Data Scientists to switch between the materialization visualization and monitoring visualizations. The Data Scientists needed the ability to seamlessly switch between materializations and monitoring, so they could view the critical functions that affect materialization processing. The tab switching enables seamless and quick movement between the two visualizations, without losing the context of the feature being built.
The largest element contributing to excessive cognitive load was the inability to visualize the data-flows as they scaled-up.
To address the data-flow visualization, the first solution, (seen above), was a Sankey Diagram. The Sankey diagram was a great starting point to enable users to visualize the data flow between nodes. However, in testing, it didn't scale well. When the Data Scientists needed to view the data flows between tens or hundred of nodes, the visualization became too hard to manage.
The complex data-flow visualization became a huge blocker for my team. In order to unblock my team, I called an all-hands design thinking workshop with the full design team, along with members of the Data Science team, in order to ideate new solutions for the data-flow visualization. Through ideation challenges & participatory design, we arrived at a final solution.
The final solution is a fully modular & scalable data-flow visualization that uses simple animations, along with the established glyphs, to bring-to-life the movement of data as it flows through the pipeline.
The visualization groups the nodes by type, truncates them, and provides a search field for each node type, along with a direct link to its associated visualizations; this ensures the data flow animation is compact enough to fit on a single screen, but doesn't lose any of the context, or detail a user needs to progress through the experience. The user is able to pan and zoom to navigate the visualization as it grows exponentially larger.
To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library, capable of delivering high-level data contextualization to virtually any situation a user encounters. I conducted interviews with each Data Scientist to understand the challenges of visualizing big data.
The end solution is a combination of users specifying what visualizations they needed, & (through research), coming up with new/easier ways to visualize the data.
Matrix charts compare two or more groups of elements within a single group. This visualizations helps Data Scientists identify how prediction information is related, & quickly assess the strength of those relationships.
The cohort analysis enables Data Scientists to view user data over a specified unit of time. This analysis helps to inform AT&T Data Scientists of specific insights needed to build a predictive model about future customer behavior.
Summarizing data with descriptive statistics or making inferences about parameters it is important to look at the data. It is hard to see any patterns by looking at a list of hundreds of numbers. Equally, a single descriptive statistic in isolation can be misleading and give the wrong impression of the data. A plot of the data is therefore essential.
Hexagonal binning is a technique that is commonly used in data science applications, to understand the spread of the dataset. It’s a richer alternative to a scatterplot chart. The technique of binning uses aggregation of data points as a method to group data points in a range or scale, that is represented by shapes like squares and hexagons (typically) and the color or saturation of these shapes represents the density of data points inside the range of these shapes. This makes it easier to identify clusters of data and can depict patterns or trends as well. The size of these shapes can be adjusted to analyze data at a micro or macro level.
Hexagonal binning is a technique that is commonly used in data science applications, to understand the spread of the dataset. It’s a richer alternative to a scatterplot chart. The technique of binning uses aggregation of data points as a method to group data points in a range or scale, that is represented by shapes like squares and hexagons (typically) and the color or saturation of these shapes represents the density of data points inside the range of these shapes. This makes it easier to identify clusters of data and can depict patterns or trends as well. The size of these shapes can be adjusted to analyze data at a micro or macro level.
To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library.
Polar, or `Radar' charts are a form of graph that allows a visual comparison between several quantitative or qualitative aspects of a situation, or when charts are drawn for several situations using the same axes (poles), a visual comparison between the situations may be made.
Polar, or `Radar' charts are a form of graph that allows a visual comparison between several quantitative or qualitative aspects of a situation, or when charts are drawn for several situations using the same axes (poles), a visual comparison between the situations may be made.
To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library.
The Sankey Diagram enables AT&T Data Scientists to view the complex flow of data through the Feature Store system, when there are less than 50 nodes involved.
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.
Sunburst Chart — also known as Ring Chart, Multi-level Pie Chart, and Radial Treemap — is typically used to visualize hierarchical data structures.