Andela Feature Store

Designing an enterprise-level Machine Learning platform that enables Data Scientists to make predictions at scale.

Project Details

Product Design
Primary Practice Area
2022
Year
Head of UX & Product Design
My Role

About The Project

Andela sought to create a platform that enables Data Scientists & Machine Learning Engineers to build high-quality Machine Learning Models. With Andela Feature Store, Data Scientists & Machine Learning Engineers can create high-quality models, at a fast pace, and utilize existing big data to easily deploy reusable production-level machine learning systems. The goal is to deliver an enterprise-level data platform for machine learning that automates real-time decisions at scale, manages the complete feature lifecycle, and creates high-value models rapidly.

Background

When I joined Andela, there was a basic, but working MVP for the Feature Store application. As testing data came in from early pilot users, I realized we needed something more polished and scalable.

The Challenge

Rearchitect Andela Feature Store, enabling the application to be scaled and deployed into full production across the enterprise.

The Design Process

For Whom We Designed

Data Scientists & Machine Learning Engineers

Data Scientists & Machine Learning Engineers take massive amounts of data & create innovative, revenue-driving ML models.

Understanding User Problems & Limitations

To assess the existing Feature Store product, I designed a comprehensive research strategy, comprised of both quantitative and qualitative research methods. I utilized foundational research to better understand how users utilize the Feature Store product, and evaluative research to diagnose the usability problems. The existing KPI's, set by the business, were used as the quantitative measure of evaluation. I relied upon the qualitative research findings to help understand the quantitative measurements.

Research Details

Exploratory
Foundational
Research Types
Contextual Inquiry
User Interviews
Usability Testing
Cognitive Load Testing
Research Methods

Synthesizing Research Data

Affinitizing the research data

My team and I executed the research, and affinitized the data from our observations & findings. We evaluated the findings, first on a per-user basis, then organized the insights into 5 core categories.

Framing Key Research Insights

I wanted our research assets to first and foremost serve as signposts for user empathy. The existing experience was riddled with friction points that caused users both cognitive & physiological problems. These real-world problems were significant, & needed to remain consistently in focus as we progressed through designing the new experience.

Personas based on real users

The personas are real users, Andela Data Scientists. For this project, I identified 4 major areas of friction, and utilized the full pilot user base to unravel each major friction point.

Summary empathy maps

I created empathy maps for each user, then mapped the critical commonalities onto a summary empathy map.

Scenario-based journey maps

The journey maps helped contextualize the core concepts from the empathy maps, by overlaying the insights against each phase of the Data Scientist's Machine Learning Model deployment process.

Turning Research Into Actionable Ideation Points

The insights from user research helped frame what my team & I needed to ideate upon, as we were redesigning the Feature Store product experience.

From Insights To Action

Users want a traditional UI to augment the CLI. The redesigned Feature Store experience needs to enable users need to be able to seamlessly switch between the UI & CLI.
The process of deploying a machine learning model is linear, but the existing feature store product forces the users into a non-linear experience; users lose their cognitive footing, often. The redesigned experience needs to be linearized, and parity to the Machine Learning model deployment process needs to be established.
Users have trouble maintaining their sense of place as they progress through the experience. The redesigned experience needs to consistently reinforce the user's sense of place.
Users are consistently worrying about their KPI's, and they aren't able to consistently achieve them. The new experience needs intuitive workflows that enable users to achieve their KPI's.
Users rely on complex visualizations to monitor multiple aspects of the process. The redesigned Feature Store product needs an entirely new set of data visualization tools, designed specifically for the user's needs.
Users need a comprehensive way to visualize massive & complex data flows. The redesigned Feature Store product needs to enable users to easily monitor and visualize the flow of data through the Feature Store pipeline.
User's efficiency is driven by muscle memory. The redesigned product needs to take into account muscle memory, as the new experience is tested.

Key Problems To Solve

How might we enable users to benefit from both a command line interface, & a graphical user interface?

The CLI is integral to deploying Machine Learning models, but requires users to work through the experience non-linearly, & by memory. Finding the right balance between the CLI & GUI will enable users to work much more efficiently.

How might we establish a clear sense of place within a data-dense product?

To help users stay grounded and find their way within tables and code snippets, I created a set of UX features that establish a clear, reliable way to orient users around their location as they progress through the experience.

How might we create recognizable icons for abstract concepts?

In order to help users find their way around the product, I created icons to represent abstract concepts like transformations, features, sources, & datasets. These icons needed to read as authentic to data scientists, while being memorable enough to reliably represent the concepts in multiple locations throughout the product.

How might we simplify data flows and processing?

In order to help data scientists visualize the data pipelines and materializations (a type of processing), I designed an extensive data visualization library, including a customized data flow visualization which enables users to stay in context when there are hundreds to thousands of nodes displayed.

How might we drive efficiencies through memorable workflows?

Andela measures the success of the tool through a set of usability and velocity metrics, like number of features created, or volume of data processed. In order to help meaningfully impact these KPIs, I wanted to drive muscle memory to make the users, not just the machines, faster. In order to do this, all of the design elements, from workflows to icons, were designed in order to enable users to operate with more speed & efficiency.

Assessing The Existing MVP

The starting point

The MVP version of the product was heavily focused on the Command Line Interface. The experience forced the user to work through their process by memory, solely by executing commands in the CLI.

The original non-linear Feature Store flow

The flow of the existing experience was disjoint; it didn't allow the users to linearly work through the process of creating a new machine learning Feature. There was a lot of jumping around between different contexts, launching them directly from the command line interface.

Linearizing the Feature Store flow

Based on my contextual inquiry, and moderated usability test of the existing experience, I was able to map out the process the users utilize to design a machine learning feature. Understanding the process the user actually uses to design and deploy a new feature, I was able to design the new flow in-line with the user's actual process, thus completely linearizing the flow.

Seamlessly switching between the UI & CLI

Even though I successfully designed the new Feature Store experience to be UI-driven, it was still important to integrate the command line interface into the product. The tab group in the top panel presents the flow of the experience linearly, & allows the users to seamlessly switch between the UI and CLI on-demand, without losing context of the UI experience. This further enhances the concept of linearization.

Turning Abstract Concepts Into Icons

Creating recognizable icons for abstract concepts

Data scientists use Andela Feature Store to execute abstract mathematical/statistical concepts. To represent these concepts, I came up with 3D icons that work as visual metaphors for the abstract concepts.

The transformation example

One such abstract concept is a transformation. A transformation is a replacement that changes the shape of a distribution or relationship. The visual metaphor for transformation, seen above, is a glyph showing a transparent rectangle transforming into an opaque circle.

Using color & icons to establish the sense of place

Adding colors to the glyphs also helps them work as signposts when they are deployed in specific places within the product. These signposts ultimately become the cornerstone to helping users begin to establish a sense of place within the experience.

Reinforcing The Sense Of Place

Enabling users to maintain their cognitive footing

By its nature, Andela Feature Store is a highly technical product. Many of the interfaces are tables and code snippets. When a user clicks into a page for more detail, a subtle animated panel pops out from the page to ensure the context of the previous screen is kept in view. Breadcrumbs at the top of the screen reinforce this, and help the user understand where they are within the experience, & take desired actions.

Visualizing Data Processing & Monitoring

These two functions are critical to enabling a Data Scientist to productionize a machine learning model, using the feature store. The Data Scientist needs to be able to first visualize data processing over time, then monitor the variables that affect how the data moves through the system, and finally, visualize the actual flow of data through the feature store system pipeline.

Simplifying the visualization of complex data processing

The screen above is designed to show a type of data processing called materialization. This bar chart displays the scheduling of materialization over time, & sets the stage for the Data Scientists to visualize the materialization's flow of data through the Andela Feature Store system.

Seamlessly switching between materialization & monitoring

In the original design, it wasn't possible for Data Scientists to switch between the materialization visualization and monitoring visualizations. The Data Scientists needed the ability to seamlessly switch between materializations and monitoring, so they could view the critical functions that affect materialization processing. The tab switching enables seamless and quick movement between the two visualizations, without losing the context of the feature being built.

Visualizing Highly Complex Data-Flows

The largest element contributing to excessive cognitive load was the inability to visualize the data-flows as they scaled-up.

The initial solution: the sankey diagram

To address the data-flow visualization, the first solution, (seen above), was a Sankey Diagram. The Sankey diagram was a great starting point to enable users to visualize the data flow between nodes. However, in testing, it didn't scale well. When the Data Scientists needed to view the data flows between tens or hundred of nodes, the visualization became too hard to manage.

Unblocking the design team

The complex data-flow visualization became a huge blocker for my team. In order to unblock my team, I called an all-hands design thinking workshop with the full design team, along with members of the Data Science team, in order to ideate new solutions for the data-flow visualization. Through ideation challenges & participatory design, we arrived at a final solution.

The final solution: custom data-flow visualization

The final solution is a fully modular & scalable data-flow visualization that uses simple animations, along with the established glyphs, to bring-to-life the movement of data as it flows through the pipeline.

The visualization groups the nodes by type, truncates them, and provides a search field for each node type, along with a direct link to its associated visualizations; this ensures the data flow animation is compact enough to fit on a single screen, but doesn't lose any of the context, or detail a user needs to progress through the experience. The user is able to pan and zoom to navigate the visualization as it grows exponentially larger.

Designing A Range Of Complex Visualizations

To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library, capable of delivering high-level data contextualization to virtually any situation a user encounters. I conducted interviews with each Data Scientist to understand the challenges of visualizing big data.

The end solution is a combination of users specifying what visualizations they needed, & (through research), coming up with new/easier ways to visualize the data.

Cell Matrix Chart

Matrix charts compare two or more groups of elements within a single group. This visualizations helps Data Scientists identify how prediction information is related, & quickly assess the strength of those relationships.

Cohort Analysis

The cohort analysis enables Data Scientists to view user data over a specified unit of time. This analysis helps to inform AT&T Data Scientists of specific insights needed to build a predictive model about future customer behavior.

Scatter Chart

Summarizing data with descriptive statistics or making inferences about parameters it is important to look at the data. It is hard to see any patterns by looking at a list of hundreds of numbers. Equally, a single descriptive statistic in isolation can be misleading and give the wrong impression of the data. A plot of the data is therefore essential.

Wide Hexagon Chart

Hexagonal binning is a technique that is commonly used in data science applications, to understand the spread of the dataset. It’s a richer alternative to a scatterplot chart. The technique of binning uses aggregation of data points as a method to group data points in a range or scale, that is represented by shapes like squares and hexagons (typically) and the color or saturation of these shapes represents the density of data points inside the range of these shapes. This makes it easier to identify clusters of data and can depict patterns or trends as well. The size of these shapes can be adjusted to analyze data at a micro or macro level.

Narrow Hexagon Chart

Prediction Chart

To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library.

Polar Chart

Polar, or `Radar' charts are a form of graph that allows a visual comparison between several quantitative or qualitative aspects of a situation, or when charts are drawn for several situations using the same axes (poles), a visual comparison between the situations may be made.

Radar Chart

Rhombus Matrix

To meet the needs of monitoring and visualizing a wide variety of big data, I designed a full data visualization library.

Sankey Diagram

The Sankey Diagram enables AT&T Data Scientists to view the complex flow of data through the Feature Store system, when there are less than 50 nodes involved.

Distribution Plot

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.

Heat Map

Sunburst Chart — also known as Ring Chart, Multi-level Pie Chart, and Radial Treemap — is typically used to visualize hierarchical data structures.

Outcomes

New Revenue

$25M

Andela Feature Store generated over $25M in new revenue through newly created data monetization models.

Reduction In Cognitive Load

75%

The rearchitected version of Andela Feature Store achieved a 75% reduction in cognitive load, as compared to the initial MVP.

Increase In Volume Of Data Processed

125%

The rearchitected version of Andela Feature Store enabled Data Scientists to process 125% more data volume per production cycle, as compared to the initial MVP.

Decrease In Time To Productionize ML Model

54%

The rearchitected version of Andela Feature Store reduced the time it takes to productionize a new ML Model by 54%, as compared to the initial MVP.

Increase In Number Of Features Created Per Cycle

89%

The rearchitected version of Andela Feature Store enabled Data Scientists to create 89% more features per production cycle, as compared to the initial MVP.

View More Case Studies

Andela Design Leadership

How I transformed Andela's Design Organization & product offerings.

Andela Epic Design System

Creating a single source of truth that enables design decisions to be scaled across the design organization.

Design Systems Design Ops Figma

Get in touch