Feature Store for Machine Learning: Comparison & Evaluation

What is a Feature Store?

Feature stores have become a critical component of the modern Machine Learning stack. They automate and centrally manage the data processes that power operational Machine Learning (ML) models in production, and allow data practitioners to build and deploy features quickly and reliably. Read more about what a feature store is and check out the additional resources below.

Feature Store Comparison

The MLOps Community has worked with vendors and community members to profile the major solutions available in the market today, based on our feature store evaluation framework.

Sort by
    • History:

      Co-created by GO-JEK and Google Cloud, now governed by the Linux Foundation with Tecton as main contributor

    • Stand-alone vs. Platform:

      Stand-alone feature store, integrates with 3rd party MLOps platforms

    • Delivery Model:

      Open source

    • Clouds Supported:

      AWS, GCP, Azure, On-Prem

    • Service Level Guarantees:

      None

    • Support:

      N/A (open source only)

    • History:

      Founded by the creators of Uber's Michelangelo platform

    • Stand-alone vs. Platform:

      Stand-alone feature store, integrates with 3rd party MLOps platforms

    • Delivery Model:

      Fully-managed cloud service

    • Clouds Supported:

      AWS (now), GCP and Azure (roadmap)

    • Service Level Guarantees:

      Uptime, Serving latencies

    • Support:

      24 x 7 support & response time guarantees

    • History:

      Developed internally by AWS

    • Stand-alone vs. Platform:

      Part of the Amazon SageMaker platform

    • Delivery Model:

      Fully-managed cloud service

    • Clouds Supported:

      AWS

    • Service Level Guarantees:

      None

    • Support:

      24 x 7 support & response time guarantees

    • History:

      Created by Databricks

    • Stand-alone vs. Platform:

      Part of a broader MLOps platform

    • Delivery Model:

      Fully-Managed Cloud Service

    • Clouds Supported:

      AWS, GCP, Azure

    • Service Level Guarantees:

      Uptime

    • Support:

      24 x 7 support & response time guarantees

    • History:

      First developed at KTH University, now managed by startup Logical Clocks

    • Stand-alone vs. Platform:

      Part of the Hopsworks MLOps platform

    • Delivery Model:

      Open source, self-managed commercial, and fully-managed cloud service

    • Clouds Supported:

      AWS and Azure (managed service), GCP and on-prem (self-managed)

    • Service Level Guarantees:

      Uptime, Serving latencies

    • Support:

      24 x 7 support & response time guarantees

    • History:

      The Qwak feature store was designed & built following the founder's experiences while leading the ML & data groups at AWS, Wix, and Payoneer.

    • Stand-alone vs. Platform:

      Part of a broader platform

    • Delivery Model:

      Fully-Managed Cloud Service

    • Clouds Supported:

      AWS

    • Service Level Guarantees:

      Uptime
      Serving latencies

    • Support:

      Yes

    • History:

      Originally created by Venkata Pingali and Indrayudh Ghoshal (founders)

    • Stand-alone vs. Platform:

      Stand-alone feature store

    • Delivery Model:

      Self Managed Commercial or Fully Managed Cloud service

    • Clouds Supported:

      On AWS, GCP, On-Prem

    • Service Level Guarantees:

      Uptime

    • Support:

      24 x 7 support & response time guarantees

    • History:

      Founded by Patrick Dougherty and Jared Parker

    • Stand-alone vs. Platform:

      Stand Alone Feature Store

    • Delivery Model:

      Open Source Software

      Fully-Managed Cloud Service

    • Clouds Supported:

      AWS, GCP, Azure, On-Prem

    • Service Level Guarantees:

      Uptime

    • Support:

      24 x 7 support & response time guarantees

    • History:

      Feature store created by Iguazio. Includes open source components created and maintained by Iguazio

    • Stand-alone vs. Platform:

      Part of the Iguazio Data Science Platform

    • Delivery Model:

      Open source components, self-managed commercial, and fully-managed cloud service

    • Clouds Supported:

      AWS, GCP, Azure, On-Prem

    • Service Level Guarantees:

      Uptime

    • Support:

      24 x 7 support & response time guarantees

How to choose a feature store

Are you looking to add a feature store to your ML stack? MLOps Community, with the help of feature store vendors, has created an evaluation framework to help you choose the right product for your needs.

Criteria 1

Commercial Information

First, you need to assess whether the product’s commercial characteristics meet your needs. We recommend evaluating the following commercial criteria:

  • Delivery Model: Open source or managed service? 
  • Standalone feature store or part of a broader ML platform? 
  • Is the product available on-premises and / or in your public cloud?
  • Is the product delivered as commercial software, open source software, or a managed cloud service?
  • What is the pricing model?  
  • SLOs / SLAs: Does the vendor provide guarantees around service levels?
  • Support: Does the vendor provide 24×7 support?

Criteria 2

Feature store capabilities

You will want to make sure that the feature store fulfills all the capabilities you need across the operational data workflow. We’ve broken down the capabilities as follows:

Feature Definitions
Does the feature store provide a framework for creating feature definitions (including the transformation logic and materialization), and can data scientists collaborate on the definitions?

Automated Transforms:
Does the feature store automatically execute the pipelines required to process the feature values, including historical backfill and fresh feature values? Do the transformations support batch, streaming and real-time data sources?

Feature Ingestion:
How are features ingested into the online and offline store?

Storage and Feature Processing Infrastructure:
What infrastructure does the feature store use to store and process feature values?

Feature Sharing & Discovery:
Is there an easy way to manage, share and discover features across the organization?

Online Serving:
How are features served online at inference time?

Training Datasets:
How do data scientists generate point-in-time accurate training datasets from the offline store?

Monitoring and Alerting:
What monitoring and alerting capabilities does the feature store provide?

Security and Data Governance:
What measures are in place to protect data and control access?

Integrations:
Which 3rd party data and ML tools does the feature store integrate with?

Frequently Asked Questions

  • What is a feature store?

    A feature store is a tool (or set of tools) that handles the movement of data needed for Machine Learning. Most of the time feature stores help get your feature data into an online storage layer needed for real-time serving.

    Feature stores are also widely associated with feature registries — tools that enable developers to share features and collaborate on the critical data assets that make your machine learning models great!

  • How is a feature store different from a data warehouse?

    There are a lot of unique ways that features for machine learning are built and consumed that require specialized tooling above-and-beyond what a data warehouse provides. Some examples:

    • Features often need to be queried in real-time, meaning they need to be stored in a low-latency storage layer
    • Training data generation often requires point-in-time correct joins, a frequently unique-to-ML query pattern
  • When do I need a feature store?

    There are typically two things that teams use to decide they need a feature store:

    • They’re rolling out a real-time prediction use-case and need to build the data pipelines that support real-time inference
    • Their team has grown and they need a way to share work between ML teams
  • Who does a feature store benefit?

    Feature stores can be a benefit to data scientists, data engineers, and ML engineers.

    • Data scientists benefit from being able to share their work with peers and use the features others have already built
    • Data engineers get tooling that makes it easier to build and support the data pipelines your ML team needs to support their use cases
    • ML engineers can get models into production with less headache
  • What are the best use cases for feature stores?

    Some of the most common ML use cases that really rely on feature stores are:

    • Recommendation
    • Search
    • Ranking
    • Fraud detection
    • Decisioning
  • How can I build a feature store?

    Building a feature store is a complex engineering effort. Check out some open-source offerings, see what you can adopt from those technologies, and then find out what additional requirements your use cases will have. For most companies, getting out a reasonable MVP takes a full engineering team at least a full year of effort.

  • Why use a feature store?

    You’ll want to use a feature store if:

    • building and maintaining the data pipelines you need for ML is taking up too much of your engineering time
    • Its taking forever to get new ML models into production
    • Your team is spending lots of duplicated effort building the same features over and over again