Databricks

Video coming soon

Commercial Information

Vendor Name

Databricks

History

Created by Databricks

Stand-alone vs. Platform

Part of a broader MLOps platform

Delivery Model

Fully-Managed Cloud Service

Clouds Supported

AWS, GCP, Azure

Pricing Model

Consumption Pricing

Service Level Guarantees

Uptime

Support

24 x 7 support & response time guarantees

Feature Store Capabilities

Feature Definitions

Declarative framework for defining features (incl. transformations and materialization)

Feature ingestion jobs managed in notebooks

Feature definitions are managed with Delta as a backing layer and a metadata managed service for schema enforcement.

Automated Transforms

Automated pipeline orchestration

Managed Batch

Streaming and Real-Time Transformations

Automated backfill of historical data

Pipeline visualization

Feature Ingestion

Spark (for batch feature ingestion)

Pandas (for batch feature ingestion)

SQL (for batch ingestion)

Spark Streaming (for streaming feature ingestion)

Storage and Feature Processing Infrastructure

Online currently can be pushed asynchronously to Aurora, RDS MySQL, Azure DB for MySQL, and Azure SQL DB. GCP online sync is still WIP.

Feature Sharing and Discovery

Web UI

Searchable feature catalog with metadata

Feature discovery including transformations, data lineage, and values

Feature versioning and dependency management

Training Dataset Generation

Dataset generated from offline storage using Python SDK

Integrated with MLflow artifacts to automate retraining feature fetch when linked as a feature store definition

Online Serving

Allows sync for pushing feature data to online stores (RDS, Azure SQL, etc.)

Monitoring and Alerting

Data quality monitoring

Security and Data Governance

Data remains in end-user's cloud account

ACL, SSO, RBAC

Integrations

Batch: Delta (defined in Hive metastore)

Streaming ingest through Spark from Kafka, Kinesis, EventHubs.