Meetup #124

Dataframes Are All You Need: MLOps on Easy Mode

It's often said that the hardest part of MLOps is building and maintaining your datasets. This talk covers the key abstraction of the Dataframe and why Dataframes are such powerful abstractions for this critical part of your MLOps workflow. They use Daft (www.getdaft.io) as a running example of a Dataframe to showcase how flexible this interface really is for heavy complex data processing, analytics, I/O, and feeding your machine learning training pipelines.

Take-aways

1. You may not need a DAG orchestrator: a Dataframe is... actually a DAG! 2. Keep it simple: Daft and S3 may be all you really need! 3. Dataframes for MLOps: an all-in-one tool for feature engineering and analytics for ensuring data quality

In this episode

Jay Chia

Jay Chia

Co-founder, Eventual

Jay is based in San Francisco and graduated from Cornell University where he did research in deep learning and computational biology. He has worked in ML Infrastructure across biotech (Freenome) and autonomous driving (Lyft L5), building large-scale data and computing platforms for diverse industries. Jay is now a maintainer of Daft: the distributed Python Dataframe for complex data.

Twitter

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Ben Epstein

Ben Epstein

Host

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.