Meetup #113

Applying DevOps Practices in Data and ML Engineering

Modern Data and ML engineering needs to be agile and able to quickly respond to a changing business landscape without sacrificing necessary data quality. DevOps revolutionized Software engineering with its adoption of agile, lean practices, and fostering collaboration. We can see the same need to happen for Data Engineering as well. We will go over how we can adopt the best DevOps practices in the space of data engineering. And what are the challenges in adopting them considering the different skill sets of the data engineers and the different needs? We will show and demonstrate together how a new open source project Versatile Data Kit (https://github.com/vmware/versatile-data-kit) answers those questions. We will create an end-to-end data pipeline, and productionize it quickly and efficiently.

Take-aways

Much more efficient data engineering can be achieved if: - we reduce dependencies between teams - enable everyone to focus on work that requires their core skills - automate and abstract as much as possible data infrastructure and DevOps

In this episode

Antoni Ivanov

Antoni Ivanov

Software Engineer, VMWare

Antoni Ivanov is a Software Engineer specializing in scalable big data systems and data analytics infrastructure. Antoni has been working on building VMware data analytics platform from its beginning. Antoni is a lead maintainer of the recently open-sourced project Versatile Data Kit. Versatile Data Kit has transformed data engineering at VMware towards being code-first, fully automated, and decentralized. Now Antoni is working to bring that as open source software to all data practitioners in the community.

Twitter

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Ben Epstein

Ben Epstein

Host

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.