Meetup #113

Applying DevOps Practices in Data and ML Engineering

Modern Data and ML engineering needs to be agile and able to quickly respond to a changing business landscape without sacrificing necessary data quality. DevOps revolutionized Software engineering with its adoption of agile, lean practices, and fostering collaboration. We can see the same need to happen for Data Engineering as well. We will go over how we can adopt the best DevOps practices in the space of data engineering. And what are the challenges in adopting them considering the different skill sets of the data engineers and the different needs? We will show and demonstrate together how a new open source project Versatile Data Kit (https://github.com/vmware/versatile-data-kit) answers those questions. We will create an end-to-end data pipeline, and productionize it quickly and efficiently.

Take-aways

Much more efficient data engineering can be achieved if: - we reduce dependencies between teams - enable everyone to focus on work that requires their core skills - automate and abstract as much as possible data infrastructure and DevOps