When building a platform, a good start would be to define the goals and features of that platform, knowing it will evolve. Kubernetes is established as the de facto standard for scalable platforms but it is not a fully-fledged data platform. Do ML engineers have to learn and use Kubernetes directly? They probably shouldn't. So it is up to the data engineering team to provide the tools and abstraction necessary to allow ML engineers to do their work. The time, effort, and knowledge it takes to build a data platform is already quite an achievement. When it is built, one has to maintain it, monitor it, train people to on-call rotation, implement escalation policies and disaster recovery, optimize for usage and costs, secure it and build a whole ecosystem of tools around it (front-end, CLI, dashboards). That cost might be too high and time-consuming for some companies to consider building their own ML platform as opposed to cloud offering alternatives. Note that cloud offerings still require some of those points but most of the work is already done.
In this episode
Site Reliability Engineer in Data Infrastructure, Spotify
Julien is a software engineer turned Site Reliability Engineer. He is a Google developer expert, certified Data Engineer on Google Cloud and Kubernetes Administrator, mentor for Woman Developer Academy and Google For Startups program. He is working on building and maintaining data/ML platform.
Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.
Vishnu Rachakonda is the operations lead for the MLOps Community and co-hosts the MLOps Coffee Sessions podcast. He is a machine learning engineer at Tesseract Health, a 4Catalyzer company focused on retinal imaging. In this role, he builds machine learning models for clinical workflow augmentation and diagnostics in on-device and cloud use cases. Since studying bioengineering at Penn, Vishnu has been actively working in the fields of computational biomedicine and MLOps. In his spare time, Vishnu enjoys suspending all logic to watch Indian action movies, playing chess, and writing.