Coffee Sessions #85

Continuous Deployment of Critical ML Applications

Finding an ML model that solves a business problem can feel like winning the lottery, but it can also be a curse. Once a model is embedded at the core of an application and used by real users, the real work begins. That's when you need to make sure that it works for everyone, that it keeps working every day, and that it can improve as time goes on. Just like building a model is all about data work, keeping a model alive and healthy is all about developing operational excellence. First, you need to monitor your model and its predictions and detect when it is not performing as expected for some types of users. Then, you'll have to devise ways to detect drift, and how quickly your models get stale. Once you know how your model is doing and can detect when it isn't performing, you have to find ways to fix the specific issues you identify. Last but definitely not least, you will now be faced with the task of deploying a new model to replace the old one, without disrupting the day of all the users that depend on it. A lot of the topics covered are active areas of work around the industry and haven't been formalized yet, but they are crucial to making sure your ML work actually delivers value. While there aren't any textbook answers, there is no shortage of lessons to learn.

Take-aways

Performance metrics can't always be directly correlated to training metrics Models suffer from a "curse of success": the more successful a model is, the more likely it is to retrain it, and the harder it becomes to retrain it without breaking someone's workflow You develop operational excellence by exercising it

In this episode

Emmanuel Ameisen

Emmanuel Ameisen

Senior ML Engineer, Stripe

Emmanuel Ameisen has worked for years as a Data Scientist and ML Engineer. He is currently an ML Engineer at Stripe, where he worked on helping improve model iteration velocity. Previously, he led Insight Data Science's AI program where he oversaw more than a hundred machine learning projects. Before that, he implemented and deployed predictive analytics and machine learning solutions for Local Motion and Zipcar. Emmanuel holds graduate degrees in artificial intelligence, computer engineering, and management from three of France’s top schools.

Twitter

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Adam Sroka

Adam Sroka

Host

Dr. Adam Sroka, Head of Machine Learning Engineering at Origami Energy, is an experienced data and AI leader helping organizations unlock value from data by delivering enterprise-scale solutions and building high-performing data and analytics teams from the ground up. Adam shares his thoughts and ideas through public speaking, tech community events, on his blog, and in his podcast.