April 19, 2022

Coffee Sessions Takeaways: Continuous Deployment of Critical ML Applications

All that glitters is not gold and from our last MLOps coffee session it turns out engineers are as susceptible to the Shiny Object Syndrome as the rest of the population. Over-engineering, over-complicating, and constant urge to move to the next shiny tool when something gets standardized – all traits our recent guest Emmanuel Ameisen from Stripe shares as serious challenges when it comes to streamlining the continuous deployment of mission critical ML applications. With hundreds of ML projects behind him, he’s pretty much seen it all – including the human barriers that get in the way of a successful project.

Grab a cup of coffee, tea, or hot cocoa (there’s no hot drink discrimination here) and listen in because the whole session is riddled with truly helpful gems. But if you’ve only got a moment, here are our top three takeaways.

ML Engineers Like It…Complicated?

You know you’re an engineer at heart if you want to over-complicate something that could be done in a much simpler way.

Machine learning is a fast-evolving field and we’re progressively tending towards simpler solutions. Ten years ago, you might have had to train your own model and maybe five years ago pre-trained embeddings that worked well became available for download. And we’re still improving on that process to make things easier for everyone.

But while systems have improved, ML engineers as a group haven’t changed that much. We still like taking the complicated, build-it-from-the-ground-up route. Or as Emmanuel puts it, ‘As the fancy stuff becomes normal, it becomes boring. And when it becomes boring, we want the new fancy stuff.’

That personal quality might be one of the pivotal factors driving the rapid evolution of ML. That doesn’t mean it isn’t better practice to focus on your use case, KPIs, and stakeholder to find tools and systems that make the most sense for what you’re doing.

Operational Excellence & Regular Maintenance of Models

We could just as easily call this section ‘Where the Real Work Begins’. As Emmanuel says, “You develop operational excellence by exercising it.” He walked us through how most teams release models. They’ll have a new use case, they’ll think of the model, make the model, be happy with it, and release it. Sometime later, they’ll decide to release an updated version and run into a series of problems; code used to train the model is way out of date, release criteria might as well be non-existent, and no one seems to know where the data is. It’s almost like performing ML archaeology to figure it out.

Developing a system of regular engagement with the model prevents you from having to clean up big problems later as the data rots or the assumptions you built the pipeline or stop being relevant. The cracks that can occur in the system are much smaller and easier to manage when maintained on a two-week rotation rather than letting it sit and rot for a year or more before thinking about it again.

It seems as though there’s an assumption that models are deployed and then there’s no need for maintenance. If anything ML is the opposite of static and must be updated more than traditional software because it depends on both code and data – and data changes.

Advice for Iteration

It’s probably safer too as it prevents oddly specific tribal knowledge from disappearing forever.

The main model Emmanuel works on with Stripe decides whether or not any given transaction is allowed or blocked. For something like this, it’s not enough to be good on average. It has to be good across the board. You can imagine there’s a multitude of high-consequence potential failures here. He argues that automation makes it safer because when you have so many things to think about, specific pieces of information get distilled into the team’s knowledge. Automation helps safeguard the consequences of someone leaving the team or needing some of that niche knowledge later and the person who had it is no longer available.

Automation might sound a bit dark and scary in the first iteration as you’re still feeling the ropes of how things are done, but in the long run when you did a few circles around the block automation might be your best friend.

To ease yourself and your team into it, Emmanuel says ‘Suggest before you automate.’ Loop a human into the process. Write your automation, suggest the value you want to automate for, and do that for a few cycles. Afterwards, if the person in the loop consistently says they’re not changing anything, then you’ll feel more comfortable automating.

Bonus: Breaking into the Field

Likely related to an engineer’s propensity to love all that is complicated, people tend to think, “I must do the most complicated thing I can do to get hired in ML.”

But the truth flows in the other direction. Being able to show actual progress and a completed project, even if it’s something more simple, may be the game-changer. It shows you’re not only able to learn but to learn progressively. And sometimes, the most important feature of any project is simply that’s it done.

Building Machine Learning Powered Applications: Going from Idea to Product Book

Looking for that simple project to take on? Emmanuel’s book, Building Machine Learning Powered Applications: Going from Idea to Product book, might be the thing you’re missing. You’ll get the most out of it with a hands-on approach as it takes you through building an example ML-driven application. It covers the tools, best practices, challenges, and solutions for each step of the way and translates to building real-world ML projects.