June 20, 2021

Start Manually, Then Automate

We’ve all seen Google’s now-infamous paper (here, ICYMI) where they make a sweeping case for automating ML pipelines — pretty much from end to end.

We’ve had a fair amount of pushback from the community on that. There’s lots in the DevOps domain that MLOps teams can learn from. But does it always make sense to automate everything in the machine learning lifecycle?

A lot of people think otherwise.

At first glance, you could assume that making ML deployments automatic would simplify processes, reduce errors and make deployments happen faster. Like anything that reaches the peak of the tech hype cycle, however, it’s easy to mistake automation for a panacea.

Of course, it has logical applications and benefits. But let’s say you’re doing something highly nuanced and specialized like drug discovery. In those cases, applying automation gets a bit tricky.

Many people have come back to us and said that because of the complexities and unknowns that are part & parcel of some industries, undertaking a level of manual work is actually really important.

To get more perspective on this, we’ve pulled some input from our podcasts with Luigi Patruno and Neil Lathia. They had insightful and practical things to say about ML automation, when it delivers the greatest benefits, and when it should be set aside.

Spoiler alert: Despite what Google seems to suggest, it isn’t a binary choice between manual and automated ML. Handling machine learning deployments manually makes sense in the early stages, but in the end, you will likely want to automate as many processes as you can.

Luigi Patruno of ML in Production

‘You don’t want to jump directly to automated processes at the start. With anything in life, I think you want to do a thing yourself first, so you discover any issues or pitfalls. Do that, and it’s easier to feel confident enough to automate.

‘You can’t be aware of all of the issues that might occur until you’ve gone through it in a hands-on way for a period of time.

In the beginning, it’s better to run a deployment manually, Patruno says. Once you’ve achieved a level of confidence about what you’re doing, automation probably makes sense. But first, you need to walk through all the steps manually so you’ll know the process better.

‘When you’re building a model, the first thing you should try to do is look at the samples and try to generate the prediction yourself.

‘You’ll want to know if, for example, you’re looking at some difficult-to-parse feature vector like an image or something that you can detect personally — whatever it is you’re trying to teach the model to see.

‘Or when you’re doing error analysis, can you look at the instances that your model misclassified and figure out the patterns manually before you do some more automated things like run PCA on a bunch of the errors and attempt to categorize them.

Doing that, he says, will give you enough assurance that it is likely to work on its own.

‘Ultimately, do automate as much as possible. The more you leave to manual processes, the more risk there is of missing things. As long as you’ve worked out the bugs first, automation is just a way to take out the potential for human error.’

For more of Luigi’s insights check out his website here or the full conversation of the coffee session here.

Neal Lathia of Monzo Bank

Another way to think about automation is selectively. Monzo Bank’s Neal Lathia says that, in some cases, you can use automation from the outset by focusing on ‘the boring bits.’

When you’re running microservices, the use of boilerplate code is one example he cites, saying automated generation of repetitive code can save time and get you to market faster.

‘At Monzo, we have over 1500 microservices, and one of the things I’ve learned is that microservices are mostly made up of boilerplate code — e.g. sections of code that have to be used repeatedly with little or no alteration.

‘That includes things like Makefiles, Dockerfiles, and readiness and liveness probes. When you’re creating a new service, the actual business logic uses far fewer lines.

‘So we’ve built tools that can generate boilerplate code for specific services. In Python, we’ve got classic cookie-cutter templates that make creating a new service more or less equivalent to cloning a template. That speeds up our development process, and it also makes all of our code more uniform.

It’s hard to overstate the impact that has on productivity, he adds.

‘You can jump into someone else’s code and, without having to absorb any documentation, start with a high-level idea about what goes where.’

Full conversation here.