When I first got involved in MLOps, there was a steep learning curve. One of the things that helped me climb it was to look at the parallels with DevOps. If you’re coming from a software development background, it’s one of the easiest ways to get your head around why MLOps exists.
In DevOps, you’re bringing together the programming, testing, and operational elements of software development. The goal is to take what are often siloed processes and re-work them into a logical and coherent process that can go forward with minimal hiccups.
MLOps shares similar objectives. You’re trying to build a streamlined process that links together the machine learning lifecycle’s disparate parts. There are often gaps between design, model development, and operations, so you want to unify that by stitching together data ingestion, evaluation, deployment, plus model training and retraining. Just like in DevOps, the aim is to create a single coherent process everyone works to maintain.
Without MLOps, getting machine learning models into production usually means following one of two approaches:
- The data scientist has to wear many hats and do everything from cleaning data and selecting the right model to setting up a Kubernetes cluster and running infrastructure.
- The alternative is for the data scientist to conduct a manual handover of their model to a machine learning engineer.
So far, so similar. But there are some key differences:
One of the biggest differences between MLOps and DevOps is the amount of freedom you have to experiment and test to see which approach delivers the best result. In machine learning projects it’s common for data scientists to try one approach to solving a problem, and then test another one days later. You might tinker with several over the course of a few weeks or even months.
Traditional software engineering involves some experimentation too, but the experiments are typically brief and often done in isolation from the core project. You won’t often see a developer put a week’s worth of work into a particular direction, decide it’s not quite right, then abandon it to find another way.
02 Data versus code
Another key difference is that in MLOps data is involved, while in DevOps you’re principally concerned about code. A DevOps project may involve data, but the core work streams are all around getting the programming right.
The techniques and processes for managing code and managing data are very different. While there is code involved in MLOps, if you stacked the code needed for an ML project up against all the data required, it would probably amount to a fraction of one percent.
We’ll delve deeper into the differences below. To get some perspective for part one we spoke with Damian Brady, Cloud DevOps Advocate at Microsoft Australia to get his view on how the two sets of praxis compare. He says that from the high-level view they can look very similar, but once you drill down, there are vital distinctions.
[Watch the original video here]
How does MLOps differ from DevOps?
Damian Brady: ‘From a high-level perspective they are almost the same thing. DevOps is about getting some kind of solution from your head into the hands of people and make sure it’s valuable and do that really effectively.
‘In that sense, MLOps is the same. It doesn’t matter if it’s a bit of code or it’s a database update or if it’s a predictive model. It differs in some aspects of how you actually get there and the processes involved in producing that model and then deploying that. Still, conceptually I don’t think it’s that different.
‘One of the dangers with buzzwords is that people can oversimplify and say ‘oh, that’s DevOps, and therefore we can use the tooling and processes that we already know. Maybe you just apply gitflow to the machine learning project and CI, so every time anybody changes to the code, you just do a training run and then roll it out to production.
‘You need to have really deep pockets if you’re going to do that. You have to be aware of where the limitations are.’
Of course, when you dig deeper there are practical differences, he says, especially between machine learning and traditional software development.
‘The cadence is definitely different between traditional software and machine learning projects; there’s a lot more experimentation — at least initially.
‘Having to test what you produce against the real world before you really know whether it’s going to work as expected is more critical in machine learning and predictive modeling than it is in traditional software.
‘If you think about creating a traditional piece of software, you can test that pretty well — even give it to a QA Department and do all that kind of stuff before pushing it out to production. If it doesn’t quite work as expected, in many organizations, they’ll just say ‘well we met all of these requirements. It does the thing that is supposed to do in production and people are using it, so we’ll iterate on it’.
I don’t think he can always say that with machine learning. The outcome you’re looking for is a better result from what you’ve generated, so you want something that’s actually legitimately valuable to the end-user, not something that checks a box’.
Ok so let’s dive a bit deeper into some of the differences between DevOps and MLOps. For another perspective, we spoke with Ryan Dawson, open-source software engineer at Seldon.
He told us why he thinks MLOps is so often misunderstood.
Ryan Dawson: ‘It can be tricky to explain if colleagues and managers are used to traditional software engineering and DevOps. Being able to explain it to a complete stranger would be ideal, but we need to at least make it clear to the other IT professionals and internal stakeholders we work with. We have to answer questions like ‘isn’t that just DevOps?’ clearly, otherwise the challenges of MLOps will continue to be underestimated.’
We wondered if most of the confusion about what MLOps means is on the business side. Do both technical and non-technical people struggle to see where the line is drawn between MLOps and DevOps.
‘I think there are different levels as to why its importance and complexity is underestimated.’
‘You still get siloed functions in organizations where people from a traditional software development background don’t get exposed to all aspects of the machine learning lifecycle. But then you also get other differences within organizations; it’s natural, for instance, for the business side to have a different focus than the software team. What we need to get across to everyone is that MLOps means doing whatever is necessary to make the whole ML build-deploy-monitor lifecycle as smooth and as safe as possible.’
People don’t realize just how varied ML is, he says. An application like a search engine is very different from sentiment analysis, which is very different from image classification, and so on. When he first got into ML, he was surprised how many people talked about building and running machine learning models as if the process and objectives were basically the same as doing mainstream software.
‘Something I started to encounter, and which actually surprised me more, was the belief among some data scientists that DevOps people already know how to run ML in production. I guess people know DevOps is quite sophisticated and assume it has a broader application than it does. ML also has a lot of complicated use cases, and while some of them are similar to DevOps, others are quite far away.’
Four critical distinctions between MLOps and DevOps
01 Training data and code together to drive fitting
In the training process, the fitting between data and code is what results in a successful model.
‘That’s a fundamental difference,’ he says. ‘It really stems from what machine learning is, versus what programming is.
‘In programming, you’re responding to inputs and giving output based on explicit rules. With machine learning, you’re capturing rules directly from data, and you take your patterns from the data. The random-ness that’s involved in that makes it tricky. It’s partly a stochastic rather than a fully deterministic process, and that’s a huge difference from traditional software development.
‘It’s not like dealing with code that you can compile and always get the same results every time. In ML, it may be slightly different each time. There may be different nuances that you capture on this data set that weren’t there in another.’
02 Lots of data, combined with long-running jobs
In machine learning, dealing with the data is a challenge in itself.
‘If it’s something like a neural network, it requires a huge data set, and that means long-running jobs.’
03 A model is different from an executable
‘A key part of being able to build a model is to reproduce outcomes. But large data sets create a reproducibility issue’, he notes. ‘If it’s more than a gigabyte, it’s too big to fit in GitHub. Also, training a model is somewhat different from an executable. Serializing a model is a different way of packaging it.’
04 Retraining may be necessary
‘You also get interesting use cases about the way that patterns get applied to new data. You may have done a great job training a model; it all fits really well against the data you’ve been supplied with. But when it goes live, you can encounter problems because the live data is different. Its distribution can change over time.
‘One example could be in fashion, where you have a model recommending clothes on an e-commerce site. If you trained it on fittings data from the Summer, the live data would likely change in the Winter.’
Watch the video with Ryan here