June 25, 2021

When PyTorch meets MLFlow

Written by Artem, Dimi, Laszlo and Paulo.


This article is part of Engineering Labs series which is a collection of reviews about the corresponding initiative provided by each team. In this case, it is about the YELP Review Classification solution of the TEAM 3.
If you are interested to know more about the initiative and how to join, here you can find all information you need.



MLOps Community 🎉 is an open, free and transparent place for MLOps practitioners to collaborate on experiences and best practices around MLOps (DevOps for ML). Engineering Labs Initiative is an educational project, whose first lab had the goal to create an MLOps example including PyTorch and MLflow. We gave it a shot and were one of the two teams (out of four) to finish the project. Now we want to share our experience with you!


Project: Summarising What We Did

Figure 2. Metrics of Our Model in the MLflow Experiment Tracking UI

Initial task definition was quite open: all teams needed to develop an ML solution using PyTorch for model training and MLflow for model tracking. We all had some more or less deep knowledge in different areas of Machine Learning, from Data Science and underlying math to infrastructure and ML tooling, from DS project management to enterprise system architecture. So the most difficult problem for us was to choose a dataset 😜. At the end, we chose to use Yelp Review dataset for training an NLP model for classifying provided texts as positive or negative reviews. The data includes reviews on restaurants, museums, hospitals, etc., and the number of stars associated with this review (0–5). We modelled this task as a binary classification problem: determining whether the review is positive (has >=3 stars) or negative (otherwise).

😎 From the MLOps perspective, there were several “stages” of the project evolution. First, we came up with a way of deploying the MLflow server on GCP and exposing it publicly. Also, we developed a nice Web UI where the user can write a review text and specify whether he or she considers this review to be positive or not, and then get the model’s response along with the statistics over all past requests. Having a Web UI talking to the model via REST API allowed us to decouple front-end and back-end and parallelise the development. Also, in order to decouple the logic of collecting model inference statistics to a database from the inference itself, we decided to implement a Model Proxy service with database access, and a Model Server exposing the model via a REST API. Thus, the Model Server could be seamlessly upgraded and replicated, if necessary. For the automatic model upgrade, we implemented another service called Model Operator, which constantly polls the state of model registry in MLflow and, if the release model version has changed, it automatically re-deploys the Model Server.


😊 So at the end, we managed to build a pipeline with the following properties:

  • partial reproducibility: manually triggered model training pipeline running in a remote environment,
  • model tracking: all model training metadata and artifacts are stored in MLflow model registry deployed in GCP and exposed to the outside world,
  • model serving: horizontally scalable REST API microservice for model inference balanced by a REST API proxy microservice that stores and serves some inference metadata,
  • automatic model deployment: model server gets automatically re-deployed once the user changes model’s tag in MLflow model registry.

😢 Unfortunately, we didn’t have time to close the model development cycle. Namely, we didn’t implement:

  • immutable training environment: training docker image is built once and used always,
  • no code versioning: we use code as a snapshot, without involving a SVC,
  • no data versioning: we use dataset snapshot,
  • no model lineage: can be done only if with code and data versioning,
  • no GitOps: automatically re-training the model once input has changed (code, data or parameters),
  • no model testing before deployment,
  • no model monitoring and alerts (no hardware characteristics, health checks, data drift detection),
  • no fancy ML tools (hyperparameter tuning, model explainability tools, etc.),
  • no business logic features required for production (HTTPS, authentication & authorization, etc)

Our Engineering Lab’s Solution

MLOps Architecture

Here’s the bird-eye view on the architecture we came up with:

Figure 3. Architecture

Note: For a complete walkthrough, check out our 13-minute presentation at Pie & AI Meetup.

As you can see, in our MLOps system, MLflow plays the central role linking all other components (training, tracking, deploying) together. On the scheme above, the green rectangles represent services implemented by our team, and orange rectangles represent third-party services. Rectangles with yellow borders depict services that are exposed publicly. Blue area indicates Google Cloud and the Kubernetes cluster deployed in it. The light grey area indicates the outside world, and the dark grey areas indicate the Streamlit Sharing hosting environment (left) and model training environment (right). Below we will briefly discuss each component of this system.



Initially, we created a Kubernetes cluster on GCP. There, we deployed the MLflow server via publicly available helm charts backed by a managed PostgreSQL database as a backend store and GCS bucket as an artifact store. The MLflow service was exposed via a LoadBalancer service that provided a public IP.

In this part, we spent some time trying to pass bucket credentials to the MLflow server, and when we finally succeeded, it appeared that it doesn’t need it: when you train a model and your Python code runs mlflow.log_model(…) to upload the model binary and its metadata, your code accesses the artifact store (GCS bucket) directly, so it’s your code that must have credentials to it, not the MLflow server.

Developing the model


Following the Torch text tutorial, we implemented a model consisting of 2 layers: EmbeddingBag and a linear layer, followed by a sigmoid activation function (using PyTorch Lightning). Here, as it usually happens in Data Science, we faced some problems with PyTorch and pickle in making the model saving process smoother

Model Training and Experiment Tracking

[Github: jupytertraining scripts]

We used MLflow for model and experiment tracking. The model artifacts are stored in the GCS bucket, while the experiment metadata (parameters, metrics, etc.) are stored in PostgreSQL. With MLflow, you can save your training artifacts and access experiment parameters via its web interface:

Figure 4. MLflow UI for Experiment Tracking

Model Serving

[Github: serverproxy]

Following the common deployment pattern, we decided to deploy the model as a REST API endpoint. Indeed, a standalone deployment can be easily scaled up horizontally in high demand, and even scaled down to zero for sake of saving GPU resources.

So the first thing we tried out was MLflow Model Serving, which, unfortunately, we failed to implement (we found the documentation rather vague and difficult to understand, and also we discovered only one example on the Internet). We also were considering Seldon to solve this task for us, but we found that the initial setup, which involves configuring service mesh, is too complicated for our POC, so we decided to implement our own REST service — at the end of the day, it’s not that difficult.

This service is based on FastAPI (basically, it is a Flask or Django alternative for REST with many cool perks like enforced REST principles or Swagger UI going out of the box). The service loads the PyTorch pickled model from the GCS bucket and serves it via a simple REST API. This service runs in Kubernetes as a single-replica deployment with a service providing load balancing with a static internal IP. The deployment has init containers to load the code from Git and model artifacts from MLflow via mlflow CLI.

This model server is backed by a model proxy, which implements some business-logic such as storing predictions results to a PostgreSQL database and calculating statistics of the model correctness rate:

$ curl -s -X POST -H "Content-Type: application/json" -d '{"text": "very good cafe", "is_positive_user_answered": true}' http://model-proxy.lab1-team3.neu.ro/predictions | jq 
  "id": 40,
  "text": "very good cafe",
  "is_positive": {
    "user_answered": true,
    "model_answered": true
  "details": {
    "mlflow_run_id": "3acade02674549b19044a59186d97db4",
    "inference_elapsed": 0.0009300708770751953,
    "timestamp": "2021-02-04T20:46:49.484379"

$ curl -s http://model-proxy.lab1-team3.neu.ro/predictions | jq
    "text": "very good cafe",
    "is_positive_user_answered": true,
    "is_positive_model_answered": true,
    "mlflow_run_id": "3acade02674549b19044a59186d97db4",
    "inference_elapsed": 0.0009300708770751953,
    "timestamp": "2021-02-04T20:46:49.484379",
    "id": 40

$ curl -s http://model-proxy.lab1-team3.neu.ro/statistics | jq    
  "statistics": {
    "correctness_rate": 0.85

Though this service does not implement any kind of authentication, and though its statistics calculation is rather straightforward (also, we would consider a distributed logging system based on ELK stack a better solution for adding business-level metadata to the model server), it serves the demo purposes well.

Model Operator


The idea was simple: we wanted to synchronise the state of the Model Server deployed in Kubernetes with the state of MLflow Model Registry. In other words, we want a new model being rolled out once the user changes the tag “Production” for a model. Meet the Model Operator service! It follows the Kubernetes Operator pattern, and constantly polls the MLflow server to see which model has a Production tag. Once this tag has changed, it updates the deployment in Kubernetes.

For example, we want to deploy the recently trained model of version 10. In MLflow UI, we change its Stageto Production:

Figure 5. Changing Model’s Stage Tag in MLflow UI for Model Registry

In a few seconds, the Model Operator notices the changes and modifies the deployment Kubernetes resource for Model Server and in a few minutes it gets re-deployed.

The current solution has a big drawback: it disables the served model for a few minutes during the redeployment process. It can be solved by rolling out the deployment instead of recreating it, but we faced the problem that sometimes the deployment wasn’t actually recreated, so for this POC project we decided to delete and create it explicitly, however, Kubernetes Rolling Update would have worked better.

Web UI


The main user-facing point is the small Web UI talking to the model-proxy only. The user writes a review text and specifies if he or she thinks this was a positive or a negative review. The Web UI then makes an inference request to the Model Proxy, and also it asks it the overall statistics of the model correctness.

Figure 6. Our Web UI

This Web UI is written with the cool tool Streamlit in somewhat 50 lines of Python code! It is deployed on the Streamlit Sharing hosting and works pretty well.

Final Considerations

🔥 Participating in MLOps Engineering Lab 1 gave us an awesome opportunity to form a team with guys from all over the world, all with different backgrounds and experience, and work together on a new project! Below we list things we learned during the work on the project.

Technical takeaways

Though PyTorch (and Pytorch Lightning) is great and has tons of tutorials and examples, pickle for Deep Learning is still a pain. You need to dance around it for a while to save and load the model. We hope that the world will eventually come to a standardised solution with an easy UX for this process.

MLflow is an awesome tool for tracking your model development progress and storing model artifacts.

  • It can be easily deployed in Kubernetes and has a nice minimalistic and intuitive interface.
  • Though we couldn’t find any good solution for authentication and role-based access control, this went out of the project scope.
  • We also found MLflow Model Serving too difficult to run in a few hours, mostly because of the lack of clear documentation.
  • In addition, we were surprised that we couldn’t find a solution for automatically deploying the model that gets the “Production” tag in MLflow UI. Is this a viable pattern, to deploy models directly from the MLflow Server dashboard? Could this microservice be a good addition to “MLflow core functionality”?

Kubernetes is amazing! It’s terrifying at first, but terrific after a while. It enables you to deploy, scale, connect, persist your apps easily in a very clear and transparent way. However, we found it difficult to parametrize bare Kubernetes resource definitions (without using helm charts). We needed to pass a single or a few parameters to the yaml definition before applying it, and here are the ways we know how to tackle this problem:

  • Pack the set of k8s configuration files into a Helm chart (or use alternatives of Helm like kubegen). This is a jedi way to manage complex deployments as it gives you full flexibility, but it takes time to implement.
  • Use k8s resource ConfigMap to configure other resources. This approach is very easy to implement (just add a resource configuration), but is not flexible enough (for example, you can’t parametrize container’s image). However, we used it for parametrizing the Model Server configuration.
  • Another, the most “dirty” way to solve this problem, is by using envsubst utility. Briefly, you process your configuration yaml with a tool that syntactically replaces all entrances of specified environment variables with their actual values (see example for Model Operator). Any other sed-like tool would work here as well.

Self-management takeaways

Looking back, we can say that our team suffered from lack of communication: we started discussing system design not having a single call to meet each other and understand each other’s feedback and wishes; we didn’t define a clear MVP and didn’t have a common understanding of what’s the final goal. Nevertheless, we have learned many important truths in collaboration and project planning, namely:

  • Do not try to over-plan the project from the beginning (each step in the project plan at the beginning should cover a large piece of responsibility, rather than be specific),
  • Use an iterative approach (define a clear MVP and the steps to achieve it, and then distribute tasks among the team members),
  • Respect project timing (avoid the situations where you have to write code during the last night before the deadline). This is especially hard in teams working in their free time, post-work!


We would like to thank the MLOps community for the awesome atmosphere and cool insights every day! Specifically, we would like to thank the organisers (Ivan Nardini and Demetrios Brinkmann) of the MLOps Engineering Labs initiative for this cool opportunity to work together! 🎉

We’re looking forward to joining the second round of Labs and applying the knowledge we acquired during the first round. Thanks to all and see you again! 🙃