Author: Segun Adelowo
Based on my experience here is a summary for individuals interested in getting started in Machine Learning Engineering and Machine Learning Operations and who want to improve their skills.
- Model production challenges (O Model, where is thy value?)
- What is Machine Learning Engineering?
- Machine Learning Production Steps
- Career Growth for a Machine Learning Engineer
- What is Machine Learning Operations?
- Machine Learning Operations Steps
- Career Growth for a Machine Learning Operations Engineer
- Roles and Skills for MLE and MLOPs
- Additional Resources
O Model, where is thy value?
ML and MLOPs are still in their early years, there is no universal standard for doing ML and MLOPS yet when compared to software engineering. A few of the most common reasons why ML models don’t get to production or thrive in production include:
- One major issue is that POCs are typically built with a limited scope and a specific set of data, which may not be representative of the real-world conditions in which the model will be deployed.
- POCs may not have been built with production considerations in mind, such as scalability, reliability, and security. Failure to do so can lead to costly downtime, data breaches, reputational damage, lawsuits etc.
- Models require ongoing maintenance and monitoring to ensure good performance. Organizations may not have the resources or expertise to manage these models effectively in production.
- Lack of understanding and buy-in from key stakeholders within the organization. Without support from decision-makers, a POC may not be fully realized in the production environment.
Within all this Chaos is the opportunity for ML & MLOPs Engineering! These skills are needed to increase the percentage of model POCs deployed and keep it growing.
What is Machine Learning Engineering?
Focuses on the development and implementation of ML models and systems, the stages include the following:
- Researching the selected ML algorithms and techniques entails understanding the internals of a model, training duration with respect to data size, parallelization etc.
- Preprocessing and cleaning data to make it suitable for training, validation, and testing models.
- Implementing ML models and systems in a production environment, also maintaining the ML systems over time.
Machine Learning Production steps…
- Business context: A detailed documentation of the problem to be solved or opportunity to innovate.
- Customer journey: How will the customers interact with the model predictions?
- Data Scientist Analysis Documentation: A good understanding of the hypothesis and experiments of the Data Scientist.
- System architecture design: Use software system design approach to come up with possible architectures to take care of the data collection, model training, serving, number of users, data size, security etc.
- Compliance: Ensure General Data Protection Regulation (GDPR) concerns are factored in and other regulatory requirements of the industry the business is in.
- Orchestration: A system to continuously run at intervals, that needs to be triggered based on when data arrives, when to fetch new training data to update the model, when to train the model, when to deploy (auto/manual) and other workflow steps needed by the business.
- Other things to consider and think about are Cloud service cost, Development, Quality Assurance, Deployment, Stakeholder Communication etc.
Career Growth for a Machine Learning Engineer
A good way to grow is to map your development to what existing companies that make use of ML as part of their core business do today, that is in addition to what your organization is using.
The below examples are from DropBox
Here are books that I use and read from time to time, I hope you find them helpful. The books are listed in order of relevance to start with in my opinion. You can jump on what you are interested in the most first, but I recommend you make use of all of them over time to fill in the gaps in other resources.
Feature Engineering for Machine Learning
Learn imputation, variable encoding, discretization, feature extraction, how to work with datetime, outliers, and more.
Deployment of machine learning models
Learn how to integrate robust and reliable Machine Learning Pipelines in Production.
Machine Learning Engineering with Python
You will learn the basics of:
- ML Engineering
- Machine Learning Development Process
- Model Factory
- Packaging Up Deployment Patterns
- Building an Example ML Microservice
- Building an Extract Transform Machine Learning Use Case.
- Designing Machine Learning Systems
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing a monitoring system to quickly detect and address issues your models might encounter in production
- Architecting an ML platform that serves across use cases
- Developing responsible ML systems
- Data Engineering Zoomcamp
- Learn Introduction to data engineering, Workflow Orchestration, Data Warehouse, Analytics Engineering, Batch processing and Streaming.
Machine Learning Operations
Focuses on the operational aspects of running ML systems in production.
MLOps is based on DevOps principles and practices that increase overall workflow efficiencies in the machine learning project lifecycle the processes involved are:
- Setting up and maintaining ML infrastructure and platforms
- Automating the build, test, and deployment process for ML models (Continuous Integration, Continuous Deployment and Continuous Training)
- Managing and monitoring ML models in production
- Ensuring the reliability, scalability, and security of ML systems
Machine Learning Monitoring steps…
In addition to the ML production steps above there are more specific considerations which are specific to MLOPs:
- ML System Architecture Design: A good understanding of the system designs created by the Machine Learning Engineer aimed at integrating the ML system with business operations.
- Monitoring Infrastructure: Evaluations of tools like MLflow, Airflow, Prometheus, Grafana etc., to best serve system production requirements.
- Establish Key Performance Indicator values: These are essential to track the health of the ML system and models. Offline metric examples are model F1-Score, Recall, AUC etc. Online metric examples are Loan default rate, Customer engagement rate etc.
- Data Engineering: This involves understanding the Extract, Transform and Load(ETL) systems built for the collection and usage of data from various sources and formats, also tracking the entire process from source to destination.
- Monitor Data Quality: Essential things to track here are data accuracy, completeness, reliability, relevance and timeliness.
Monitor Model Concept Drift: An example of this is when a model is trained on a dataset of house prices from the past one to two years, when the housing market changes months after training, the relationship between the features and the price of a house might also change. In this case, the model will not work well when predicting the price of a new house, because the relationship between the features and the output has changed, and the model is not aware of these changes.
Career Growth for a Machine Learning Operations Engineer
As I mentioned above under the “career growth for a machine learning engineer”, mapping your development to what existing companies make use of today in addition to what your organization is using will go a long way, there is no standard approach right now. At the moment I have not seen a career path detail enough to share, I will update this post immediately after I discover good information to share.
To be continued… 🙂
Both MLE and MLOPs require a good experience with software engineering, data management and machine learning to be successful.
Roles and Skills for MLE and MLOPs
Below is a sample cross-section of roles and skills required to fit into the MLE and MLOPs roles.
You will learn practical aspects of productionizing ML services from collecting requirements to model deployment and monitoring.
- Formulate data governance strategies and pipelines for ML training and deployment
- Get to grips with implementing ML pipelines, CI/CD pipelines, and ML monitoring pipelines
- Design a robust and scalable microservice and API for test and production environments
- Curate your custom CD processes for related use cases and organizations
- Monitor ML models, including monitoring data drift, model drift, and application performance
- Build and maintain automated ML systems
Practitioners Guide to Machine Learning Operations (MLOps)
Gain an overview of the machine learning operations (MLOps) life cycle, processes, and capabilities.