The year 2020 was… eh… you know it wasn’t good. However, there were some silver linings. For those of us practicing machine learning (ML), the MLOps Community was a much needed forum. Even before the pandemic, this community would have been a very welcomed development but arguably would not be the same without the reality that COVID thrust upon everyone. The MLOps community was a child of 2020. In case you missed it, or would like a recap, this article aims to do just that.
While this article is a TL;DR kinda post, I really want to point out that I’m missing a lot. So much happened in this community, I can’t cover everything. With that in mind, if you like what you read and want to know more check out the community on Slack and YouTube.
I first learned about the community when Demetrios Brinkmann, its very talented organizer, asked me to participate in a weekly podcast video he runs, organizes and hosts, which has been a regular event of the community. This isn’t a digest of every single podcast but I’ll try to capture the overall spirit of what happened this year while pointing to what stood out to me.
MLOps is a Culture and Practice
One facet of the discussions within the community that kept coming up again and again was the approach to, or perhaps I should say definition of, MLOps. It is best described as an ‘engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops)’. That phrasing comes from MLOps: Continuous delivery and automation pipelines in machine learning (see the review by David Aponte here), which is a foundational blog.
The idea of MLOps being a culture and practice was echoed throughout the year. MLOps is not just about a tool choice — despite our intense interest, value of, and requirements for tools. Consideration must be given to people and process as well as technology stack. The forming and storming of ML teams is related and effected by tooling choices (e.g. MLFlow vs Kubeflow) along with the discussion of build vs buy.
This came across very well in the podcast with Shubhi Jain from SurveyMonkey where he discussed the forming and storming of an ML team. The consensus-driven approach they took early on, sounded like classic Scrum to me and is an approach I believe to be invaluable when starting a new team with a new mission. I wasn’t surprised to learn the team took a build-over-buy approach given their tech stack constraints and start date (seemed like a couple of years ago) in addition to the relative immaturity of MLOps tools. Nor was it a surprise to learn the team make up was slightly engineering-heavy and the transfer of knowledge across data science and engineering roles was fluid. In many ways, this provides a nice skeleton upon which to build our thinking of forming and storming for MLOps.
How you choose to form and storm a team has a big impact on the decisions to build or buy. I like the way Neal Lathia put it — most will choose to build rather than buy as the investment is in the team i.e. you need skilled and talented data scientists and engineers (I’m paraphrasing here and will continue to do so throughout the post). You can’t ‘buy’ a team but you can build it. Regardless of the tools, their learned skills, and dare I say emotional investment, is imperative to long term success.
Shubhi’s perspective also touched on the challenges involved in building from the ground up —focusing on build over buy throughout the ML lifecycle — and gave listeners a sense of scale — their team consists of around 15 team members to handle 30 to 40. Given the passage of time and maturation of the MLOps tech stack since the inception of that team, I wonder if they would take a different approach if forming their team today. Would there be more room for buying, or at least less building?
The Simplicity and Complexity of our Identity
It’s well known that there is a gap between data ‘science’ and engineering. For example, Shubhi’s team learned through experience that a simple hand-off between ML development and deployment was problematic, so they drifted towards a more collaborative working model. I talk about this in another article here and I reckon my perspective would sound familiar to that team. Inherently this brings roles closer together.
It seems like another response to bridge the gap has been to create more roles and ‘daisy chain’. I see this as potentially problematic as our industry struggles with the identity of those roles. For example, what is the difference between a data analyst and a data scientist? A full stack data scientist? An ML engineer? A data engineer? A cloud engineer? And perhaps most important, when, why and how is the distinction important?
Alexey Grigorev touched on this in his talk where he addressed the question ‘what is a full stack data scientist?’ He advocated for an interpretation that says ‘full stack’ is not about depth of knowledge in every single step of the ML lifecycle but rather implies the ability to effectively navigate the full cycle. This interpretation would suggest that generalist capabilities like learning quickly and being self-sufficient along with knowing when to stop being a lone wolf are essential. Moreover, creating trust with production-level stakeholders also becomes a key factor because generally we come from the outside of that circle with our eager desires to run ML in production. Does any role I listed above fully fit with that? I argue no, but in practice Alexey is spot on and this appreciation for a generalized interpretation of our domain may help encourage knowledge sharing across individuals with different role titles that may feel their identities are set but in reality are not.
Experience and grit matter above all else. It feels like these roles, these titles really, better serve others. Titles carry assumptions that might help facilitate conversations for business and recruiting stakeholders, but can get in the way if not checked by the reality of the ML lifecycle, which a strong MLOps culture and practice should help navigate.
Two Tools that may Lead to 90% Done
The previous two sections primarily focused on people and to some extent process, so what about tools? Well there is almost too much that can be said. The number of ML tools has grown very quickly and their evolution has been just as quick. With that in mind, I’ll focus on two things that stand out to me. The conversation around Kubeflow, and, separately, feature stores.
There were many talks on YouTube and Slack about Kubeflow. The one that really caught my attention involved David Aronchick — who was the co-founder of Kubeflow and lead product manager for Google Kubernetes Engine. The discussion revolved around how Kubeflow was conceived and the focus on figuring out how to reduce the barrier of entry for large model development. I found this talk insightful. It gave me a better appreciation for design choices in Kubeflow and reminded me of Machine Learning Design Patterns, which might be a useful companion book to help contextualize those design choices.
The maturity of Kubeflow is at a stage that I would recommend it to clients as long as they understand what they are adopting, which is a bit more challenging to comprehend for newbies because we are talking about something that offers a grand vision, fulfilling end-to-end ML pipeline deployments. Moreover, it does this in an opinionated, open source manner, which might not work for every team. Going back to Shubi’s example, his team was not allowed to use Kubernetes, meaning Kubeflow was definitely not an option. Some teams will struggle with certain design choices and others may find their own technical maturity to be the barrier to entry. Regardless of your tool choice, the culture and practice of your ML team will need to mature technically as you become more reliant on what you build or maintain.
Feature stores were not talked about nearly as much as Kubeflow. However, they offer what something like Kubeflow cannot. In other words, if you can provide a feature store capability plus what Kubeflow offers, that’s 90% of the ideal MLOps platform. Period. Ok, fine that makes it sound too simple, but architecturally, I believe this to be accurate.
Feature stores are not always well understood, but Kevin Stumpf, CTO at Tecton, breaks it down well. They are built to serve features for ML exploration, training and prediction, and to catalog data definitions, constraints, and permissions. Additionally, they should offer the ability to time travel. The single biggest challenge that robust feature stores help address are strict freshness requirements in many use cases.
Why would freshness be a requirement? In short, there needs to be consistency between data trained upon and data used for inference. If the right features are not available or calculated correctly, online prediction use cases are not possible.
Tecton and others offer buy instead of build solutions. However, as many others have mentioned, you can build it on your own. For example, Neal wrote about how this was done at Monzo and Tomás Moreyra discusses Mercado Libre’s approach. I too have implemented my own feature store utilizing an OLAP data warehouse (e.g. BigQuery) and an orchestration tool (e.g. dbt). This met the requirements I had but frankly would not address the online freshness requirement I mentioned above — an unbound data processing tool (e.g. Apache Beam) would be needed.
In the intro I wrote, “[c]onsideration must be given to people and process…” That is not just in reference to the forming and storming of ML teams. It can also refer to the design process of choosing ML use cases along with a better understanding of the people that will be positively, and negatively, impacted by those use cases.
Charles Radclyffe’s cautionary words highlight this well. He gives a rather well-known example, Cambridge Analytica, which did not lack technical talent but rather a moral backbone. While their data scientists and engineers were technically proficient, they failed to challenge the most important question — should we be doing this? Charles speaks from experience with a background in finance during the global financial crisis of 2007–2008 (GFC). After this event, he saw the reputation of bankers, as well as the financial markets they operate in, implode due to a lack of ethical consciousness.
Some will say oversight and regulation is the answer to both the underlying issues of the GFC and the Cambridge Analytica scandal. There is some truth in that perspective. Charles himself, speaks of a ‘regulatory gift’ that greased the wheels in a previous role, suddenly removing barriers that would have taken much longer to surpass. It is useful to wave around a piece of paper that gave him a mandate.
Policy can be empowering. However, if we focus on that alone, are we really putting our best foot forward? And, does that mean we should focus more on the technical solutions?
Charles writes about how companies “conflate two connected but separate aspects of technology governance while ignoring genuine ethical thinking. Technologists feel most comfortable with standards and process. They tend to seek engineering solutions to problems. This is why most of the ethics conversation is confined to a discussion on “transparency,” “explainability,” and “bias” mitigation. These are critical concerns, but conveniently are also those with technological solutions. The second confusion is that between ethics and regulatory and legal issues. The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have done a lot to raise awareness that perhaps data analytics needs some reigning in — but have also lulled many commentators on the subject into feeling that the extent of the ethics debate and responsibility for its management lies with regulators and not a wider community.”
So yet again, we arrive at the single most important thing you can do as an MLOps team member — help build an MLOps culture and practice. In my humble opinion, culture and practice should embrace the complexity of our roles without siloing individuals, effectively navigate the investment not only in ourselves but in our tech stack — leveraging advancements to gain an edge when appropriate— and foster an ethical mindset lest we fall victim to the sin of thinking someone else will do it.