This is a collection of Newsletter blurbs about some of the new MLOps tools I see in the community.
Simplify the deployment of Fast.ai models
Korey MacDougall built out FastServe, a service to convert your pre-trained fast.ai models into API endpoints. You upload a model (e.g., export.pkl), and FastServe generates an API endpoint you can integrate into your applications.
Intrigued when I saw this come through, I wanted to get the skinny on the creation from Korey himself.
“I was inspired to start Launchable.AI when I worked through the fastai course, ‘Practical Deep Learning for Coders‘. I was impressed with how much they (Jeremy Howard and Rachel Thomas) had lowered the barrier to entry to applied machine learning, with both the fastai library and their courses (and especially the top-down pedagogical approach they have adopted from David Perkins). I loved their mission of democratizing access to AI. I wanted to contribute to that mission of making AI more accessible.
“With Launchable, we are particularly interested in the intersection of accessible AI and low-code entrepreneurship. I think that each of these 2 forces are going to empower folks without traditional tech backgrounds to build products, services, and companies with new perspectives and values, and will hopefully lead, in the next few years, to a richer technology landscape and entrepreneurial climate. Combining these two developments can provide enormous leverage for individuals and small teams to build products that a few years ago would have required mid-size teams with multiple skill sets (data science & engineering, cloud infrastructure, and UI/UX, at least). Now, with things like Bubble (on the front-end) and Peltarion (on the back-end), a single maker/hacker can build an AI-powered web-app in a day, and we think that is going to be game-changing.
“We’re trying to empower folks to build businesses and products that take advantage of these two sources of leverage. We do that partly through educational content (e.g., our YouTube channel), partly through consulting, and partly through product development.
“Our most recent product development efforts have been focused on FastServe, which is a service that simplifies the deployment of fastai models. It allows data scientists to upload a trained model and get back an API endpoint, which can then be plugged into any application.
“The idea came out of our work training and deploying models, both internally and for clients. We would train a model, spin up some infrastructure (a web server and a web application, typically FastAPI or Flask) to serve predictions from the models as an API, and then build a front-end to consume that API. As we repeated this process, again and again, we started automating some portions of it. We got to where we could deploy models very quickly, using templates and resources we’d built, but were finding that clients, especially small data science teams, didn’t have the time or expertise to maintain the infrastructure (things like SSH-ing into servers to apply updates, updating serverless functions, modifying web applications, and so on). So we built some custom dashboards and workflows for clients, to do things like update their model or spin up a new endpoint. That allowed the clients to iterate more easily and took away some of the ops headaches. We’ve had some positive feedback from this approach, and thought it worth exploring whether other folks would see value in a similar offering.
“So we built FastServe, and we’re hoping this will be of particular use to data scientists who want to leverage the development speed of fastai on the backend and of No-Code platforms like Bubble on the front-end. We think of it as the missing link for low-code AI.
“There are several other services that allow data scientists to deploy models as API endpoints, like HuggingFace’s Accelerated Inference API and the Peltarion platform (both amazing, BTW), but we’re focused specifically on simplifying the model -> API step for fastai developers. Making that as seamless as possible especially in the context of low-code application development. We are working now on gathering feedback from the community to see what would be the most valuable additions to the platform.
“So if anyone reading this is a fastai developer and would like to try to platform, please check out our free private beta.”
Everybody Loves Raymon
Another Monitoring Tool?
I caught up with Karel Vanhoorebeeck about the Monitoring tool their team just released. this space is getting really crowded which is an obvious signal to me that, its a hard problem to solve and there is much demand for it.
So what was the inspiration for this open source project? Karel told me a lot can go wrong post-deployment with ML systems. This is mainly due to things going wrong or changing in the data generation process. For example,
the ML team may have overlooked something during model training or may have introduced a bug during integration and deployment. Someone or something may have changed the data generation process.
Some examples of how he has seen this happen in the past:
- Camera shift due to vibrations. Dust collecting on the lens.
- Data drift due to onboarding new customers on the other side of the world.
- New android version with more privacy features leading to different data distributions.
- Some other dependency may have changed, leading to faulty data e.g. API change, outages, …
“The reason we founded Raymon was because we were frustrated by the lack of tooling to handle ML systems post deployment.
“Most teams set up different DevOps tools like Elastic stack to collect logs and grafana to log metrics. Combined that with some custom in-house developed tooling for data inspection, visualization, and troubleshooting.
“What we consider really important is easy to understand distance metrics between distributions and tunability, on which you can build flexible alerting schemes.
“These tools are often set up as an afterthought or something to tick off the to-do list. Not much thought goes into how useful they actually are, how easy they are to work with and how exactly they will help an engineer with troubleshooting.
“Only when production issues start occurring people notice they lack this or that functionality and either ignore the tooling and write custom code to debug, or they gradually improve and patch up the tooling.
“We have a broader focus where teams can log all relevant data and metadata related to a model prediction, like pre and post processing information and visualisations. This is especially useful for explainability information and richer data types like computer vision or sensor data where an image tells more than a thousand words.
“Stitching together and building your own troubleshooting and monitoring tooling takes a lot of time, requires a lot of conceptual scoping work, and unless you really get the time and budget to work on it, the usefulness and user friendliness will be… bad.
“We’re building an observability hub that is basically a place to log all kinds of information that could be relevant for the ML team. This can be the raw data received for a prediction request for example, or it can be the data after a few preprocessing steps, it can be the model output, model confidence, processing times, data after post processing, and so on. All this information is relevant for a DS, so all this information should end up in one integrated platform.”
Where Does The Open Source Library Come In?
“It’s a toolbox to collect metrics about your data and model predictions into profiles (ModelProfiles we call them). These profiles can then be used to validate data in production, which generates all kinds of data health metrics that can be logged to the Raymon hub. Using these profiles, Raymon knows what all these metrics mean and can auto-configure monitoring. Next to that, the library also allows you to log all kinds of data to our hub.
“The metrics that are collected in a profile can be anything really. The simplest case is probably if you would use them to track input feature distributions and health. We currently offer support for a (limited) set of metrics that we’ve found useful so far in structured data and computer vision. We are now working to support more metrics for more data domains. All suggestions and ideas are most welcome!”
Have a look around Raymon’s Github here
Dude, It’s a Dud
What is Dud?
I heard about Dud, pronounced “duhd”, not “dood” a few months ago when the creator Kevin Hanselman dropped a few lines in the community about it. Curious about the project, I caught up with him to learn about the new 0.2.0 release, and what exactly the tool aims to do.
Dud is a lightweight MLOps tool for versioning data alongside source code and building data pipelines. In practice, Dud extends many of the benefits of source control to large binary data. It is especially a more focused and lighter weight data version control tool.
It strives to be 3 things. Simple. Fast. Transparent.
Dud should never get in your way (unless you’re about to do something stupid). Dud should be less magical, not more. Dud should do one thing well and be a good UNIX citizen.
Dud should prioritize speed while maintaining sensible assurances of data
integrity. Dud should isolate time-intensive operations to keep the majority of the UX as fast as possible. Dud should scale to datasets in the hundreds of gigabytes and/or hundreds of thousands of files.
Dud should explain itself early and often. Dud should maintain its state in a human-readable (and ideally human-editable) form.
To summarize with an analogy: Dud is to DVC what Flask is to Django.
Both Dud and DVC have their strengths. If you want a “batteries included” suite of tools for managing machine learning projects, DVC can be a good fit for you. If data management is your main area of need and you want something lightweight and fast, Dud may be what you are looking for.
Check out dud here
chas·sis | \ ˈcha-sē , ˈsha-sē also ˈcha-səs \
plural chassis\ ˈcha-sēz , ˈsha-sēz \
The supporting frame of a structure (such as an automobile or television)
Leaf springs are attached to the car’s chassis.
Also: the frame and working parts (as of an automobile or electronic device) exclusive of the body or housing
So what does this have to do with MLOps? The ultra prolific Luke Marsden, aka my old boss, hasn’t stopped creating stuff in the MLOps space since dotscience went under. And judging by what he told me last week, I don’t think he has plans to any time soon.
I caught up with Luke to get the low down on his new creation Chassis.ml
“We created chassis.ml to help bridge the gap between data scientist/ML teams and DevOps teams. Getting models into production is still one of the main challenges for companies trying to get value out of AI/ML. DevOps teams could do worse than deploying chassis to their k8s cluster to give data scientists an easy python SDK to convert their MLflow models into runnable, production-ready container images that are multi-platform.”
“While we already support MLflow, kfserving and modzy, we’re looking to integrate with other model sources and ML runtimes so come and get involved in #chassis-model-builder on the MLOps community slack.”
Check out a demo of Chassis in action here, or click here for a full on test drive.
On-Premise Cost Cutting
Right before going on vacation, I saw Alon Gubkin created a nifty tool for saving money with Kubernetes. Naturally, I had to talk with him more about the reasons for creating it and what exactly it does.
You all may remember another New Tool Tuesday I did on BudgetML. I am hoping there can be a community collaboration and these two money savers can have a superhero offspring that pays me for running Kubernetes.
So the story is really simple, the idea for this tool came from two real needs:
- At Aporia we provide an ML monitoring solution that is on-prem, meaning you can install it in your own Kubernetes. We obviously needed to estimate how much our installation costs to the customer, but when I tried to accurately estimate that, I figured that it can be really hard.
- Some of our customers run a LOT of training jobs and model servers on kubernetes, and if you don’t configure you cluster correctly this can cost you A LOT (especially if you use GPUs). So after some conversations with them I thought of this idea
There are other similar tools like kubecost (which is great!), but it works on existing clusters with everything already installed.
I wanted a tool that lets you quickly plan your cluster and estimate costs for it, without a real cluster behind the scenes and without installing anything. So I built a really small “programming language” that lets you easily define your workloads:
# My training jobs
pod(cpu: 1, memory: “2Gi”, gpu: 2) * 3 +
# Some model servers
pod(cpu: 2, memory: “4Gi”) * 10
And it automatically gives you the cheapest node configuration, with its price per month (e.g 2 p3.2xlarge instances for USD $875.16)
Check out the tool here