June 20, 2021

New Tool Tuesday – Part II

A collection of all the latest New Tool Tuesday excerpts from our MLOps weekly newsletter. Subscribe to get them fresh in your inbox here.

Kites and Boxes

Another ML Monitoring Solution?

There is something very special about today’s “New Tool Tuesday”. My old boss and an MLOps community founder Luke Marsden is a co-creator.

I spoke with Luke over the weekend about the tool and why they created it. Before we talked about the tool though, first thing he said to me was “I saw your LinkedIn post….. yeah sorry about that” 😆

Anyway, I asked him why the hell he would make a monitoring solution at this time with the current space being full already.

“BasisAI have a product called Bedrock which is an MLOps platform. It has a monitoring component in it — to create boxkite we extracted the code from the proprietary platform and open sourced it.”

Sounds a bit like cheating, go on.

What it is

“Under the hood, it’s a simple python library which you pass your training data, training labels, production data and production labels it and it creates Prometheus histograms automatically from them. You ship the training time histogram with the model and feed it back into the boxkite component that runs in production, boxkite then knows how to compare the training time histogram with what it’s seeing in production and exposes a prometheus endpoint.”

Ok, Nice. But I’m still not convinced. Sounds like I can just do this with prometheus and grafana. Why would I need boxkite?

“One of the hardest parts was figuring out how to show the difference between the distributions in Granfana — normally to compare two distributions you use a technique called Kullback–Leibler divergence or KL divergence for short. There are loads of implementations of KL divergence in Python. One of the really clever things the BasisAI team did was to port KL divergence to Grafana.

So, there’s a really hairy PromQL expression hiding in the Grafana dashboard which shows you the KL divergence purely in Prometheus & Grafana.

How it’s different

None of the ML monitoring tools we’ve seen work natively with Prometheus & Grafana. We don’t believe you should be using one stack to monitor your ML and another stack to monitor your software. There should be unification between the MLOps teams & DevOps teams — and using the same tools is a key part of breaking down that wall.”

Interesting point. MLOps and DevOps working together in unison on a single stack.

“We’ve also seen a lot of monitoring tools that are pretty heavyweight: Seldon’s Alibi for example uses an event bus to ship all of the production inferences to a central server which then runs the statistical techniques like KL divergence in Python.

Instead, boxkite is really lightweight — it’s just instrumenting the training, exporting a prometheus compatible histogram. That’s what’s shipped with the model. It then exposes a simple Prometheus endpoint on your service which gets scraped just like the rest of your microservices.

Props to the BasisAI team for making something actually quite difficult seem pretty simple.”

Luke set up a demo to check it out and kick the tyres if you feel so inclined. Check out Boxkite.ml

Hypervectors for Hyperspace

Application Programming Interface

The Long and the short of it — Hypervector is an API for writing test fixtures for data-driven features and components. It provides synthetic data (generated from distributions the user defines programmatically) over dedicated endpoints. It also allows you to benchmark the output of any function or model to detect changes and regressions as part of your wider test suites.

So I caught up with the creator of Hypervector Jason Costello to talk about how and why this tool materialized. Enter Jason.

The backstory

I started off as a data scientist around 2014 after working in applied ML research for a bit, but I found myself more interested in becoming a better engineer as time went on. I’ve been fortunate enough to work with some superb teams, and one of the areas I’ve learned the most as a developer has been in writing useful tests. I find these help me contribute with less stress & uncertainty, can often aid in my understanding of a problem, and help make the experience of fast-moving shared codebases a little less chaotic.

A surprisingly common scenario I’ve encountered in multiple data teams has been at the interface between the data scientists and wider engineering folks when it comes time to ship something.

Engineers love to verify equivalence and consistency — over and over again with every incremental change. I remember being asked after pushing an improvement to a model: “How do we know it’s doing what it did before, plus some bits better?”. My answer would have been explaining the train-test-validate cycle, the various model selection metrics we’d used to make the decision to ship an improvement, and I might have pulled up a Jupyter Notebook to show some graphs (and probably waved my hands about a lot).


Now I can see that was sort of missing the point. The question was more like “How do WE (the engineers) know its doing what it did before, plus some bits better?”. Why can’t we run a test on every build of this project that ensures even seemingly unrelated changes have not somehow broke a small but important part of the model’s output?

I didn’t have a great response for this at the time, and eventually settled on using some of the training data used to build the feature as a test resource in the project repo — not a very elegant solution, and then a real pain to maintain going forward.

Hypervector tries to help in this area by providing a set of tools you can access via Python (or REST if you’d prefer). These tools allow you to define test resources centrally for such scenarios.

It began as a side project I was working on during evenings and weekends, and I decided to focus on it full-time towards the end of 2020. You can try out the early adopters Alpha version here, and please feel free to reach out on the MLOps Slack at any time. Feedback Appreciated.

Sages And Saints


In late 2017 my greek brother Pavlos Mitsoulis was a data scientist suffering through the painful experience of configuring his own EC2 instances in order to train and deploy ML models. Fraught with distress he tilted his head upwards, threw his hands in the sky, and clamored “there must be a better way!” Sadly the state of MLOps was nowhere near as booming as it is today, so with the fire of determination burning inside of him he set out to make things right for all other data scientists who found themselves in the same predicament.

The Solution 

The idea was simple and some even laughed at his naivete. Train and deploy models by implementing 2 functions on AWS? “This is blasphemous!” They responded and even created custom slack emojis to call him out. None the less Pavlos persisted. Train() and predict(), train() and predict(), train() and predict(). Two functions was all he needed.

Enter Sagemaker 

As Pavlos became more and more enamored with the idea of making life for himself and fellow data scientists easier, AWS did something unexpected; they released Sagemaker. “Oh no, they got there first” he thought to himself. Then it dawned on him, Sagemaker is an ML engine, not an ML platform. Realizing he could take the reins and stand on the shoulders of the beast this would help him get to his solution sooner! Train() and predict(), train() and predict() like an incantation fueling his late night coding sessions.

Sagify Is Born 

After numerous sleepless nights and nearing defeat countless times Pavlos rose like the phoenix from the ashes with an easy to use CLI MLOps tool in hand. Sagify is for all those data scientists who feel their voice falls upon deaf ears. For those data scientists who only want to focus 100% on ML; just training, tuning, and deploying models. For those that are already on AWS, a star is born to leverage Sagemaker as a backend ML engine, so they can work smarter, not harder.

Check out Sagify here

*This is based on a true story, some creative liberties have been taken for the sake of entertainment. 🙂

Dikembe Mutombo

The One Commandment

I am a sucker for well named products. When @Miguel Jaques mentioned he had created a new tool called Nimbo I couldn’t help myself. I had to dive in deeper for this addition of ‘New Tool Tuesday’.

What it is

Nimbo is a command-line tool that allows you to run machine learning models on AWS with a single command. It abstracts away the complexity of AWS. This allows you to build, iterate, and deliver machine learning models.

Why it was created 

Two Edinburgh college buddies were sick of how cumbersome using AWS was. Miguel a PhD in Machine Learning, Juozas, the co-founder, a Software Engineer wanted to be able to run jobs on AWS as easily as running them locally (e.g. training a neural network).

“All in all, we didn’t like the current AWS user experience, and we thought we could drastically simplify it for the machine learning/scientific computing niche.”

Having experienced that pain they set out to provide commands that make it easier to work with AWS. Such commands include easily checking GPU instance prices, logging onto instances, syncing data to/from S3, one-command Jupyter notebook, etc.

The lads decided to be solely client-side, meaning that the code runs on your EC2 instances and data is stored in your S3 buckets.

“We don’t have a server; all the infrastructure orchestration happens in the Nimbo package.”

How it works under the hood 

Nimbo uses the AWS CLI and SDK to automate the many steps of running jobs on AWS.

Some examples of this include: launching an instance (on-demand or spot instance), setting up your environment, syncing your code, pulling your datasets from S3, running the job, and when the job is done, saving the results and logs back to S3 and gracefully shutting down the instance.

You can use a single command (nimbo pull results) to get the results onto your computer. One of the most annoying parts of working with AWS is the IAM management and permissions. Miguel and Juozas decided to automate that too cause no one should have to suffer through that unwillingly.

Looking forward

The guys plan to add docker support, one-command deployments, and GCP support. Who knows maybe they will even chat with Pavlos and the Sagify folks as it seems they are trying to address some of the same problems.

Check out Nimbo here

Puddles and Ponds

Rock Skipping on K8S

Puntastic: My man @Bogdan State from New Zeland came through the community slack and made quite a “splash” about his new tool Walden. What is it? Well it’s is meant to be a “data pond” that runs on Kubernetes. Data pond? we’ll get to that in a minute. This edition of new tool Tuesday was supposed to be in last week’s newsletter, but I ended up being completely underwater and had to postpone til now. Anyway I caught up with the kiwi himself to get the rundown he hopes this tool can become the big fish in a… Really I couldn’t help myself.

The Backstory

Bogdan spent some time interning at Facebook in 2013 when he first encountered the data processing engine called Presto/Trino. Becasue of its incredible speed, he has come to rely on it extensively as a data scientist when processing very large amounts of tabular data.

After starting his own data science consultancy, scie.nz, one of the first items on the order of business was getting a working Presto / Trino installation. This turned out to be more work than expected, particularly due to a requirement that whatever setup they created, had to run on Kubernetes. After a few months of off-and-on development, reading tutorials, and pursuing debugging rabbit holes, his team finally got a working install, made up of Min.IO, Trino and the Hive Metastore. It is here that Mr. State realized he had built a small data lake in the process of doing so — a “data pond” if you want.

The Name

Since this sort of deployment is meant to answer the use case of a small team just trying to get started with Presto / Trino, and does not deal with more enterprise things like permissions and auditing, Bogdan and his team called it Walden. It’s a tribute to American writer Henry David Thoreau, who exalted self-sufficiency, solitude and contemplation at Walden pond, on the shores of which he lived by himself for two years. As it turns out, Thoreau received some help from his mother who did his laundry and dropped off food for him to eat. The lesson being that self-sufficiency looks easy and romantic on paper but creates a lot of unforeseen issues! This being the case, the kiwi nonetheless thinks that setting up your own mini data lake can be a useful (and inexpensive) way to get started analyzing big data, and hopes others find this work useful and build upon it!

Read all about Walden here or click here to go straight to the github.