Meetup #48

Serving ML Models at a High Scale with Low Latency

Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often <100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with; Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.


This talk will explain how Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.

In this episode

Manoj  Agarwal

Manoj Agarwal

Software Architect, Salesforce

Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.



Demetrios Brinkmann

Demetrios Brinkmann


Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.