Hands-on Serving Models Using KFserving
We will look inside some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph and the signature concepts. We will discuss the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar and the virtual service) and Knative (the service and revisions), as well as Kubeflow and KFServing. Then we'll get into the design details of KFServing, its custom resources, the controller and webhooks, the logging and configuration. Then we are going to spend a large part in the monitoring stack, the metrics of the servable (memory footprint, latency, number of requests) as well as the model metrics like the graph init/restore latencies, the optimizations and the runtime metrics which end up to Prometheus. We will look at the inference payload and prediction logging to observe drifts and trigger the retraining of the pipeline. Finally, a few words about the awesome community and the roadmap of the project on multi-model serving and inference routing graph.
Take-aways
In this episode
Theofilos Papapanagiotou
Data Science Architect, Prosus
Theo is a recovering Unix Engineer with 20 years of work experience in Telcos, on internet services, video delivery and cybersecurity. He is also a university student for life; BSc in CS 1999, MSc in Data Coms 2008 and MSc in AI 2017. Nowadays he calls himself an ML Engineer, as he expresses through this role his passion in System Engineering and Machine Learning. His analytical thinking is driven by curiosity and hacker's spirit. He has skills that span a variety of different areas: Statistics, Programming, Databases, Distributed Systems and Visualization.
Demetrios Brinkmann
Host
Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.