Coffee Sessions #63

PyTorch: Bridging AI Research and Production

Over the past few years, PyTorch became the tool of choice for many AI developers ranging from academia to industry. With the fast evolution of state-of-the-art in many AI domains, the key desired property of the software toolchain is to enable the swift transition of the latest research advances to practical applications. In this coffee session, Dmytro discusses some of the design principles that contributed to this popularity, how PyTorch navigates inherent tension between research and production requirements, and how AI developers can leverage PyTorch and PyTorch ecosystem projects for bringing AI models to their domain.  

Take-aways

- Innovation in ML and in particular in deep learning is driven by research iterations, so the speed of trying new ideas is a key factor in the advancement of AI - Speed of iteration is not only about scalability and hardware resources. Time to express novel ideas is often a limiting factor. Thus good and usable tools are a must. - Research and production environments often tend to optimize for different outcomes in isolation and that prevents newer AI advances to be brought to products quickly. Building a single software stack with tools for gradual transition on a research-production scale is harder but necessary. - ‘ML model is a program’ is central to PyTorch and is one of the reasons for the framework's success. This idea shapes many of the aspects of the downstream tools and has proven particularly powerful in the deep learning domain - Flexibility and programmability needs extend across the software stack and into hardware, especially as with specialized accelerators there’s a risk of overfitting to today’s models - Usability of tools is crucial not only for research but for performance optimization and production deployment too - Empowering the ecosystem is an effective way to deliver end-user value across domains and stages of ML development. Examples include PyTorch-based applied libraries for various problem domains or integrations with MLOps projects.

In this episode

Dmytro Dzhulgakov

Dmytro Dzhulgakov

Software Engineer, Technical Lead, Facebook

Dmytro Dzhulgakov is a technical lead of PyTorch at Facebook where he focuses on the framework core development and building the toolchain for bringing AI from research to production. Previously he was one of the creators of ONNX, a joint initiative aimed at making AI development more interoperable. Before that Dmytro built several generations of large-scale machine learning infrastructure that powered products like Ads or News Feed.

Twitter

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Vishnu Rachakonda

Vishnu Rachakonda

Host

Vishnu Rachakonda is the operations lead for the MLOps Community and co-hosts the MLOps Coffee Sessions podcast. He is a machine learning engineer at Tesseract Health, a 4Catalyzer company focused on retinal imaging. In this role, he builds machine learning models for clinical workflow augmentation and diagnostics in on-device and cloud use cases. Since studying bioengineering at Penn, Vishnu has been actively working in the fields of computational biomedicine and MLOps. In his spare time, Vishnu enjoys suspending all logic to watch Indian action movies, playing chess, and writing.