Building an effective ML pipeline requires understanding the data available to you and how it's changing. Exploring a new dataset is often an iterative, interactive process that gives the engineer doing it tremendous insight into the underlying data generating processes and the pipelines that have touched it. Yet too often, those insights are lost when a system goes into production or after internal handoff between teams. We talk about how to capture Exploratory Data Analysis done when first working with a dataset. With a clear understanding of what data characteristics were important in crafting a dataset, it becomes possible to collaborate on and share clear expectations about the true differentiator in ML pipelines -- the data that fuels them.
In this episode
Chief Technology Officer, Superconductive
James Campbell is the CTO at Superconductive, the company behind the open-source data quality project Great Expectations, which he co-founded in 2017. Prior to that, he spent nearly 15 years working across a variety of quantitative and qualitative analytic roles in the US intelligence community. James studied Mathematics and Philosophy at Yale and is passionate about creating tools that help communicate uncertainty and build intuition about complex systems.
Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.