Jovan and Maarten showcase Vaex, an open-source DataFrame library in Python, tailor-made to allow fast, interactive workflows with datasets that are too large to fit in RAM on a single node. Vaex makes this possible by leveraging lazy evaluations, efficient out-of-core algorithms, memory mapping, and computational graphs, all mostly behind the scenes and out of the user's way. Using data from the New York City YellowCab taxi service comprising 1.1 billion samples and taking up over 100 GB on disk, Jovan and Maarten show how one can conduct an exploratory data analysis, complete with filtering, grouping, calculations of statistics, and interactive visualizations on a single laptop in real-time. Jovan and Maarten also demonstrate how one can automatically build a machine learning pipeline as a by-product of the exploratory data analysis using the computational graphs in Vaex.
In this episode
Vaex.io Founder / independent freelancer, Vaex.io / Maarten Breddels
Senior Data Scientist, Tiqets
Jovan is a senior data scientist at Tiqets, where he creates predictive models and recommender systems centered around the e-commerce domain. Working mostly with Python in the Jupyter/PyData ecosystem, he has considerable experience in creating dashboards, clustering analysis, and predictive modeling. Jovan has a Ph.D. in Astrophysics, is a co-founder of vaex.io, and is interested in novel machine learning technologies and applications.
Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.
Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.