Meetup #121

Let's Talk About Raw Documents

Modern ML pipelines still often need pre-processed documents. This isn't changing anytime soon, in fact, the appetite is growing. Unstructured.io is focused on extracting structured data from raw documents (pdf, pptx, html, etc). In the near term, we're more NLP-focused. Check out Unstructured.io's open-source libraries!

In this episode

Crag Wolfe

Crag Wolfe

Infrastructure Team Lead, Unstructured.io

Crag is a seasoned Back-End Engineer, with over a decade of experience working at Red Hat. In his most recent role, he served as the Technical Lead for a key product at an NLP startup, where he spent five years honing his skills and expertise.

Twitter

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.

Ben Epstein

Ben Epstein

Host

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a founding software engineer at Galileo (rungalileo.io) focused on building data discovery and data quality tooling for machine learning teams. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.