Coffee Sessions #58

10 Types of Features your Location ML Model is Missing

Machine learning on geographic data is relatively under-studied in comparison to ML on other formats like images or graphs. But geographic data is prevalent across a wide variety of domains (although many practitioners may not think of it that way). Clearly, any dataset with `latitude` and `longitude` columns can be viewed as geographic data, but also any dataset with a `zipcode`, `city`, `address`, or `county` can be construed as geographic. Demographics, weather, foot traffic, points of interest, and topographic features can all be used to enrich a dataset with any of these types of keys. In this coffee session, Anne discusses ways to simplify the process of incorporating geographic or location data into the MLOps workflow, as well as interesting trends in the geographic ML research community that will ultimately make it easier for us to learn from geography just as we do with images or graphs today.

Take-aways

- Lots of ML engineers are dealing with location data, but many don't even think of it that way. If you have a zip code or address or city name in your data, you could be missing out on a rich set of location features. - Incorporating rich location features into the MLOps workflow can be a real challenge, but Iggy is building tools to cut through the data sourcing and cleaning and deliver smart location features into ML models. We want to make experimenting with new location features so easy that there's no good reason *not* to do it. - Many ML shops have geographic data going into their models but aren't thinking of it that way; there could be hidden signals to be uncovered by adding relatively straightforward geographic features to models. - Deep learning on geographic data is relatively under-studied in comparison to deep learning on images or graphs, but there are many parallels that we can use to jump-start the process. - Building pipelines to add geographic data to models can seem like a large infrastructure lift, which is why some teams don't do it. Incorporating relatively straightforward geographic features into models can yield substantial improvements; adding "distance to the beach" or "square mileage reachable within 10 min drive" to a real estate pricing model, for example, can lead to significant decreases in model error. Unfortunately, many ML teams find it difficult to incorporate these types of geographic data into their models because the process of ingesting from geographic formats (geojson or shapefiles), projecting, and properly joining with their existing data can be a large infrastructure lift. 

In this episode

Anne Cocos

Anne Cocos

Director of Data Science, Ask Iggy

Dr. Anne Cocos currently leads data science and machine learning at Ask Iggy, Inc., a venture-backed, seed round startup focused on location analytics. Her team builds tools that make it simple for data scientists to leverage location information in their models and analyses. Previously she was the Director and Head, NLP and Knowledge Graph at GlaxoSmithKline, where she built algorithms and infrastructure to enable GSK’s scientists to leverage all the world’s written biomedical knowledge for drug discovery. She also worked on applied natural language processing research at The Children’s Hospital of Philadelphia Department of Biomedical Informatics. Anne completed her Ph.D. in computer science at the University of Pennsylvania, where she was supported by the Google Ph.D. Fellowship and the Allen Institute for Artificial Intelligence Key Scientific Challenges award. Before shifting her career toward artificial intelligence, Anne spent several years as an end-user of early ML-powered technologies in the U.S. Navy and at HelloWallet. Her previous degrees are from the U.S. Naval Academy, Royal Holloway University of London, and Oxford University. She currently lives just outside Philadelphia with her husband and three boys.

LinkedIn

Demetrios Brinkmann

Demetrios Brinkmann

Host

Demetrios is one of the main organizers of the MLOps community and currently resides in a small town outside Frankfurt, Germany. He is an avid traveller who taught English as a second language to see the world and learn about new cultures. Demetrios fell into the Machine Learning Operations world, and since, has interviewed the leading names around MLOps, Data Science, and ML. Since diving into the nitty-gritty of Machine Learning Operations he felt a strong calling to explore the ethical issues surrounding ML. When he is not conducting interviews you can find him making stone stacking with his daughter in the woods or playing the ukulele by the campfire.