April 30, 2024

How Distributed LightGBM Works

This article discusses a talk by James Lamb

This article discusses a talk by James Lamb. James is a Machine Learning Engineer at SpotHero, based in Chicago, IL. He is a LigthGBM maintainer and has led several large efforts to expand access to LightGBM, including publishing that project’s R package on CRAN and integrating ‘dask-lightgbm’ into the main ‘lightgbm’ Python package.

LightGBM is a framework ( documentation ) for supervised learning tasks (regression, classification, and ranking) on tabular data. People use it for tasks as varied as building search engines, detecting fraud, deciding whether or not to offer loans, predicting failures in industrial machinery, and forecasting demand.

James gave a great talk about LightGBM. In this talk, attendees will learn about LightGBM, a popular gradient boosting library from Microsoft. After a high-level overview of the LightGBM algorithm, the talk will describe strategies for distributed training of gradient boosted decision tree (GBDT) models generally, and distributed training of LightGBM models specifically. With this base established, the bulk of the talk will cover the current state of LightGBM’s Dask integration.

Attendees will learn the division of responsibilities between Dask and LightGBM’s existing distributed training framework, which is written in C++. The talk will also cover the specific components of the Dask ecosystem that LightGBM relies on. The talk offers details on distributed LightGBM training, and describes the main implementation of it using Dask. Attendees will learn which pieces of the Dask ecosystem LightGBM relies on, and what challenges LightGBM faces in using Dask to wrap existing distributed training code written in C++.

Links to talk: talks

Notebooks in talk: https://github.com/jameslamb/lightgbm-dask-testing/tree/main/notebooks

James’ previous MLops coffee talk on Building for Small Data Science Teams : https://www.youtube.com/watch?v=yAsPfhI5Jd8

Author

Morris

Become Part of the Global Movement

Become part of a thriving network of over 70,000 AI and ML professionals. Experience unparalleled opportunities for learning, collaboration, and growth—all for free!

Join the Community

How Distributed LightGBM Works

This article discusses a talk by James Lamb

Author

Engineering the Memory Layer For An AI Agent To Navigate Large-scale Event Data

The Illustrated Guide on How to Use AI Coding Platforms

Chaigent: An affordable alternative to Gemini Enterprise on Google Cloud

Become Part of the Global Movement

How Distributed LightGBM Works

This article discusses a talk by James Lamb

Author

Related posts

Engineering the Memory Layer For An AI Agent To Navigate Large-scale Event Data

The Illustrated Guide on How to Use AI Coding Platforms

Chaigent: An affordable alternative to Gemini Enterprise on Google Cloud

Become Part of the Global Movement