Skip to content

Memory

LightGBM memory factors

  • https://github.com/microsoft/LightGBM/issues/562

  • https://lightgbm.readthedocs.io/en/latest/FAQ.html

Adjusting these parameters can reduce memory usage:

  • histogram_pool_size: 1024? histogram cache ~ 20bytes * num_leaves * num_features * num_bins

  • num_leaves: 255? When num_leaves decreases, the RAM required decreases exponentially

  • max_bin:

Solution to lower RAM usage in LightGBM:

  • set histogram_pool_size parameter to the MB you want to use

  • approximately RAM used = histogram_pool_size + dataset size

  • lower num_leaves, lower max_depth, or lower max_bin

lightgbm reduce memory

  • https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

  • https://github.com/microsoft/LightGBM/issues/6319

lightgbm training-time memory usage:

  • raw data

  • dataset

  • model

  • other

Avoid the memory usage for the raw data:

  • construct a Dataset directly from a file (either a CSV/TSV/LibSVM file or a LightGBM Dataset binary file).

Reduce the memory usage of the dataset:

  • use smaller max_bin [default = 255] or high min_data_in_bin

  • remove irrelevant features before construction

  • In Python, construct a dataset and perform training in the same process to avoid storing a copy of the raw data by passing free_raw_data=True

Reduce the size of the model:

  • use early stopping

  • set max_depth [default = -1] (good values of this will depend on num_leaves).

  • reduce num_leaves. default = 31

  • reduce n_estimators (num_boost_round).

  • increase min_gain_to_split. default = 0.0

  • increase min_data_in_leaf. default = 20