Imbalanced¶
imbalanced data - should i use smote¶
https://mindfulmodeler.substack.com/p/dont-fix-your-imbalanced-data
https://mindfulmodeler.substack.com/p/imbalanced-data-do-nothing-should
Problem:
oversampling, will break calibration
undersampling, will lost some information
Solutions:
- weight data points: give underrepresented data points a higher and overrepresented a lower weight
lgb:
sample_weightandclass_weight = balancedin classificationuse cost-sensitive machine learning to train models
do threshold tuning on validation data
smote works for a weak classifier
undersampling for extremely large data and assigned larger weights to the sampled examples
lightgbm imbalanced data¶
model = lgb.train(
params={'class_weight': 'balanced'},
train_set=train_data,
valid_sets=valid_sets,
num_boost_round=num_boost_round,
)
xgboost imbalanced data¶
https://xgboosting.com/xgboost-imbalanced-multi-class-classification-set-sample_weight-using-compute_sample_weight/