Balancing the learning rate (or shrinkage) and the number of boosting stages (or trees) in XGBoost is critical for optimizing both training duration and predictive accuracy.
The Trade-Off
Learning Rate: Affects the influence of each tree on the final outcome. Smaller rates necessitate higher tree counts, but can lead to better generalization due to more gradual model updates.
Number of Trees: Correlates with model complexity and training time. Fewer trees may underfit, while too many can overfit and slow down training.
Searching the Optimal Space
Analytical Audit: Validate prior choices using learning curves, feature importance, and cross-validation mean square error.
Grid Search: Exhaustively try all combinations within bounded intervals for both parameters.
Visualizing the Relationship
When executing the grid search, plotting both the learning rate and number of estimators against the chosen performance metric provides a 3D landscape view.
The Code Example
Here is the Python code:
import numpy as np
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_boston, load_digits
# Load datasets
boston = load_boston()
digits = load_digits()
# Parameter grid
param_grid = {
'learning_rate': [0.05, 0.1, 0.2, 0.3],
'n_estimators': [50, 100, 150, 200],
}
# Grid search
clf_boston = GridSearchCV(xgb.XGBRegressor(), param_grid, cv=5, scoring='neg_mean_squared_error')
clf_digits = GridSearchCV(xgb.XGBClassifier(), param_grid, cv=5)
# Fit the models
clf_boston.fit(boston.data, boston.target)
clf_digits.fit(digits.data, digits.target)
# Best parameters
print('Best parameters for Boston:', clf_boston.best_params_)
print('Best parameters for Digits:', clf_digits.best_params_)