Gradient Boosting in python using scikit-learn
Gradient boosting has become a big part of Kaggle competition winners’ toolkits. It was initially explored in earnest by Jerome Friedman in the paper Greedy Function Approximation: A Gradient Boosting Machine. In this post we’ll take a look at gradient boosting and its use in python with the scikit-learn library.
Gradient boosting is a boosting ensemble method.
Ensemble machine learning methods are ones in which a number of predictors are aggregated to form a final prediction, which has lower bias and variance than any of the individual predictors.
Ensemble machine learning methods come in 2 different flavours – bagging and boosting.
- Bagging is a technique in which many predictors are trained independently of one another and then they are aggregated afterwards using an average (majority vote/mode, mean, weighted mean). Random forests are an example of bagging.
- Boosting is a technique in which the predictors are trained sequentially (the error of one stage is passed as input into the next stage).
Gradient boosting produces an ensemble of decision trees that, on their own, are weak decision models. Let’s take a look at how this model works.