Ensemble Learning

When we perform random forests, that is an example of ensemble learning. Ensemble learning is a method in machine learning where multiple models are employed to produce more accurate results than a single model could achieve. Each model gives its own "vote" on the outcome, whether it's classifying an image or predicting stock prices. It also allows for the incorporation of various types of algorithms to solve a single problem.

Examples

Random Forests: An ensemble of decision trees, each trained on a different subset of the data. The individual trees' predictions are aggregated for a final decision.
K-Means Clustering: Different k-means models initialized at different centroids can be employed in ensemble to optimize clustering.

The Netflix prize serves as an impactful case study for ensemble methods, where an ensemble model with used multiple recommender algorithms at once outperformed Netflix's own existing movie recommendation algorithm, leading to an award of one million dollars to the developers.

Key Techniques

Bagging (Bootstrap Aggregating): This involves creating multiple subsets of the original dataset and training each model on a different subset. Finally, the models vote for the most probable prediction.
Boosting: Here, each subsequent model is adjusted to correct the mistakes made by the ensemble of existing models. This iterative approach refines the prediction capability of the ensemble.
Bucket of Models: Different algorithms are used in concert. Each makes its own prediction, and the final decision is made by considering all these individual predictions.
Stacking: Multiple models are trained on the same dataset; however, instead of simple voting, the models are evaluated to select the most reliable prediction for each specific scenario.

Bagging

Generate $N$ new training sets by random sampling with replacement. Each resampled model can be trained in parallel

Basically we create a bunch of different models, that just have different training sets that are generated this way, that are recycled from the original set of training data. The advantage of this is that you can train all of those models in parallel, and then just let them all vote on a final result.

Boosting

Works in a more serial manner:

Observations in our dataset are weighted to each dataset point
As we learn that over time, we keep refining those weights of the underlying observations.
- E.g. Next classifier may assign a higher weight to those data points which were misclassified
All of the sequential splits can join together to form a stronger split.
Some will take part in new training sets more often
Training is sequential; each classifier takes into account the previous one's success.

File:Boosting.png

Advanced Techniques: A Word of Caution

While advanced methods like Bayes Optimal Classifier and Bayesian Parameter Averaging exist, they often don't provide substantial benefits over simpler, more straightforward techniques like bagging and boosting, especially given their complexity.

Conclusion

Ensemble learning stands out as an effective approach for improving predictive accuracy and robustness in machine learning. Utilizing a combination of models typically results in more reliable predictions.

Measuring Entropy Recommender Systems