Decision Trees
Code | Part 2 : Random Forest
Decision trees have high variance. Even a small change in the training data can result in a drastically different tree. Even the Decision Trees with the highest accuracy, may fail to predict the class of a new datapoint. To address these challenges, Random Forest trains many decision tree classifiers and combines them to predict the class of a new datapoint. Â Each tree in the Random Forest predicts the class of the new datapoint and the prediction with the most votes becomes the predicted classification. We can see that the accuracy increases when Random Forest is implemented.
The number of trees used in Random Forest Classification with SciKit is defined by the n_estimators hyperparameter. By default, the value is 100, but it can be changed by setting a new value. The example below uses 50 trees.