Conclusions

Decision Trees

Conclusions

Decision Trees classify data which cannot be linearly separated
Gini impurity or entropy impurity can be used to measure how homogeneous the data classes in a node are, but Gini impurity is most common because it is less computationally intensive
A Decision Tree stops splitting when the impurity of a node is 0 or when the samples assigned to a node are small enough or when the maximum tree depth is reached
Overfitting can often happen when a decision tree is too large
Plot the misclassification cost against different maximum depth values to understand an acceptable max_depth value
Random Forest trains many Decision Trees and predicts the classification that receives the most votes to overcome the high variance of individual Decision Trees