top of page

Decision Trees

Conclusions

  • Decision Trees classify data which cannot be linearly separated

  • Gini impurity or entropy impurity can be used to measure how homogeneous the data classes in a node are, but Gini impurity is most common because it is less computationally intensive

  • A Decision Tree stops splitting when the impurity of a node is 0 or when the samples assigned to a node are small enough or when the maximum tree depth is reached

  • Overfitting can often happen when a decision tree is too large

  • Plot the misclassification cost against different maximum depth values to understand an acceptable max_depth value

  • Random Forest trains many Decision Trees and predicts the classification that receives the most votes to overcome the high variance of individual Decision Trees

bottom of page