Why One Decision Tree Always Overfits: And How Random Forests Fix It

random-forest decision-trees ensemble-learning overfitting supervised-learning

Yesterday I worked through how Decision Trees pick their splits using Gini Index. Today I ran into their biggest problem face-first: a fully grown Decision Tree almost always overfits the training data. It memorizes it. The fix is one of the most elegant ideas in ML. Random Forests.

Here's why a single tree fails. The tree keeps splitting until each leaf contains only one class (or hits a depth limit). On training data, this looks perfect. On test data, it falls apart because those leaves captured noise specific to the training set, not the actual pattern. Tweak the training data slightly and you get a completely different tree. The technical name for this instability is high variance.

Random Forest attacks variance with two stacked ideas:

Bagging (Bootstrap Aggregating): Train $N$ trees, but each tree sees a different random subset of the training data: sampled with replacement (some rows appear multiple times, some not at all). Each tree learns a slightly different version of the problem.

Feature randomness: At every split, each tree can only consider a random subset of features. This prevents all trees from splitting on the same dominant feature early on, which would make them all identical.

When predicting, you take a majority vote (classification) or average (regression) across all $N$ trees. Individual trees are wrong, but they're wrong in different directions. The errors cancel out.