How Decision Trees Pick the Right Question to Ask: The Gini Index Explained
A decision tree classifies data the same way a doctor does a differential diagnosis: by asking a sequence of yes/no questions, narrowing down possibilities with each answer until reaching a conclusion. What makes the algorithm interesting isn't the structure (a flowchart), it's the question: how does it know which question to ask first?
The tree is hierarchical. It starts at a root node: the first question. Each question splits the data into branches. The process recurses until you reach leaf nodes: terminal nodes that give a final prediction. The algorithm at each node has to pick the feature and threshold that creates the most useful split.
Two metrics measure "most useful":
Information Gain measures how much a split reduces entropy (disorder). High entropy means the data is a mix of classes. After a good split, each branch should be purer. .
Gini Index measures the probability that a randomly picked element would be misclassified if it were randomly labeled by the node's distribution:
where is the proportion of class in the node. Gini of 0 means perfectly pure. Gini of 0.5 means maximum impurity.