Decision Trees, Visually
Forget the formulas for a moment. Watch a handful of real data points act out Gini impurity, entropy, and information gain — and see exactly why a decision tree splits the way it does.
Here are 20 loan applicants. A bank must decide: approve or deny? Each dot is one applicant — hover to see their income, credit score, and employment.
Scroll to begin ↓
What does “impurity” mean?
Pick an applicant, then guess their decision by pointing at another applicant at random. How often are you wrong? The messier (more impure) the group, the more often your guess misses.
Gini impurity, one step at a time
Read off each class’s share of the same 20 applicants, then ask: if you picked two at random, how often would they disagree?
Start with the whole group: 20 applicants, a mix of approvals and denials.
Scroll to advance the calculation ↓
Entropy: counting bits of surprise
Think of surprise as the number of yes/no questions you’d need to guess an outcome. Entropy is the average over all applicants.
The same 20 applicants — now measured as "surprise". Pick one at random: how surprised are we by what we observe?
Scroll to advance — or use the buttons on the visual ↓
Zooming out: the same shape across every possible split
Gini and entropy measure the same idea — how mixed a group is — with different math. Here’s how they compare across every possible split proportion.
At our dataset’s proportion (p = 0.60), Gini reads 0.48 and Entropy reads 0.97 bits.
Was the split worth it?
Drag the threshold to slice the applicants in two. Information gain measures how much impurity the split removed — the drop from the grey “before” bar.
Before the split: one mixed group with its own impurity.
Growing the whole tree
A decision tree is just this idea on repeat: at every impure group, pick the split with the most information gain, then do it again on the children.
