Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently
Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find groups in data that nobody labeled. No answer key.
K-Means is the most common clustering algorithm. You pick (number of clusters) upfront. The algorithm:
- Places centroids randomly
- Assigns each point to its nearest centroid
- Moves each centroid to the average of its assigned points
- Repeats until centroids stop moving
The problem: K-Means is sensitive to initial centroid placement (hence K-Means++ for smarter initialization), only finds roughly spherical clusters, and you have to know in advance. Picking uses the elbow method: plot within-cluster sum of squares against , pick where the curve bends.
Hierarchical Clustering doesn't need upfront. It builds a dendrogram: a tree showing how points merge into clusters step by step:
- Agglomerative (bottom-up): Start with every point as its own cluster. Merge the two closest. Repeat until one cluster remains.
- Divisive (top-down): Start with everything in one cluster. Split recursively.
You then "cut" the dendrogram at a height that gives you the number of clusters you want.