Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently

clustering k-means hierarchical-clustering unsupervised-learning dimensionality-reduction

Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find groups in data that nobody labeled. No answer key.

K-Means is the most common clustering algorithm. You pick $K$ (number of clusters) upfront. The algorithm:

Places $K$ centroids randomly
Assigns each point to its nearest centroid
Moves each centroid to the average of its assigned points
Repeats until centroids stop moving

The problem: K-Means is sensitive to initial centroid placement (hence K-Means++ for smarter initialization), only finds roughly spherical clusters, and you have to know $K$ in advance. Picking $K$ uses the elbow method: plot within-cluster sum of squares against $K$ , pick where the curve bends.

Hierarchical Clustering doesn't need $K$ upfront. It builds a dendrogram: a tree showing how points merge into clusters step by step:

Agglomerative (bottom-up): Start with every point as its own cluster. Merge the two closest. Repeat until one cluster remains.
Divisive (top-down): Start with everything in one cluster. Split recursively.

You then "cut" the dendrogram at a height that gives you the number of clusters you want.