Entry 8 of 13
ML Fundamentals Series
·1 min read

Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently

Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find groups in data that nobody labeled. No answer key.

K-Means is the most common clustering algorithm. You pick KK (number of clusters) upfront. The algorithm:

  1. Places KK centroids randomly
  2. Assigns each point to its nearest centroid
  3. Moves each centroid to the average of its assigned points
  4. Repeats until centroids stop moving

The problem: K-Means is sensitive to initial centroid placement (hence K-Means++ for smarter initialization), only finds roughly spherical clusters, and you have to know KK in advance. Picking KK uses the elbow method: plot within-cluster sum of squares against KK, pick where the curve bends.

Hierarchical Clustering doesn't need KK upfront. It builds a dendrogram: a tree showing how points merge into clusters step by step:

  • Agglomerative (bottom-up): Start with every point as its own cluster. Merge the two closest. Repeat until one cluster remains.
  • Divisive (top-down): Start with everything in one cluster. Split recursively.

You then "cut" the dendrogram at a height that gives you the number of clusters you want.