My ML Journey

Why I'm Documenting This

Learning ML in public. Every entry is a day of studying: what clicked, what confused me, and what I'm pulling on next. Writing forces me to actually understand things, not just watch them go by. If it helps someone else figure something out, even better.

Daily notes on machine learning, deep learning, and AI.

24 entries

July 10, 2026·2 min read

LSTM Gives a Network Three Gates to Choose What to Forget; GRU Does It With Two

Vanilla RNNs forget almost everything within a few timesteps because of vanishing gradients, which defeats the entire point of using a recurrent network on long

July 9, 2026·2 min read

Backpropagation Through Time Is Just Backprop That Remembers Every Timestep

An RNN's hidden state at time $t$ depends on the hidden state at $t-1$, which depends on $t-2$, all the way back to the start of the sequence. That chain is wha

July 8, 2026·2 min read

RNNs Don't Just Process Data: They Process Data in Order, and Order Is the Point

Feedforward networks and CNNs both assume inputs are independent of each other: shuffle the rows of a dataset and a feedforward network doesn't notice. That ass

July 7, 2026·2 min read

LeNet Proved Deep Learning Works, One Layer at a Time

Before CNNs were a default assumption, someone had to prove the idea worked end to end on a real task. LeNet did that: a small convolutional architecture built

July 6, 2026·2 min read

Padding and Pooling Aren't Details: They Decide What a Convolution Keeps and Throws Away

A convolution is conceptually simple: slide a small filter across the input, multiply element-wise at each position, sum the result, and that sum becomes one pi

July 5, 2026·2 min read

A CNN Doesn't See an Image: It Sees a Stack of Filtered Patches

A fully connected network fed a raw image throws away the one thing that makes images images: spatial structure. A pixel's neighbors matter more than a pixel th

July 4, 2026·2 min read

Gradient Descent Doesn't Know Where the Minimum Is: It Only Knows Which Way Is Down

Backpropagation tells you the gradient: how much each weight should change to reduce the cost. Gradient Descent is the algorithm that actually uses that gradien

July 3, 2026·2 min read

Backpropagation Isn't Magic: It's the Chain Rule Run Backwards

A network with random weights makes garbage predictions on the first pass. The entire question that makes deep learning work is: given how wrong an output was,

July 2, 2026·2 min read

Without Activation Functions, a Neural Network Is Just Linear Regression in Disguise

Stack ten linear layers on top of each other with no activation function between them and you still only have a linear model: matrix multiplication composed wit

July 1, 2026·2 min read

The Perceptron Isn't a Neuron: It's a Weighted Vote With a Threshold

Before there were multilayer networks, there was a single unit trying to prove a point: that a machine could make a decision from weighted evidence. The McCullo

June 30, 2026·2 min read

Every Neural Network Is Just Weighted Sums Stacked on Weighted Sums

Starting a new block: Neural Networks. An Artificial Neural Network is a system designed to mimic how the human brain processes information, and the core promis

June 29, 2026·2 min read

Feature Selection Picks What to Keep, Feature Extraction Builds Something New

Not all features in a dataset are worth keeping. Some are redundant. Some are noise. Some slow down training without adding any predictive signal. Feature engin

June 28, 2026·2 min read

Ridge Shrinks Coefficients, Lasso Zeros Them Out: Why That Difference Matters

Regularization is not about making your model more accurate on training data. It is about making it less wrong on data it has never seen. When a model fits too

June 27, 2026·1 min read

Accuracy Is a Lie: What the Confusion Matrix Actually Tells You

If your classifier says it's 95% accurate, be suspicious. Accuracy is the most reported metric and often the most useless one, especially when classes are imbal

June 26, 2026·1 min read

MSE Penalizes Big Mistakes Harder: R² Tells You If Your Model Even Learned Anything

You've trained a regression model. It outputs numbers. How do you know if those numbers are any good? Two metrics cover this from different angles: MSE measures

June 25, 2026·1 min read

PCA Doesn't Remove Features: It Finds Better Ones

When you have a dataset with 50 features, most ML models struggle. More features means more noise, slower training, and harder interpretation. The naive solutio

June 24, 2026·1 min read

Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently

Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find gr

June 23, 2026·1 min read

K-Nearest Neighbors Has No Training Phase: And That's the Whole Point

Every algorithm I've studied so far learns during training, it adjusts weights, builds trees, finds hyperplanes. KNN (K-Nearest Neighbors) doesn't. There is no

June 22, 2026·1 min read

SVMs Don't Learn Patterns: They Find the Best Boundary Between Them

Support Vector Machines approach classification differently from everything I've studied so far. Decision Trees ask questions recursively. Logistic Regression e

June 21, 2026·1 min read

Why One Decision Tree Always Overfits: And How Random Forests Fix It

Yesterday I worked through how Decision Trees pick their splits using Gini Index. Today I ran into their biggest problem face-first: a fully grown Decision Tree

June 20, 2026·1 min read

How Decision Trees Pick the Right Question to Ask: The Gini Index Explained

A decision tree classifies data the same way a doctor does a differential diagnosis: by asking a sequence of yes/no questions, narrowing down possibilities with

June 19, 2026·1 min read

Logistic Regression Isn't Regression: It's Classification Through a Probability Trick

The name is confusing. Logistic Regression sounds like it predicts continuous values the way Linear Regression does, but it's actually a classification algorith

June 18, 2026·1 min read

Linear Regression Is Just Finding the Best Straight Line: Here's What That Actually Means

Linear regression is the first algorithm most people learn and often the most underestimated. At its core, it does one thing: fits a straight line through data

June 17, 2026·1 min read

The Map Before the Territory: How ML Splits Into Supervised and Unsupervised

Before you can understand any specific ML algorithm, you need to understand why ML splits into two completely different philosophies, and what that distinction