Entry 2 of 13
ML Fundamentals Series
·1 min read

Linear Regression Is Just Finding the Best Straight Line: Here's What That Actually Means

Linear regression is the first algorithm most people learn and often the most underestimated. At its core, it does one thing: fits a straight line through data points to model the relationship between an input and a continuous output. But what "fits a straight line" actually means mathematically is where it gets interesting.

The equation is y=mx+by = mx + b: slope times input plus intercept. In ML notation you'll see y=θ1x+θ0y = \theta_1 x + \theta_0, same thing. The model learns the values of mm (slope) and bb (intercept) from data. Once you have them, prediction is just plugging in xx.

The real question is: which line? There are infinite lines you could draw through a scatter plot. Linear regression picks the one that minimizes the error between predicted and actual values. Each data point has a residual: the gap between what the model predicted (y^\hat{y}) and what actually happened (yy).

Residual=yy^\text{Residual} = y - \hat{y}

To find the best line, you minimize the Sum of Squared Residuals:

SSR=i(yiy^i)2SSR = \sum_i (y_i - \hat{y}_i)^2

Squaring serves two purposes: negatives and positives don't cancel each other out, and large errors get penalized harder than small ones (a residual of 10 becomes 100; a residual of 1 stays 1). This is called the Least Squares Method.