OLS Derivation
Step 1: The Model
We assume a linear relationship between \(x\) and \(y\): $$\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$$ For each observation, the model produces a predicted value \(\hat{y}_i\). The question is: how do we choose \(\hat{\beta}_0\) and \(\hat{\beta}_1\)?
Step 2: Defining Error
The error (or residual) for observation \(i\) is the gap between what we observed and what we predicted: $$e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i)$$ A good line makes these errors small. But errors can be positive or negative, so we cannot just add them up. The sum of raw errors can be zero even for a terrible fit.
Step 3: Why Square the Errors?
Squaring solves the sign problem and penalizes large errors more than small ones: $$e_i^2 = (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2$$ Summing over all observations gives the Sum of Squared Errors: $$\text{SSE} = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2$$ This is our objective function. We want to find the \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that make SSE as small as possible.
Step 4: Minimize — Partial Derivative w.r.t. β0
Take the partial derivative of SSE with respect to \(\beta_0\) and set it to zero: $$\frac{\partial \text{SSE}}{\partial \beta_0} = -2\sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0$$ Divide both sides by \(-2n\) and rearrange: $$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$$ The intercept is pinned by the means. The OLS line always passes through the point \((\bar{x}, \bar{y})\).
Step 5: Minimize — Partial Derivative w.r.t. β1
Take the partial derivative with respect to \(\beta_1\) and set it to zero: $$\frac{\partial \text{SSE}}{\partial \beta_1} = -2\sum_{i=1}^{n}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0$$ Substitute the expression for \(\hat{\beta}_0\) from Step 4 and simplify.
Step 6: Solve for β1
After substitution and algebra: $$\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$$ The slope is the ratio of how \(x\) and \(y\) move together (covariance) to how much \(x\) varies on its own (variance).
Step 7: Back-Substitute for β0
Plug \(\hat{\beta}_1\) back into the intercept formula from Step 4: $$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$$ We now have closed-form expressions for both coefficients. No iteration required.
◆ THE BLUEPRINT
What You're Looking At

A step-by-step derivation of the OLS estimators. This is the calculus-based proof that the familiar regression coefficients minimize the sum of squared errors.

Key Result
$$\hat{\beta}_1 = \frac{\text{Cov}(X,Y)}{\text{Var}(X)} \qquad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$$
Why This Matters

These closed-form solutions exist because SSE is a convex quadratic in the parameters. No iterative optimization is needed. The Loss Landscape tab lets you see this convexity directly.

Related Topics
Your Line
Contour Plot
◆ THE BLUEPRINT
What You're Looking At

The SSE (or MSE) is a function of two variables: the intercept and the slope. Because the loss is quadratic in these parameters, the surface is a bowl shape (convex) with a single global minimum.

Key Insight
$$\text{SSE}(\beta_0, \beta_1) = \sum_{i=1}^{n}(y_i - \beta_0 - \beta_1 x_i)^2$$

The contour lines are ellipses. The OLS solution sits at the center. Every other combination of intercept and slope produces a higher loss.

How to Interact

Drag the sliders and watch the ember dot move on both the contour and 3D plots. Click "Snap to OLS" to jump to the minimum. The metric cards show exactly how far off your current guess is.

Related Topics
Optimal Lines Under Each Metric
Loss Function Shapes
Metric Comparison
Takeaway
◆ THE BLUEPRINT
What You're Looking At

Two different lines fitted to the same data. The MSE-optimal line minimizes the average squared error. The MAE-optimal line minimizes the average absolute error. With clean data, they are nearly identical. With outliers, they diverge.

Key Equations
$$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$
$$\text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$$
Why They Differ

MSE squares the errors. A residual of 10 costs 100 under MSE but only 10 under MAE. Large errors dominate MSE, so the MSE line bends toward outliers. MAE treats all errors linearly and is more robust.

The MAE-optimal line passes through the median of the data in a precise sense. The MSE-optimal line passes through the mean. This is why the mean is sensitive to outliers but the median is not.

Related Topics
◆ THE BLUEPRINT
The Challenge

Position a line using only the intercept and slope sliders. Try to make the residuals as small as possible. Your only feedback is the SSE of your current line. The OLS solution is hidden until you reveal it.

Strategy

Start with the slope. Tilt the line to follow the general trend. Then adjust the intercept to shift the line up or down through the center of the data. Watch the residual segments shrink as you improve.

The Three Metrics
$$\text{SSE} = \sum e_i^2 \qquad \text{MSE} = \frac{1}{n}\sum e_i^2 \qquad \text{MAE} = \frac{1}{n}\sum |e_i|$$
Related Topics