THE FORGE • Bias-Variance Tradeoff

Forge the Fit

Training vs Test Error

Error at Current Degree

What You're Looking At

A polynomial model fit to noisy data. The degree slider controls model complexity. Low degree = rigid line. High degree = wiggly curve that chases every data point.

The Model

\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x + \hat{\beta}_2 x^2 + \cdots + \hat{\beta}_d x^d\)

Each additional degree adds a free parameter. More parameters means more flexibility to fit the training data.

Why Training Error Always Decreases

A degree-d polynomial has d+1 parameters. More parameters can always fit the training data at least as well. Training MSE is a monotonically non-increasing function of degree.

Why Test Error Has a U-Shape

At low degree the model is too rigid (underfitting). At high degree the model memorizes noise (overfitting). The best test error sits in between.

How to Interact

Slide the degree from 1 to 15 and watch both curves. Toggle "Show True Function" to see where the model agrees with reality. Click "New Sample" several times at a high degree to see how much the fit changes. That instability is variance.

What You're Looking At

K-Nearest Neighbors regression. For each test point, find the K closest training points and average their y-values. No model is fit in the traditional sense. The training data IS the model.

The Prediction Rule

\(\hat{f}(x_0) = \frac{1}{K} \sum_{x_i \in N_K(x_0)} y_i\)

\(N_K(x_0)\) is the set of K training points closest to \(x_0\).

K Controls Flexibility

K=1: prediction equals the single nearest neighbor. Zero training error. Extremely wiggly. High variance.

K=n: prediction equals the global mean of all training y-values. A flat line. High bias, zero variance.

The error plot uses 1/K on the x-axis so flexibility increases left to right, matching the polynomial tab.

Same Tradeoff, Different Method

Polynomials control complexity via degree. KNN controls complexity via K. Both produce the same U-shaped test error curve. The bias-variance tradeoff is universal.

How to Interact

Try K=1 and click "New Sample" several times. The fit changes dramatically each time. That is variance. Now set K equal to half the sample size. The fit barely changes between samples. That is low variance, but the model may miss the true shape (high bias).

Toggle "Overlay Best Polynomial" to compare the two methods on the same data.

What You're Looking At

This tab runs many simulations. Each one generates a fresh training set from the same true function, fits a polynomial of each degree, and evaluates it on a fixed test grid. The result is the expected test error broken into its three components.

The Fundamental Equation

\(\text{E}\left[(y - \hat{f}(x))^2\right] = \text{Bias}^2(\hat{f}(x)) + \text{Var}(\hat{f}(x)) + \sigma^2\)

What Each Term Means

\(\text{Bias}^2 = \left[\text{E}[\hat{f}(x)] - f(x)\right]^2\): how far the average prediction is from the truth. High bias means the model is systematically wrong.

\(\text{Var} = \text{E}\left[\hat{f}(x) - \text{E}[\hat{f}(x)]\right]^2\): how much the prediction changes across different training sets. High variance means the model is unstable.

\(\sigma^2\): irreducible noise in the data. No model can remove this.

The Tradeoff

Simple models (low degree) have high bias and low variance. Complex models (high degree) have low bias and high variance. The optimal complexity minimizes the sum.

This is not just a polynomial concept. The same tradeoff governs every supervised learning method.

How to Interact

Change the true function to see how shape affects the optimal complexity. Increase noise to raise the irreducible floor and watch the optimal degree shift left. Increase sample size to reduce variance at every complexity level.

The Anvil

This is the unguided workbench. Start by studying the raw data. Then pick a model type, tune it, and try to minimize test MSE.

Challenge Mode

Toggle Challenge Mode to hide the true function and test error. Try to find the best model using only training error and residual patterns. Then reveal the answer.

What to Try

Start with a sine wave, low noise, and a polynomial. Find the degree that minimizes test error. Now switch to KNN and find the K that matches. Compare the two test MSEs.

Try a step function with high noise. Polynomials struggle here. KNN adapts more naturally to discontinuities.

Other Forge Tools

Regression & Regularization Ridge, Lasso, and Elastic Net for multivariate data

Pick a true function. Study the data. Then choose a model and tune it.

True Function

Sample Size

Noise Level (σ)

Model Type

None

KNN

Polynomial

Polynomial Degree

Neighbors (K)

Show True Function

Challenge Mode

◆ THE BLUEPRINT

What You're Looking At

The Model

Why Training Error Always Decreases

Why Test Error Has a U-Shape

How to Interact

Related Topics

◆ THE BLUEPRINT

What You're Looking At

The Prediction Rule

K Controls Flexibility

Same Tradeoff, Different Method

How to Interact

Related Topics

◆ THE BLUEPRINT

What You're Looking At

The Fundamental Equation

What Each Term Means

The Tradeoff

How to Interact

Related Topics

◆ THE BLUEPRINT

The Anvil

Challenge Mode

What to Try

Other Forge Tools