Bias-Variance Trade-off

When we talk about underfitting and overfitting, you often hear about the Bias-Variance Trade-off. In machine learning, the Bias-Variance Trade-off helps us grasp how well a model is doing and what steps can improve its performance. Think of it like throwing darts at a dartboard, where hitting the bullseye means making a perfect prediction.

Dartboards representing Bias and Variance

Bias: Measures how off your predictions are from the correct values. How good are your predictions overall in predicting the right overall value? i.e. how far removed the mean of your predicted values is from the "real" answer.
Variance: Measures of how spread out, how scattered your predictions are from the "real" answer.

When fitting your data, you often trade-off between bias and variance, which leads to either overfitting or underfitting the data.

If your model is underfitting it has a "high bias"
If your model is overfitting then it has a "high variance"

Your model will be alright if you balance the Bias & Variance.

Graphs showing Overfitting and Underfitting

Here is another way to view it. On the left we have a straight line with low variance relative to the observations. Bias (error) from each individual point is high however. On the right however, we have high variance, but low bias as each individual point is exactly where it should be. This is an example of where we have traded off variance for bias.

The ideal situation is a line like that in the middle.

Again, in the context of decision boundaries for classification:

Decision boundary Overfitting and Underfitting

Or for a different visualisation:

Decision Boundary

The Maths Behind It

At the end of the day, you're not out to just reduce bias or just reduce variance, you want to reduce error. The error of your model is a function of both bias and variance:

\text{Error} = \text{Bias}^2 + \text{Variance}

You want to minimize error, not just bias or variance.

Tying Back to Algorithms

In k-Nearest Neighbors (KNN), increasing the value of $K$ smooths things out, reducing variance but possibly increasing bias.
Decision Trees are prone to overfitting (high variance). Random Forests mitigate this by averaging multiple decision trees, trading some variance for less bias.

Understanding the Bias-Variance trade-off helps you make better decisions in model selection and tuning, ultimately leading to more accurate and reliable models.

Improvements K-fold Cross validation