Bias & Variance in Deep Learning

When training models in deep learning, we encounter the bias-variance trade-off again, which influences whether our model overfits or underfits our data.

If a model performs poorly on the training set, it’s likely suffering from high bias, indicating underfitting. To assess variance, we compare the training error with the development (dev) set error. A significant increase in error from the training to the dev set suggests high variance, meaning the model is overfitting and not generalizing well.

Consider the following scenarios:

High Variance (Overfitting): The model achieves a 1% error rate on the training set but a 11% error rate on the dev set, indicating poor generalization.
High Bias (Underfitting): Both training and dev set errors are relatively high, with 15% on training and 14% on the dev set.
High Bias and High Variance: The model has a 15% training error escalating to a 30% test error.
Optimal Performance: The model reaches an ideal balance with a 0.5% training error and only a 1% test error.

The baseline for comparison is often the human error rate or the Bayes optimal error, which is the lowest possible error rate for a given problem. Ultimately, the total error is what matters, not just bias and variance specifically. Bias and variance both contribute to error:

\text{Error} = \text{Bias}^2 + \text{Variance}

A complex model will typically have high variance but low bias, whereas a simple model will tend to have high bias but low variance. The key is to find the sweet spot that minimizes total error.

Fixing High Bias & Variance

Knowing whether your model is affected by bias or variance guides you towards solutions:

To reduce high bias:
- Increase the size of your neural network, including more hidden units or layers.
- Choose a different model architecture that captures the complexity of the data better.
- Train the model for more iterations.
- Experiment with different optimization algorithms that might converge better.
To reduce high variance:
- Acquire more training data, if possible.
- Implement regularization techniques, which can penalize complexity.
- Consider changing the model architecture to one that fits the data distribution more closely.

Previously, the bias-variance trade-off suggested a compromise between the two. However, deep learning offers more flexibility to address both issues simultaneously, largely due to the ability to build larger models and the availability of large datasets. Also, training a bigger neural network never hurts (except for computation time and resources).

Dataset Splits Normalization and Regularization