Overfitting, bias-variance and learning curves
- Overfitting: When the statistical model contains more parameters than justified by the data. This means that it will tend to fit noise in the data and so may not generalize well to new examples. The hypothesis function is too complex.
- Underfitting: When the statistical model cannot adequately capture the structure of the underlying data. The hypothesis function is too simple
A graph of training and testing error vs model complexity provides a good illustration of the tradeoff between underfitting and overfitting. The best model is the one the provides a hypothesis function that is closest to the true relationship within the data. Such a model will produce a minimum in testing error. To the left of this minimum, simpler models will be underfitting. To the right, more complex models will be overfitting.
What do we actually mean by model complexity? This can be related to a great many things and so is difficult to quantify. It is related to the hyperparameter choices for the algorithm - such as the size of the regularization term in linear regression, the maximum tree depth for decision trees, the number of layers in a neural network etc. These are usually things that can be optimized during testing. However, model complexity is also related to the number of features and the size of the dataset. Typically the more features are used, the more complex the model becomes.
- Bias is the contribution to total error from the simplifying assumptions built into the method we chose
- Variance is the contribution to the total error due to sensitivity to noise in the data. For example, a model with high variance is likely to produce quite different hypothesis functions given different training sets with the same underlying structure just due to different noise in two datasets.