Artificial Intelligence 🤖
Basics Refresher
Population variance versus sample variance

Population variance versus sample variance

There is a little nuance to standard deviation and variance, and that's when you're talking about population versus sample variance. If you're working with a complete set of data, a complete set of observations, you just take the average of all the squared differences from the mean and that's your variance.

However, if you're sampling your data, that is, if you're taking a subset of the data just to make computing easier, you have to do something a little bit different. Instead of dividing by the number of samples, you divide by the number of samples minus 1.

For example, take data studying for people standing in a line. We take the sum of the squared variances and divide by 5, that is the number of data points that we had, to get 5.04.

σ2=(11.56+0.16+0.36+0.16+12.96)/5=5.04\sigma^{2}=(11.56+0.16+0.36+0.16+12.96) / 5=5.04

If we were to look at the sample variance, which is designated by S2S^{2}, it is found by the sum of the squared variances divided by 4 , that is (n−1)(n-1). This gives us the sample variance, which comes out to 6.3 .

S2=(11.56+0.16+0.36+0.16+12.96)/4=6.3S^{2}=(11.56+0.16+0.36+0.16+12.96) / 4=6.3

So again, if this was some sort of sample that we took from a larger dataset, that's what you would do. If it was a complete dataset, you divide by the actual number.

As we've seen, population variance is usually designated as sigma squared (σ2)\left(\sigma^{2}\right), with sigma (σ)(\sigma) as standard deviation, and we can say that is the summation of each data point XX minus the mean, mu, squared, that's the variance of each sample squared over N\mathrm{N}, the number of data points, and we can express it with the following equation:

σ2=∑(X−μ)2N\sigma^{2}=\frac{\sum(X-\mu)^{2}}{N}
  • X denotes each data point
  • μ\mu denotes the mean
  • N\mathrm{N} denotes the number of data points

Sample variance similarly is designated as S2\mathrm{S}^{2}, with the following equation:

S2=∑(X−M)2N−1S^{2}=\frac{\sum(X-M)^{2}}{N-1}
  • X denotes each data point
  • M denotes the mean
  • N-1 denotes the number of data points minus 1