Standard deviation & Variance

Standard deviation and variance are two fundamental quantities for a data distribution.

Variance

Variance measures how 'spread-out' the data is. Variance ( $\sigma^2$ ) is simply the average of the squared differences from the mean.

\sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2

This represents the formula for variance, where $\bar{x}$ is the sample mean.

Example: What is the variance of the data set (1, 4, 5, 4, 8)?

First find the mean: (1+4+5+4+8)/5 = 4.4
Now find the differences from the mean: (-3.4, -0.4, 0.6, -0.4, 3.6)
Find the squared differences: (11.56, 0.16, 0.36, 0.16, 12.96)
Find the average of the squared differences:
$\sigma^2$ = (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04

Standard Deviation

Standard Deviation $\sigma$ is just the square root of the variance.

This is usually used as a way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.

You can talk about how extreme a data point is by talking about 'how many sigmas/standard deviations' away from the mean it is. The standard deviation is usually used as a way to think about how to identify outliers in your dataset. If I say if I'm within one standard deviation of the mean, that's considered to be kind of a typical value in a normal distribution.

Mean, Median & Mode Population variance versus sample variance