Artificial Intelligence 🤖
Basics Refresher
Standard Deviation & Variance

Standard deviation & Variance

Standard deviation and variance are two fundamental quantities for a data distribution.

Variance

Variance measures how 'spread-out' the data is. Variance (σ2\sigma^2) is simply the average of the squared differences from the mean.

σ2=1n∑i=1n(xi−xˉ)2\sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2

This represents the formula for variance, where xˉ\bar{x} is the sample mean.

Example: What is the variance of the data set (1, 4, 5, 4, 8)?

  1. First find the mean: (1+4+5+4+8)/5 = 4.4
  2. Now find the differences from the mean: (-3.4, -0.4, 0.6, -0.4, 3.6)
  3. Find the squared differences: (11.56, 0.16, 0.36, 0.16, 12.96)
  4. Find the average of the squared differences:
  5. σ2\sigma^2= (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04

Standard Deviation

Standard Deviation σ\sigma is just the square root of the variance.

This is usually used as a way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.

You can talk about how extreme a data point is by talking about 'how many sigmas/standard deviations' away from the mean it is. The standard deviation is usually used as a way to think about how to identify outliers in your dataset. If I say if I'm within one standard deviation of the mean, that's considered to be kind of a typical value in a normal distribution.