Artificial Intelligence 🤖
Common Continuous Distributions

Common Continuous Distributions

We will now analyse three important probability distributions for continuous random variables, particular those involved in engineering and statistics: the Exponential and Normal distributions.

Uniform distribution

Let's start off with a really simple example: uniform distribution. A uniform distribution just means there's a flat constant probability of a value occurring within a given range. we can create a uniform distribution by using the NumPy random.uniform function.

Unifiorm Distribution

There's pretty much an equal chance of any given value or range of values occurring within that data. So, unlike the normal distribution, where we saw a concentration of values near the mean, a uniform distribution has equal probability across any given value within the range that you define.

So what would the probability distribution function of this look like? Well, I'd expect to see basically nothing outside of the range of 10\mathbf{- 1 0} or beyond 10\mathbf{1 0}. But when I'm between 10\mathbf{- 1 0} and 10, I would see a flat line because there's a constant probability of any one of those ranges of values occurring. So in a uniform distribution you would see a flat line on the probability distribution function because there is basically a constant probability. Every value, every range of values has an equal chance of appearing as any other value.

Normal (Gaussian) Distribution

To visualise in python:

from scipy.stats import norm
import matplotlib.pyplot as plt
 
x = np.arrange(-3, 3, 0.001)
plt.plot(x, norm.pdf(x))

Which results in:

Normal Python

We continue our discussion of continuous distributions by considering perhaps the most important probability distribution of them all: the Normal distribution. We will begin with a general definition for the Normal distribution, which we shall see has two parameters μ\mu and σ\sigma that exactly define E[X]E[X] and Var[X]\operatorname{Var}[X], respectively.

The random variable XX, follows a Normal distribution with parameters μ\mu and σ2\sigma^{2} (where σ>0\sigma>0 ), denoted by

XN(μ,σ)X \sim N(\mu, \sigma)

if its density function is given by

fX(xμ,σ)=12πσ2e(xμ)2/2σ2f_{X}(x \mid \mu, \sigma)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-(x-\mu)^{2} / 2 \sigma^{2}}

The parameters μ\mu and σ2\sigma^{2} are the mean (E[X])(E[X]) and variance (Var[X])(\operatorname{Var}[X]) of XX, respectively.

At first glance this expression appears rather complex, but we will now introduce an important transformation that will greatly simplify our calculations.

Standardising and the Z-transformation

We start by considering the following important property of the Normal distribution:

If XN(μ,σ2)X \sim N\left(\mu, \sigma^{2}\right), then for any constants aa and b(b0)b(b \neq 0), the random variable Y=a+bXY=a+b X is also normally distributed.

Using our properties from Chapter 2, the expressions for the expectation and variance of YY are easily computed as:

E[Y]=E[a+bX]=a+bE[X]=a+bμVar[Y]=Var[a+bX]=b2Var[X]=b2σ2\begin{aligned} E[Y] & =E[a+b X]=a+b E[X]=a+b \mu \\ \operatorname{Var}[Y] & =\operatorname{Var}[a+b X]=b^{2} \operatorname{Var}[X]=b^{2} \sigma^{2} \end{aligned}

From which we deduce that YN(a+bμ,b2σ2)Y \sim N\left(a+b \mu, b^{2} \sigma^{2}\right) (a formal derivation of this is omitted to save space but can be found in almost any Sheldon Ross book on Probability).

Recall that the expression Y=a+bXY=a+b X holds for any values of aa and b0b \neq 0 ! What we would like to do is find the values for aa and bb such that E[Y]=0E[Y]=0 and Var[Y]=1\operatorname{Var}[Y]=1; we shall refer to such a distribution N(0,1)N(0,1) as the standard Normal. Solving for aa and bb we find:

E[Y]=a+bμ=0a=bμ=μσVar[Y]=b2σ2=1b=1σ\begin{aligned} & E[Y]=a+b \mu=0 \quad \Rightarrow \quad a=-b \mu=-\frac{\mu}{\sigma} \\ & \operatorname{Var}[Y]=b^{2} \sigma^{2}=1 \quad \Rightarrow \quad b=\frac{1}{\sigma} \end{aligned}

Plugging in a=μ/σa=-\mu / \sigma and b=1/σb=1 / \sigma into our expression Y=a+bXY=a+b X we find that Y=XμσY=\frac{X-\mu}{\sigma}; we refer to this as the Z-transformation: Any Normal random variable XN(μ,σ)X \sim N(\mu, \sigma) can be equivalently expressed as ZN(0,1)Z \sim N(0,1) using the Z-transformation:

ZXμσZN(0,1):E[Z]=0,Var[Z]=1\begin{gathered} Z \equiv \frac{X-\mu}{\sigma} \\ Z \sim N(0,1): \quad E[Z]=0, \quad \operatorname{Var}[Z]=1 \end{gathered}

Which simplifies the Normal PDF fX(xμ,σ)f_{X}(x \mid \mu, \sigma) to the standard Normal PDF:

fZ(z)=12πez2/2f_{Z}(z)=\frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2}

Figure 17: Standard Normal PDFfZ(z)=12πez2/2\operatorname{PDF} f_{Z}(z)=\frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2} with Var[Z]=1\operatorname{Var}[Z]=1 (and also a standard deviation of 1 since standard deviation σ=Var[Z])\sigma=\sqrt{\operatorname{Var}[Z]}) and E[Z]=0E[Z]=0. To compute probabilities from this PDF, we will use look up tables to find the value of the CDFFZ(z)=zfZ(u)du\operatorname{CDF} F_{Z}(z)=\int_{-\infty}^{z} f_{Z}(u) d u for a given value zz, as illustrated for z=2z=-2.

In other words, to standardise a Normal random variable, we subtract its mean and then divide by the square root of its variance (which recall we defined as the standard deviation). Notice the parameters (μ,σ)(\mu, \sigma) have been removed in the standard Normal PDF\mathrm{PDF} (i.e. fZ(z)f_{Z}(z) is not conditioned on any model parameters)! We shall see that is useful property allows us to map any Normal distribution onto the standard Normal, which we only have to compute once and store in a lookup table.

Expectation and Variance:

To show that E[X]=μE[X]=\mu, we integrate using the Z-transformation z=(xμ)/σz=(x-\mu) / \sigma, so that dx=σdzd x=\sigma d z. The results is:

E[X]=xfX(x)dx=x12πσ2e(xμ)2/2σ2dx=(σz+μ)12πσ2ez2/2σdz=σ2πzez2/2dz+μ12πez2/2dz=μ\begin{aligned} E[X] & =\int_{-\infty}^{\infty} x f_{X}(x) d x=\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-(x-\mu)^{2} / 2 \sigma^{2}} d x=\int_{-\infty}^{\infty}(\sigma z+\mu) \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-z^{2} / 2} \sigma d z \\ & =\frac{\sigma}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} z e^{-z^{2} / 2} d z+\mu \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2} d z=\mu \end{aligned}

The final integral evaluates to 1 by definition because we are integrating the standar Normal PDF fZ(z)f_{Z}(z) over the entire range.

Next we can evaluate E[X2]E\left[X^{2}\right] to show that Var[X]=σ2\operatorname{Var}[X]=\sigma^{2}, which requires integration by parts.

udv=uvvdu\int u d v=u v-\int v d u

We leave it to the reader to prove this.

Computing Probabilities using the Normal CDF

The CDF of the Normal has no explicit closed form. However, as we have just noted above, we can compute probabilities involving Normally distributed random variables by standardising them (i.e. using the Z\mathrm{Z} transformation). That is:

FX(x)=P(Xx)=P(Xμσxμσ)=P(Zxμσ)=FZ(xμσ)F_{X}(x)=P(X \leq x)=P\left(\frac{X-\mu}{\sigma} \leq \frac{x-\mu}{\sigma}\right)=P\left(Z \leq \frac{x-\mu}{\sigma}\right)=F_{Z}\left(\frac{x-\mu}{\sigma}\right)

Where the standard Normal CDFFZ(z)=P(Zz)\operatorname{CDF} F_{Z}(z)=P(Z \leq z) is defined as:

FZ(z)=zfZ(u)du=z12πeu2/2duF_{Z}(z)=\int_{-\infty}^{z} f_{Z}(u) d u=\int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} e^{-u^{2} / 2} d u

Thus, using a precomputed table of values of the standard Normal CDF FZ(z)F_{Z}(z) (sometimes also denoted as Φ(z)\Phi(z) in other textbooks), we can obtain the CDF for the general Normal of interest. Indeed, every probability text book will present these as tables in the appendices (a table is currently posted on BBl\mathrm{BBl} ).

Lastly, this procedure can be generalised for computing the CDF over an interval [a,b][a, b] for any XX \sim N(μ,σ)N(\mu, \sigma) :

P(aXb)=P(aμσXμσbμσ)=P(aμσZbμσ)=P(Zbμσ)P(Zaμσ)=FZ(bμσ)FZ(aμσ)\begin{aligned} P(a \leq X \leq b) & =P\left(\frac{a-\mu}{\sigma} \leq \frac{X-\mu}{\sigma} \leq \frac{b-\mu}{\sigma}\right) \\ & =P\left(\frac{a-\mu}{\sigma} \leq Z \leq \frac{b-\mu}{\sigma}\right) \\ & =P\left(Z \leq \frac{b-\mu}{\sigma}\right)-P\left(Z \leq \frac{a-\mu}{\sigma}\right) \\ & =F_{Z}\left(\frac{b-\mu}{\sigma}\right)-F_{Z}\left(\frac{a-\mu}{\sigma}\right) \end{aligned}

Where again the CDFFZ(z)\mathrm{CDF} F_{Z}(z) can be obtained from the area under of curve from -\infty to zz of the PDF fZ(z)f_{Z}(z) as illustrated in Figure 17 for z=2z=-2.

Approximation of the Binomial Distribution

You might have already noticed by now that under certain conditions some of the previous distributions we have considered have a symmetric, bell-shape to them (e.g. see Figures 13 and 15 for binomial and Poisson, respectively). Indeed, the standard normal (N(0,1))(N(0,1)) was proposed in 1733 by DeMoivre to approximate the binomial distribution for p=0.5p=0.5 (and was later generalised for any pp by Laplace in 1812). Recall that for the binomial distribution:

E[X]=npVar[X]=np(1p)\begin{aligned} E[X] & =n p \\ \operatorname{Var}[X] & =n p(1-p) \end{aligned}

Thus, we can standardise XX as:

Z=Xnpnp(1p)Z=\frac{X-n p}{\sqrt{n p(1-p)}}

We can approximate the distribution of a binomial random variable XX, where pp is the probability of "success", for large nn as:

limnP[aXnpnp(1p)b]=12πabez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{X-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

For any numbers aa and bb. Note that this approximation is even a bit more "flexible" than the Poisson approximation of the binomial distribution in that it can provide a decent approximation for relatively small nn (we will consider an example shortly for p=0.5p=0.5 and n=20n=20 ). It should be noted that we can improve the accuracy of the approximation of the discrete distribution by making the following continuity correction:

limnP[aXnpnp(1p)b]=12πa0.5b+0.5ez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{X-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a-0.5}^{b+0.5} e^{-z^{2} / 2} d z

This will become clear with the consideration of an example.

Example: Normal Approximation to Binomial Distribution of Ebola Virus

Recall our example of modelling the number of survivors of the Ebola Virus as XBinomial(n=X \sim \operatorname{Binomial}(n= 15,p=0.5)15, p=0.5). If 15 people are infected, we had computed the probability of more than ten survivors to be P(X10)=0.150879P(X \geq 10)=0.150879. Let's consider a Normal approximation to this distribution.

Solution:

First we need to standardise the values for XX based on E[X]=np=7.5E[X]=n p=7.5 and Var[X]=np(1p)=3.75\operatorname{Var}[X]=n p(1-p)=3.75 :

X=(X=10)7.53.75=1.29099X^{\prime}=\frac{(X=10)-7.5}{\sqrt{3.75}}=1.29099

Now we integrate the standard normal curve as follows:

P[1.29099X7.53.75]=12π1.29099ez2/2dzP\left[1.29099 \leq \frac{X-7.5}{\sqrt{3.75}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{1.29099}^{\infty} e^{-z^{2} / 2} d z

Figure 18: Normal approximation to the binomial distribution for the number of survivors of the Ebola Virus. To evaluate the probability of 10 or more survivors using the Normal approximation, we can integrate the Normal curve for X10X \geq 10 (red line), or using the continuity correction for X9.5X \geq 9.5 (blue line), which turns out to be a much better approximation of the binomial probability.

Numerical evaluation or table lookup of this integral results in P(X10)=0.0983529P(X \geq 10)=0.0983529, which is a substantial underestimation of the binomial model. However, if we apply the continuity correction:

X=(X=9.5)7.53.75=1.0328P[1.0328X7.53.75]=12π1.0328ez2/2dz\begin{gathered} X^{\prime}=\frac{(X=9.5)-7.5}{\sqrt{3.75}}=1.0328 \\ P\left[1.0328 \leq \frac{X-7.5}{\sqrt{3.75}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{1.0328}^{\infty} e^{-z^{2} / 2} d z \end{gathered}

we get a much more accurate estimation: P(X9.5)=0.15085P(X \geq 9.5)=0.15085. The quality of the fit of the Normal approximation to the binomial distribution is shown in Figure 18.

Example: Normal Approximation of Binomial: ESP Research

This next example of the Normal approximation to the Binomial distribution follows from a more serious treatment by William Feller (Feller, W., "Statistical Aspects of ESP.", Journal of Parapsychology, 4: 271-298, 1940).

In 1938, to test whether extrasensory perception (ESP) exists, researchers Pratt and Woodruff at Duke University conducted a study using 32 students. For this purpose, the investigator and the student sit at opposite ends of a table. The investigator would shuffle a deck of cards containing 5 'ESP symbols' (see Figure 19), select a card and concentrate on it. The student would then try to guess the identity of the card (i.e. guess which of the 5 symbols the investigator was concentrating on). For a comical rendition of this experiment, see the opening scene of the film Ghostbusters (1984).

In the Pratt and Woodruff experiment, the 32 students made a total of 60,000 guesses, which were found to be correct 12,489 times (i.e. x=12,489x=12,489 ). Since there were only 5 ESP symbols, we could estimate p=15p=\frac{1}{5}, and assuming this to be a series of independent Bernoulli trials (i.e. a binomial distribution), our expected number of successful guesses would be E[X]=np=60,00015=12,000E[X]=n p=60,000 \cdot \frac{1}{5}=12,000. Can we conclude from this study that the additional 489 correct outcomes proves that ESP exists?

Solution:

We can compute the probability of observing 12,489 or more guesses to be correct using the binomial distribution as follows:

P(X12,489)=x=12,48960,000(60,000x)(15)x(45)60,000xP(X \geq 12,489)=\sum_{x=12,489}^{60,000}\left(\begin{array}{c} 60,000 \\ x \end{array}\right)\left(\frac{1}{5}\right)^{x}\left(\frac{4}{5}\right)^{60,000-x}

Clearly the binomial coefficient in this problem cannot be computed easily! Thus, we shall make use of the Normal approximation to the binomial distribution, which should be very accurate in this case given that n=60,000n=60,000. Using the continuity correction, we have:

X=(X=12,488.5)12,00012,00004/5=4.99P[4.99X12,00012,00004/5]=12π4.99ez2/2dz=0.0000003=3.0E7\begin{gathered} X^{\prime}=\frac{(X=12,488.5)-12,000}{\sqrt{12,0000 \cdot 4 / 5}}=4.99 \\ P\left[4.99 \leq \frac{X-12,000}{\sqrt{12,0000 \cdot 4 / 5}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{4.99}^{\infty} e^{-z^{2} / 2} d z=0.0000003=3.0 E-7 \end{gathered}

Interestingly, the probability of observing this many guesses (or more) to be correct is so low that it suggests this could not have been due to chance! However, it should be noted that this particular ESP experiment has been met with a great deal of skepticism in the scientific community.

Figure 19: ESP cards (also known as 'Zener Cards') used in the ESP experiments.

Comments: Even with the continuity correction, the Normal approximation of the binomial can be poor if nn is too small. General conditions for applying the approximation are if n>9p1pn>9 \frac{p}{1-p} and n>91ppn>9 \frac{1-p}{p}.

Central Limit Theorem

Interestingly, every binomial random variable XX can be expressed as the sum of nn independent Bernoulli random variables X1,X2,,XnX_{1}, X_{2}, \ldots, X_{n}, where Xi=1X_{i}=1 with a probability pp and Xi=0X_{i}=0 with a probability 1p1-p (think for a moment why this is so; we actually used this property to derive the expectation of a binomial random variable given the result for the Bernoulli distribution). That is:

X=i=1nXiX=\sum_{i=1}^{n} X_{i}

Therefore the Normal approximation to the binomial distribution can also be written as:

limnP[a(X1+X2++Xn)npnp(1p)b]=12πabez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(X_{1}+X_{2}+\ldots+X_{n}\right)-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

This raises the question if this limit applies to the sums of other types of random variables? Astonishingly, it does! Indeed significant efforts were made over the next two centuries to extend this result to other distributions. In 1920, this was generalised by G. Polya as the Central Limit Theorem:

Suppose that the random variables Y1,Y2,,YnY_{1}, Y_{2}, \ldots, Y_{n} are independent and from the same distribution, with mean E[Y]=μE[Y]=\mu and variance Var[Y]=σ2<\operatorname{Var}[Y]=\sigma^{2}<\infty (i.e. E[Y]\overline{E[Y]} and Var[Y]\operatorname{Var}[Y] are finite). We shall refer to this collection of variables as a random sample but defer further discussion until the Statistics portion of the course. The Central Limit Theorem (CLT) states for any numbers aa and bb that:

limnP[a(Y1+Y2++Yn)E[i=1nYi]Var[i=1nYi]b]=limnP[a(Y1+Y2++Yn)nμnσb]=12πabez2/2dz\begin{aligned} \lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(Y_{1}+Y_{2}+\ldots+Y_{n}\right)-E\left[\sum_{i=1}^{n} Y_{i}\right]}{\sqrt{\operatorname{Var}\left[\sum_{i=1}^{n} Y_{i}\right]}} \leq b\right] & = \\ \lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(Y_{1}+Y_{2}+\ldots+Y_{n}\right)-n \mu}{\sqrt{n} \sigma} \leq b\right] & =\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z \end{aligned}

As the sample size increases, the distribution of the sum converges to the Normal, regardless of the original distribution of the YiY_{i}.

In the above derivation, we have made use of the simple fact that:

E[i=1nYi]=i=1nμ=nμVar[i=1nYi]=i=1nσ2=nσ2\begin{aligned} E\left[\sum_{i=1}^{n} Y_{i}\right] & =\sum_{i=1}^{n} \mu=n \mu \\ \operatorname{Var}\left[\sum_{i=1}^{n} Y_{i}\right] & =\sum_{i=1}^{n} \sigma^{2}=n \sigma^{2} \end{aligned}

You will commonly encounter the CLT presented in terms of the average of Y1,Y2,,YnY_{1}, Y_{2}, \ldots, Y_{n} (denoted as Yˉ\bar{Y} ) rather than the sum. Again, we can simply compute the expectation and variance of the average as:

E[Yˉ]=E[1ni=1nYi]=1n(i=1nμ)=1nnμ=μVar[Yˉ]=Var[1ni=1nYi]=1n2(i=1nσ2)=1n2nσ2=σ2n\begin{aligned} E[\bar{Y}] & =E\left[\frac{1}{n} \sum_{i=1}^{n} Y_{i}\right]=\frac{1}{n}\left(\sum_{i=1}^{n} \mu\right)=\frac{1}{n} n \mu=\mu \\ \operatorname{Var}[\bar{Y}] & =\operatorname{Var}\left[\frac{1}{n} \sum_{i=1}^{n} Y_{i}\right]=\frac{1}{n^{2}}\left(\sum_{i=1}^{n} \sigma^{2}\right)=\frac{1}{n^{2}} n \sigma^{2}=\frac{\sigma^{2}}{n} \end{aligned}

Plugging in these expressions for E[Yˉ]E[\bar{Y}] and Var[Yˉ]\operatorname{Var}[\bar{Y}] results in the following form for the Central Limit Theorem for the average Yˉ\bar{Y} :

limnP[aYˉμσ/nb]=12πabez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{\bar{Y}-\mu}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z