Common Continuous Distributions
We will now analyse three important probability distributions for continuous random variables, particular those involved in engineering and statistics: the Exponential and Normal distributions.
Uniform distribution
Let's start off with a really simple example: uniform distribution. A uniform distribution just means there's a flat constant probability of a value occurring within a given range. we can create a uniform distribution by using the NumPy random.uniform
function.
There's pretty much an equal chance of any given value or range of values occurring within that data. So, unlike the normal distribution, where we saw a concentration of values near the mean, a uniform distribution has equal probability across any given value within the range that you define.
So what would the probability distribution function of this look like? Well, I'd expect to see basically nothing outside of the range of or beyond . But when I'm between and 10, I would see a flat line because there's a constant probability of any one of those ranges of values occurring. So in a uniform distribution you would see a flat line on the probability distribution function because there is basically a constant probability. Every value, every range of values has an equal chance of appearing as any other value.
Normal (Gaussian) Distribution
To visualise in python:
from scipy.stats import norm
import matplotlib.pyplot as plt
x = np.arrange(-3, 3, 0.001)
plt.plot(x, norm.pdf(x))
Which results in:
We continue our discussion of continuous distributions by considering perhaps the most important probability distribution of them all: the Normal distribution. We will begin with a general definition for the Normal distribution, which we shall see has two parameters and that exactly define and , respectively.
The random variable , follows a Normal distribution with parameters and (where ), denoted by
if its density function is given by
The parameters and are the mean and variance of , respectively.
At first glance this expression appears rather complex, but we will now introduce an important transformation that will greatly simplify our calculations.
Standardising and the Z-transformation
We start by considering the following important property of the Normal distribution:
If , then for any constants and , the random variable is also normally distributed.
Using our properties from Chapter 2, the expressions for the expectation and variance of are easily computed as:
From which we deduce that (a formal derivation of this is omitted to save space but can be found in almost any Sheldon Ross book on Probability).
Recall that the expression holds for any values of and ! What we would like to do is find the values for and such that and ; we shall refer to such a distribution as the standard Normal. Solving for and we find:
Plugging in and into our expression we find that ; we refer to this as the Z-transformation: Any Normal random variable can be equivalently expressed as using the Z-transformation:
Which simplifies the Normal PDF to the standard Normal PDF:
Figure 17: Standard Normal with (and also a standard deviation of 1 since standard deviation and . To compute probabilities from this PDF, we will use look up tables to find the value of the for a given value , as illustrated for .
In other words, to standardise a Normal random variable, we subtract its mean and then divide by the square root of its variance (which recall we defined as the standard deviation). Notice the parameters have been removed in the standard Normal (i.e. is not conditioned on any model parameters)! We shall see that is useful property allows us to map any Normal distribution onto the standard Normal, which we only have to compute once and store in a lookup table.
Expectation and Variance:
To show that , we integrate using the Z-transformation , so that . The results is:
The final integral evaluates to 1 by definition because we are integrating the standar Normal PDF over the entire range.
Next we can evaluate to show that , which requires integration by parts.
We leave it to the reader to prove this.
Computing Probabilities using the Normal CDF
The CDF of the Normal has no explicit closed form. However, as we have just noted above, we can compute probabilities involving Normally distributed random variables by standardising them (i.e. using the transformation). That is:
Where the standard Normal is defined as:
Thus, using a precomputed table of values of the standard Normal CDF (sometimes also denoted as in other textbooks), we can obtain the CDF for the general Normal of interest. Indeed, every probability text book will present these as tables in the appendices (a table is currently posted on ).
Lastly, this procedure can be generalised for computing the CDF over an interval for any :
Where again the can be obtained from the area under of curve from to of the PDF as illustrated in Figure 17 for .
Approximation of the Binomial Distribution
You might have already noticed by now that under certain conditions some of the previous distributions we have considered have a symmetric, bell-shape to them (e.g. see Figures 13 and 15 for binomial and Poisson, respectively). Indeed, the standard normal was proposed in 1733 by DeMoivre to approximate the binomial distribution for (and was later generalised for any by Laplace in 1812). Recall that for the binomial distribution:
Thus, we can standardise as:
We can approximate the distribution of a binomial random variable , where is the probability of "success", for large as:
For any numbers and . Note that this approximation is even a bit more "flexible" than the Poisson approximation of the binomial distribution in that it can provide a decent approximation for relatively small (we will consider an example shortly for and ). It should be noted that we can improve the accuracy of the approximation of the discrete distribution by making the following continuity correction:
This will become clear with the consideration of an example.
Example: Normal Approximation to Binomial Distribution of Ebola Virus
Recall our example of modelling the number of survivors of the Ebola Virus as . If 15 people are infected, we had computed the probability of more than ten survivors to be . Let's consider a Normal approximation to this distribution.
Solution:
First we need to standardise the values for based on and :
Now we integrate the standard normal curve as follows:
Figure 18: Normal approximation to the binomial distribution for the number of survivors of the Ebola Virus. To evaluate the probability of 10 or more survivors using the Normal approximation, we can integrate the Normal curve for (red line), or using the continuity correction for (blue line), which turns out to be a much better approximation of the binomial probability.
Numerical evaluation or table lookup of this integral results in , which is a substantial underestimation of the binomial model. However, if we apply the continuity correction:
we get a much more accurate estimation: . The quality of the fit of the Normal approximation to the binomial distribution is shown in Figure 18.
Example: Normal Approximation of Binomial: ESP Research
This next example of the Normal approximation to the Binomial distribution follows from a more serious treatment by William Feller (Feller, W., "Statistical Aspects of ESP.", Journal of Parapsychology, 4: 271-298, 1940).
In 1938, to test whether extrasensory perception (ESP) exists, researchers Pratt and Woodruff at Duke University conducted a study using 32 students. For this purpose, the investigator and the student sit at opposite ends of a table. The investigator would shuffle a deck of cards containing 5 'ESP symbols' (see Figure 19), select a card and concentrate on it. The student would then try to guess the identity of the card (i.e. guess which of the 5 symbols the investigator was concentrating on). For a comical rendition of this experiment, see the opening scene of the film Ghostbusters (1984).
In the Pratt and Woodruff experiment, the 32 students made a total of 60,000 guesses, which were found to be correct 12,489 times (i.e. ). Since there were only 5 ESP symbols, we could estimate , and assuming this to be a series of independent Bernoulli trials (i.e. a binomial distribution), our expected number of successful guesses would be . Can we conclude from this study that the additional 489 correct outcomes proves that ESP exists?
Solution:
We can compute the probability of observing 12,489 or more guesses to be correct using the binomial distribution as follows:
Clearly the binomial coefficient in this problem cannot be computed easily! Thus, we shall make use of the Normal approximation to the binomial distribution, which should be very accurate in this case given that . Using the continuity correction, we have:
Interestingly, the probability of observing this many guesses (or more) to be correct is so low that it suggests this could not have been due to chance! However, it should be noted that this particular ESP experiment has been met with a great deal of skepticism in the scientific community.
Figure 19: ESP cards (also known as 'Zener Cards') used in the ESP experiments.
Comments: Even with the continuity correction, the Normal approximation of the binomial can be poor if is too small. General conditions for applying the approximation are if and .
Central Limit Theorem
Interestingly, every binomial random variable can be expressed as the sum of independent Bernoulli random variables , where with a probability and with a probability (think for a moment why this is so; we actually used this property to derive the expectation of a binomial random variable given the result for the Bernoulli distribution). That is:
Therefore the Normal approximation to the binomial distribution can also be written as:
This raises the question if this limit applies to the sums of other types of random variables? Astonishingly, it does! Indeed significant efforts were made over the next two centuries to extend this result to other distributions. In 1920, this was generalised by G. Polya as the Central Limit Theorem:
Suppose that the random variables are independent and from the same distribution, with mean and variance (i.e. and are finite). We shall refer to this collection of variables as a random sample but defer further discussion until the Statistics portion of the course. The Central Limit Theorem (CLT) states for any numbers and that:
As the sample size increases, the distribution of the sum converges to the Normal, regardless of the original distribution of the .
In the above derivation, we have made use of the simple fact that:
You will commonly encounter the CLT presented in terms of the average of (denoted as ) rather than the sum. Again, we can simply compute the expectation and variance of the average as:
Plugging in these expressions for and results in the following form for the Central Limit Theorem for the average :