Common Continuous Distributions

We will now analyse three important probability distributions for continuous random variables, particular those involved in engineering and statistics: the Exponential and Normal distributions.

Uniform distribution

Let's start off with a really simple example: uniform distribution. A uniform distribution just means there's a flat constant probability of a value occurring within a given range. we can create a uniform distribution by using the NumPy random.uniform function.

Unifiorm Distribution

There's pretty much an equal chance of any given value or range of values occurring within that data. So, unlike the normal distribution, where we saw a concentration of values near the mean, a uniform distribution has equal probability across any given value within the range that you define.

So what would the probability distribution function of this look like? Well, I'd expect to see basically nothing outside of the range of $\mathbf{- 1 0}$ or beyond $\mathbf{1 0}$ . But when I'm between $\mathbf{- 1 0}$ and 10, I would see a flat line because there's a constant probability of any one of those ranges of values occurring. So in a uniform distribution you would see a flat line on the probability distribution function because there is basically a constant probability. Every value, every range of values has an equal chance of appearing as any other value.

Normal (Gaussian) Distribution

To visualise in python:

from scipy.stats import norm
import matplotlib.pyplot as plt
 
x = np.arrange(-3, 3, 0.001)
plt.plot(x, norm.pdf(x))

Which results in:

Normal Python

We continue our discussion of continuous distributions by considering perhaps the most important probability distribution of them all: the Normal distribution. We will begin with a general definition for the Normal distribution, which we shall see has two parameters $\mu$ and $\sigma$ that exactly define $E[X]$ and $\operatorname{Var}[X]$ , respectively.

The random variable $X$ , follows a Normal distribution with parameters $\mu$ and $\sigma^{2}$ (where $\sigma>0$ ), denoted by

X \sim N(\mu, \sigma)

if its density function is given by

f_{X}(x \mid \mu, \sigma)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-(x-\mu)^{2} / 2 \sigma^{2}}

The parameters $\mu$ and $\sigma^{2}$ are the mean $(E[X])$ and variance $(\operatorname{Var}[X])$ of $X$ , respectively.

At first glance this expression appears rather complex, but we will now introduce an important transformation that will greatly simplify our calculations.

Standardising and the Z-transformation

We start by considering the following important property of the Normal distribution:

If $X \sim N\left(\mu, \sigma^{2}\right)$ , then for any constants $a$ and $b(b \neq 0)$ , the random variable $Y=a+b X$ is also normally distributed.

Using our properties from Chapter 2, the expressions for the expectation and variance of $Y$ are easily computed as:

\begin{aligned} E[Y] & =E[a+b X]=a+b E[X]=a+b \mu \\ \operatorname{Var}[Y] & =\operatorname{Var}[a+b X]=b^{2} \operatorname{Var}[X]=b^{2} \sigma^{2} \end{aligned}

From which we deduce that $Y \sim N\left(a+b \mu, b^{2} \sigma^{2}\right)$ (a formal derivation of this is omitted to save space but can be found in almost any Sheldon Ross book on Probability).

Recall that the expression $Y=a+b X$ holds for any values of $a$ and $b \neq 0$ ! What we would like to do is find the values for $a$ and $b$ such that $E[Y]=0$ and $\operatorname{Var}[Y]=1$ ; we shall refer to such a distribution $N(0,1)$ as the standard Normal. Solving for $a$ and $b$ we find:

\begin{aligned} & E[Y]=a+b \mu=0 \quad \Rightarrow \quad a=-b \mu=-\frac{\mu}{\sigma} \\ & \operatorname{Var}[Y]=b^{2} \sigma^{2}=1 \quad \Rightarrow \quad b=\frac{1}{\sigma} \end{aligned}

Plugging in $a=-\mu / \sigma$ and $b=1 / \sigma$ into our expression $Y=a+b X$ we find that $Y=\frac{X-\mu}{\sigma}$ ; we refer to this as the Z-transformation: Any Normal random variable $X \sim N(\mu, \sigma)$ can be equivalently expressed as $Z \sim N(0,1)$ using the Z-transformation:

\begin{gathered} Z \equiv \frac{X-\mu}{\sigma} \\ Z \sim N(0,1): \quad E[Z]=0, \quad \operatorname{Var}[Z]=1 \end{gathered}

Which simplifies the Normal PDF $f_{X}(x \mid \mu, \sigma)$ to the standard Normal PDF:

f_{Z}(z)=\frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2}

Figure 17: Standard Normal $\operatorname{PDF} f_{Z}(z)=\frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2}$ with $\operatorname{Var}[Z]=1$ (and also a standard deviation of 1 since standard deviation $\sigma=\sqrt{\operatorname{Var}[Z]})$ and $E[Z]=0$ . To compute probabilities from this PDF, we will use look up tables to find the value of the $\operatorname{CDF} F_{Z}(z)=\int_{-\infty}^{z} f_{Z}(u) d u$ for a given value $z$ , as illustrated for $z=-2$ .

In other words, to standardise a Normal random variable, we subtract its mean and then divide by the square root of its variance (which recall we defined as the standard deviation). Notice the parameters $(\mu, \sigma)$ have been removed in the standard Normal $\mathrm{PDF}$ (i.e. $f_{Z}(z)$ is not conditioned on any model parameters)! We shall see that is useful property allows us to map any Normal distribution onto the standard Normal, which we only have to compute once and store in a lookup table.

Expectation and Variance:

To show that $E[X]=\mu$ , we integrate using the Z-transformation $z=(x-\mu) / \sigma$ , so that $d x=\sigma d z$ . The results is:

\begin{aligned} E[X] & =\int_{-\infty}^{\infty} x f_{X}(x) d x=\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-(x-\mu)^{2} / 2 \sigma^{2}} d x=\int_{-\infty}^{\infty}(\sigma z+\mu) \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-z^{2} / 2} \sigma d z \\ & =\frac{\sigma}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} z e^{-z^{2} / 2} d z+\mu \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi}} e^{-z^{2} / 2} d z=\mu \end{aligned}

The final integral evaluates to 1 by definition because we are integrating the standar Normal PDF $f_{Z}(z)$ over the entire range.

Next we can evaluate $E\left[X^{2}\right]$ to show that $\operatorname{Var}[X]=\sigma^{2}$ , which requires integration by parts.

\int u d v=u v-\int v d u

We leave it to the reader to prove this.

Computing Probabilities using the Normal CDF

The CDF of the Normal has no explicit closed form. However, as we have just noted above, we can compute probabilities involving Normally distributed random variables by standardising them (i.e. using the $\mathrm{Z}$ transformation). That is:

F_{X}(x)=P(X \leq x)=P\left(\frac{X-\mu}{\sigma} \leq \frac{x-\mu}{\sigma}\right)=P\left(Z \leq \frac{x-\mu}{\sigma}\right)=F_{Z}\left(\frac{x-\mu}{\sigma}\right)

Where the standard Normal $\operatorname{CDF} F_{Z}(z)=P(Z \leq z)$ is defined as:

F_{Z}(z)=\int_{-\infty}^{z} f_{Z}(u) d u=\int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} e^{-u^{2} / 2} d u

Thus, using a precomputed table of values of the standard Normal CDF $F_{Z}(z)$ (sometimes also denoted as $\Phi(z)$ in other textbooks), we can obtain the CDF for the general Normal of interest. Indeed, every probability text book will present these as tables in the appendices (a table is currently posted on $\mathrm{BBl}$ ).

Lastly, this procedure can be generalised for computing the CDF over an interval $[a, b]$ for any $X \sim$ $N(\mu, \sigma)$ :

\begin{aligned} P(a \leq X \leq b) & =P\left(\frac{a-\mu}{\sigma} \leq \frac{X-\mu}{\sigma} \leq \frac{b-\mu}{\sigma}\right) \\ & =P\left(\frac{a-\mu}{\sigma} \leq Z \leq \frac{b-\mu}{\sigma}\right) \\ & =P\left(Z \leq \frac{b-\mu}{\sigma}\right)-P\left(Z \leq \frac{a-\mu}{\sigma}\right) \\ & =F_{Z}\left(\frac{b-\mu}{\sigma}\right)-F_{Z}\left(\frac{a-\mu}{\sigma}\right) \end{aligned}

Where again the $\mathrm{CDF} F_{Z}(z)$ can be obtained from the area under of curve from $-\infty$ to $z$ of the PDF $f_{Z}(z)$ as illustrated in Figure 17 for $z=-2$ .

Approximation of the Binomial Distribution

You might have already noticed by now that under certain conditions some of the previous distributions we have considered have a symmetric, bell-shape to them (e.g. see Figures 13 and 15 for binomial and Poisson, respectively). Indeed, the standard normal $(N(0,1))$ was proposed in 1733 by DeMoivre to approximate the binomial distribution for $p=0.5$ (and was later generalised for any $p$ by Laplace in 1812). Recall that for the binomial distribution:

\begin{aligned} E[X] & =n p \\ \operatorname{Var}[X] & =n p(1-p) \end{aligned}

Thus, we can standardise $X$ as:

Z=\frac{X-n p}{\sqrt{n p(1-p)}}

We can approximate the distribution of a binomial random variable $X$ , where $p$ is the probability of "success", for large $n$ as:

\lim _{n \rightarrow \infty} P\left[a \leq \frac{X-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

For any numbers $a$ and $b$ . Note that this approximation is even a bit more "flexible" than the Poisson approximation of the binomial distribution in that it can provide a decent approximation for relatively small $n$ (we will consider an example shortly for $p=0.5$ and $n=20$ ). It should be noted that we can improve the accuracy of the approximation of the discrete distribution by making the following continuity correction:

\lim _{n \rightarrow \infty} P\left[a \leq \frac{X-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a-0.5}^{b+0.5} e^{-z^{2} / 2} d z

This will become clear with the consideration of an example.

Example: Normal Approximation to Binomial Distribution of Ebola Virus

Recall our example of modelling the number of survivors of the Ebola Virus as $X \sim \operatorname{Binomial}(n=$ $15, p=0.5)$ . If 15 people are infected, we had computed the probability of more than ten survivors to be $P(X \geq 10)=0.150879$ . Let's consider a Normal approximation to this distribution.

Solution:

First we need to standardise the values for $X$ based on $E[X]=n p=7.5$ and $\operatorname{Var}[X]=n p(1-p)=3.75$ :

X^{\prime}=\frac{(X=10)-7.5}{\sqrt{3.75}}=1.29099

Now we integrate the standard normal curve as follows:

P\left[1.29099 \leq \frac{X-7.5}{\sqrt{3.75}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{1.29099}^{\infty} e^{-z^{2} / 2} d z

Figure 18: Normal approximation to the binomial distribution for the number of survivors of the Ebola Virus. To evaluate the probability of 10 or more survivors using the Normal approximation, we can integrate the Normal curve for $X \geq 10$ (red line), or using the continuity correction for $X \geq 9.5$ (blue line), which turns out to be a much better approximation of the binomial probability.

Numerical evaluation or table lookup of this integral results in $P(X \geq 10)=0.0983529$ , which is a substantial underestimation of the binomial model. However, if we apply the continuity correction:

\begin{gathered} X^{\prime}=\frac{(X=9.5)-7.5}{\sqrt{3.75}}=1.0328 \\ P\left[1.0328 \leq \frac{X-7.5}{\sqrt{3.75}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{1.0328}^{\infty} e^{-z^{2} / 2} d z \end{gathered}

we get a much more accurate estimation: $P(X \geq 9.5)=0.15085$ . The quality of the fit of the Normal approximation to the binomial distribution is shown in Figure 18.

Example: Normal Approximation of Binomial: ESP Research

This next example of the Normal approximation to the Binomial distribution follows from a more serious treatment by William Feller (Feller, W., "Statistical Aspects of ESP.", Journal of Parapsychology, 4: 271-298, 1940).

In 1938, to test whether extrasensory perception (ESP) exists, researchers Pratt and Woodruff at Duke University conducted a study using 32 students. For this purpose, the investigator and the student sit at opposite ends of a table. The investigator would shuffle a deck of cards containing 5 'ESP symbols' (see Figure 19), select a card and concentrate on it. The student would then try to guess the identity of the card (i.e. guess which of the 5 symbols the investigator was concentrating on). For a comical rendition of this experiment, see the opening scene of the film Ghostbusters (1984).

In the Pratt and Woodruff experiment, the 32 students made a total of 60,000 guesses, which were found to be correct 12,489 times (i.e. $x=12,489$ ). Since there were only 5 ESP symbols, we could estimate $p=\frac{1}{5}$ , and assuming this to be a series of independent Bernoulli trials (i.e. a binomial distribution), our expected number of successful guesses would be $E[X]=n p=60,000 \cdot \frac{1}{5}=12,000$ . Can we conclude from this study that the additional 489 correct outcomes proves that ESP exists?

Solution:

We can compute the probability of observing 12,489 or more guesses to be correct using the binomial distribution as follows:

P(X \geq 12,489)=\sum_{x=12,489}^{60,000}\left(\begin{array}{c} 60,000 \\ x \end{array}\right)\left(\frac{1}{5}\right)^{x}\left(\frac{4}{5}\right)^{60,000-x}

Clearly the binomial coefficient in this problem cannot be computed easily! Thus, we shall make use of the Normal approximation to the binomial distribution, which should be very accurate in this case given that $n=60,000$ . Using the continuity correction, we have:

\begin{gathered} X^{\prime}=\frac{(X=12,488.5)-12,000}{\sqrt{12,0000 \cdot 4 / 5}}=4.99 \\ P\left[4.99 \leq \frac{X-12,000}{\sqrt{12,0000 \cdot 4 / 5}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{4.99}^{\infty} e^{-z^{2} / 2} d z=0.0000003=3.0 E-7 \end{gathered}

Interestingly, the probability of observing this many guesses (or more) to be correct is so low that it suggests this could not have been due to chance! However, it should be noted that this particular ESP experiment has been met with a great deal of skepticism in the scientific community.

Figure 19: ESP cards (also known as 'Zener Cards') used in the ESP experiments.

Comments: Even with the continuity correction, the Normal approximation of the binomial can be poor if $n$ is too small. General conditions for applying the approximation are if $n>9 \frac{p}{1-p}$ and $n>9 \frac{1-p}{p}$ .

Central Limit Theorem

Interestingly, every binomial random variable $X$ can be expressed as the sum of $n$ independent Bernoulli random variables $X_{1}, X_{2}, \ldots, X_{n}$ , where $X_{i}=1$ with a probability $p$ and $X_{i}=0$ with a probability $1-p$ (think for a moment why this is so; we actually used this property to derive the expectation of a binomial random variable given the result for the Bernoulli distribution). That is:

X=\sum_{i=1}^{n} X_{i}

Therefore the Normal approximation to the binomial distribution can also be written as:

\lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(X_{1}+X_{2}+\ldots+X_{n}\right)-n p}{\sqrt{n p(1-p)}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

This raises the question if this limit applies to the sums of other types of random variables? Astonishingly, it does! Indeed significant efforts were made over the next two centuries to extend this result to other distributions. In 1920, this was generalised by G. Polya as the Central Limit Theorem:

Suppose that the random variables $Y_{1}, Y_{2}, \ldots, Y_{n}$ are independent and from the same distribution, with mean $E[Y]=\mu$ and variance $\operatorname{Var}[Y]=\sigma^{2}<\infty$ (i.e. $\overline{E[Y]}$ and $\operatorname{Var}[Y]$ are finite). We shall refer to this collection of variables as a random sample but defer further discussion until the Statistics portion of the course. The Central Limit Theorem (CLT) states for any numbers $a$ and $b$ that:

\begin{aligned} \lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(Y_{1}+Y_{2}+\ldots+Y_{n}\right)-E\left[\sum_{i=1}^{n} Y_{i}\right]}{\sqrt{\operatorname{Var}\left[\sum_{i=1}^{n} Y_{i}\right]}} \leq b\right] & = \\ \lim _{n \rightarrow \infty} P\left[a \leq \frac{\left(Y_{1}+Y_{2}+\ldots+Y_{n}\right)-n \mu}{\sqrt{n} \sigma} \leq b\right] & =\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z \end{aligned}

As the sample size increases, the distribution of the sum converges to the Normal, regardless of the original distribution of the $Y_{i}$ .

In the above derivation, we have made use of the simple fact that:

\begin{aligned} E\left[\sum_{i=1}^{n} Y_{i}\right] & =\sum_{i=1}^{n} \mu=n \mu \\ \operatorname{Var}\left[\sum_{i=1}^{n} Y_{i}\right] & =\sum_{i=1}^{n} \sigma^{2}=n \sigma^{2} \end{aligned}

You will commonly encounter the CLT presented in terms of the average of $Y_{1}, Y_{2}, \ldots, Y_{n}$ (denoted as $\bar{Y}$ ) rather than the sum. Again, we can simply compute the expectation and variance of the average as:

\begin{aligned} E[\bar{Y}] & =E\left[\frac{1}{n} \sum_{i=1}^{n} Y_{i}\right]=\frac{1}{n}\left(\sum_{i=1}^{n} \mu\right)=\frac{1}{n} n \mu=\mu \\ \operatorname{Var}[\bar{Y}] & =\operatorname{Var}\left[\frac{1}{n} \sum_{i=1}^{n} Y_{i}\right]=\frac{1}{n^{2}}\left(\sum_{i=1}^{n} \sigma^{2}\right)=\frac{1}{n^{2}} n \sigma^{2}=\frac{\sigma^{2}}{n} \end{aligned}

Plugging in these expressions for $E[\bar{Y}]$ and $\operatorname{Var}[\bar{Y}]$ results in the following form for the Central Limit Theorem for the average $\bar{Y}$ :

\lim _{n \rightarrow \infty} P\left[a \leq \frac{\bar{Y}-\mu}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

The above representation of the Central Limit Theorem is extremely important and will be used throughout the Statistics section of the course. It is equivalent to stating that as the sample size $n$ becomes large, the average of any random sample of random variables $\left(Y_{1}, Y_{2}, \ldots, Y_{n}\right)$ becomes Normally distributed:

\bar{Y} \rightarrow N\left(\mu, \sigma^{2} / n\right)

Comments: Although the Central Limit Theorem has been specified in the limit of large $n$ , it is surprisingly accurate for even small values of $n$ . Generally speaking, if the sample has come from a symmetric PDF then the approximation will converge quickly. Having said this, recall the shape of the exponential PDF, $f_{W}(w \mid \hat{\lambda})=\hat{\lambda} e^{-\hat{\lambda} w}$ , as presented in Figure 16, which is clearly not symmetric. None-the-less, the CLT still applies to this distribution, albeit for much greater values of $n$ (we will see some examples of this in lecture).

Exponential (Power) Distribution

An important distribution that has applications in modelling aspects of process safety (discussed in the 'Reliability' section of this course) is the Exponential distribution, which has a close relationship with the discrete Poisson distribution just discussed. In general, we can use something like scipy.stats.expon to create an exponential distribution.

from scipy.stats import expon
import matplotlib.pyplot as plt
 
x = np.arange(0, 10, 0.001)
plt.plot(x, expon.pdf(x))

Which will result in:

Exponential distribution

Let us consider again the Poisson Process for modelling bus arrivals (see Figure 14). While the Poisson distribution is used to compute the probability of specific numbers of arrivals during a particular period $T$ , we might also be interested in computing the "waiting time" between each bus, as shown by $t_{1}$ and $t_{k}$ in Figure 14. It is intuitive that the observations of waiting times will take on a range of continuous values, and thus we will need to model them using a continuous random variable and probability density function (PDF). However, we shall see that the resulting PDF utilises the same parameter, $\hat{\lambda}$ , as the Poisson Process!

Suppose that we interested in events that occur independently over time, and at a known average rate. Let $\hat{\lambda}$ be the rate per unit time, then the waiting time between occurrences, $W$ , follows an exponential distribution. We denote this by

W \sim \operatorname{Exponential}(\hat{\lambda})

In other words: If the number of arrivals in a time interval is Poisson distributed, then the time between arrivals is exponentially distributed.

Derivation of Exponential PDF from Poisson PMF:

To derive the exponential PDF in relation to the Poisson PMF, let's say we are interested in computing the probability that our waiting time is greater than some time $T$ . We will refer to the continuous random variable of waiting time as $W$ , so that we do not confuse it with the discrete random variable $X$ in the Poisson PMF. Thus, we seek the probability: $P(W>T)$ . We can define the waiting time as $W>T$ if and only if $x=0$ in our Poisson PMF $f_{X}(x \mid \hat{\lambda}, T)$ for a given parameter $T$ (i.e. no buses have arrived in the interval $T$ ), which leads to:

P(W>T)=f_{X}(x=0 \mid \hat{\lambda}, T)=\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{0}}{0 !}=e^{-\hat{\lambda} T}

We can also generically state for any random variable $W$ that:

\begin{gathered} P(W \leq T)+P(W>T)=1 \\ F_{W}(T)=P(W \leq T) \end{gathered}

for some $\operatorname{CDF} F_{W}(w=T)$ . If we combine the previous three equations, we can explicitly solve for $F_{W}(w=T)$ as:

F_{W}(T)=1-P(W>T)=1-e^{-\hat{\lambda} T}

What this tells us is that $F_{W}(w)=1-e^{-\hat{\lambda} w}$ . Recalling the relationship between a PDF and CDF $\left(f_{W}(w)=\frac{d}{d w} F_{W}(w)\right)$ , we can now define the PDF of the exponential distribution, where $W$ is our continuous random variable for waiting times:

f_{W}(w \mid \hat{\lambda})= \begin{cases}\hat{\lambda} e^{-\hat{\lambda} w} & \text { if } w>0 \\ 0 & \text { otherwise }\end{cases}

Analogously, we could have worked from a definition of $f_{W}(w)$ to derive the CDF. For a value $u>0$ ,

P(W \leq u)=\int_{-\infty}^{u} f_{W}(w \mid \hat{\lambda}) d w=\int_{0}^{u} \hat{\lambda} e^{-\hat{\lambda} w} d w=\left[-e^{-\hat{\lambda} w}\right]_{0}^{u}=1-e^{-\hat{\lambda} u}

so the complete form of the CDF is

F_{W}(w \mid \hat{\lambda})= \begin{cases}1-e^{-\hat{\lambda} w} & \text { if } w>0 \\ 0 & \text { otherwise }\end{cases}

One of the greatest uses of the exponential CDF, as we have just highlighted, is in computing the probability that the waiting time will exceed some value $T$ :

P(W>T)=1-F_{W}(w=T \mid \hat{\lambda})=e^{-\hat{\lambda} T}

Let's highlight some interesting properties of the exponential PDF $f_{W}(w \mid \hat{\lambda})$ using the example of London buses.

Example: Exponential Distribution of Waiting Times for London Buses

Recall the '49 Bus' which is scheduled to arrive every ten minutes $\left(\hat{\lambda}=\frac{1}{10}\right)$ , and let's see how we can compute some queries relating to expected waiting times $W$ .

(a) What is the probability that you will have to wait for more than five minutes for a bus to arrive?

Solution: We see the query $P(W>T)$ , which we found in the derivation above to be the Poisson distribution evaluated at $f_{X}(x=0 \mid \hat{\lambda}, T)$ (e.g. no arrivals at time point $T$ ) and also $1-F_{W}(w=T \mid \hat{\lambda})$ :

\begin{aligned} P(W>T)= & 1-F_{W}(w=T \mid \hat{\lambda}) \\ & =e^{-\hat{\lambda} T}=e^{-0.1 \times 5}=0.606531 \end{aligned}

(b) What is the probability you will have to wait for additional ten minutes, given that you have already waited for five minutes and no bus has arrived?

Solution:

Using the definition of our waiting time, we seek to compute the conditional query $P(W>15 \mid W>5)$ . As this is a conditional probability, it can be expressed as:

P\left(W>T_{2} \mid W>T_{1}\right)=\frac{P\left(W>T_{2}, W>T_{1}\right)}{P\left(W>T_{1}\right)}=\frac{P\left(W>T_{2}\right)}{P\left(W>T_{1}\right)}\left(\text { for any } T_{2} \geq T_{1}\right)

Inserting the probabilities in the above expression:

\begin{aligned} P\left(W>T_{2} \mid W>T_{1}\right) & =\frac{P\left(W>T_{2}\right)}{P\left(W>T_{1}\right)} \\ & =\frac{e^{-\hat{\lambda} T_{2}}}{e^{-\hat{\lambda} T_{1}}}=e^{-\hat{\lambda}\left(T_{2}-T_{1}\right)} \end{aligned}

Thus $P\left(W>T_{2} \mid W>T_{1}\right)=e^{-\hat{\lambda}\left(T_{2}-T_{1}\right)}$ , which is equivalent to $P\left(W>\left(T_{2}-T_{1}\right)\right)$ :

P\left(W>T_{2} \mid W>T_{1}\right)=P\left(W>\left(T_{2}-T_{1}\right)\right)

In our example, this means that the probability of waiting ten more minutes for the bus after already waiting for five minutes is equivalent to waiting ten minutes in the first place! Figure 16 illustrates how $P(W>10)$ is computed from the $\operatorname{PDF} f_{W}(w \mid \hat{\lambda})$ :

P(W>15 \mid W>5)=P(W>10)=0.367879

Such processes that are independent from the past are called memoryless processes. Indeed, the exponential distribution is the only continuous distribution which has this memoryless property.

Let's take on last look at the PDF of waiting times between occurrences, $f_{W}(w \mid \hat{\lambda})$ , for the Poisson model of bus arrivals, as shown in Figure 16. It is interesting to note that the exponential distribution implies that the most common separation between Poisson events are the shortest ones. For example, in the bus arrival model, we do not expect the buses to arrive in equally spaced time intervals (as is advertised at the bus stops), but rather in clusters during certain time intervals. This phenomena bears a strong resemblance to the phrase "things come in threes", where people often refer to good or bad things seemingly happening in clusters of three events.

Expectation and Variance:

Finding the expectation of $W$ requires integration by parts

\begin{aligned} E[W] & =\int_{-\infty}^{\infty} w f_{W}(w \mid \hat{\lambda}) d w=\int_{0}^{\infty} w \hat{\lambda} e^{-\hat{\lambda} w} d w=-\int_{0}^{\infty} w\left(e^{-\hat{\lambda} w}\right)^{\prime} d w \\ & =-\left[w e^{-\hat{\lambda} w}\right]_{0}^{\infty}+\int_{0}^{\infty}(w)^{\prime} e^{-\hat{\lambda} w} d w=-0+\int_{0}^{\infty} e^{-\hat{\lambda} w} d w \\ & =\left[-\frac{1}{\hat{\lambda}} e^{-\hat{\lambda} w}\right]_{0}^{\infty}=-\frac{1}{\hat{\lambda}}\left(0-e^{0}\right)=\frac{1}{\hat{\lambda}} \end{aligned}

To find $E\left[W^{2}\right]$ , and hence the variance of $W$ , we need to integrate by parts twice. The end result is

\operatorname{Var}[W]=\frac{1}{\hat{\lambda}^{2}}

Figure 16: Exponential PDF $f_{W}(w \mid \hat{\lambda})$ for the times between bus arrivals, $W$ , given $\hat{\lambda}=0.1$ arrivals/minute. Also shown is the query $P(W>10)=1-F_{W}(w=10 \mid \hat{\lambda})=0.367879$ .

Common Probability Distributions Common Discrete Distributions