Common Discrete Distributions

In this section we will review some discrete probability mass functions frequently encountered in science and engineering.

Discrete Uniform

A random variable $X$ follows a discrete uniform distribution if it takes values $1,2, \ldots, n$ with equal probability. We denote this by:

X \sim D U[1, n]

The symbol $\sim$ means 'is distributed as'.

Since the sum of the probabilities must equal 1, the PMF of $X$ is

f_{X}(x \mid n)=\left\{\begin{array}{cc} \frac{1}{n} & \text { if } \quad x=1,2, \ldots, n \\ 0 & \text { otherwise } \end{array}\right.

NOTE: Notice the convention where we say that the model parameter $n$ is given: $f_{X}(x \mid n)$ . This will be adopted throughout the chapter to distinguish random variables from model parameters.

Using the formula for the sum of the first $n$ natural numbers, the expectation of $X$ is:

E[X]=\sum_{\forall x} x f_{X}(x \mid n)=\sum_{x=1}^{n} x \frac{1}{n}=\frac{1}{n} \sum_{x=1}^{n} x=\frac{1}{n} \frac{n(n+1)}{2}=\frac{n+1}{2}

which is halfway between the minimum and maximum values $X$ can take. Using the formula for the sum of the squares of the first $n$ natural numbers, we have

E\left[X^{2}\right]=\sum_{\forall x} x^{2} f_{X}(x \mid n)=\sum_{x=1}^{n} x^{2} \frac{1}{n}=\frac{1}{n} \sum_{x=1}^{n} x^{2}=\frac{1}{n} \frac{n(n+1)(2 n+1)}{6}=\frac{(n+1)(2 n+1)}{6}

so the variance of $X$ is:

\operatorname{Var}[X]=E\left[X^{2}\right]-E[X]^{2}=\frac{(n+1)(2 n+1)}{6}-\left(\frac{n+1}{2}\right)^{2}=\frac{n^{2}-1}{12}

Bernoulli

A random variable $X$ follows a Bernoulli distribution if it takes the value 1 with probability $p$ , and the value 0 with probability $1-p$ . We denote this by

X \sim \operatorname{Bernoulli}(p)

We will generally assign the random variable values as $X=1$ to denote "success" and $X=0$ to denote "failure".

The PMF of $X$ is

f_{X}(x \mid p)=\left\{\begin{array}{rll} p & \text { if } & x=1 \\ 1-p & \text { if } & x=0 \\ 0 & & \text { otherwise } \end{array}\right.

The Bernoulli distribution arises in any experiment where the sample space consists of only two outcomes; examples are flipping a biased coin once $(\Omega=$ {heads, tails $\})$ . We call such an experiment a Bernoulli trial, and arbitrarily assign a random variable value to one of the outcomes as a success (e.g. $x=1$ ) and the other as a failure (e.g. $x=0$ ).

The PMF can also be conveniently summarised as:

f_{X}(x \mid p)=p^{x}(1-p)^{1-x}

The expectation and variance of $X$ are computed in a straightforward manner as:

\begin{aligned} E[X] & =\sum_{\forall x} x f_{X}(x \mid p)=0 \cdot(1-p)+1 \cdot p=p \\ E\left[X^{2}\right] & =\sum_{\forall x} x^{2} f_{X}(x \mid p)=0^{2} \cdot(1-p)+1^{2} \cdot p=p \\ \operatorname{Var}[X] & =E\left[X^{2}\right]-E[X]^{2}=p-p^{2}=p(1-p) \end{aligned}

Binomial

Binomial probabilities apply to situations involving a series of independent and identical trials, where each trial can have only one of two possible outcomes (i.e. Bernoulli trials). If we were flipping a coin, and the probability of it landing heads was $p$ (and therefore probability of tails $1-p$ ), then the probability of obtaining the sequence $(H, T, H)$ would be $p \times(1-p) \times p$ . In other words, since the coin tosses are independent events, we would multiply the probability of obtaining a head with the probability of the coin flipping tails, and with the probability of the coin flipping heads again.

We can generate a plot of the probability mass function (PMF) for a binomial distribution using the following code:

from scipy.stats import binom
import matplotlib.pyplot as plt
 
n = 10
p = 0.5
x = np.arange(0, 10, 0.5)
plt.plot(x, binom.pmf(x, n, p))

Which results in:

Binomial

Or, visualised with different values of $p$ and $n$ :

Probability mass function for the binomial distribution

Suppose we were interested in the probability of obtaining two (or any other number of) heads from these throws, in any order. For a particular number of coin throws $n$ , the number of sequences containing $k$ heads can be computed using the binomial coefficient, $\left(\begin{array}{l}n \\ k\end{array}\right)$ ( read " $n$ choose $k$ ", button $n C r$ on your calculator):

\left(\begin{array}{l} n \\ k \end{array}\right)=\frac{n !}{k !(n-k) !}, \text { where } x !=\prod_{j=1}^{x} j, \text { and } 0 !=1

In particular, for three coin flips $(n=3)$ , there are three sequences resulting in two heads $(k=2)$ :

Since each sequence occurs with a probability of $p^{2}(1-p)$ , and there are three sequences, the total probability of obtaining two heads is therefore $3 \times p^{2}(1-p)$ , where we have simply counted the total number sequences containing 2 heads, but we could have equivalently used the binomial coefficient:

\left(\begin{array}{l} 3 \\ 2 \end{array}\right)=\frac{3 !}{2 ! 1 !}=\frac{6}{2}=3

This leads us to the definition of the binomial distribution.

Suppose that we have a sequence of $n$ independent Bernoulli trials, each with probability $p$ of "success". The total number of successes, $X$ , follows a binomial distribution, which we denote by

X \sim \operatorname{Binomial}(n, p)

A binomial random variable with parameters $(n, p)$ is equivalent to the sum of $n$ independent Bernoulli random variables with parameter $p$ .

The PMF of a binomial random variable $X$ is

f_{X}(x \mid n, p)= \begin{cases}\left(\begin{array}{l} n \\ x \end{array}\right) p^{x}(1-p)^{n-x} & \text { if } x=0,1, \ldots, n \\ 0 & \text { otherwise }\end{cases}

Where $\left(\begin{array}{l}n \\ x\end{array}\right)$ is the binomial coefficient defined above.

The expectation and variance of a binomial random variable $X$ are:

\begin{aligned} E[X] & =n p \\ \operatorname{Var}[X] & =n p(1-p) \end{aligned}

We will prove these later, when we consider the properties of sums of random variables.

An important point to consider is how values for $p$ are estimated. Traditionally, historical data is used for this purpose, but this assumes that the process under investigation is the same as that when the data was collected (i.e. that $p$ is indeed a constant).

Example: Binomial Distribution - Ebola Virus

You may recall the 2013-14 Ebola virus epidemic that resulted in 8,914 cases and 4,447 deaths, according to the World Health Organisation (data collected on 15 October 2014); this corresponds to a $50 \%$ mortality rate (although the WHO estimates that the death rate is likely to be closer to $70 \%$ ). From the given data, if 15 people are known to have contracted this virus, what is the probability that:

(a) Exactly 5 people survive?

Solution:

We can model the fate of each individual that has contracted the virus as a Bernoulli random variable $X$ with parameter $p=0.5$ . The number of surviving individuals can then be modelled using binomial distribution.

P(X=5)=f_{X}(x=5)=\frac{15 !}{5 ! 10 !}(0.5)^{5}(0.5)^{10}=3003 \cdot(0.03125) \cdot(0.000976562)=0.0916443

(b) At least 10 people survive?

Solution:

\begin{aligned} P(X \geq 10) & =\sum_{x=10}^{15} f_{X}(x)= \\ & +3003 \cdot(0.0009766) \cdot(0.03125) \\ & +1365 \cdot(0.0004883) \cdot(0.0625) \\ & +455 \cdot(0.0002441) \cdot(0.125) \\ & +105 \cdot(0.0001221) \cdot(0.25) \\ & +15 \cdot(6.10352 e-05) \cdot(0.5) \\ & +1 \cdot(3.05176 e-05) \cdot(1) \\ & =0.150879 \end{aligned}

We can also visually inspect the corresponding PMF by plotting it over all $x$ , which is shown for mortality rates of $50 \%$ (blue) and $70 \%$ (red) in Figure 13. From the figure we immediately observe a dramatic decrease in the probability of survival if the mortality rate is estimated to be $70 \%$ .

Poisson

Poisson probability mass function, and this has a very specific application. It looks a lot like a normal distribution, but it's a little bit different.

The idea here is, if you have some information about the average number of things that happen in a given time period, this probability mass function can give you a way to predict the odds of getting another value instead, on a given future day.

As an example, let's say I have a website, and on average I get 500 visitors per day. I can use the Poisson probability mass function to estimate the probability of seeing some other value on a specific day. For example, with my average of 500 visitors per day, what's the odds of seeing 550 visitors on a given day? That's what a Poisson probability mass function can give you take a look at the following code:

from scipy.stats import poisson
import matplotlib.pyplot as plt
 
mu = 500
x = np.arange(400, 600, 0.5)
plt.plot(x, poisson.pmf(x, mu))

Which results in:

Poisson

I can use that graph to look up the odds of getting any specific value that's not $500$ , assuming a normal distribution. The odds of seeing 550 visitors on a given day, it turns out, comes out to about $\mathbf{0 . 0 0 2}$ or $0.2 \%$ probability. Very interesting.

An obvious inconvenience of the binomial distribution is that for large or even moderate $n$ the calculations of the binomial coefficient can become laborious. In 1837, the Poisson distribution was published as a limiting approximation of the binomial distribution in cases where $n \rightarrow \infty, p \rightarrow 0$ and $n p$ remains constant.

Figure 13: PMF of the number of survivors of the Ebola virus taking the probability of mortality to be $50 \%$ (blue bars; $p=0.5$ ) and $70 \%$ (red bars; $p=0.3$ ) for $n=15$ .

f_{X}(x \mid n, p)=e^{-n p}(n p)^{x} / x !

Indeed the approximation is quite good for $n=100$ and $p=\frac{1}{100}$ .

Example: Poisson Approximation of London Bombings During WWII

In his seminal book Introduction to Probability Theory and its Applications, William Feller discusses the statistics of bombings in the south of London during 'The Blitz' of World War II. If you lived in a district consisting of 10 by 10 blocks (i.e. 100 squares), how likely is it that your square would not be hit given that 400 bombs were dropped?

Solution:

Can this modelled as a series of random Bernoulli trials?

Yes!

We need to identify values for the model parameters from the given data.

We can estimate that the probability of a particular bomb hitting your square to be $p=\frac{1}{100}$ . The dropping of the 400 bombs can also be approximated as a series of $n$ random Bernoulli trials, where the random variable $X$ corresponds to the number of bombs dropped on your square. Thus, we could model this using a binomial distribution, but the calculation of the binomial coefficient becomes pro- hibitive. Fortunately, we can approximate this as a Poisson distribution using $n p=400 \cdot \frac{1}{100}=4$

Given the model parameters we can formulate our query as:

\begin{gathered} P(X=x)=f_{X}(x \mid n, p)=\frac{e^{-n p}(n p)^{x}}{x !} \\ =\frac{e^{-4}(4)^{x}}{x !} \end{gathered}

If we compute this for $x=0$ , we get $P(X=0)=0.0183156$ .

By the end of the $19^{\text {th }}$ century, this distribution found further applications even when no binomial random variable is present and there are no available values for $n$ or $p$ . These correspond to experiments where the random variable $X$ is the number of outcomes occurring during a given time interval or in a specified region, based on a constant rate of occurrence $\lambda$ , as described below.

Suppose that we interested in events that occur independently over time, and at a known average rate. The number of events, $X$ , that occur in a fixed time interval follows a Poisson distribution, which we denote by

X \sim \operatorname{Poisson}(\lambda)

where $\lambda$ is the average number of outcomes over the interval (e.g. expressed in time, distance, area or volume).

The PMF of $X$ is

f_{X}(x \mid \lambda)= \begin{cases}e^{-\lambda} \lambda^{x} / x ! & \text { if } x=0,1,2, \ldots \\ 0 & \text { otherwise }\end{cases}

Interestingly, the expectation and variance of $X$ are

\begin{aligned} E[X] & =\lambda \\ \operatorname{Var}[X] & =\lambda \end{aligned}

The Poisson Process:

It turns out that the expression of the Poisson distribution, $e^{-\lambda} \lambda^{x} / x$ !, captures an extremely wide variety of phenomena that possess similar underlying features. Indeed, we can explicitly state said features, and denote any process that satisfies these conditions to be a 'Poisson Process'.

To illustrate this, let us consider events occurring over a time interval of length $T$ , where the events correspond to the arrival of a bus, as shown in Figure 14. We can dissect the total time interval into $n$ non-overlapping subintervals, each of length $\frac{T}{n}$ , where $n$ is large. A Poisson Process corresponds to any experiment that obeys the following assumptions:

The probability that two or more events (e.g. bus arrivals) occur in any given subinterval is negligible.
The number of events occurring in one subinterval is independent of the number that occur in any other subinterval (we shall later refer to this as having 'no memory').
The probability that an event occurs during a given subinterval is constant over the entire interval (e.g. from 0 to $T$ ).

Figure 14: Illustration of a Poisson Process by considering the arrival of buses over time interval $T$ , which is dissected into $n$ non-overlapping subintervals, each of length $\frac{T}{n}$ .

Here the $n$ subintervals are analogous to the $n$ independent Bernoulli trials that form the basis of the binomial distribution. In each subinterval, there will either be 0 or 1 event taking place, and the probability of such events is constant. Then $X$ corresponds to the total number events occurring during time $T$ .

We make a slight, but notable notation adjustment here, and define $\hat{\lambda}$ denotes the rate at which events occur over the interval $T$ (e.g. 2.5 events per minute). Thus, it is equivalent to say that $\lambda=\hat{\lambda} T$ , and re-express the PMF as:

f_{X}(x \mid \hat{\lambda}, T)= \begin{cases}\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{x}}{x !} & \text { if } x=0,1,2, \ldots \\ 0 & \text { otherwise }\end{cases}

So to recap, in very brief terms a Poisson Process occur independently and at a constant rate $(\hat{\lambda})$ , and the Poisson distribution is used to compute the probability of specific numbers of "events" during a particular period $T$ .

Example: Poisson Model of Bug Consumption

Entomologists estimate that an average person consumes approximately one pound $(0.45359 \mathrm{kgs})$ of "bug parts" per year. Apparently, the foods and liquids we eat and drink contain insect eggs and various body parts. To regulate this, the Food and Drug Administration (FDA) has set legal limits on the amount of insect parts 'allowed' in food; for instance, peanut butter can only contain 30 insect fragments per 100 grams.

Figure 15: PMF for the number of bug fragments in 20 grams of peanut butter assuming an average occurrence of 6 fragments.

Now let's consider a standard package of peanut butter crackers, containing approximately 20 grams of peanut butter. What is the probability that the snack includes at least 5 insect fragments?

Solution:

We need to identify values for the model parameters from the given data.

Using a Poisson model, we can define our interval to be over 20 grams (e.g. $T=20$ ). Assuming the worst case scenario, we can compute the average number of bug fragments expected as:

\lambda=\hat{\lambda} T=\frac{30 \text { insect fragments }}{100 \text { grams }} 20 \text { grams }=6 \text { fragments }

Given the model parameters, we can express the PMF for the number of bug parts, $X$ , present in the snack as:

P(X=x)=f_{X}(x \mid \hat{\lambda}, T)=\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{x}}{x !}=\frac{e^{-6}(6)^{x}}{x !}

From this PMF we can formulate the query $P(X \geq 5)$ :

P(X \geq 5)=1-P(X \leq 4)=1-\sum_{x=0}^{4} \frac{e^{-6}(6)^{x}}{x !}=1-0.285057=0.71493

The full PMF corresponding to the distribution of bug fragments is presented in Figure 15.

Let's highlight some further properties of Poisson Processes using the example of bus arrivals.

Example: Poisson Process for Bus Arrivals

The '49 Bus' arrives every ten minutes (at least according to the chart at the bus stop). Let us model the arrival of buses using a Poisson Process.

(a) What is the probability that you wait for ten minutes and the bus does not arrive?

Solution:

The model parameter corresponding to the average rate of the process can be estimated as: $\hat{\lambda}=\frac{1 \text { arrival }}{10 \text { minutes }}=0.1$ arrivals per minute.

Which will be used in the following PMF:

f_{X}(x=k \mid \hat{\lambda}, T)=\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{k}}{k !}

Now need to specify $T$ and $x=k$ to compute query.

Using the definition above, in the problem statement we are therefore asked to compute is $f_{X}(x=$ $0 \mid \hat{\lambda}=0.1, T=10)$ :

f_{X}(x=0 \mid \hat{\lambda}=0.1, T=10)=\frac{e^{-0.1 \times 10}(0.1 \times 10)^{0}}{0 !}=0.367879

(b) What is the probability that three buses arrive in twenty minutes?

Solution:

Evaluate the Poisson PMF in an analogous manner:

f_{X}(x=3 \mid \hat{\lambda}=0.1, T=20)=\frac{e^{-0.1 \times 20}(0.1 \times 20)^{3}}{3 !}=0.180447

CLOSING REMARKS: As with the binomial distribution, we make the assumption that the 'rate' $(\hat{\lambda})$ of the Poisson Process is constant, which it might not be. For instance, the average rate of bus arrivals over the period of a day could fluctuate due to rush hour, accidents, etc.

Common Continuous Distributions Sampling & Estimation