Artificial Intelligence 🤖
Common Discrete Distributions

Common Discrete Distributions

In this section we will review some discrete probability mass functions frequently encountered in science and engineering.

Discrete Uniform

A random variable XX follows a discrete uniform distribution if it takes values 1,2,,n1,2, \ldots, n with equal probability. We denote this by:

XDU[1,n]X \sim D U[1, n]

The symbol \sim means 'is distributed as'.

Since the sum of the probabilities must equal 1, the PMF of XX is

fX(xn)={1n if x=1,2,,n0 otherwise f_{X}(x \mid n)=\left\{\begin{array}{cc} \frac{1}{n} & \text { if } \quad x=1,2, \ldots, n \\ 0 & \text { otherwise } \end{array}\right.

NOTE: Notice the convention where we say that the model parameter nn is given: fX(xn)f_{X}(x \mid n). This will be adopted throughout the chapter to distinguish random variables from model parameters.

Using the formula for the sum of the first nn natural numbers, the expectation of XX is:

E[X]=xxfX(xn)=x=1nx1n=1nx=1nx=1nn(n+1)2=n+12E[X]=\sum_{\forall x} x f_{X}(x \mid n)=\sum_{x=1}^{n} x \frac{1}{n}=\frac{1}{n} \sum_{x=1}^{n} x=\frac{1}{n} \frac{n(n+1)}{2}=\frac{n+1}{2}

which is halfway between the minimum and maximum values XX can take. Using the formula for the sum of the squares of the first nn natural numbers, we have

E[X2]=xx2fX(xn)=x=1nx21n=1nx=1nx2=1nn(n+1)(2n+1)6=(n+1)(2n+1)6E\left[X^{2}\right]=\sum_{\forall x} x^{2} f_{X}(x \mid n)=\sum_{x=1}^{n} x^{2} \frac{1}{n}=\frac{1}{n} \sum_{x=1}^{n} x^{2}=\frac{1}{n} \frac{n(n+1)(2 n+1)}{6}=\frac{(n+1)(2 n+1)}{6}

so the variance of XX is:

Var[X]=E[X2]E[X]2=(n+1)(2n+1)6(n+12)2=n2112\operatorname{Var}[X]=E\left[X^{2}\right]-E[X]^{2}=\frac{(n+1)(2 n+1)}{6}-\left(\frac{n+1}{2}\right)^{2}=\frac{n^{2}-1}{12}

Bernoulli

A random variable XX follows a Bernoulli distribution if it takes the value 1 with probability pp, and the value 0 with probability 1p1-p. We denote this by

XBernoulli(p)X \sim \operatorname{Bernoulli}(p)

We will generally assign the random variable values as X=1X=1 to denote "success" and X=0X=0 to denote "failure".

The PMF of XX is

fX(xp)={p if x=11p if x=00 otherwise f_{X}(x \mid p)=\left\{\begin{array}{rll} p & \text { if } & x=1 \\ 1-p & \text { if } & x=0 \\ 0 & & \text { otherwise } \end{array}\right.

The Bernoulli distribution arises in any experiment where the sample space consists of only two outcomes; examples are flipping a biased coin once (Ω=(\Omega= {heads, tails })\}). We call such an experiment a Bernoulli trial, and arbitrarily assign a random variable value to one of the outcomes as a success (e.g. x=1x=1 ) and the other as a failure (e.g. x=0x=0 ).

The PMF can also be conveniently summarised as:

fX(xp)=px(1p)1xf_{X}(x \mid p)=p^{x}(1-p)^{1-x}

The expectation and variance of XX are computed in a straightforward manner as:

E[X]=xxfX(xp)=0(1p)+1p=pE[X2]=xx2fX(xp)=02(1p)+12p=pVar[X]=E[X2]E[X]2=pp2=p(1p)\begin{aligned} E[X] & =\sum_{\forall x} x f_{X}(x \mid p)=0 \cdot(1-p)+1 \cdot p=p \\ E\left[X^{2}\right] & =\sum_{\forall x} x^{2} f_{X}(x \mid p)=0^{2} \cdot(1-p)+1^{2} \cdot p=p \\ \operatorname{Var}[X] & =E\left[X^{2}\right]-E[X]^{2}=p-p^{2}=p(1-p) \end{aligned}

Binomial

Binomial probabilities apply to situations involving a series of independent and identical trials, where each trial can have only one of two possible outcomes (i.e. Bernoulli trials). If we were flipping a coin, and the probability of it landing heads was pp (and therefore probability of tails 1p1-p ), then the probability of obtaining the sequence (H,T,H)(H, T, H) would be p×(1p)×pp \times(1-p) \times p. In other words, since the coin tosses are independent events, we would multiply the probability of obtaining a head with the probability of the coin flipping tails, and with the probability of the coin flipping heads again.

We can generate a plot of the probability mass function (PMF) for a binomial distribution using the following code:

from scipy.stats import binom
import matplotlib.pyplot as plt
 
n = 10
p = 0.5
x = np.arange(0, 10, 0.5)
plt.plot(x, binom.pmf(x, n, p))

Which results in:

Binomial

Or, visualised with different values of pp and nn:

Probability mass function for the binomial distribution

Suppose we were interested in the probability of obtaining two (or any other number of) heads from these throws, in any order. For a particular number of coin throws nn, the number of sequences containing kk heads can be computed using the binomial coefficient, (nk)\left(\begin{array}{l}n \\ k\end{array}\right) ( read " nn choose kk ", button nCrn C r on your calculator):

(nk)=n!k!(nk)!, where x!=j=1xj, and 0!=1\left(\begin{array}{l} n \\ k \end{array}\right)=\frac{n !}{k !(n-k) !}, \text { where } x !=\prod_{j=1}^{x} j, \text { and } 0 !=1

In particular, for three coin flips (n=3)(n=3), there are three sequences resulting in two heads (k=2)(k=2) :

Since each sequence occurs with a probability of p2(1p)p^{2}(1-p), and there are three sequences, the total probability of obtaining two heads is therefore 3×p2(1p)3 \times p^{2}(1-p), where we have simply counted the total number sequences containing 2 heads, but we could have equivalently used the binomial coefficient:

(32)=3!2!1!=62=3\left(\begin{array}{l} 3 \\ 2 \end{array}\right)=\frac{3 !}{2 ! 1 !}=\frac{6}{2}=3

This leads us to the definition of the binomial distribution.

Suppose that we have a sequence of nn independent Bernoulli trials, each with probability pp of "success". The total number of successes, XX, follows a binomial distribution, which we denote by

XBinomial(n,p)X \sim \operatorname{Binomial}(n, p)

A binomial random variable with parameters (n,p)(n, p) is equivalent to the sum of nn independent Bernoulli random variables with parameter pp.

The PMF of a binomial random variable XX is

fX(xn,p)={(nx)px(1p)nx if x=0,1,,n0 otherwise f_{X}(x \mid n, p)= \begin{cases}\left(\begin{array}{l} n \\ x \end{array}\right) p^{x}(1-p)^{n-x} & \text { if } x=0,1, \ldots, n \\ 0 & \text { otherwise }\end{cases}

Where (nx)\left(\begin{array}{l}n \\ x\end{array}\right) is the binomial coefficient defined above.

The expectation and variance of a binomial random variable XX are:

E[X]=npVar[X]=np(1p)\begin{aligned} E[X] & =n p \\ \operatorname{Var}[X] & =n p(1-p) \end{aligned}

We will prove these later, when we consider the properties of sums of random variables.

An important point to consider is how values for pp are estimated. Traditionally, historical data is used for this purpose, but this assumes that the process under investigation is the same as that when the data was collected (i.e. that pp is indeed a constant).

Example: Binomial Distribution - Ebola Virus

You may recall the 2013-14 Ebola virus epidemic that resulted in 8,914 cases and 4,447 deaths, according to the World Health Organisation (data collected on 15 October 2014); this corresponds to a 50%50 \% mortality rate (although the WHO estimates that the death rate is likely to be closer to 70%70 \% ). From the given data, if 15 people are known to have contracted this virus, what is the probability that:

(a) Exactly 5 people survive?

Solution:

We can model the fate of each individual that has contracted the virus as a Bernoulli random variable XX with parameter p=0.5p=0.5. The number of surviving individuals can then be modelled using binomial distribution.

P(X=5)=fX(x=5)=15!5!10!(0.5)5(0.5)10=3003(0.03125)(0.000976562)=0.0916443P(X=5)=f_{X}(x=5)=\frac{15 !}{5 ! 10 !}(0.5)^{5}(0.5)^{10}=3003 \cdot(0.03125) \cdot(0.000976562)=0.0916443

(b) At least 10 people survive?

Solution:

P(X10)=x=1015fX(x)=+3003(0.0009766)(0.03125)+1365(0.0004883)(0.0625)+455(0.0002441)(0.125)+105(0.0001221)(0.25)+15(6.10352e05)(0.5)+1(3.05176e05)(1)=0.150879\begin{aligned} P(X \geq 10) & =\sum_{x=10}^{15} f_{X}(x)= \\ & +3003 \cdot(0.0009766) \cdot(0.03125) \\ & +1365 \cdot(0.0004883) \cdot(0.0625) \\ & +455 \cdot(0.0002441) \cdot(0.125) \\ & +105 \cdot(0.0001221) \cdot(0.25) \\ & +15 \cdot(6.10352 e-05) \cdot(0.5) \\ & +1 \cdot(3.05176 e-05) \cdot(1) \\ & =0.150879 \end{aligned}

We can also visually inspect the corresponding PMF by plotting it over all xx, which is shown for mortality rates of 50%50 \% (blue) and 70%70 \% (red) in Figure 13. From the figure we immediately observe a dramatic decrease in the probability of survival if the mortality rate is estimated to be 70%70 \%.

Poisson

Poisson probability mass function, and this has a very specific application. It looks a lot like a normal distribution, but it's a little bit different.

The idea here is, if you have some information about the average number of things that happen in a given time period, this probability mass function can give you a way to predict the odds of getting another value instead, on a given future day.

As an example, let's say I have a website, and on average I get 500 visitors per day. I can use the Poisson probability mass function to estimate the probability of seeing some other value on a specific day. For example, with my average of 500 visitors per day, what's the odds of seeing 550 visitors on a given day? That's what a Poisson probability mass function can give you take a look at the following code:

from scipy.stats import poisson
import matplotlib.pyplot as plt
 
mu = 500
x = np.arange(400, 600, 0.5)
plt.plot(x, poisson.pmf(x, mu))

Which results in:

Poisson

I can use that graph to look up the odds of getting any specific value that's not 500500, assuming a normal distribution. The odds of seeing 550 visitors on a given day, it turns out, comes out to about 0.002\mathbf{0 . 0 0 2} or 0.2%0.2 \% probability. Very interesting.

An obvious inconvenience of the binomial distribution is that for large or even moderate nn the calculations of the binomial coefficient can become laborious. In 1837, the Poisson distribution was published as a limiting approximation of the binomial distribution in cases where n,p0n \rightarrow \infty, p \rightarrow 0 and npn p remains constant.

Figure 13: PMF of the number of survivors of the Ebola virus taking the probability of mortality to be 50%50 \% (blue bars; p=0.5p=0.5 ) and 70%70 \% (red bars; p=0.3p=0.3 ) for n=15n=15.

fX(xn,p)=enp(np)x/x!f_{X}(x \mid n, p)=e^{-n p}(n p)^{x} / x !

Indeed the approximation is quite good for n=100n=100 and p=1100p=\frac{1}{100}.

Example: Poisson Approximation of London Bombings During WWII

In his seminal book Introduction to Probability Theory and its Applications, William Feller discusses the statistics of bombings in the south of London during 'The Blitz' of World War II. If you lived in a district consisting of 10 by 10 blocks (i.e. 100 squares), how likely is it that your square would not be hit given that 400 bombs were dropped?

Solution:

Can this modelled as a series of random Bernoulli trials?

Yes!

We need to identify values for the model parameters from the given data.

We can estimate that the probability of a particular bomb hitting your square to be p=1100p=\frac{1}{100}. The dropping of the 400 bombs can also be approximated as a series of nn random Bernoulli trials, where the random variable XX corresponds to the number of bombs dropped on your square. Thus, we could model this using a binomial distribution, but the calculation of the binomial coefficient becomes pro- hibitive. Fortunately, we can approximate this as a Poisson distribution using np=4001100=4n p=400 \cdot \frac{1}{100}=4

Given the model parameters we can formulate our query as:

P(X=x)=fX(xn,p)=enp(np)xx!=e4(4)xx!\begin{gathered} P(X=x)=f_{X}(x \mid n, p)=\frac{e^{-n p}(n p)^{x}}{x !} \\ =\frac{e^{-4}(4)^{x}}{x !} \end{gathered}

If we compute this for x=0x=0, we get P(X=0)=0.0183156P(X=0)=0.0183156.

By the end of the 19th 19^{\text {th }} century, this distribution found further applications even when no binomial random variable is present and there are no available values for nn or pp. These correspond to experiments where the random variable XX is the number of outcomes occurring during a given time interval or in a specified region, based on a constant rate of occurrence λ\lambda, as described below.

Suppose that we interested in events that occur independently over time, and at a known average rate. The number of events, XX, that occur in a fixed time interval follows a Poisson distribution, which we denote by

XPoisson(λ)X \sim \operatorname{Poisson}(\lambda)

where λ\lambda is the average number of outcomes over the interval (e.g. expressed in time, distance, area or volume).

The PMF of XX is

fX(xλ)={eλλx/x! if x=0,1,2,0 otherwise f_{X}(x \mid \lambda)= \begin{cases}e^{-\lambda} \lambda^{x} / x ! & \text { if } x=0,1,2, \ldots \\ 0 & \text { otherwise }\end{cases}

Interestingly, the expectation and variance of XX are

E[X]=λVar[X]=λ\begin{aligned} E[X] & =\lambda \\ \operatorname{Var}[X] & =\lambda \end{aligned}

The Poisson Process:

It turns out that the expression of the Poisson distribution, eλλx/xe^{-\lambda} \lambda^{x} / x !, captures an extremely wide variety of phenomena that possess similar underlying features. Indeed, we can explicitly state said features, and denote any process that satisfies these conditions to be a 'Poisson Process'.

To illustrate this, let us consider events occurring over a time interval of length TT, where the events correspond to the arrival of a bus, as shown in Figure 14. We can dissect the total time interval into nn non-overlapping subintervals, each of length Tn\frac{T}{n}, where nn is large. A Poisson Process corresponds to any experiment that obeys the following assumptions:

  1. The probability that two or more events (e.g. bus arrivals) occur in any given subinterval is negligible.
  2. The number of events occurring in one subinterval is independent of the number that occur in any other subinterval (we shall later refer to this as having 'no memory').
  3. The probability that an event occurs during a given subinterval is constant over the entire interval (e.g. from 0 to TT ).

Figure 14: Illustration of a Poisson Process by considering the arrival of buses over time interval TT, which is dissected into nn non-overlapping subintervals, each of length Tn\frac{T}{n}.

Here the nn subintervals are analogous to the nn independent Bernoulli trials that form the basis of the binomial distribution. In each subinterval, there will either be 0 or 1 event taking place, and the probability of such events is constant. Then XX corresponds to the total number events occurring during time TT.

We make a slight, but notable notation adjustment here, and define λ^\hat{\lambda} denotes the rate at which events occur over the interval TT (e.g. 2.5 events per minute). Thus, it is equivalent to say that λ=λ^T\lambda=\hat{\lambda} T, and re-express the PMF as:

fX(xλ^,T)={eλ^T(λ^T)xx! if x=0,1,2,0 otherwise f_{X}(x \mid \hat{\lambda}, T)= \begin{cases}\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{x}}{x !} & \text { if } x=0,1,2, \ldots \\ 0 & \text { otherwise }\end{cases}

So to recap, in very brief terms a Poisson Process occur independently and at a constant rate (λ^)(\hat{\lambda}), and the Poisson distribution is used to compute the probability of specific numbers of "events" during a particular period TT.

Example: Poisson Model of Bug Consumption

Entomologists estimate that an average person consumes approximately one pound (0.45359kgs)(0.45359 \mathrm{kgs}) of "bug parts" per year. Apparently, the foods and liquids we eat and drink contain insect eggs and various body parts. To regulate this, the Food and Drug Administration (FDA) has set legal limits on the amount of insect parts 'allowed' in food; for instance, peanut butter can only contain 30 insect fragments per 100 grams.

Figure 15: PMF for the number of bug fragments in 20 grams of peanut butter assuming an average occurrence of 6 fragments.

Now let's consider a standard package of peanut butter crackers, containing approximately 20 grams of peanut butter. What is the probability that the snack includes at least 5 insect fragments?

Solution:

We need to identify values for the model parameters from the given data.

Using a Poisson model, we can define our interval to be over 20 grams (e.g. T=20T=20 ). Assuming the worst case scenario, we can compute the average number of bug fragments expected as:

λ=λ^T=30 insect fragments 100 grams 20 grams =6 fragments \lambda=\hat{\lambda} T=\frac{30 \text { insect fragments }}{100 \text { grams }} 20 \text { grams }=6 \text { fragments }

Given the model parameters, we can express the PMF for the number of bug parts, XX, present in the snack as:

P(X=x)=fX(xλ^,T)=eλ^T(λ^T)xx!=e6(6)xx!P(X=x)=f_{X}(x \mid \hat{\lambda}, T)=\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{x}}{x !}=\frac{e^{-6}(6)^{x}}{x !}

From this PMF we can formulate the query P(X5)P(X \geq 5) :

P(X5)=1P(X4)=1x=04e6(6)xx!=10.285057=0.71493P(X \geq 5)=1-P(X \leq 4)=1-\sum_{x=0}^{4} \frac{e^{-6}(6)^{x}}{x !}=1-0.285057=0.71493

The full PMF corresponding to the distribution of bug fragments is presented in Figure 15.

Let's highlight some further properties of Poisson Processes using the example of bus arrivals.

Example: Poisson Process for Bus Arrivals

The '49 Bus' arrives every ten minutes (at least according to the chart at the bus stop). Let us model the arrival of buses using a Poisson Process.

(a) What is the probability that you wait for ten minutes and the bus does not arrive?

Solution:

The model parameter corresponding to the average rate of the process can be estimated as: λ^=1 arrival 10 minutes =0.1\hat{\lambda}=\frac{1 \text { arrival }}{10 \text { minutes }}=0.1 arrivals per minute.

Which will be used in the following PMF:

fX(x=kλ^,T)=eλ^T(λ^T)kk!f_{X}(x=k \mid \hat{\lambda}, T)=\frac{e^{-\hat{\lambda} T}(\hat{\lambda} T)^{k}}{k !}

Now need to specify TT and x=kx=k to compute query.

Using the definition above, in the problem statement we are therefore asked to compute is fX(x=f_{X}(x= 0λ^=0.1,T=10)0 \mid \hat{\lambda}=0.1, T=10) :

fX(x=0λ^=0.1,T=10)=e0.1×10(0.1×10)00!=0.367879f_{X}(x=0 \mid \hat{\lambda}=0.1, T=10)=\frac{e^{-0.1 \times 10}(0.1 \times 10)^{0}}{0 !}=0.367879

(b) What is the probability that three buses arrive in twenty minutes?

Solution:

Evaluate the Poisson PMF in an analogous manner:

fX(x=3λ^=0.1,T=20)=e0.1×20(0.1×20)33!=0.180447f_{X}(x=3 \mid \hat{\lambda}=0.1, T=20)=\frac{e^{-0.1 \times 20}(0.1 \times 20)^{3}}{3 !}=0.180447

CLOSING REMARKS: As with the binomial distribution, we make the assumption that the 'rate' (λ^)(\hat{\lambda}) of the Poisson Process is constant, which it might not be. For instance, the average rate of bus arrivals over the period of a day could fluctuate due to rush hour, accidents, etc.