Artificial Intelligence 🤖
Discrete Random Variables

Discrete Random Variables

A random variable is discrete if its range is countable, that is, if we can enumerate the values it can take. Broadly speaking, discrete random variables represent quantities that are counted.

We define the distribution of a random variable to assign a probability to each value the random variable can take. The distribution is given by the probability mass function.

The probability mass function (PMF) of a discrete random variable XX is the function:

fX(x)=P(X=x) for xRf_{X}(x)=P(X=x) \text { for } x \in \mathbb{R}

The function fXf_{X} evaluated at a point xx is the probability that the random variable XX takes the exact value xx.

Elementary properties of the PMF can be derived based on the properties of the general probability function we derived in Chapter 1. In particular, a function fXf_{X} is a probability mass function for a discrete random variable XX if and only if:

 1. 0fX(x)1 2. xfX(x)=1\begin{array}{ll} \text { 1. } & 0 \leq f_{X}(x) \leq 1 \\ \text { 2. } & \sum_{\forall x} f_{X}(x)=1 \end{array}

These results follow as events X=x1,X=x2X=x_{1}, X=x_{2}, etc. are equivalent to events that partition Ω\Omega.

Illustrative Example of Discrete Random Variable: Two Dice

Recall our example of a random variable that is equal to the sum of rolling two dice (we will use this example throughout the Chapter to link key concepts):

X={(ω1+ω2):ω1+ω2=2,3,4,5,6,7,8,9,10,11,12}X=\left\{\left(\omega_{1}+\omega_{2}\right): \omega_{1}+\omega_{2}=2,3,4,5,6,7,8,9,10,11,12\right\}

For each of these values for X=xX=x we can assign the following probability mass function (PMF):

P(X=2)=fX(2)=1/36P(X=3)=fX(3)=2/36P(X=4)=fX(4)=3/36P(X=5)=fX(5)=4/36P(X=6)=fX(6)=5/36P(X=7)=fX(7)=6/36P(X=8)=fX(8)=5/36P(X=9)=fX(9)=4/36P(X=10)=fX(10)=3/36P(X=11)=fX(11)=2/36P(X=12)=fX(12)=1/36\begin{aligned} P(X=2)=f_{X}(2) & =1 / 36 \\ P(X=3)=f_{X}(3) & =2 / 36 \\ P(X=4)=f_{X}(4) & =3 / 36 \\ P(X=5)=f_{X}(5) & =4 / 36 \\ P(X=6)=f_{X}(6) & =5 / 36 \\ P(X=7)=f_{X}(7) & =6 / 36 \\ P(X=8)=f_{X}(8) & =5 / 36 \\ P(X=9)=f_{X}(9) & =4 / 36 \\ P(X=10)=f_{X}(10) & =3 / 36 \\ P(X=11)=f_{X}(11) & =2 / 36 \\ P(X=12)=f_{X}(12) & =1 / 36 \end{aligned}

Note that this PMF inherently satisfies the properties 0fX(x)10 \leq f_{X}(x) \leq 1 and xfX(x)=1\sum_{\forall x} f_{X}(x)=1. Often we will graphically represent the PMF fX(x)f_{X}(x) using what is known as a histogram, as shown in Figure 3.

Figure 3: Histogram representation of the PMF fX(x)f_{X}(x) for the sum of two dice. The height of the bar at each xx denotes the probability of observing this outcome.

Moments of PMFs

As Chemical Engineers, we frequently utilise integral transforms (e.g. Fourier transforms, which you will have come to love from the other parts of this course) and similar tools to extract features from complex functions to characterise them. In probability theory, the moments of probability distributions produce particularly insightful attributes. For a discrete random variable XX, we define the nn-th moment of a PMF about a value cc as:

Mnc=x(xc)nfX(x)M_{n}^{c}=\sum_{\forall x}(x-c)^{n} f_{X}(x)

Let's illustrate this by analysing a few moments for the PMF for the sum of two dice, as given in Figure 3. First, let us consider the 0th 0^{\text {th }} moment about zero (n=0,c=0)(n=0, c=0) given by:

M00=xfX(x)M_{0}^{0}=\sum_{\forall x} f_{X}(x)

Figure 4: Computing the 0th 0^{\text {th }} moment of the PMF fX(x)f_{X}(x) of the sum of two dice (see Figure 3) about c=0c=0 (i.e. M00M_{0}^{0} ). The yellow bars correspond to the individual values for fX(x)f_{X}(x) at a given xx (left axis; note this is the same as the histogram in Figure 3) and the blue step-function corresponds to uxfX(u)\sum_{u \leq x} f_{X}(u) for a given xx (right axis); this step-function is known as the cumulative distribution function (CDF), FX(x)F_{X}(x), of the PMF.

Notice that M00M_{0}^{0} results in property 2 of the PMF (i.e. xfX(x)=1\sum_{\forall x} f_{X}(x)=1 ), as we see on the far right of Figure 4 that the summation of fX(x)f_{X}(x) over all values for xx is 1 . The blue step-function observed in Figure 4 is known as the cumulative distribution function: The cumulative distribution function (CDF) of a random variable XX is the function:

FX(x)=P(Xx) for xRF_{X}(x)=P(X \leq x) \text { for } x \in \mathbb{R}

The function FX(x)F_{X}(x) evaluated at a point xx is the probability that the random variable XX takes a value less than or equal to xx.

CDFs for discrete random variables have the following properties:

  1. FX(x)[0,1]F_{X}(x) \in[0,1]
  2. FX(x)=uxfX(u)F_{X}(x)=\sum_{u \leq x} f_{X}(u)
  3. If y>xy>x, then FX(y)FX(x)=P(x<Xy)0F_{X}(y)-F_{X}(x)=P(x<X \leq y) \geq 0
  4. FX(x)0F X(x) \rightarrow 0 as xx \rightarrow-\infty
  5. FX(x)1F X(x) \rightarrow 1 as xx \rightarrow \infty

Property 3 also implies that FX(x)F_{X}(x) is non-decreasing. The key idea here is that the functions fX(x)f_{X}(x) or FX(x)F_{X}(x) can be used to describe the probability distribution of random variable XX.

CDF Computations using Sum of Two Dice

Recall the PMF for the sum of two dice, as illustrated in Figure 3 and the table above it. Let's use the properties of the cumulative distribution function (CDF) to answer the following queries:

  1. P(X5)P(X \leq 5) :

Using the definition of the CDF and property 2 we have:

P(X5)=FX(5)=u5fX(u)=fX(2)+fX(3)+fX(4)+fX(5)=136+236+336+436=10360.27778\begin{aligned} P(X \leq 5) & =F_{X}(5) \\ & =\sum_{u \leq 5} f_{X}(u) \\ & =f_{X}(2)+f_{X}(3)+f_{X}(4)+f_{X}(5) \\ & =\frac{1}{36}+\frac{2}{36}+\frac{3}{36}+\frac{4}{36} \\ & =\frac{10}{36} \approx 0.27778 \end{aligned}
  1. P(5<X8)P(5<X \leq 8) :

Using property 3 for a CDF:

P(5<X8)=FX(8)FX(5)=u8fX(u)u5fX(u)=fX(6)+fX(7)+fX(8)=536+636+536=16360.44444\begin{aligned} P(5<X \leq 8) & =F_{X}(8)-F_{X}(5) \\ & =\sum_{u \leq 8} f_{X}(u)-\sum_{u \leq 5} f_{X}(u) \\ & =f_{X}(6)+f_{X}(7)+f_{X}(8) \\ & =\frac{5}{36}+\frac{6}{36}+\frac{5}{36} \\ & =\frac{16}{36} \approx 0.44444 \end{aligned}

Note that this informs us that the outcome for the sum of two randomly rolled die is {6,7\{6,7, or 8}\} almost 50%50 \% of the time.

  1. P(X>8)P(X>8) :

To answer this, we need to make use of the fact that P(X8)+P(X>8)=1P(X \leq 8)+P(X>8)=1 and rearrange our expression to be:

P(X>8)=1P(X8)=1FX(8)=1u8fX(u)=12636=10360.27778\begin{aligned} P(X>8) & =1-P(X \leq 8)=1-F_{X}(8) \\ & =1-\sum_{u \leq 8} f_{X}(u) \\ & =1-\frac{26}{36} \\ & =\frac{10}{36} \approx 0.27778 \end{aligned}

Lastly it is worth noting that P(X5)+P(5<X8)+P(X>8)P(X \leq 5)+P(5<X \leq 8)+P(X>8) does indeed equal 1.

Next let us consider the 1st 1^{\text {st }} moment of a PMF centred about zero (n=1,c=0)(n=1, c=0), given by:

M10=xxfX(x)M_{1}^{0}=\sum_{\forall x} x f_{X}(x)

Figure 5 shows the 1st 1^{\text {st }} moment for our PMF for the sum of two dice.

Due to the underlying characteristics of a PMF, the first moment gives rise to another important property, which we define as the 'expectation' of a random variable:

The expectation, E(X)E(X), of a discrete random variable XX, also called the expected value or mean of XX, is defined as:

E(X)=xxfX(x)E(X)=\sum_{\forall x} x f_{X}(x)

The expectation is a weighted average of the values XX can take.

Pause and Reflect 1: Why is the expectation of a PMF the weighted average? Recall the formula for computing a weighted average to be:

 Weighted Avg of x=iwixiiwi\text { Weighted Avg of } x=\frac{\sum_{i} w_{i} \cdot x_{i}}{\sum_{i} w_{i}}

Where wiw_{i} are the weighting factors.

Pause and Reflect 2: What is the expectation of a constant value (i.e. E[c]E[c] )?

Hint: We can think of this as fX(c)=1f_{X}(c)=1 for some x=cx=c.

Pause and Reflect 3: What happens to the expectation when we multiply our random variable XX by some constant aa (i.e. E[aX])E[a X]) ?

Figure 5: Computing the 1st 1^{\text {st }} moment of the PMF fX(x)f_{X}(x) of the sum of two dice (Figure 3) about c=0c=0 (i.e. M10M_{1}^{0} ). The yellow bars correspond to the individual values for xfX(x)x \cdot f_{X}(x) at a given xx (left axis) and the blue step-function corresponds to uxufX(u)\sum_{u \leq x} u \cdot f_{X}(u) for a given xx (right axis). The total summation over all xfX(x)x \cdot f_{X}(x) (the value of which is indicated by the final step-function on the far right) results in what is known as the Expectation (E[X])(E[X]), or the weighted average. We can see that E[X]=7E[X]=7 for the sum of two die example considered here. Continuing on in this fashion, we can compute the 2nd 2^{\text {nd }} moment of our PMF centred around our expected value of X(n=2,c=E[X]))X(n=2, c=E[X])) :

M2E[X]=x(xE[X])2fX(x)M_{2}^{E[X]}=\sum_{\forall x}(x-E[X])^{2} f_{X}(x)

Applying this to our PMF for the sum of two dice example results in Figure 6.

Figure 6: Taking the 2nd 2^{\text {nd }} moment of the sum of two dice PMF centred about the mean c=E[X]=7c=E[X]=7 (i.e. M2E(X))\left.M_{2}^{E(X)}\right). Notice that the individual contributions (yellow bars; (xE[X])2fX(x)(x-E[X])^{2} \cdot f_{X}(x) ) are symmetric about the mean (E[X]=7)(E[X]=7), as the PMF fX(x)f_{X}(x) is a symmetric function (see Figure 3 ). The blue step-function again represents ux(uE[X])2fX(u)\sum_{u \leq x}(u-E[X])^{2} \cdot f_{X}(u) up to a given xx, and on the far right axis of this plot (x=12)(x=12) it can be seen that x(xE[X])2fX(x)=5.833\sum_{x}(x-E[X])^{2} \cdot f_{X}(x)=5.833; this value is known as the Variance (Var[X])(\operatorname{Var}[X]), and it provides a measure of the "spread" of the distribution around the mean.

Let us take a minute to consider the information conveyed in Figure 6. The second moment about E[X]E[X] provides us with a measure of how the PMF fX(x)f_{X}(x) (see histogram in Figure 3) "spreads" around the expected value E[X]E[X]; that is, it provides us with a scalar quantity that conveys information regarding the overall spread or variation of the distribution. This second moment of the PMF about the expected value (M2E(X))\left(M_{2}^{E(X)}\right) is known as the Variance, Var[X]\operatorname{Var}[X], of that random variable.

Pause and Reflect: Notice the units of Var[X]\operatorname{Var}[X] would be in terms of x2x^{2} (e.g. if xx were measured in terms of distance, this would be m2m^{2} ). Thus, for convenience the statistics community has defined the standard deviation as:

σ= standard deviation =Var[X]\sigma=\text { standard deviation }=\sqrt{\operatorname{Var}[X]}

Expectation and Variance for a Function of a Discrete Random Variable

We can derive expressions for expectation and variance of PMFs in an analogous fashion by first defining an expectation in general terms for any function g(X)g(X) of a discrete random variable XX :

If XX is a discrete random variable, and g(X)\mathrm{g}(\mathrm{X}) is for some real-valued function, then

E[g(X)]=xg(x)fX(x)E[g(X)]=\sum_{\forall x} g(x) f_{X}(x)

To evaluate the expectation of a function of a random variable XX, we apply the function to every value in the range of XX, then take a weighted average of the results.

From this definition we can derive some very basic properties. For instance, consider the case where g(X)=aX+bg(X)=a X+b :

E[aX+b]=x(ax+b)fX(x)=xaxfX(x)+xbfX(x)=axxfX(x)+bxfX(x)=aE[X]+b\begin{aligned} E[a X+b] & =\sum_{\forall x}(a x+b) f_{X}(x) \\ & =\sum_{\forall x} a x f_{X}(x)+\sum_{\forall x} b f_{X}(x) \\ & =a \sum_{\forall x} x f_{X}(x)+b \sum_{\forall x} f_{X}(x)=a E[X]+b \end{aligned}

Using the definition above for the expectation of a function, we can reformulate the variance of a discrete random variable XX as the expectation of the squared difference between XX and its mean: E[(XE[X])2]E\left[(X-E[X])^{2}\right]

The variance of a random variable XX is can also be conveniently represented as

Var[X]=E[(XE[X])2]=E[X2]E[X]2\operatorname{Var}[X]=E\left[(X-E[X])^{2}\right]=E\left[X^{2}\right]-E[X]^{2}

The variance is the average squared distance between a random variable and its mean. It is a measure of dispersion, i.e. spread.

Pause and Reflect 1: Where did the latter expression for the variance come from?

Hint: Note that E[X]E[X] is a constant (say μ\mu ) and expand (Xμ)2(X-\mu)^{2} inside the expectation.

Pause and Reflect 2: What happens to the variance if we multiply a random variable by some constant (i.e. Var[aX])\operatorname{Var}[a X]) ?

Pause and Reflect 3: What happens to the variance if translate a random variable by some constant (i.e. Var[X+a])?\operatorname{Var}[X+a]) ?