Discrete Random Variables

A random variable is discrete if its range is countable, that is, if we can enumerate the values it can take. Broadly speaking, discrete random variables represent quantities that are counted.

We define the distribution of a random variable to assign a probability to each value the random variable can take. The distribution is given by the probability mass function.

The probability mass function (PMF) of a discrete random variable $X$ is the function:

f_{X}(x)=P(X=x) \text { for } x \in \mathbb{R}

The function $f_{X}$ evaluated at a point $x$ is the probability that the random variable $X$ takes the exact value $x$ .

Elementary properties of the PMF can be derived based on the properties of the general probability function we derived in Chapter 1. In particular, a function $f_{X}$ is a probability mass function for a discrete random variable $X$ if and only if:

\begin{array}{ll} \text { 1. } & 0 \leq f_{X}(x) \leq 1 \\ \text { 2. } & \sum_{\forall x} f_{X}(x)=1 \end{array}

These results follow as events $X=x_{1}, X=x_{2}$ , etc. are equivalent to events that partition $\Omega$ .

Illustrative Example of Discrete Random Variable: Two Dice

Recall our example of a random variable that is equal to the sum of rolling two dice (we will use this example throughout the Chapter to link key concepts):

X=\left\{\left(\omega_{1}+\omega_{2}\right): \omega_{1}+\omega_{2}=2,3,4,5,6,7,8,9,10,11,12\right\}

For each of these values for $X=x$ we can assign the following probability mass function (PMF):

\begin{aligned} P(X=2)=f_{X}(2) & =1 / 36 \\ P(X=3)=f_{X}(3) & =2 / 36 \\ P(X=4)=f_{X}(4) & =3 / 36 \\ P(X=5)=f_{X}(5) & =4 / 36 \\ P(X=6)=f_{X}(6) & =5 / 36 \\ P(X=7)=f_{X}(7) & =6 / 36 \\ P(X=8)=f_{X}(8) & =5 / 36 \\ P(X=9)=f_{X}(9) & =4 / 36 \\ P(X=10)=f_{X}(10) & =3 / 36 \\ P(X=11)=f_{X}(11) & =2 / 36 \\ P(X=12)=f_{X}(12) & =1 / 36 \end{aligned}

Note that this PMF inherently satisfies the properties $0 \leq f_{X}(x) \leq 1$ and $\sum_{\forall x} f_{X}(x)=1$ . Often we will graphically represent the PMF $f_{X}(x)$ using what is known as a histogram, as shown in Figure 3.

Figure 3: Histogram representation of the PMF $f_{X}(x)$ for the sum of two dice. The height of the bar at each $x$ denotes the probability of observing this outcome.

Moments of PMFs

As Chemical Engineers, we frequently utilise integral transforms (e.g. Fourier transforms, which you will have come to love from the other parts of this course) and similar tools to extract features from complex functions to characterise them. In probability theory, the moments of probability distributions produce particularly insightful attributes. For a discrete random variable $X$ , we define the $n$ -th moment of a PMF about a value $c$ as:

M_{n}^{c}=\sum_{\forall x}(x-c)^{n} f_{X}(x)

Let's illustrate this by analysing a few moments for the PMF for the sum of two dice, as given in Figure 3. First, let us consider the $0^{\text {th }}$ moment about zero $(n=0, c=0)$ given by:

M_{0}^{0}=\sum_{\forall x} f_{X}(x)

Figure 4: Computing the $0^{\text {th }}$ moment of the PMF $f_{X}(x)$ of the sum of two dice (see Figure 3) about $c=0$ (i.e. $M_{0}^{0}$ ). The yellow bars correspond to the individual values for $f_{X}(x)$ at a given $x$ (left axis; note this is the same as the histogram in Figure 3) and the blue step-function corresponds to $\sum_{u \leq x} f_{X}(u)$ for a given $x$ (right axis); this step-function is known as the cumulative distribution function (CDF), $F_{X}(x)$ , of the PMF.

Notice that $M_{0}^{0}$ results in property 2 of the PMF (i.e. $\sum_{\forall x} f_{X}(x)=1$ ), as we see on the far right of Figure 4 that the summation of $f_{X}(x)$ over all values for $x$ is 1 . The blue step-function observed in Figure 4 is known as the cumulative distribution function: The cumulative distribution function (CDF) of a random variable $X$ is the function:

F_{X}(x)=P(X \leq x) \text { for } x \in \mathbb{R}

The function $F_{X}(x)$ evaluated at a point $x$ is the probability that the random variable $X$ takes a value less than or equal to $x$ .

CDFs for discrete random variables have the following properties:

$F_{X}(x) \in[0,1]$
$F_{X}(x)=\sum_{u \leq x} f_{X}(u)$
If $y>x$ , then $F_{X}(y)-F_{X}(x)=P(x<X \leq y) \geq 0$
$F X(x) \rightarrow 0$ as $x \rightarrow-\infty$
$F X(x) \rightarrow 1$ as $x \rightarrow \infty$

Property 3 also implies that $F_{X}(x)$ is non-decreasing. The key idea here is that the functions $f_{X}(x)$ or $F_{X}(x)$ can be used to describe the probability distribution of random variable $X$ .

CDF Computations using Sum of Two Dice

Recall the PMF for the sum of two dice, as illustrated in Figure 3 and the table above it. Let's use the properties of the cumulative distribution function (CDF) to answer the following queries:

$P(X \leq 5)$ :

Using the definition of the CDF and property 2 we have:

\begin{aligned} P(X \leq 5) & =F_{X}(5) \\ & =\sum_{u \leq 5} f_{X}(u) \\ & =f_{X}(2)+f_{X}(3)+f_{X}(4)+f_{X}(5) \\ & =\frac{1}{36}+\frac{2}{36}+\frac{3}{36}+\frac{4}{36} \\ & =\frac{10}{36} \approx 0.27778 \end{aligned}

$P(5<X \leq 8)$ :

Using property 3 for a CDF:

\begin{aligned} P(5<X \leq 8) & =F_{X}(8)-F_{X}(5) \\ & =\sum_{u \leq 8} f_{X}(u)-\sum_{u \leq 5} f_{X}(u) \\ & =f_{X}(6)+f_{X}(7)+f_{X}(8) \\ & =\frac{5}{36}+\frac{6}{36}+\frac{5}{36} \\ & =\frac{16}{36} \approx 0.44444 \end{aligned}

Note that this informs us that the outcome for the sum of two randomly rolled die is $\{6,7$ , or 8 $\}$ almost $50 \%$ of the time.

$P(X>8)$ :

To answer this, we need to make use of the fact that $P(X \leq 8)+P(X>8)=1$ and rearrange our expression to be:

\begin{aligned} P(X>8) & =1-P(X \leq 8)=1-F_{X}(8) \\ & =1-\sum_{u \leq 8} f_{X}(u) \\ & =1-\frac{26}{36} \\ & =\frac{10}{36} \approx 0.27778 \end{aligned}

Lastly it is worth noting that $P(X \leq 5)+P(5<X \leq 8)+P(X>8)$ does indeed equal 1.

Next let us consider the $1^{\text {st }}$ moment of a PMF centred about zero $(n=1, c=0)$ , given by:

M_{1}^{0}=\sum_{\forall x} x f_{X}(x)

Figure 5 shows the $1^{\text {st }}$ moment for our PMF for the sum of two dice.

Due to the underlying characteristics of a PMF, the first moment gives rise to another important property, which we define as the 'expectation' of a random variable:

The expectation, $E(X)$ , of a discrete random variable $X$ , also called the expected value or mean of $X$ , is defined as:

E(X)=\sum_{\forall x} x f_{X}(x)

The expectation is a weighted average of the values $X$ can take.

Pause and Reflect 1: Why is the expectation of a PMF the weighted average? Recall the formula for computing a weighted average to be:

\text { Weighted Avg of } x=\frac{\sum_{i} w_{i} \cdot x_{i}}{\sum_{i} w_{i}}

Where $w_{i}$ are the weighting factors.

Pause and Reflect 2: What is the expectation of a constant value (i.e. $E[c]$ )?

Hint: We can think of this as $f_{X}(c)=1$ for some $x=c$ .

Pause and Reflect 3: What happens to the expectation when we multiply our random variable $X$ by some constant $a$ (i.e. $E[a X])$ ?

Figure 5: Computing the $1^{\text {st }}$ moment of the PMF $f_{X}(x)$ of the sum of two dice (Figure 3) about $c=0$ (i.e. $M_{1}^{0}$ ). The yellow bars correspond to the individual values for $x \cdot f_{X}(x)$ at a given $x$ (left axis) and the blue step-function corresponds to $\sum_{u \leq x} u \cdot f_{X}(u)$ for a given $x$ (right axis). The total summation over all $x \cdot f_{X}(x)$ (the value of which is indicated by the final step-function on the far right) results in what is known as the Expectation $(E[X])$ , or the weighted average. We can see that $E[X]=7$ for the sum of two die example considered here. Continuing on in this fashion, we can compute the $2^{\text {nd }}$ moment of our PMF centred around our expected value of $X(n=2, c=E[X]))$ :

M_{2}^{E[X]}=\sum_{\forall x}(x-E[X])^{2} f_{X}(x)

Applying this to our PMF for the sum of two dice example results in Figure 6.

Figure 6: Taking the $2^{\text {nd }}$ moment of the sum of two dice PMF centred about the mean $c=E[X]=7$ (i.e. $\left.M_{2}^{E(X)}\right)$ . Notice that the individual contributions (yellow bars; $(x-E[X])^{2} \cdot f_{X}(x)$ ) are symmetric about the mean $(E[X]=7)$ , as the PMF $f_{X}(x)$ is a symmetric function (see Figure 3 ). The blue step-function again represents $\sum_{u \leq x}(u-E[X])^{2} \cdot f_{X}(u)$ up to a given $x$ , and on the far right axis of this plot $(x=12)$ it can be seen that $\sum_{x}(x-E[X])^{2} \cdot f_{X}(x)=5.833$ ; this value is known as the Variance $(\operatorname{Var}[X])$ , and it provides a measure of the "spread" of the distribution around the mean.

Let us take a minute to consider the information conveyed in Figure 6. The second moment about $E[X]$ provides us with a measure of how the PMF $f_{X}(x)$ (see histogram in Figure 3) "spreads" around the expected value $E[X]$ ; that is, it provides us with a scalar quantity that conveys information regarding the overall spread or variation of the distribution. This second moment of the PMF about the expected value $\left(M_{2}^{E(X)}\right)$ is known as the Variance, $\operatorname{Var}[X]$ , of that random variable.

Pause and Reflect: Notice the units of $\operatorname{Var}[X]$ would be in terms of $x^{2}$ (e.g. if $x$ were measured in terms of distance, this would be $m^{2}$ ). Thus, for convenience the statistics community has defined the standard deviation as:

\sigma=\text { standard deviation }=\sqrt{\operatorname{Var}[X]}

Expectation and Variance for a Function of a Discrete Random Variable

We can derive expressions for expectation and variance of PMFs in an analogous fashion by first defining an expectation in general terms for any function $g(X)$ of a discrete random variable $X$ :

If $X$ is a discrete random variable, and $\mathrm{g}(\mathrm{X})$ is for some real-valued function, then

E[g(X)]=\sum_{\forall x} g(x) f_{X}(x)

To evaluate the expectation of a function of a random variable $X$ , we apply the function to every value in the range of $X$ , then take a weighted average of the results.

From this definition we can derive some very basic properties. For instance, consider the case where $g(X)=a X+b$ :

\begin{aligned} E[a X+b] & =\sum_{\forall x}(a x+b) f_{X}(x) \\ & =\sum_{\forall x} a x f_{X}(x)+\sum_{\forall x} b f_{X}(x) \\ & =a \sum_{\forall x} x f_{X}(x)+b \sum_{\forall x} f_{X}(x)=a E[X]+b \end{aligned}

Using the definition above for the expectation of a function, we can reformulate the variance of a discrete random variable $X$ as the expectation of the squared difference between $X$ and its mean: $E\left[(X-E[X])^{2}\right]$

The variance of a random variable $X$ is can also be conveniently represented as

\operatorname{Var}[X]=E\left[(X-E[X])^{2}\right]=E\left[X^{2}\right]-E[X]^{2}

The variance is the average squared distance between a random variable and its mean. It is a measure of dispersion, i.e. spread.

Pause and Reflect 1: Where did the latter expression for the variance come from?

Hint: Note that $E[X]$ is a constant (say $\mu$ ) and expand $(X-\mu)^{2}$ inside the expectation.

Pause and Reflect 2: What happens to the variance if we multiply a random variable by some constant (i.e. $\operatorname{Var}[a X])$ ?

Pause and Reflect 3: What happens to the variance if translate a random variable by some constant (i.e. $\operatorname{Var}[X+a]) ?$

Definition of a Random Variable Continuous Random Variables