Definition of a Random Variable

Up until now, we have considered the properties of probabilities defined over events (i.e. a set of one or more outcomes, $\omega$ , that make up the Sample Space, $\Omega$ , of the experiment). For simple problems this framework for computing probabilities is adequate. However, there often arise instances where a numerical attribute of an outcome is particularly important in our analysis. In these cases we would like to associate a single number with a sample outcome; random variables allow us to do just that. Let us consider an example to motivate this problem:

Example: A Simple Random Variable

Say we are concerned with analysing the outcomes of rolling two dice (let's denote these outcomes as $\left.\left(\omega_{1}, \omega_{2}\right)\right)$ . Applying the definitions from Chapter 1 , we can define the probability space of the two outcomes in the following way:

P\left(\omega_{1}, \omega_{2}\right)=\frac{1}{36} \quad\left(\text { for all } \omega_{1}, \omega_{2} \in\{1, \ldots, 6\}\right)

Thus, the outcomes of the experiment make up a 36-member Sample Space $\Omega$ , where all outcomes are equally likely (as describes most games of chance). For the sake of argument suppose we are only interested in the probabilities of sums of the pair of dice. This might arise, for instance, when playing "craps", and they player is most interested in the sum of the dice being 7 . Thus, we could define a random variable to be the sum of the two dice:

X(\omega)=\left\{\left(\omega_{1}+\omega_{2}\right): \omega_{1}+\omega_{2}=2,3,4,5,6,7,8,9,10,11,12\right\}

Note that the definition of this random variable contracts our original 36-member Sample Space to contain only a 11-members, the outcomes of which are not equally likely (e.g. there is only one pair $\left(\omega_{1}, \omega_{2}\right)$ that sums to 2 , but six distinct pairs summing to 7 $)$ .

Figure 1: Geometric representation for the random variable $X(\omega)$ mapping the outcomes $\left(\omega_{1}, \omega_{2}\right)$ of rolling two die into a smaller Sample Space (yellow).

The example in Figure 1 illustrates how random variables originally came out about in probability theory: as a way to create simpler Sample Spaces. A random variable (RV) is a function that maps outcomes $\omega$ into real numbers $\mathbb{R}$ . In a more formal sense, a random variable describes a method for assigning a number $X(\omega)=x \in \mathbb{R}$ to each outcome $\omega \in \Omega$ of an experiment.

Figure 2: Illustration of a Random Variable $X(\omega)$ as a mapping function of outcomes into the real space.

Comment on Notation: We use uppercase letters (i.e. $X, Y$ and $Z$ ) to represent the random variables themselves, and lowercase letters (i.e. $x, y$ and $z$ ) to represent the possible values the random variables can take. This becomes particularly important in terms involving summation, where:

\sum_{\forall x} \text { or } \sum_{x} \text { means the summation over all possible values for } X \text {. }

The range of a random variable is the set of all possible values it can take. As long as the experiment and the random variable are clearly defined, we can simply write $X$ instead of $X(\omega)$ .

Random variables can be discrete or continuous (which will frame our discussion below), and finite or infinite. Let's take a minute to think about how this applies to some examples from Chapter 1, which thus far have all been for discrete random variables.

Timeout: Random Variables from Chapter 1

In Chapter 1 we computed probabilities using outcomes, $\omega$ , of an experiment. So how do random variables affect the examples where our outcomes were from flipping a coin?

Single Coin Toss:

In the example of tossing an unbiased coin, we had two equally likely outcomes that made up our sample space $\Omega_{\text {coin }}=\{\mathrm{H}, \mathrm{T}\}$ . We can arbitrarily assign any two numbers we want to these two outcomes (so long as they are not the same value) to define the random variable $X$ :

\begin{array}{ll} x=0 & \text { if } \omega=\text { Heads }(\text { or } \mathrm{H}) \\ x=1 & \text { if } \omega=\text { Tails }(\text { or T) } \end{array}

Several Coin Tosses:

Following a similar logic using the Independence example in Section 1.7 where we flipped a coin twice, we could define the following random variable $X$ if we wanted to map each individual outcome:

\begin{array}{ll} x=0 & \text { if } \omega=\{\mathrm{HH}\} \\ x=1 & \text { if } \omega=\{\mathrm{HT}\} \\ x=2 & \text { if } \omega=\{\mathrm{TH}\} \\ x=3 & \text { if } \omega=\{\mathrm{TT}\} \end{array}

Alternatively, if we were interested in only the number of heads that were flipped, we could define a different random variable $Y$ :

\begin{array}{ll} y=2 & \text { if } \omega=\{\mathrm{HH}\} \\ y=1 & \text { if } \omega=\{\mathrm{HT}\} \\ y=1 & \text { if } \omega=\{\mathrm{TH}\} \\ y=0 & \text { if } \omega=\{\mathrm{TT}\} \end{array}

Comment: We can see that in cases where we want to define a unique random variable value for every $\omega$ (as is the case for most simple examples we've considered thus far), the numbering associated with the random variable outcomes is arbitrary. Indeed in these simplest cases where there is a one-to-one correspondence between experiment outcomes $\omega$ and random variable values $x$ , we may still casually refer to random variable outcomes as $x=H$ or $x=T$ , for example.

So one might ask why are we even bothering with random variables in these instances? We shall see the utility of the numerical convention when we start computing properties of our probability functions (i.e. expectation and variance) in the next sections.

NOTE: Perhaps the most amazing feature of random variables is that all of the probabilistic relationships we defined in Chapter 1 for events $(A, B, C)$ (e.g. Conditional Probability, Chain Rule, Bayes' Theorem, etc) are directly applicable for random variables $(X, Y, Z)$ ! The reason for this is beyond the scope of the course, but it provides us with a flexible framework for manipulating probabilistic expressions.

Marginalisation - the Total Law revisited

Before moving on, we should point out one more useful property of random variables. Recall our definition of the Total Law of Probability in Section 1.5, where we could "sum away" events $A_{k}$ if they formed a partition of the Sample Space $\Omega$ .

For random variables, we have an analogous operation that we will refer to as "Marginalisation" (for now do not worry where this name comes from; it will become apparent later in the Chapter):

\begin{aligned} P(X) & =\sum_{y} P(X, y) \\ P(X) & =\sum_{y} \sum_{z} P(X, y, z) \end{aligned}

This is a handy tool that provides a vehicle for linking complex probabilistic expressions (e.g. $P(X, Y, Z)$ ) to simpler ones (e.g $P(X)$ ). To quickly prove the first expression, let us decompose $P(X, Y, Z)$ using the Chain Rule:

\begin{aligned} P(X) & =\sum_{y} P(X, y) \\ & =\sum_{y} P(y \mid X) P(X) \\ & =P(X) \sum_{y} P(y \mid X) \\ & =P(X) \end{aligned}

Since the term in the summation sums to 1 by definition.

Random Variables & Probability Distributions Discrete Random Variables