Multivariate Distributions

Up to this point we have mostly focused on single (univariate) random variables (i.e. $f_{X}(x)$ ). However, in certain experiments it might be appropriate to explore the relationships between several random variables, such as the relationship between blood cholesterol and heart disease. In this section, we will mostly focus on the bivariate case (e.g. $\left.f_{X, Y}(x, y)\right)$ - that is, pairs of random variables $(X, Y)$ - as the main concepts can easily be extended to the general multivariate case.

Joint Cumulative Distribution Function

We begin by defining the joint cumulative distribution function (CDF), which we recall from our analysis of single (univariate) random variables is the same for discrete and continuous random variables:

The joint cumulative distribution function (joint CDF) of the random variables $X$ and $Y$ is given by the function

F_{X, Y}(x, y)=P(X \leq x, Y \leq y) \quad \text { for } x, y \in \mathbb{R}

Recall that the comma here represents the intersection between $X$ and $Y$ (i.e. $X \cap Y$ ). The function $F_{X, Y}(x, y)$ evaluated at a point $(x, y)$ is the probability that the random variable $X$ takes a value less than or equal to $x$ , and the random variable $Y$ takes a value less than or equal to $y$ .

It should be noted that this definition also applies to mixed distributions (i.e. when $X$ and $Y$ are a mixture of discrete and continuous random variables). Joint CDFs have the following properties:

$F_{X, Y}(x, y) \in[0,1]$ for all $(x, y) \in \mathbb{R}$
$F_{X, Y}(x, y) \rightarrow 0$ as $x \rightarrow-\infty$ or $y \rightarrow-\infty$
$F_{X, Y}(x, y) \rightarrow F_{X}(x)$ as $y \rightarrow \infty$ and $F_{X, Y}(x, y) \rightarrow F_{Y}(y)$ as $x \rightarrow \infty$

Property 3 stems from the fact that if $y \rightarrow \infty$ , then the event $Y \leq y$ occurs with probability 1 (and vice versa for the case where $x \rightarrow \infty)$ .

Joint Discrete Random Variables

Suppose that the random variables $X, Y$ are both discrete. It is straightforward to generalise the univariate definition of the PMF to the bivariate (and multivariate) case:

The joint probability mass function (joint PMF) of the discrete random variables $X$ and $Y$ is given by the function

f_{X, Y}(x, y)=P(X=x, Y=y) \quad \text { for } x, y \in \mathbb{R}

The function $f_{X, Y}(x, y)$ evaluated at a point $(x, y)$ is the probability that the random variable $X$ takes the value $x$ , and the random variable $Y$ takes the value $y$ .

If we know the joint $\operatorname{PMF} f_{X, Y}(x, y)$ , we can compute the joint $\operatorname{CDF} F_{X, Y}(x, y)$ by summation of the appropriate probabilities:

F_{X, Y}(x, y)=\sum_{u \leq x} \sum_{v \leq y} f_{X, Y}(u, v)

As in the univariate case, the total probability (i.e. summation over all values for $x$ and $y$ ) must equal 1 :

\sum_{\forall x} \sum_{\forall y} f_{X, Y}(x, y)=1

We can display the values of the joint PMF in a two-dimensional table, with the rows representing the values of $X$ and the columns representing the values of $Y$ .

Example: Joint PMF for Two Fair Dice

Let's consider our favourite PMF example where we roll two fair dice. We will define $X$ to be the usual sum of the two dice (see the Examples in Section 2.2 and the PMF shown in Figure 3), and $Y$ to be the larger of the two numbers (i.e. $Y=\max \left(\omega_{1}, \omega_{2}\right)$ ).

We can compute each of the entries for $f_{X, Y}(x, y)$ by fixing a value for $Y=y$ (the largest number on either die), fixing a value for $X=x$ (the sum of the two die), and then seeing the there exists an outcome $\omega$ such that $x=y+\omega$ .

\begin{aligned} \text { if } \omega \notin\{1,2,3,4,5,6\}: & f_{X, Y}(x, y)=0 \\ \text { if } \omega \in\{1,2,3,4,5,6\} \text { and } y=\frac{x}{2}: & f_{X, Y}(x, y)=\frac{1}{36} \\ \text { if } \omega \in\{1,2,3,4,5,6\} \text { and } y \neq \frac{x}{2}: & f_{X, Y}(x, y)=\frac{2}{36} \end{aligned}

Let's see how this applies to a few scenarios for $Y=3$ :

\begin{aligned} & f_{X, Y}(X=3, Y=3)=P(X=3, Y=3)=0 \\ & f_{X, Y}(X=4, Y=3)=P(X=4, Y=3)=\frac{2}{36} \\ & f_{X, Y}(X=5, Y=3)=P(X=5, Y=3)=\frac{2}{36} \\ & f_{X, Y}(X=6, Y=3)=P(X=6, Y=3)=\frac{1}{36} \end{aligned}

By enumerating over all $X$ and $Y$ , we can compute the entire joint PMF $f_{X, Y}(x, y)$ as:

$f_{X, Y}(x, y)$	$Y=1$	$Y=2$	$Y=3$	$Y=4$	$Y=5$	$Y=6$
$\mathrm{X}=2$	$1 / 36$	0	0	0	0	0
$\mathrm{X}=3$	0	$2 / 36$	0	0	0	0
$\mathrm{X}=4$	0	$1 / 36$	$2 / 36$	0	0	0
$\mathrm{X}=5$	0	0	$2 / 36$	$2 / 36$	0	0
$\mathrm{X}=6$	0	0	$1 / 36$	$2 / 36$	$2 / 36$	0
$\mathrm{X}=7$	0	0	0	$2 / 36$	$2 / 36$	$2 / 36$
$\mathrm{X}=8$	0	0	0	$1 / 36$	$2 / 36$	$2 / 36$
$\mathrm{X}=9$	0	0	0	0	$2 / 36$	$2 / 36$
$\mathrm{X}=10$	0	0	0	0	$1 / 36$	$2 / 36$
$\mathrm{X}=11$	0	0	0	0	0	$2 / 36$
$\mathrm{X}=12$	0	0	0	0	0	$1 / 36$

From the joint PMF we can directly compute a number of joint queries:

$F_{X, Y}(4,4)=\frac{6}{36}$
$F_{X, Y}(12,6)=1$
$P(X=6,3<Y \leq 6)=\frac{4}{36}$

Joint Continuous Random Variables

Let us now consider the case where random variables $X$ and $Y$ are both continuous.

The random variables $X$ and $Y$ are jointly continuous if there exists a function $f_{X, Y}(x, y)$ and a joint probability density function (joint PDF) of $X, Y$ , with the following property:

P[(X, Y) \in A]=\iint_{A} f_{X, Y}(x, y) d x d y

for any region $A$ . Integrating the PDF over a region $A$ gives the probability that $X, Y$ take values within that region. The order of integration is not important and is interchangeable (as we shall see in the example below).

If the region $A$ of interest is rectangular, we have:

P\left(x_{1} \leq X \leq x_{2}, y_{1} \leq Y \leq y_{2}\right)=\int_{y_{1}}^{y_{2}} \int_{x_{1}}^{x_{2}} f_{X, Y}(x, y) d x d y

The total probability must be equal to 1 , so if we integrate the joint PDF everywhere, we obtain:

\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X, Y}(x, y) d x d y=1

The joint CDF and PDF are linked in the usual way, but we need to integrate/differentiate twice,

\begin{aligned} F_{X, Y}(x, y) & =\int_{-\infty}^{y} \int_{-\infty}^{x} f_{X, Y}(u, v) d u d v \\ f_{X, Y}(x, y) & =\frac{\partial^{2}}{\partial x \partial y} F_{X, Y}(x, y) \end{aligned}

Example: A Simple Joint PDF for Continuous RVs

Consider the PDF for two continuous random variables, $X$ and $Y$ , given by:

f_{X, Y}(x, y)= \begin{cases}k x^{2} y & \text { if } x, y \in[0,1] \\ 0 & \text { otherwise }\end{cases}

(a) Find the value of $k$ so that $f_{X, Y}(x, y)$ is a valid joint PDF.

Solution:

The expression above is a joint PDF if and only if $\iint f_{X, Y}(x, y) d x d y=1$ . So integrating the PDF over the correct range $([0,1])$ , and integrating over $x$ first we have:

\begin{aligned} \int_{0}^{1} \int_{0}^{1} k x^{2} y d x d y=1 & =k \int_{0}^{1} \int_{0}^{1} x^{2} y d x d y \\ 1 & =k \int_{0}^{1}\left[\frac{x^{3} y}{3}\right]_{x=0}^{x=1} d y \\ 1 & =k \int_{0}^{1} \frac{y}{3} d y \\ 1 & =k\left[\frac{y^{2}}{6}\right]_{y=0}^{y=1} \\ 1 & =k \frac{1}{6} \end{aligned}

Thus $k=6$ . The complete expression for $f_{X, Y}$ then becomes:

f_{X, Y}(x, y)= \begin{cases}6 x^{2} y & \text { if } x, y \in[0,1] \\ 0 & \text { otherwise }\end{cases}

(b) Compute $P(X>Y)$ .

Solution:

As we mentioned in the definition above, we need to integrate over the area $A$ defined by $X>Y$ , and the order of integration (i.e. integrating over $x$ or $y$ first) should not matter. However, for nonrectangular areas (i.e. there is a dependence between $x$ and $y$ ) we must be careful in determining our range of integration.

Trick for Determining Integration Bounds for Non-rectangular Areas:

1.) First construct a table that specifies the bounds on each of the variables. For this particular example we have:

Upper bound	1	$\mathrm{x}$
$1 \geq \mathrm{x}>\mathrm{y} \geq 0$
Lower bound	$\mathrm{y}$	0

2.) Now we integrate the first variable over the range specified in the table above. The two possibilities for $x$ and $y$ are summarised in red below:

\iint_{y}^{1} f_{X, Y}(x, y) d x d y

\iint_{0}^{x} f_{X, Y}(x, y) d y d x

3.) Specifying the outer integration range for the second variable is a bit trickier, as we can no longer use the first variable in the range (it will have already been removed from the expression due to the first integration). In the case, if any of first variables appear in the range of the outer integration, we set them to the values of their corresponding bounds (as read from the table), which is shown in blue below:

\begin{aligned} \int_{0}^{x_{U B}} \int_{y}^{1} f_{X, Y}(x, y) d x d y & =\int_{0}^{1} \int_{y}^{1} f_{X, Y}(x, y) d x d y \\ \int_{y_{L B}}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x & =\int_{0}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x \end{aligned}

Let us now compute $P(X>Y)$ by integrating over $y$ first and then $x$ :

\begin{aligned} P(X>Y) & =\int_{0}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x=\int_{0}^{1} \int_{0}^{x} 6 x^{2} y d y d x \\ P(X>Y) & =\int_{0}^{1} 6 x^{2}\left[\frac{y^{2}}{2}\right]_{0}^{x} d x=\int_{0}^{1} 6 x^{2}\left[\frac{x^{2}}{2}\right] d x \\ & =\int_{0}^{1} 3 x^{4} d x=3\left[\frac{x^{5}}{5}\right]_{0}^{1} \\ & =\frac{3}{5} \end{aligned}

We leave it to the reader to prove we get the same answer when integrating over $x$ first and then $y$ .

Marginal Distributions

We introduced the concept of marginalisation in Section 2.1.1, and we saw that it is essentially the Total Law of probability applied to random variables. Here we provide a more formal definition of marginalisation for discrete and continuous random variables and utilise examples to illustrate.

The marginal probability mass functions (marginal PMFs) of the discrete random variables $X$ and $Y$ are the functions

f_{X}(x)=\sum_{\forall y} f_{X, Y}(x, y) \quad \text { and } \quad f_{Y}(y)=\sum_{\forall x} f_{X, Y}(x, y)

The marginal of $X$ gives us the probability distribution of the random variable $X$ alone, ignoring any information about $Y$ , and vice versa.

Example: Marginalisation of the Joint PMF for Two Dice

Consider again our example of a joint PMF $f_{X, Y}(x, y)$ , where $X$ was defined to be sum of the two dice and $Y$ to be the larger of the two numbers. Using the above definition, we can compute $f_{X}(x)$ by summing over all $y$ (i.e. over all columns) for a given $x$ ; the resulting values are shown in the far right column below. Likewise, we can compute $f_{Y}(y)$ by summing over all $x$ (i.e. over all rows) for a given $y$ ; the resulting values are shown in the bottom row below:

$f_{X, Y}(x, y)$	$Y=1$	$Y=2$	$Y=3$	$Y=4$	$Y=5$	$Y=6$	$f_{X}(x)$
$\mathrm{X}=2$	$1 / 36$	0	0	0	0	0	$1 / 36$
$\mathrm{X}=3$	0	$2 / 36$	0	0	0	0	$2 / 36$
$\mathrm{X}=4$	0	$1 / 36$	$2 / 36$	0	0	0	$3 / 36$
$\mathrm{X}=5$	0	0	$2 / 36$	$2 / 36$	0	0	$4 / 36$
$\mathrm{X}=6$	0	0	$1 / 36$	$2 / 36$	$2 / 36$	0	$5 / 36$
$\mathrm{X}=7$	0	0	0	$2 / 36$	$2 / 36$	$2 / 36$	$6 / 36$
$\mathrm{X}=8$	0	0	0	$1 / 36$	$2 / 36$	$2 / 36$	$5 / 36$
$\mathrm{X}=9$	0	0	0	0	$2 / 36$	$2 / 36$	$4 / 36$
$\mathrm{X}=10$	0	0	0	0	$1 / 36$	$2 / 36$	$3 / 36$
$\mathrm{X}=11$	0	0	0	0	0	$2 / 36$	$2 / 36$
$\mathrm{X}=12$	0	0	0	0	0	$1 / 36$	$1 / 36$
$f_{Y}(y)$	$1 / 36$	$3 / 36$	$5 / 36$	$7 / 36$	$9 / 36$	$11 / 36$

Notice that the resulting $f_{X}(x)$ and $f_{Y}(y)$ are indeed PMFs as they satisfy the two requirements (i.e. they are bounded between zero and one, and they sum to one). The univariate PMFs $f_{X}(x)$ and $f_{Y}(y)$ are commonly written in the margins of the joint PMF table (as shown above), and are known as the marginals (hence the term "Marginalisation").

Pause and Reflect 1: Recall we defined marginalisation in Section 2.1.1 as $P(X)=\sum_{\forall y} P(X, y)$ . We have simply re-expressed this as $f_{X}(x)=\sum_{\forall y} f_{X, Y}(x, y)$ using PMF functions for discrete random variables; these two expressions are identical.

Pause and Reflect 2: Does $f_{X}(x)$ looks familiar? Check out Figure 3 and the table above it.

Pause and Reflect 3: Note that it is generally not possible to go the other way around; that is, to reconstruct the full joint PMF table $f_{X, Y}(x, y)$ from $f_{X}(x)$ and $f_{Y}(y)$ alone (unless the random variables are independent, which will be discussed in the next section).

In an analogous fashion we can define marginalisation for continuous random variables as follows:

The marginal probability density functions (marginal PDFs) of the continuous random variables $X$ and $Y$ are the functions

f_{X}(x)=\int_{-\infty}^{\infty} f_{X, Y}(x, y) d y \quad \text { and } \quad f_{Y}(y)=\int_{-\infty}^{\infty} f_{X, Y}(x, y) d x

This is essentially the same as the previous definition; in a continuous setting, getting rid of a variable requires integration rather than summation. Let's consider a few examples to illustrate marginalising joint PDFs.

Example: Marginalisation of Joint PDFs for Continuous RVs

1.) Find $f_{X}(x)$ for the following joint PDF:

\begin{gathered} f_{X, Y}(x, y)=6(1-x-y) \quad \text { for } 0 \leq x+y \leq 1 \\ \text { and } 0 \leq x \leq 1,0 \leq y \leq 1 \\ f_{X}(x) \quad=\int_{0}^{1-x} 6(1-x-y) d y \\ =[6 y]_{0}^{1-x}-[6 x y]_{0}^{1-x}-\left[3 y^{2}\right]_{0}^{1-x} \\ =3-6 x+3 x^{2} \end{gathered}

2.) Find the marginal density functions $f_{X}(x)$ and $f_{Y}(y)$ for the following joint PDF:

\begin{gathered} f_{X, Y}(x, y)=\frac{3}{2} x^{3} y^{2} \quad \text { for } 0<y<2, \quad 0<x<1 \\ f_{X}(x)=\frac{3}{2} x^{3} \int_{0}^{2} y^{2} d y=\frac{3}{2} x^{3}\left[\frac{y^{3}}{3}\right]_{0}^{2}=4 x^{3} \\ f_{Y}(y)=\frac{3}{2} y^{2} \int_{0}^{1} x^{3} d y=\frac{3}{2} y^{2}\left[\frac{x^{4}}{4}\right]_{0}^{1}=\frac{3}{8} y^{2} \end{gathered}

Independence of Two Random Variables

Though we have previously encountered independent random variables, we can now give a formal definition.

For the random variables $X$ and $Y$ , if the events $X \leq x$ and $Y \leq y$ are independent for all $x, y \in \mathbb{R}$ , then

F_{X, Y}=P(X \leq x, Y \leq y)=P(X \leq x) P(Y \leq y)=F_{X}(x) F_{Y}(y)

and we say that $X$ and $Y$ are independent random variables.

If $X, Y$ are independent, it is straightforward to show that $f_{X, Y}(x, y)=f_{X}(x) f_{Y}(y)$ , which for discrete random variables is exactly what we had in Chapter $1(P(A, B)=P(A) P(B)$ if $A$ and $B$ are independent). We will exploit this property to determine if $X$ and $Y$ are independent, where we will be left with functions $f_{X}(x)$ and $f_{Y}(y)$ rather than numbers (e.g. $P(A)$ and $P(B)$ ).

Example: Independence of a Simple Joint PDF

1.) Consider the previous example for $f_{X, Y}(x, y)=6 x^{2} y$ (for $x, y \in[0,1]$ ); are $X$ and $Y$ independent?

Solution:

From the definition of conditional independence, $X$ and $Y$ are independent if and only if $f_{X, Y}(x, y)=$ $f_{X}(x) f_{Y}(y)$ ; therefore we need to compute $f_{X}$ and $f_{Y}$ . This can be done using marginalisation:

\begin{aligned} & f_{X}(x)=6 x^{2} \int_{0}^{1} y d y=6 x^{2}\left[\frac{y^{2}}{2}\right]_{0}^{1}=3 x^{2} \\ & f_{Y}(y)=6 y \int_{0}^{1} x^{2} d x=6 y\left[\frac{x^{3}}{3}\right]_{0}^{1}=2 y \end{aligned}

Combining the terms to compute $f_{X}(x) f_{Y}(x)=3 x^{2} \cdot 2 y=6 x^{2} y$

2.) How about for the joint PDF defined by $f_{X, Y}(x, y)=\frac{3}{2} x^{3} y^{2}$ in the example above? Do the marginal PDFs show that $X$ and $Y$ independent?

f_{X}(x) f_{Y}(x)=4 x^{3} \cdot \frac{3}{8} y^{2}=\frac{3}{2} x^{3} y^{2}

Conditional Distributions

Now suppose that we know the value taken by the random variable $X$ , and we wish to work out how this affects the distribution of $Y$ . In the discrete case, this is a simple conditional probability calculation:

P(Y=y \mid X=x)=\frac{P(X=x, Y=y)}{P(X=x)}

In this expression, the RHS is the ratio of the joint PMF and the marginal PMF of $X$ . The LHS is a conditional PMF, and we can use the same definition for the general case.

The conditional probability mass/density function (conditional PMF/PDF) of the random variables $X$ and $Y$ is the function

f_{Y \mid X}(y \mid x)=\frac{f_{X, Y}(x, y)}{f_{X}(x)} \quad \text { where } x, y \in \mathbb{R} \text { and } f_{X}(x)>0

and similarly for $f_{Y \mid X}(\cdot \mid x)$ . The function $f_{Y \mid X}(\cdot \mid x)$ gives the distribution of the random variable $Y$ conditional on the event $X=x$ .

We have previously mentioned that $f_{X, Y}(x, y)=f_{X}(x) f_{Y}(y)$ if $X, Y$ are independent. In this case the conditional PMF/PDF reduces to:

f_{Y \mid X}(y \mid x)=\frac{f_{X, Y}(x, y)}{f_{X}(x)}=\frac{f_{X}(x) f_{Y}(y)}{f_{X}(x)}=f_{Y}(y)

This is not surprising; if the random variables are independent, the value of $X$ does not contain any information about the value of $Y$ , so the conditional is equal to the marginal (compare to similar derivation in Section 1.7).

Expectation for Multivariate Distributions

Now that we can compute the conditional PMFs and PDFs $f_{X \mid Y}(x \mid y)$ (or $f_{Y \mid X}(y \mid x)$ ), we can define expectations of conditional distributions:

The conditional expectation of the random variable $X$ given $Y=y$ is defined as:

E[X \mid Y=y]= \begin{cases}\sum_{\forall x} x f_{X \mid Y}(x \mid y) & \text { if } \mathrm{X} \text { is discrete } \\ \int_{-\infty}^{\infty} x f_{X \mid Y}(x \mid y) d x & \text { if } \mathrm{X} \text { is continuous }\end{cases}

Conditional expectation allows us to compute the average value of $X$ if we already know the value of $Y$ .

To compute the expectation of a joint function of two random variables, we sum/integrate at every point $(x, y)$ the value of the function multiplied by the joint PMF/PDF:

E[g(X, Y)]= \begin{cases}\sum_{\forall x} \sum_{\forall y} g(x, y) f_{X, Y}(x, y) & \text { if X is discrete } \\ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x, y) f_{X, Y}(x, y) d x d y & \text { if X is continuous }\end{cases}

As we have shown already, expectation is a linear operator, so for random variables $X, Y$ and constants $a, b, c \in \mathbb{R}$ , we have

E[a X+b Y+c]=a E[X]+b E[Y]+c

Example: Expectation of a Joint PDF

Consider an electrical circuit with two resistors $\left(R_{X}, R_{Y}\right)$ wired in parallel. Let's define continuous random variables $X$ and $Y$ to be the resistance for each resistor, which varies between 10 and 20 ohms according to the following PDF:

f_{X, Y}(x, y)=\frac{1}{3000}(x+y) \quad 10 \leq x \leq 20,10 \leq y \leq 20

The resistance $R$ of a parallel circuit is traditionally defined as:

\frac{1}{R}=\frac{1}{X}+\frac{1}{Y}

Given that the resistances are distributed according to the above joint PDF, what is the expected resistance of the circuit?

Solution:

\begin{aligned} E[R(X, Y)] & =\int_{10}^{20} \int_{10}^{20} R(x, y) f_{X, Y}(x, y) d x d y \\ E[R(X, Y)] & =\frac{1}{3000} \int_{10}^{20} \int_{10}^{20} \frac{x y}{x+y} \cdot(x+y) d x d y \\ & =\frac{1}{3000}\left[\frac{y^{2}}{2}\right]_{10}^{20}\left[\frac{x^{2}}{2}\right]_{10}^{20}=7.5 \end{aligned}

Covariance

It is often useful to characterise the nature of the dependence between two random variables. Here we will just consider the covariance, as defined below.

The covariance of two random variables $X$ and $Y$ is defined as

\operatorname{Cov}[X, Y]=E[(X-E[X])(Y-E[Y])]=E[X Y]-E[X] E[Y]

Covariance is a measure of linear association between two random variables.

Positive covariance indicates that large values of $X$ tend to be associated with large values of $Y$ . The higher the covariance, the stronger the relationship. Conversely, if $\operatorname{Cov}[X, Y]<0$ , then large values of $X$ tend to be associated with small values of $Y$ , and vice versa. A covariance near zero indicates that there is no simple linear relationship between the two variables.

Consider the case where $X$ and $Y$ are independent random variables:

\begin{aligned} E[X Y] & =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_{X, Y}(x, y) d x d y=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_{X}(x) f_{Y}(y) d x d y \\ & =\int_{-\infty}^{\infty} x f_{X}(x) d x \int_{-\infty}^{\infty} y f_{Y}(y) d y=E[X] E[Y] \end{aligned}

which results in $\operatorname{Cov}[X, Y]=0$ (and similarly for the discrete case). However, we must be careful to note that the converse does not hold! That is, if $\operatorname{Cov}[X, Y]=0$ we cannot conclude that $X$ and $Y$ are necessarily independent.

Example: Covariance of the Joint PMF for Two Dice

For one last time recall the joint PMF $f_{X, Y}(x, y)$ , where $X$ was defined to be the sum of two dice and $Y$ to be the larger of the two numbers. We have also computed the marginal functions $f_{X}(x)$ and $f_{Y}(y)$ in an earlier example. The covariance between discrete random variables $X$ and $Y$ is:

\operatorname{Cov}[X, Y]=E[X Y]-E[X] E[Y]

Recall we have already computed $E[X]=7$ (expectation for the sum of two dice). Thus all we need is $E[X Y]$ and $E[Y]$ :

\begin{aligned} E[X Y] & =\sum_{x} \sum_{y} x y f_{X, Y}(x, y) \\ & =2 \cdot 1 \cdot \frac{1}{36}+3 \cdot 2 \cdot \frac{2}{36}+4 \cdot 2 \cdot \frac{1}{36}+4 \cdot 3 \cdot \frac{2}{36}+5 \cdot 3 \cdot \frac{2}{36}+5 \cdot 4 \cdot \frac{2}{36} \ldots \\ & =\frac{1232}{36} \approx 34.2222 \\ E[Y]= & \sum_{\forall y} y f_{Y}(y) \\ E[Y]= & 1 \cdot \frac{1}{36}+2 \cdot \frac{3}{36}+3 \cdot \frac{5}{36}+4 \cdot \frac{7}{36}+5 \cdot \frac{9}{36}+6 \cdot \frac{11}{36} \approx 4.472 \\ & \operatorname{Cov}[X, Y]=34.2222-7 \cdot 4.472=2.91 \end{aligned}

This result for the covariance makes intuitive sense, since as the value for the larger of the two numbers increases (i.e. $Y$ increases), the value of the sum of the two dice $(X)$ is also expected to increase.

For random variables $X, Y, Z$ and constants $a, b, c, d$ , we can use the properties of expectation $(E[a X]=$ $a E[X]$ and $E[a]=a$ for some constant $a$ ) to derive the following relationships:

$\operatorname{Cov}[X, a]=0$
$\operatorname{Cov}[a X+b, c Y+d]=a c \operatorname{Cov}[X, Y]$
$\operatorname{Cov}[X+Y, Z]=\operatorname{Cov}[X, Z]+\operatorname{Cov}[Y, Z]$

We leave it to the reader to prove these properties.

We say that covariance is a bilinear operator, in the sense that it is linear in both its inputs. Notice that $\operatorname{Cov}[X, X]=\operatorname{Var}[X]$ , so the second of the properties above explains why $\operatorname{Var}[a X+b]=a^{2} \operatorname{Var}[X]$ . The third property implies that

\begin{aligned} \operatorname{Var}[X+Y] & =\operatorname{Cov}[X+Y, Y+X]=\operatorname{Cov}[X, X]+\operatorname{Cov}[X, Y]+\operatorname{Cov}[Y, X]+\operatorname{Cov}[Y, Y] \\ & =\operatorname{Var}[X]+2 \operatorname{Cov}[X, Y]+\operatorname{Var}[Y] \end{aligned}

A deficiency of covariance is that it depends on the units of measurement. For example, if $X$ and $Y$ are time measurements and we change the units from minutes to seconds, the covariance becomes $\operatorname{Cov}[60 X, 60 Y]=3600 \operatorname{Cov}[X, Y]$ .

Continuous Random Variables Common Probability Distributions