Artificial Intelligence 🤖
Multivariate Distributions

Multivariate Distributions

Up to this point we have mostly focused on single (univariate) random variables (i.e. fX(x)f_{X}(x) ). However, in certain experiments it might be appropriate to explore the relationships between several random variables, such as the relationship between blood cholesterol and heart disease. In this section, we will mostly focus on the bivariate case (e.g. fX,Y(x,y))\left.f_{X, Y}(x, y)\right) - that is, pairs of random variables (X,Y)(X, Y) - as the main concepts can easily be extended to the general multivariate case.

Joint Cumulative Distribution Function

We begin by defining the joint cumulative distribution function (CDF), which we recall from our analysis of single (univariate) random variables is the same for discrete and continuous random variables:

The joint cumulative distribution function (joint CDF) of the random variables XX and YY is given by the function

FX,Y(x,y)=P(Xx,Yy) for x,yRF_{X, Y}(x, y)=P(X \leq x, Y \leq y) \quad \text { for } x, y \in \mathbb{R}

Recall that the comma here represents the intersection between XX and YY (i.e. XYX \cap Y ). The function FX,Y(x,y)F_{X, Y}(x, y) evaluated at a point (x,y)(x, y) is the probability that the random variable XX takes a value less than or equal to xx, and the random variable YY takes a value less than or equal to yy.

It should be noted that this definition also applies to mixed distributions (i.e. when XX and YY are a mixture of discrete and continuous random variables). Joint CDFs have the following properties:

  1. FX,Y(x,y)[0,1]F_{X, Y}(x, y) \in[0,1] for all (x,y)R(x, y) \in \mathbb{R}
  2. FX,Y(x,y)0F_{X, Y}(x, y) \rightarrow 0 as xx \rightarrow-\infty or yy \rightarrow-\infty
  3. FX,Y(x,y)FX(x)F_{X, Y}(x, y) \rightarrow F_{X}(x) as yy \rightarrow \infty and FX,Y(x,y)FY(y)F_{X, Y}(x, y) \rightarrow F_{Y}(y) as xx \rightarrow \infty

Property 3 stems from the fact that if yy \rightarrow \infty, then the event YyY \leq y occurs with probability 1 (and vice versa for the case where x)x \rightarrow \infty).

Joint Discrete Random Variables

Suppose that the random variables X,YX, Y are both discrete. It is straightforward to generalise the univariate definition of the PMF to the bivariate (and multivariate) case:

The joint probability mass function (joint PMF) of the discrete random variables XX and YY is given by the function

fX,Y(x,y)=P(X=x,Y=y) for x,yRf_{X, Y}(x, y)=P(X=x, Y=y) \quad \text { for } x, y \in \mathbb{R}

The function fX,Y(x,y)f_{X, Y}(x, y) evaluated at a point (x,y)(x, y) is the probability that the random variable XX takes the value xx, and the random variable YY takes the value yy.

If we know the joint PMFfX,Y(x,y)\operatorname{PMF} f_{X, Y}(x, y), we can compute the joint CDFFX,Y(x,y)\operatorname{CDF} F_{X, Y}(x, y) by summation of the appropriate probabilities:

FX,Y(x,y)=uxvyfX,Y(u,v)F_{X, Y}(x, y)=\sum_{u \leq x} \sum_{v \leq y} f_{X, Y}(u, v)

As in the univariate case, the total probability (i.e. summation over all values for xx and yy ) must equal 1 :

xyfX,Y(x,y)=1\sum_{\forall x} \sum_{\forall y} f_{X, Y}(x, y)=1

We can display the values of the joint PMF in a two-dimensional table, with the rows representing the values of XX and the columns representing the values of YY.

Example: Joint PMF for Two Fair Dice

Let's consider our favourite PMF example where we roll two fair dice. We will define XX to be the usual sum of the two dice (see the Examples in Section 2.2 and the PMF shown in Figure 3), and YY to be the larger of the two numbers (i.e. Y=max(ω1,ω2)Y=\max \left(\omega_{1}, \omega_{2}\right) ).

We can compute each of the entries for fX,Y(x,y)f_{X, Y}(x, y) by fixing a value for Y=yY=y (the largest number on either die), fixing a value for X=xX=x (the sum of the two die), and then seeing the there exists an outcome ω\omega such that x=y+ωx=y+\omega.

 if ω{1,2,3,4,5,6}:fX,Y(x,y)=0 if ω{1,2,3,4,5,6} and y=x2:fX,Y(x,y)=136 if ω{1,2,3,4,5,6} and yx2:fX,Y(x,y)=236\begin{aligned} \text { if } \omega \notin\{1,2,3,4,5,6\}: & f_{X, Y}(x, y)=0 \\ \text { if } \omega \in\{1,2,3,4,5,6\} \text { and } y=\frac{x}{2}: & f_{X, Y}(x, y)=\frac{1}{36} \\ \text { if } \omega \in\{1,2,3,4,5,6\} \text { and } y \neq \frac{x}{2}: & f_{X, Y}(x, y)=\frac{2}{36} \end{aligned}

Let's see how this applies to a few scenarios for Y=3Y=3 :

fX,Y(X=3,Y=3)=P(X=3,Y=3)=0fX,Y(X=4,Y=3)=P(X=4,Y=3)=236fX,Y(X=5,Y=3)=P(X=5,Y=3)=236fX,Y(X=6,Y=3)=P(X=6,Y=3)=136\begin{aligned} & f_{X, Y}(X=3, Y=3)=P(X=3, Y=3)=0 \\ & f_{X, Y}(X=4, Y=3)=P(X=4, Y=3)=\frac{2}{36} \\ & f_{X, Y}(X=5, Y=3)=P(X=5, Y=3)=\frac{2}{36} \\ & f_{X, Y}(X=6, Y=3)=P(X=6, Y=3)=\frac{1}{36} \end{aligned}

By enumerating over all XX and YY, we can compute the entire joint PMF fX,Y(x,y)f_{X, Y}(x, y) as:

fX,Y(x,y)f_{X, Y}(x, y)Y=1Y=1Y=2Y=2Y=3Y=3Y=4Y=4Y=5Y=5Y=6Y=6
X=2\mathrm{X}=21/361 / 3600000
X=3\mathrm{X}=302/362 / 360000
X=4\mathrm{X}=401/361 / 362/362 / 36000
X=5\mathrm{X}=5002/362 / 362/362 / 3600
X=6\mathrm{X}=6001/361 / 362/362 / 362/362 / 360
X=7\mathrm{X}=70002/362 / 362/362 / 362/362 / 36
X=8\mathrm{X}=80001/361 / 362/362 / 362/362 / 36
X=9\mathrm{X}=900002/362 / 362/362 / 36
X=10\mathrm{X}=1000001/361 / 362/362 / 36
X=11\mathrm{X}=11000002/362 / 36
X=12\mathrm{X}=12000001/361 / 36

From the joint PMF we can directly compute a number of joint queries:

  1. FX,Y(4,4)=636F_{X, Y}(4,4)=\frac{6}{36}
  2. FX,Y(12,6)=1F_{X, Y}(12,6)=1
  3. P(X=6,3<Y6)=436P(X=6,3<Y \leq 6)=\frac{4}{36}

Joint Continuous Random Variables

Let us now consider the case where random variables XX and YY are both continuous.

The random variables XX and YY are jointly continuous if there exists a function fX,Y(x,y)f_{X, Y}(x, y) and a joint probability density function (joint PDF) of X,YX, Y, with the following property:

P[(X,Y)A]=AfX,Y(x,y)dxdyP[(X, Y) \in A]=\iint_{A} f_{X, Y}(x, y) d x d y

for any region AA. Integrating the PDF over a region AA gives the probability that X,YX, Y take values within that region. The order of integration is not important and is interchangeable (as we shall see in the example below).

If the region AA of interest is rectangular, we have:

P(x1Xx2,y1Yy2)=y1y2x1x2fX,Y(x,y)dxdyP\left(x_{1} \leq X \leq x_{2}, y_{1} \leq Y \leq y_{2}\right)=\int_{y_{1}}^{y_{2}} \int_{x_{1}}^{x_{2}} f_{X, Y}(x, y) d x d y

The total probability must be equal to 1 , so if we integrate the joint PDF everywhere, we obtain:

fX,Y(x,y)dxdy=1\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X, Y}(x, y) d x d y=1

The joint CDF and PDF are linked in the usual way, but we need to integrate/differentiate twice,

FX,Y(x,y)=yxfX,Y(u,v)dudvfX,Y(x,y)=2xyFX,Y(x,y)\begin{aligned} F_{X, Y}(x, y) & =\int_{-\infty}^{y} \int_{-\infty}^{x} f_{X, Y}(u, v) d u d v \\ f_{X, Y}(x, y) & =\frac{\partial^{2}}{\partial x \partial y} F_{X, Y}(x, y) \end{aligned}

Example: A Simple Joint PDF for Continuous RVs

Consider the PDF for two continuous random variables, XX and YY, given by:

fX,Y(x,y)={kx2y if x,y[0,1]0 otherwise f_{X, Y}(x, y)= \begin{cases}k x^{2} y & \text { if } x, y \in[0,1] \\ 0 & \text { otherwise }\end{cases}

(a) Find the value of kk so that fX,Y(x,y)f_{X, Y}(x, y) is a valid joint PDF.

Solution:

The expression above is a joint PDF if and only if fX,Y(x,y)dxdy=1\iint f_{X, Y}(x, y) d x d y=1. So integrating the PDF over the correct range ([0,1])([0,1]), and integrating over xx first we have:

0101kx2ydxdy=1=k0101x2ydxdy1=k01[x3y3]x=0x=1dy1=k01y3dy1=k[y26]y=0y=11=k16\begin{aligned} \int_{0}^{1} \int_{0}^{1} k x^{2} y d x d y=1 & =k \int_{0}^{1} \int_{0}^{1} x^{2} y d x d y \\ 1 & =k \int_{0}^{1}\left[\frac{x^{3} y}{3}\right]_{x=0}^{x=1} d y \\ 1 & =k \int_{0}^{1} \frac{y}{3} d y \\ 1 & =k\left[\frac{y^{2}}{6}\right]_{y=0}^{y=1} \\ 1 & =k \frac{1}{6} \end{aligned}

Thus k=6k=6. The complete expression for fX,Yf_{X, Y} then becomes:

fX,Y(x,y)={6x2y if x,y[0,1]0 otherwise f_{X, Y}(x, y)= \begin{cases}6 x^{2} y & \text { if } x, y \in[0,1] \\ 0 & \text { otherwise }\end{cases}

(b) Compute P(X>Y)P(X>Y).

Solution:

As we mentioned in the definition above, we need to integrate over the area AA defined by X>YX>Y, and the order of integration (i.e. integrating over xx or yy first) should not matter. However, for nonrectangular areas (i.e. there is a dependence between xx and yy ) we must be careful in determining our range of integration.

Trick for Determining Integration Bounds for Non-rectangular Areas:

1.) First construct a table that specifies the bounds on each of the variables. For this particular example we have:

Upper bound1x\mathrm{x}
1x>y01 \geq \mathrm{x}>\mathrm{y} \geq 0
Lower boundy\mathrm{y}0

2.) Now we integrate the first variable over the range specified in the table above. The two possibilities for xx and yy are summarised in red below:

y1fX,Y(x,y)dxdy\iint_{y}^{1} f_{X, Y}(x, y) d x d y 0xfX,Y(x,y)dydx\iint_{0}^{x} f_{X, Y}(x, y) d y d x

3.) Specifying the outer integration range for the second variable is a bit trickier, as we can no longer use the first variable in the range (it will have already been removed from the expression due to the first integration). In the case, if any of first variables appear in the range of the outer integration, we set them to the values of their corresponding bounds (as read from the table), which is shown in blue below:

0xUBy1fX,Y(x,y)dxdy=01y1fX,Y(x,y)dxdyyLB10xfX,Y(x,y)dydx=010xfX,Y(x,y)dydx\begin{aligned} \int_{0}^{x_{U B}} \int_{y}^{1} f_{X, Y}(x, y) d x d y & =\int_{0}^{1} \int_{y}^{1} f_{X, Y}(x, y) d x d y \\ \int_{y_{L B}}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x & =\int_{0}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x \end{aligned}

Let us now compute P(X>Y)P(X>Y) by integrating over yy first and then xx :

P(X>Y)=010xfX,Y(x,y)dydx=010x6x2ydydxP(X>Y)=016x2[y22]0xdx=016x2[x22]dx=013x4dx=3[x55]01=35\begin{aligned} P(X>Y) & =\int_{0}^{1} \int_{0}^{x} f_{X, Y}(x, y) d y d x=\int_{0}^{1} \int_{0}^{x} 6 x^{2} y d y d x \\ P(X>Y) & =\int_{0}^{1} 6 x^{2}\left[\frac{y^{2}}{2}\right]_{0}^{x} d x=\int_{0}^{1} 6 x^{2}\left[\frac{x^{2}}{2}\right] d x \\ & =\int_{0}^{1} 3 x^{4} d x=3\left[\frac{x^{5}}{5}\right]_{0}^{1} \\ & =\frac{3}{5} \end{aligned}

We leave it to the reader to prove we get the same answer when integrating over xx first and then yy.

Marginal Distributions

We introduced the concept of marginalisation in Section 2.1.1, and we saw that it is essentially the Total Law of probability applied to random variables. Here we provide a more formal definition of marginalisation for discrete and continuous random variables and utilise examples to illustrate.

The marginal probability mass functions (marginal PMFs) of the discrete random variables XX and YY are the functions

fX(x)=yfX,Y(x,y) and fY(y)=xfX,Y(x,y)f_{X}(x)=\sum_{\forall y} f_{X, Y}(x, y) \quad \text { and } \quad f_{Y}(y)=\sum_{\forall x} f_{X, Y}(x, y)

The marginal of XX gives us the probability distribution of the random variable XX alone, ignoring any information about YY, and vice versa.

Example: Marginalisation of the Joint PMF for Two Dice

Consider again our example of a joint PMF fX,Y(x,y)f_{X, Y}(x, y), where XX was defined to be sum of the two dice and YY to be the larger of the two numbers. Using the above definition, we can compute fX(x)f_{X}(x) by summing over all yy (i.e. over all columns) for a given xx; the resulting values are shown in the far right column below. Likewise, we can compute fY(y)f_{Y}(y) by summing over all xx (i.e. over all rows) for a given yy; the resulting values are shown in the bottom row below:

fX,Y(x,y)f_{X, Y}(x, y)Y=1Y=1Y=2Y=2Y=3Y=3Y=4Y=4Y=5Y=5Y=6Y=6fX(x)f_{X}(x)
X=2\mathrm{X}=21/361 / 36000001/361 / 36
X=3\mathrm{X}=302/362 / 3600002/362 / 36
X=4\mathrm{X}=401/361 / 362/362 / 360003/363 / 36
X=5\mathrm{X}=5002/362 / 362/362 / 36004/364 / 36
X=6\mathrm{X}=6001/361 / 362/362 / 362/362 / 3605/365 / 36
X=7\mathrm{X}=70002/362 / 362/362 / 362/362 / 366/366 / 36
X=8\mathrm{X}=80001/361 / 362/362 / 362/362 / 365/365 / 36
X=9\mathrm{X}=900002/362 / 362/362 / 364/364 / 36
X=10\mathrm{X}=1000001/361 / 362/362 / 363/363 / 36
X=11\mathrm{X}=11000002/362 / 362/362 / 36
X=12\mathrm{X}=12000001/361 / 361/361 / 36
fY(y)f_{Y}(y)1/361 / 363/363 / 365/365 / 367/367 / 369/369 / 3611/3611 / 36

Notice that the resulting fX(x)f_{X}(x) and fY(y)f_{Y}(y) are indeed PMFs as they satisfy the two requirements (i.e. they are bounded between zero and one, and they sum to one). The univariate PMFs fX(x)f_{X}(x) and fY(y)f_{Y}(y) are commonly written in the margins of the joint PMF table (as shown above), and are known as the marginals (hence the term "Marginalisation").

Pause and Reflect 1: Recall we defined marginalisation in Section 2.1.1 as P(X)=yP(X,y)P(X)=\sum_{\forall y} P(X, y). We have simply re-expressed this as fX(x)=yfX,Y(x,y)f_{X}(x)=\sum_{\forall y} f_{X, Y}(x, y) using PMF functions for discrete random variables; these two expressions are identical.

Pause and Reflect 2: Does fX(x)f_{X}(x) looks familiar? Check out Figure 3 and the table above it.

Pause and Reflect 3: Note that it is generally not possible to go the other way around; that is, to reconstruct the full joint PMF table fX,Y(x,y)f_{X, Y}(x, y) from fX(x)f_{X}(x) and fY(y)f_{Y}(y) alone (unless the random variables are independent, which will be discussed in the next section).

In an analogous fashion we can define marginalisation for continuous random variables as follows:

The marginal probability density functions (marginal PDFs) of the continuous random variables XX and YY are the functions

fX(x)=fX,Y(x,y)dy and fY(y)=fX,Y(x,y)dxf_{X}(x)=\int_{-\infty}^{\infty} f_{X, Y}(x, y) d y \quad \text { and } \quad f_{Y}(y)=\int_{-\infty}^{\infty} f_{X, Y}(x, y) d x

This is essentially the same as the previous definition; in a continuous setting, getting rid of a variable requires integration rather than summation. Let's consider a few examples to illustrate marginalising joint PDFs.

Example: Marginalisation of Joint PDFs for Continuous RVs

1.) Find fX(x)f_{X}(x) for the following joint PDF:

fX,Y(x,y)=6(1xy) for 0x+y1 and 0x1,0y1fX(x)=01x6(1xy)dy=[6y]01x[6xy]01x[3y2]01x=36x+3x2\begin{gathered} f_{X, Y}(x, y)=6(1-x-y) \quad \text { for } 0 \leq x+y \leq 1 \\ \text { and } 0 \leq x \leq 1,0 \leq y \leq 1 \\ f_{X}(x) \quad=\int_{0}^{1-x} 6(1-x-y) d y \\ =[6 y]_{0}^{1-x}-[6 x y]_{0}^{1-x}-\left[3 y^{2}\right]_{0}^{1-x} \\ =3-6 x+3 x^{2} \end{gathered}

2.) Find the marginal density functions fX(x)f_{X}(x) and fY(y)f_{Y}(y) for the following joint PDF:

fX,Y(x,y)=32x3y2 for 0<y<2,0<x<1fX(x)=32x302y2dy=32x3[y33]02=4x3fY(y)=32y201x3dy=32y2[x44]01=38y2\begin{gathered} f_{X, Y}(x, y)=\frac{3}{2} x^{3} y^{2} \quad \text { for } 0<y<2, \quad 0<x<1 \\ f_{X}(x)=\frac{3}{2} x^{3} \int_{0}^{2} y^{2} d y=\frac{3}{2} x^{3}\left[\frac{y^{3}}{3}\right]_{0}^{2}=4 x^{3} \\ f_{Y}(y)=\frac{3}{2} y^{2} \int_{0}^{1} x^{3} d y=\frac{3}{2} y^{2}\left[\frac{x^{4}}{4}\right]_{0}^{1}=\frac{3}{8} y^{2} \end{gathered}

Independence of Two Random Variables

Though we have previously encountered independent random variables, we can now give a formal definition.

For the random variables XX and YY, if the events XxX \leq x and YyY \leq y are independent for all x,yRx, y \in \mathbb{R}, then

FX,Y=P(Xx,Yy)=P(Xx)P(Yy)=FX(x)FY(y)F_{X, Y}=P(X \leq x, Y \leq y)=P(X \leq x) P(Y \leq y)=F_{X}(x) F_{Y}(y)

and we say that XX and YY are independent random variables.

If X,YX, Y are independent, it is straightforward to show that fX,Y(x,y)=fX(x)fY(y)f_{X, Y}(x, y)=f_{X}(x) f_{Y}(y), which for discrete random variables is exactly what we had in Chapter 1(P(A,B)=P(A)P(B)1(P(A, B)=P(A) P(B) if AA and BB are independent). We will exploit this property to determine if XX and YY are independent, where we will be left with functions fX(x)f_{X}(x) and fY(y)f_{Y}(y) rather than numbers (e.g. P(A)P(A) and P(B)P(B) ).

Example: Independence of a Simple Joint PDF

1.) Consider the previous example for fX,Y(x,y)=6x2yf_{X, Y}(x, y)=6 x^{2} y (for x,y[0,1]x, y \in[0,1] ); are XX and YY independent?

Solution:

From the definition of conditional independence, XX and YY are independent if and only if fX,Y(x,y)=f_{X, Y}(x, y)= fX(x)fY(y)f_{X}(x) f_{Y}(y); therefore we need to compute fXf_{X} and fYf_{Y}. This can be done using marginalisation:

fX(x)=6x201ydy=6x2[y22]01=3x2fY(y)=6y01x2dx=6y[x33]01=2y\begin{aligned} & f_{X}(x)=6 x^{2} \int_{0}^{1} y d y=6 x^{2}\left[\frac{y^{2}}{2}\right]_{0}^{1}=3 x^{2} \\ & f_{Y}(y)=6 y \int_{0}^{1} x^{2} d x=6 y\left[\frac{x^{3}}{3}\right]_{0}^{1}=2 y \end{aligned}

Combining the terms to compute fX(x)fY(x)=3x22y=6x2yf_{X}(x) f_{Y}(x)=3 x^{2} \cdot 2 y=6 x^{2} y

2.) How about for the joint PDF defined by fX,Y(x,y)=32x3y2f_{X, Y}(x, y)=\frac{3}{2} x^{3} y^{2} in the example above? Do the marginal PDFs show that XX and YY independent?

fX(x)fY(x)=4x338y2=32x3y2f_{X}(x) f_{Y}(x)=4 x^{3} \cdot \frac{3}{8} y^{2}=\frac{3}{2} x^{3} y^{2}

Conditional Distributions

Now suppose that we know the value taken by the random variable XX, and we wish to work out how this affects the distribution of YY. In the discrete case, this is a simple conditional probability calculation:

P(Y=yX=x)=P(X=x,Y=y)P(X=x)P(Y=y \mid X=x)=\frac{P(X=x, Y=y)}{P(X=x)}

In this expression, the RHS is the ratio of the joint PMF and the marginal PMF of XX. The LHS is a conditional PMF, and we can use the same definition for the general case.

The conditional probability mass/density function (conditional PMF/PDF) of the random variables XX and YY is the function

fYX(yx)=fX,Y(x,y)fX(x) where x,yR and fX(x)>0f_{Y \mid X}(y \mid x)=\frac{f_{X, Y}(x, y)}{f_{X}(x)} \quad \text { where } x, y \in \mathbb{R} \text { and } f_{X}(x)>0

and similarly for fYX(x)f_{Y \mid X}(\cdot \mid x). The function fYX(x)f_{Y \mid X}(\cdot \mid x) gives the distribution of the random variable YY conditional on the event X=xX=x.

We have previously mentioned that fX,Y(x,y)=fX(x)fY(y)f_{X, Y}(x, y)=f_{X}(x) f_{Y}(y) if X,YX, Y are independent. In this case the conditional PMF/PDF reduces to:

fYX(yx)=fX,Y(x,y)fX(x)=fX(x)fY(y)fX(x)=fY(y)f_{Y \mid X}(y \mid x)=\frac{f_{X, Y}(x, y)}{f_{X}(x)}=\frac{f_{X}(x) f_{Y}(y)}{f_{X}(x)}=f_{Y}(y)

This is not surprising; if the random variables are independent, the value of XX does not contain any information about the value of YY, so the conditional is equal to the marginal (compare to similar derivation in Section 1.7).

Expectation for Multivariate Distributions

Now that we can compute the conditional PMFs and PDFs fXY(xy)f_{X \mid Y}(x \mid y) (or fYX(yx)f_{Y \mid X}(y \mid x) ), we can define expectations of conditional distributions:

The conditional expectation of the random variable XX given Y=yY=y is defined as:

E[XY=y]={xxfXY(xy) if X is discrete xfXY(xy)dx if X is continuous E[X \mid Y=y]= \begin{cases}\sum_{\forall x} x f_{X \mid Y}(x \mid y) & \text { if } \mathrm{X} \text { is discrete } \\ \int_{-\infty}^{\infty} x f_{X \mid Y}(x \mid y) d x & \text { if } \mathrm{X} \text { is continuous }\end{cases}

Conditional expectation allows us to compute the average value of XX if we already know the value of YY.

To compute the expectation of a joint function of two random variables, we sum/integrate at every point (x,y)(x, y) the value of the function multiplied by the joint PMF/PDF:

E[g(X,Y)]={xyg(x,y)fX,Y(x,y) if X is discrete g(x,y)fX,Y(x,y)dxdy if X is continuous E[g(X, Y)]= \begin{cases}\sum_{\forall x} \sum_{\forall y} g(x, y) f_{X, Y}(x, y) & \text { if X is discrete } \\ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x, y) f_{X, Y}(x, y) d x d y & \text { if X is continuous }\end{cases}

As we have shown already, expectation is a linear operator, so for random variables X,YX, Y and constants a,b,cRa, b, c \in \mathbb{R}, we have

E[aX+bY+c]=aE[X]+bE[Y]+cE[a X+b Y+c]=a E[X]+b E[Y]+c

Example: Expectation of a Joint PDF

Consider an electrical circuit with two resistors (RX,RY)\left(R_{X}, R_{Y}\right) wired in parallel. Let's define continuous random variables XX and YY to be the resistance for each resistor, which varies between 10 and 20 ohms according to the following PDF:

fX,Y(x,y)=13000(x+y)10x20,10y20f_{X, Y}(x, y)=\frac{1}{3000}(x+y) \quad 10 \leq x \leq 20,10 \leq y \leq 20

The resistance RR of a parallel circuit is traditionally defined as:

1R=1X+1Y\frac{1}{R}=\frac{1}{X}+\frac{1}{Y}

Given that the resistances are distributed according to the above joint PDF, what is the expected resistance of the circuit?

Solution:

E[R(X,Y)]=10201020R(x,y)fX,Y(x,y)dxdyE[R(X,Y)]=1300010201020xyx+y(x+y)dxdy=13000[y22]1020[x22]1020=7.5\begin{aligned} E[R(X, Y)] & =\int_{10}^{20} \int_{10}^{20} R(x, y) f_{X, Y}(x, y) d x d y \\ E[R(X, Y)] & =\frac{1}{3000} \int_{10}^{20} \int_{10}^{20} \frac{x y}{x+y} \cdot(x+y) d x d y \\ & =\frac{1}{3000}\left[\frac{y^{2}}{2}\right]_{10}^{20}\left[\frac{x^{2}}{2}\right]_{10}^{20}=7.5 \end{aligned}

Covariance

It is often useful to characterise the nature of the dependence between two random variables. Here we will just consider the covariance, as defined below.

The covariance of two random variables XX and YY is defined as

Cov[X,Y]=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\operatorname{Cov}[X, Y]=E[(X-E[X])(Y-E[Y])]=E[X Y]-E[X] E[Y]

Covariance is a measure of linear association between two random variables.

Positive covariance indicates that large values of XX tend to be associated with large values of YY. The higher the covariance, the stronger the relationship. Conversely, if Cov[X,Y]<0\operatorname{Cov}[X, Y]<0, then large values of XX tend to be associated with small values of YY, and vice versa. A covariance near zero indicates that there is no simple linear relationship between the two variables.

Consider the case where XX and YY are independent random variables:

E[XY]=xyfX,Y(x,y)dxdy=xyfX(x)fY(y)dxdy=xfX(x)dxyfY(y)dy=E[X]E[Y]\begin{aligned} E[X Y] & =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_{X, Y}(x, y) d x d y=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x y f_{X}(x) f_{Y}(y) d x d y \\ & =\int_{-\infty}^{\infty} x f_{X}(x) d x \int_{-\infty}^{\infty} y f_{Y}(y) d y=E[X] E[Y] \end{aligned}

which results in Cov[X,Y]=0\operatorname{Cov}[X, Y]=0 (and similarly for the discrete case). However, we must be careful to note that the converse does not hold! That is, if Cov[X,Y]=0\operatorname{Cov}[X, Y]=0 we cannot conclude that XX and YY are necessarily independent.

Example: Covariance of the Joint PMF for Two Dice

For one last time recall the joint PMF fX,Y(x,y)f_{X, Y}(x, y), where XX was defined to be the sum of two dice and YY to be the larger of the two numbers. We have also computed the marginal functions fX(x)f_{X}(x) and fY(y)f_{Y}(y) in an earlier example. The covariance between discrete random variables XX and YY is:

Cov[X,Y]=E[XY]E[X]E[Y]\operatorname{Cov}[X, Y]=E[X Y]-E[X] E[Y]

Recall we have already computed E[X]=7E[X]=7 (expectation for the sum of two dice). Thus all we need is E[XY]E[X Y] and E[Y]E[Y] :

E[XY]=xyxyfX,Y(x,y)=21136+32236+42136+43236+53236+54236=12323634.2222E[Y]=yyfY(y)E[Y]=1136+2336+3536+4736+5936+611364.472Cov[X,Y]=34.222274.472=2.91\begin{aligned} E[X Y] & =\sum_{x} \sum_{y} x y f_{X, Y}(x, y) \\ & =2 \cdot 1 \cdot \frac{1}{36}+3 \cdot 2 \cdot \frac{2}{36}+4 \cdot 2 \cdot \frac{1}{36}+4 \cdot 3 \cdot \frac{2}{36}+5 \cdot 3 \cdot \frac{2}{36}+5 \cdot 4 \cdot \frac{2}{36} \ldots \\ & =\frac{1232}{36} \approx 34.2222 \\ E[Y]= & \sum_{\forall y} y f_{Y}(y) \\ E[Y]= & 1 \cdot \frac{1}{36}+2 \cdot \frac{3}{36}+3 \cdot \frac{5}{36}+4 \cdot \frac{7}{36}+5 \cdot \frac{9}{36}+6 \cdot \frac{11}{36} \approx 4.472 \\ & \operatorname{Cov}[X, Y]=34.2222-7 \cdot 4.472=2.91 \end{aligned}

This result for the covariance makes intuitive sense, since as the value for the larger of the two numbers increases (i.e. YY increases), the value of the sum of the two dice (X)(X) is also expected to increase.

For random variables X,Y,ZX, Y, Z and constants a,b,c,da, b, c, d, we can use the properties of expectation (E[aX]=(E[a X]= aE[X]a E[X] and E[a]=aE[a]=a for some constant aa ) to derive the following relationships:

  1. Cov[X,a]=0\operatorname{Cov}[X, a]=0
  2. Cov[aX+b,cY+d]=acCov[X,Y]\operatorname{Cov}[a X+b, c Y+d]=a c \operatorname{Cov}[X, Y]
  3. Cov[X+Y,Z]=Cov[X,Z]+Cov[Y,Z]\operatorname{Cov}[X+Y, Z]=\operatorname{Cov}[X, Z]+\operatorname{Cov}[Y, Z]

We leave it to the reader to prove these properties.

We say that covariance is a bilinear operator, in the sense that it is linear in both its inputs. Notice that Cov[X,X]=Var[X]\operatorname{Cov}[X, X]=\operatorname{Var}[X], so the second of the properties above explains why Var[aX+b]=a2Var[X]\operatorname{Var}[a X+b]=a^{2} \operatorname{Var}[X]. The third property implies that

Var[X+Y]=Cov[X+Y,Y+X]=Cov[X,X]+Cov[X,Y]+Cov[Y,X]+Cov[Y,Y]=Var[X]+2Cov[X,Y]+Var[Y]\begin{aligned} \operatorname{Var}[X+Y] & =\operatorname{Cov}[X+Y, Y+X]=\operatorname{Cov}[X, X]+\operatorname{Cov}[X, Y]+\operatorname{Cov}[Y, X]+\operatorname{Cov}[Y, Y] \\ & =\operatorname{Var}[X]+2 \operatorname{Cov}[X, Y]+\operatorname{Var}[Y] \end{aligned}

A deficiency of covariance is that it depends on the units of measurement. For example, if XX and YY are time measurements and we change the units from minutes to seconds, the covariance becomes Cov[60X,60Y]=3600Cov[X,Y]\operatorname{Cov}[60 X, 60 Y]=3600 \operatorname{Cov}[X, Y].