Multivariate Distributions
Up to this point we have mostly focused on single (univariate) random variables (i.e. ). However, in certain experiments it might be appropriate to explore the relationships between several random variables, such as the relationship between blood cholesterol and heart disease. In this section, we will mostly focus on the bivariate case (e.g. - that is, pairs of random variables - as the main concepts can easily be extended to the general multivariate case.
Joint Cumulative Distribution Function
We begin by defining the joint cumulative distribution function (CDF), which we recall from our analysis of single (univariate) random variables is the same for discrete and continuous random variables:
The joint cumulative distribution function (joint CDF) of the random variables and is given by the function
Recall that the comma here represents the intersection between and (i.e. ). The function evaluated at a point is the probability that the random variable takes a value less than or equal to , and the random variable takes a value less than or equal to .
It should be noted that this definition also applies to mixed distributions (i.e. when and are a mixture of discrete and continuous random variables). Joint CDFs have the following properties:
- for all
- as or
- as and as
Property 3 stems from the fact that if , then the event occurs with probability 1 (and vice versa for the case where .
Joint Discrete Random Variables
Suppose that the random variables are both discrete. It is straightforward to generalise the univariate definition of the PMF to the bivariate (and multivariate) case:
The joint probability mass function (joint PMF) of the discrete random variables and is given by the function
The function evaluated at a point is the probability that the random variable takes the value , and the random variable takes the value .
If we know the joint , we can compute the joint by summation of the appropriate probabilities:
As in the univariate case, the total probability (i.e. summation over all values for and ) must equal 1 :
We can display the values of the joint PMF in a two-dimensional table, with the rows representing the values of and the columns representing the values of .
Example: Joint PMF for Two Fair Dice
Let's consider our favourite PMF example where we roll two fair dice. We will define to be the usual sum of the two dice (see the Examples in Section 2.2 and the PMF shown in Figure 3), and to be the larger of the two numbers (i.e. ).
We can compute each of the entries for by fixing a value for (the largest number on either die), fixing a value for (the sum of the two die), and then seeing the there exists an outcome such that .
Let's see how this applies to a few scenarios for :
By enumerating over all and , we can compute the entire joint PMF as:
0 | 0 | 0 | 0 | 0 | ||
0 | 0 | 0 | 0 | 0 | ||
0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | |||
0 | 0 | 0 | ||||
0 | 0 | 0 | ||||
0 | 0 | 0 | ||||
0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | ||
0 | 0 | 0 | 0 | 0 |
From the joint PMF we can directly compute a number of joint queries:
Joint Continuous Random Variables
Let us now consider the case where random variables and are both continuous.
The random variables and are jointly continuous if there exists a function and a joint probability density function (joint PDF) of , with the following property:
for any region . Integrating the PDF over a region gives the probability that take values within that region. The order of integration is not important and is interchangeable (as we shall see in the example below).
If the region of interest is rectangular, we have:
The total probability must be equal to 1 , so if we integrate the joint PDF everywhere, we obtain:
The joint CDF and PDF are linked in the usual way, but we need to integrate/differentiate twice,
Example: A Simple Joint PDF for Continuous RVs
Consider the PDF for two continuous random variables, and , given by:
(a) Find the value of so that is a valid joint PDF.
Solution:
The expression above is a joint PDF if and only if . So integrating the PDF over the correct range , and integrating over first we have:
Thus . The complete expression for then becomes:
(b) Compute .
Solution:
As we mentioned in the definition above, we need to integrate over the area defined by , and the order of integration (i.e. integrating over or first) should not matter. However, for nonrectangular areas (i.e. there is a dependence between and ) we must be careful in determining our range of integration.
Trick for Determining Integration Bounds for Non-rectangular Areas:
1.) First construct a table that specifies the bounds on each of the variables. For this particular example we have:
Upper bound | 1 | |
---|---|---|
Lower bound | 0 |
2.) Now we integrate the first variable over the range specified in the table above. The two possibilities for and are summarised in red below:
3.) Specifying the outer integration range for the second variable is a bit trickier, as we can no longer use the first variable in the range (it will have already been removed from the expression due to the first integration). In the case, if any of first variables appear in the range of the outer integration, we set them to the values of their corresponding bounds (as read from the table), which is shown in blue below:
Let us now compute by integrating over first and then :
We leave it to the reader to prove we get the same answer when integrating over first and then .
Marginal Distributions
We introduced the concept of marginalisation in Section 2.1.1, and we saw that it is essentially the Total Law of probability applied to random variables. Here we provide a more formal definition of marginalisation for discrete and continuous random variables and utilise examples to illustrate.
The marginal probability mass functions (marginal PMFs) of the discrete random variables and are the functions
The marginal of gives us the probability distribution of the random variable alone, ignoring any information about , and vice versa.
Example: Marginalisation of the Joint PMF for Two Dice
Consider again our example of a joint PMF , where was defined to be sum of the two dice and to be the larger of the two numbers. Using the above definition, we can compute by summing over all (i.e. over all columns) for a given ; the resulting values are shown in the far right column below. Likewise, we can compute by summing over all (i.e. over all rows) for a given ; the resulting values are shown in the bottom row below:
0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | ||||
0 | 0 | 0 | 0 | ||||
0 | 0 | 0 | |||||
0 | 0 | 0 | |||||
0 | 0 | 0 | |||||
0 | 0 | 0 | 0 | ||||
0 | 0 | 0 | 0 | ||||
0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | |||
Notice that the resulting and are indeed PMFs as they satisfy the two requirements (i.e. they are bounded between zero and one, and they sum to one). The univariate PMFs and are commonly written in the margins of the joint PMF table (as shown above), and are known as the marginals (hence the term "Marginalisation").
Pause and Reflect 1: Recall we defined marginalisation in Section 2.1.1 as . We have simply re-expressed this as using PMF functions for discrete random variables; these two expressions are identical.
Pause and Reflect 2: Does looks familiar? Check out Figure 3 and the table above it.
Pause and Reflect 3: Note that it is generally not possible to go the other way around; that is, to reconstruct the full joint PMF table from and alone (unless the random variables are independent, which will be discussed in the next section).
In an analogous fashion we can define marginalisation for continuous random variables as follows:
The marginal probability density functions (marginal PDFs) of the continuous random variables and are the functions
This is essentially the same as the previous definition; in a continuous setting, getting rid of a variable requires integration rather than summation. Let's consider a few examples to illustrate marginalising joint PDFs.
Example: Marginalisation of Joint PDFs for Continuous RVs
1.) Find for the following joint PDF:
2.) Find the marginal density functions and for the following joint PDF:
Independence of Two Random Variables
Though we have previously encountered independent random variables, we can now give a formal definition.
For the random variables and , if the events and are independent for all , then
and we say that and are independent random variables.
If are independent, it is straightforward to show that , which for discrete random variables is exactly what we had in Chapter if and are independent). We will exploit this property to determine if and are independent, where we will be left with functions and rather than numbers (e.g. and ).
Example: Independence of a Simple Joint PDF
1.) Consider the previous example for (for ); are and independent?
Solution:
From the definition of conditional independence, and are independent if and only if ; therefore we need to compute and . This can be done using marginalisation:
Combining the terms to compute
2.) How about for the joint PDF defined by in the example above? Do the marginal PDFs show that and independent?
Conditional Distributions
Now suppose that we know the value taken by the random variable , and we wish to work out how this affects the distribution of . In the discrete case, this is a simple conditional probability calculation:
In this expression, the RHS is the ratio of the joint PMF and the marginal PMF of . The LHS is a conditional PMF, and we can use the same definition for the general case.
The conditional probability mass/density function (conditional PMF/PDF) of the random variables and is the function
and similarly for . The function gives the distribution of the random variable conditional on the event .
We have previously mentioned that if are independent. In this case the conditional PMF/PDF reduces to:
This is not surprising; if the random variables are independent, the value of does not contain any information about the value of , so the conditional is equal to the marginal (compare to similar derivation in Section 1.7).
Expectation for Multivariate Distributions
Now that we can compute the conditional PMFs and PDFs (or ), we can define expectations of conditional distributions:
The conditional expectation of the random variable given is defined as:
Conditional expectation allows us to compute the average value of if we already know the value of .
To compute the expectation of a joint function of two random variables, we sum/integrate at every point the value of the function multiplied by the joint PMF/PDF:
As we have shown already, expectation is a linear operator, so for random variables and constants , we have
Example: Expectation of a Joint PDF
Consider an electrical circuit with two resistors wired in parallel. Let's define continuous random variables and to be the resistance for each resistor, which varies between 10 and 20 ohms according to the following PDF:
The resistance of a parallel circuit is traditionally defined as:
Given that the resistances are distributed according to the above joint PDF, what is the expected resistance of the circuit?
Solution:
Covariance
It is often useful to characterise the nature of the dependence between two random variables. Here we will just consider the covariance, as defined below.
The covariance of two random variables and is defined as
Covariance is a measure of linear association between two random variables.
Positive covariance indicates that large values of tend to be associated with large values of . The higher the covariance, the stronger the relationship. Conversely, if , then large values of tend to be associated with small values of , and vice versa. A covariance near zero indicates that there is no simple linear relationship between the two variables.
Consider the case where and are independent random variables:
which results in (and similarly for the discrete case). However, we must be careful to note that the converse does not hold! That is, if we cannot conclude that and are necessarily independent.
Example: Covariance of the Joint PMF for Two Dice
For one last time recall the joint PMF , where was defined to be the sum of two dice and to be the larger of the two numbers. We have also computed the marginal functions and in an earlier example. The covariance between discrete random variables and is:
Recall we have already computed (expectation for the sum of two dice). Thus all we need is and :
This result for the covariance makes intuitive sense, since as the value for the larger of the two numbers increases (i.e. increases), the value of the sum of the two dice is also expected to increase.
For random variables and constants , we can use the properties of expectation and for some constant ) to derive the following relationships:
We leave it to the reader to prove these properties.
We say that covariance is a bilinear operator, in the sense that it is linear in both its inputs. Notice that , so the second of the properties above explains why . The third property implies that
A deficiency of covariance is that it depends on the units of measurement. For example, if and are time measurements and we change the units from minutes to seconds, the covariance becomes .