Artificial Intelligence 🤖
Conditional Probability and Chain Rule

Conditional Probability and Chain Rule

We can also define the probability of an event conditional upon "evidence" of another event. Let's consider the following Venn diagram to illustrate this concept:

We can formally express this as: Let AA and BB be events with P(B)>0P(B)>0. The conditional probability P(A∣B)P(A \mid B) (probability of AA given BB ) is defined as:

P(A∣B)=P(A,B)P(B)P(A \mid B)=\frac{P(A, B)}{P(B)}

This represents the probability that AA will occur given that BB has occurred.

Conditioning on BB means that we are restricting the sample space to the outcomes contained in AA. We know that either (A,B)(A, B) or (Aˉ,B)(\bar{A}, B) will occur.

Rearrangement of this definition (P(A,B)=P(A∣B)P(B))(P(A, B)=P(A \mid B) P(B)) can be further generalised to provide the Chain Rule in probability theory:

P(A,B,C)=P(C∣A,B)P(B∣A)P(A)P(A, B, C)=P(C \mid A, B) P(B \mid A) P(A)

Derivation:

We first condition P(A,B,C)P(A, B, C) on AA :

P(A,B,C)=P(B,C∣A)P(A)P(A, B, C)=P(B, C \mid A) P(A)

Then we condition P(B,C∣A)P(B, C \mid A) on BB :

P(B,C∣A)=P(C∣A,B)P(B∣A)P(B, C \mid A)=P(C \mid A, B) P(B \mid A)

Combining these equations results in the Chain Rule definition.

Note that conditioning changes the probability, not the event. In the same way that the function P(⋅)P(\cdot) assigns a probability to any event A⊆ΩA \subseteq \Omega, the function P(⋅∣B)P(\cdot \mid B) assigns a conditional probability (given BB ) to any event A⊆ΩA \subseteq \Omega. Thus, all the identities we have encountered so far will still work, for example:

P(Aˉ∣B)=1−P(A∣B)P(A∪C∣B)=P(A∣B)+P(C∣B)−P(A,C∣B)\begin{gathered} P(\bar{A} \mid B)=1-P(A \mid B) \\ P(A \cup C \mid B)=P(A \mid B)+P(C \mid B)-P(A, C \mid B) \end{gathered}

Example: Conditional Probability

Using the survey example from the previous section, what is the probability that a person has seen Game of Thrones, given that the said person is a Breaking Bad fan?

Using the same notation as before, we want to find P(G∣B)P(G \mid B). This is a straightforward application of the conditional-probability definition:

P(G∣B)=P(G,B)P(B)P(G \mid B)=\frac{P(G, B)}{P(B)}

Since we know the probabilities P(G,B)P(G, B) and P(B)P(B) we can directly compute the answer:

P(G∣B)=2010055100=2055P(G \mid B)=\frac{\frac{20}{100}}{\frac{55}{100}}=\frac{20}{55}

Note that this probability is the same as the probability of selecting a Game of Thrones viewer at random from the list of Breaking Bad viewers. Conditional probabilities can be thought to be reducing the sample space to outcomes satisfied by the conditioned event only.