Artificial Intelligence 🤖
General Laws of Probability

General Laws of Probability

As we eluded to at the start of this course, the definition for the probability of an event has undergone considerable evolution over the years. Original definitions arose within the context of gambling, or situations where nn possible outcomes were equally likely. If some event AA were satisfied by mm outcomes, then P(A)=mnP(A)=\frac{m}{n} (this is the classical or a priori definition of probability, which we will encounter in subsequent examples).

However, what if outcomes are not equally likely, or occupy an infinite space? We could then resort to an empirical or a posteriori definition of probability, where we repeat an experiment nn times; this method can also suffer from limitations (can you think of any?).

Due to these issues, there was a strong desire to define probability as a function rather than a ratio (or limit). Suppose that AΩA \subseteq \Omega is an event, and let P(A)P(A) be the probability of AA. Intuitively, P(A)P(A) is a number that indicates how likely AA is to occur. The axioms of probability are:

From these axioms, we can deduce some basic properties of probability.

 1. P(Aˉ)=1P(A) 2. P()=0 3.  If AB then P(A)P(B) 4. P(AB)=P(A)+P(B)P(A,B)\begin{array}{ll} \text { 1. } & P(\bar{A})=1-P(A) \\ \text { 2. } & P(\emptyset)=0 \\ \text { 3. } & \text { If } A \subset B \text { then } P(A) \leq P(B) \\ \text { 4. } & P(A \cup B)=P(A)+P(B)-P(A, B) \end{array}

The last property, P(AB)=P(A)+P(B)P(A,B)P(A \cup B)=P(A)+P(B)-P(A, B), is known as the "inclusion-exclusion" formula (or "additive rule", depending on what text you are reading), and can be illustrated using the following Venn diagram:

P(A,B)P(A, B) is included in both P(A)P(A) and P(B)P(B) and therefore is counted twice when adding P(A)P(A) and P(B)P(B) together, requiring it to be subtracted afterwards, resulting in P(A)+P(B)P(A,B)P(A)+P(B)-P(A, B) expression.

From this we also have the "union bound", i.e. P(AB)P(A)+P(B)P(A \cup B) \leq P(A)+P(B), which is a useful inequality when AA and BB are disjoint events.

Generalisation of the Inclusion-Exclusion Formula:

As the inclusion-exclusion is such a useful tool for solving problems, we take a moment here to show how it generalises to three events:

Two events (A,B):(A, B):P(AB)P(A \cup B)=P(A)+P(B)P(A,B)=P(A)+P(B)-P(A, B)
Three events (A,B,C):(A, B, C):P(ABC)P(A \cup B \cup C)=P(A)+P(B)+P(C)P(A,B)P(A,C)P(B,C)+P(A,B,C)=P(A)+P(B)+P(C)-P(A, B)-P(A, C)-P(B, C)+P(A, B, C)

Example: Inclusion-Exclusion

In an online survey of 100 students, 55 students claimed to have seen the Breaking Bad TV series. The survey has also revealed that 44 students watch Game of Thrones, out of which 20 students watched both shows. What is the probability that a randomly-selected student has seen neither of the two shows?

This problem is closely related to the set operations described in the previous section. The sample space for randomly selecting a student from the survey, Ω\Omega, consists of a 100 outcomes each corresponding to a different person. Fifty-five of these outcomes belong to the event of selecting a Breaking Bad fan, meaning that the probability of said event, P(B)P(B) to be equal to 55100\frac{55}{100}. Similarly, we expect to select a Game of Thrones fan with the probability P(G)=44100P(G)=\frac{44}{100}. Likewise, we know that a randomly-selected student has watched both TV shows with the probability P(B,G)=P(B, G)= 20100\frac{20}{100}. Using the inclusion-exclusion principle, we can find the probability that a randomly-selected student has watched either of these shows:

P(BG)=P(B)+P(G)P(B,G)=55100+4410020100=79100P(B \cup G)=P(B)+P(G)-P(B, G)=\frac{55}{100}+\frac{44}{100}-\frac{20}{100}=\frac{79}{100}

Since we want to get the probability that the student has seen neither, we need to take the complement of this event:

P(BG)=1P(BG)=179100=21100P(\overline{B \cup G})=1-P(B \cup G)=1-\frac{79}{100}=\frac{21}{100}

The result becomes intuitively evident when we draw a Venn diagram of survey results: