Artificial Intelligence 🤖
Bayes' Theorem

Bayes' Theorem

Now that you understand conditional probability, you can understand how to apply Bayes' theorem, which is based on conditional probability. It's a very important concept, which allows us to reverse natural way of conditioning probabilities. It is especially important if you're going into the medical field, but it is broadly applicable too, and you'll see why in a minute.

You'll hear about this a lot, but not many people really understand what it means or its significance. It can tell you very quantitatively sometimes when people are misleading you with statistics, so let's see how that works.

First, let's talk about Bayes' theorem at a high level. Bayes' theorem is simply this: the probability of AA given BB is equal to the probability of AA times the probability of BB given AA over the probability of B. Suppose that A,BA, B are events, and P(A),P(B)>0P(A), P(B)>0. Bayes' Theorem states that:

P(B∣A)=P(A∣B)P(B)P(A)P(B \mid A)=\frac{P(A \mid B) P(B)}{P(A)}
💡

The key insight is that the probability of something that depends on AA depends very much on the base probability of AA and BB. People ignore this all the time.

One common example is drug testing. We might say, what's the probability of being an actual user of a drug given that you tested positive for it. The reason Bayes' theorem is important is that it calls out that this very much depends on both the probability of A and the probability of B\mathrm{B}. The probability of being a drug user given that you tested positive depends very much on the base overall probability of being a drug user and the overall probability of testing positive. The probability of a drug test being accurate depends a lot on the overall probability of being a drug user in the population, not just the accuracy of the test.

It also means that the probability of B\mathrm{B} given A\mathrm{A} is not the same thing as the probability of A\mathrm{A} given BB. That is, the probability of being a drug user given that you tested positive can be very different from the probability of testing positive given that you're a drug user. You can see where this is going. That is a very real problem where diagnostic tests in medicine or drug tests yield a lot of false positives. You can still say that the probability of a test detecting a user can be very high, but it doesn't necessarily mean that the probability of being a user given that you tested positive is high. Those are two different things, and Bayes' theorem allows you to quantify that difference.

Again, a drug test can be a common example of applying Bayes' theorem to prove a point. Even a highly accurate drug test can produce more false positives than true positives. So in our example here, we're going to come up with a drug test that can accurately identify users of a drug 99%99 \% of the time and accurately has a negative result for 99%99 \% of non-users, but only 0.3%0.3 \% of the overall population actually uses the drug in question. So we have a very small probability of actually being a user of a drug. What seems like a very high accuracy of 99%99 \% isn't actually high enough, right?

We can work out the math as follows:

  • Event AA = is a user of the drug
  • Event BB = tested positively for the drug

So let event A mean that you're a user of some drug, and event B the event that you tested positively for the drug using this drug test.

We need to work out the probability of testing positively overall. We can work that out by taking the sum of probability of testing positive if you are a user and the probability of testing positive if you're not a user. So, P(B)\mathrm{P}(\mathrm{B}) works out to 1.3%(0.99∗0.003+0.01∗0.997)1.3 \%\left(0.99^{*} 0.003+0.01^{*} 0.997\right) in this example. So we have a probability of B\mathrm{B}, the probability of testing positively for the drug overall without knowing anything else about you.

Let's do the math and calculate the probability of being a user of the drug given that you tested positively.

P(A∣B)=P(A)P(B∣A)P(B)=0.003∗0.990.013=22.8%P(A \mid B)=\frac{P(A) P(B \mid A)}{P(B)}=\frac{0.003 * 0.99}{0.013}=22.8 \%

So the probability of being an actual user of this drug given that you tested positive for it is only 22.8%22.8 \%. So even though this drug test is accurate 99%99 \% of the time, it's still providing a false result in most of the cases where you're testing positive.

Derivation

Let's derive this expression using probability rules that we already know. Say we are interested in computing P(B∣A)P(B \mid A), but we cannot measure this directly and only have information relating to P(A∣B)P(A \mid B) etc.. However, we can say that:

P(B∣A)=P(A,B)P(A)P(B \mid A)=\frac{P(A, B)}{P(A)}

Using the Chain Rule, we can express P(A,B)P(A, B) as:

P(A,B)=P(A∣B)P(B)P(A, B)=P(A \mid B) P(B)

which gives us the same expression as Bayes' Theorem above. However, using our definition of the Total Law from above, we can take this one step further by expressing P(A)P(A) as:

P(A)=∑i=1nP(A∣Bi)P(Bi)P(A)=\sum_{i=1}^{n} P\left(A \mid B_{i}\right) P\left(B_{i}\right)

This form for P(A)P(A) can be thought of as a "normalisation constant". Combining these terms results in:

P(B∣A)=P(A∣B)P(B)∑i=1nP(A∣Bi)P(Bi)P(B \mid A)=\frac{P(A \mid B) P(B)}{\sum_{i=1}^{n} P\left(A \mid B_{i}\right) P\left(B_{i}\right)}

Notice that we can now express P(B∣A)P(B \mid A) strictly as a function of P(A∣B)P(A \mid B) and P(B)P(B) ! This is a very useful property that we will exploit in subsequent chapters.