Artificial Intelligence 🤖
The P-value

The P-value

In the previous sections we have established the decision rule and used this to quantify the evidence against H0H_{0}; that is, we have selected a significance level α\alpha and defined a critical region RR for rejecting the null hypothesis for a given test statistic before any data are collected. An alternative strategy to this would be to calculate a P\mathbf{P}-value:

The P-value associated with an observed test statistic is the probability of getting a value for that test statistic as extreme or more extreme than what was actually observed in the experiment (relative to H1H_{1} ) given that H0H_{0} is true.

In other words, it is the smallest significance level α\alpha at which the null-hypothesis will be rejected:

If P\mathrm{P}-value α\leq \alpha, then H0H_{0} can be rejected at significance level α\alpha

As with the significance level, the smaller the P-value, the better. Unlike the significance level, the P-value captures both the certainty about H0H_{0} and the power of the test. A large P-value can result both due to the null hypothesis being true, or the power of the test being low.

Figure 37: Illustration of P-value calculation for coin flip example where H0:θ=0.5,H1:θ0.5H_{0}: \theta=0.5, H_{1}: \theta \neq 0.5. Since this is a simple hypothesis, we are performing a two-tailed test. Thus, the P-value is computed by summing over P(T8)P(T \leq 8) and P(T12)P(T \geq 12) (i.e. labeled in red in the null distribution).

Suppose that we again were testing that a coin is fair (that is H0:θ=0.5,H1:θ0.5H_{0}: \theta=0.5, H_{1}: \theta \neq 0.5 ). The observation of our test statistic is that we toss it twenty times and get 8 heads (i.e. the dataset for the MLE of the Bernoulli distribution in the previous chapter). Using the number of heads as a test statistic, we again know that TBinomial(20,0.5)T \sim \operatorname{Binomial}(20,0.5) under the null hypothesis. From Table 4 and Figure 37, we see that probability of getting eight or fewer heads, under the null hypothesis is P(T8)0.25P(T \leq 8) \approx 0.25.

Since we are interested in non-composite H1H_{1}, which corresponds to a two-tailed test, we need to consider the symmetrical bound (P(T12))(P(T \geq 12)) in the rejection region. This means that our P\mathrm{P}-value, the lowest significance threshold at which we would reject the null hypothesis is P=P(T8)+P(T12)0.5P=P(T \leq 8)+P(T \geq 12) \approx 0.5 (summarised in Figure 37). This is quite a large P-value: it is equivalent to saying that if we choose to reject the null hypothesis, given this data, we should be willing to accept that we will be wrong (i.e. we will commit a Type I error) half of the time!