The P-value
In the previous sections we have established the decision rule and used this to quantify the evidence against ; that is, we have selected a significance level and defined a critical region for rejecting the null hypothesis for a given test statistic before any data are collected. An alternative strategy to this would be to calculate a -value:
The P-value associated with an observed test statistic is the probability of getting a value for that test statistic as extreme or more extreme than what was actually observed in the experiment (relative to ) given that is true.
In other words, it is the smallest significance level at which the null-hypothesis will be rejected:
If -value , then can be rejected at significance level
As with the significance level, the smaller the P-value, the better. Unlike the significance level, the P-value captures both the certainty about and the power of the test. A large P-value can result both due to the null hypothesis being true, or the power of the test being low.
Figure 37: Illustration of P-value calculation for coin flip example where . Since this is a simple hypothesis, we are performing a two-tailed test. Thus, the P-value is computed by summing over and (i.e. labeled in red in the null distribution).
Suppose that we again were testing that a coin is fair (that is ). The observation of our test statistic is that we toss it twenty times and get 8 heads (i.e. the dataset for the MLE of the Bernoulli distribution in the previous chapter). Using the number of heads as a test statistic, we again know that under the null hypothesis. From Table 4 and Figure 37, we see that probability of getting eight or fewer heads, under the null hypothesis is .
Since we are interested in non-composite , which corresponds to a two-tailed test, we need to consider the symmetrical bound in the rejection region. This means that our -value, the lowest significance threshold at which we would reject the null hypothesis is (summarised in Figure 37). This is quite a large P-value: it is equivalent to saying that if we choose to reject the null hypothesis, given this data, we should be willing to accept that we will be wrong (i.e. we will commit a Type I error) half of the time!