Critical Region, Significance Level and the Decision Rule
Once we have computed a null distribution (based on the assumed null hypothesis), we can determine the regions or values where the test statistic is very unlikely to reside; note this is essentially opposite to our approach for determining confidence intervals for an estimate that was described in the previous chapter. For instance, let us examine Figure 35, which is a plot of the null distribution for the twenty-coin-toss example under consideration. It can be argued that we are quite unlikely to observe two or fewer heads and, similarly, observe eighteen or more heads, given than the null hypothesis is true. Indeed, in Table 4 we see that the probability of such event, , is approximately 0.0002 . Therefore the total probability of the test statistic being in any of these two regions is 0.0004 , given that the null hypothesis holds true.
0 | 1.0 | |
1 | 0.999999046326 | |
2 | 0.000201225280762 | 0.999979972839 |
3 | 0.00128841400146 | 0.999798774719 |
4 | 0.00590896606445 | 0.998711585999 |
5 | 0.020694732666 | 0.994091033936 |
6 | 0.0576591491699 | 0.979305267334 |
7 | 0.131587982178 | 0.94234085083 |
8 | 0.251722335815 | 0.868412017822 |
12 | 0.868412017822 | 0.251722335815 |
13 | 0.94234085083 | 0.131587982178 |
14 | 0.979305267334 | 0.0576591491699 |
15 | 0.994091033936 | 0.020694732666 |
16 | 0.998711585999 | 0.00590896606445 |
17 | 0.999798774719 | 0.00128841400146 |
18 | 0.999979972839 | 0.000201225280762 |
19 | 0.999999046326 | |
20 | 1.0 |
Table 4: The cumulative probabilities for the test statistic of the coin-flip example given the null hypothesis heads
Our analysis thus far already raises several interesting practical points. Using the null distribution, we defined a region, - which we will refer to as the critical or rejection region, over which we do not expect our test statistic to reside (i.e. or ). We have also computed a corresponding probability associated with finding the test statistic in this rejection region assuming the null hypothesis (i.e. 0.0004) - we will refer to this probability as our significance level, . Of course, our selection of the critical region or is seemingly arbitrary; we could have just as convincingly said that we do not expect to observe 3 or fewer heads or 17 or more heads.
It is more common to work in the reverse direction; that is, by first specifying a significance level, , and then determining the corresponding critical region using the null distribution (again quite analogous but opposite to our method for determining confidence intervals of an estimate). In other words, we can always define a critical region such that (or in the case of discrete distri- butions), where is the significance level, provided that we know how to compute the null distribution.
With this in mind, we can now establish the decision rule we will use to make claims about the hypothesis test:
- Define a significance level (e.g. that we want to base our decisions about null hypothesis on.
- Define the corresponding critical or rejection region, , of the test statistic values such that , given the null hypothesis is true.
- Define the rejection rule as follows: we choose to reject the null hypothesis if , otherwise we say that there is insufficient evidence to reject the null hypothesis under the significance level of .
In our coin-flip example, for the significance level of we can define the rejection region for the null hypothesis as . Thus, if we would observe 19 heads in 20 coin tosses in a sample, we would reject the null hypothesis and therefore claim the coin is not fair. However, if on the other hand we observed only eight heads in twenty coin tosses (as in the MLE example in the previous chapter), we would only be able to claim that there is insufficient evidence to say that the coin is biased. Note that this is not the same as claiming the coin is fair. The decision rule in this case simply states that the sample provides insufficient evidence to overturn our null hypothesis that the coin is fair.
One- vs. Two-tailed Tests
In the previous example, we proposed to reject the null hypothesis when the observed number of heads in a given sample was improbably low or improbably high. In statistical terms, we used both of the lowprobability 'tails' of the Binomial distribution as our rejection region. This was necessary because our initial hypothesis was simple, i.e. of the form: and - the coin is either fair, or not.
Imagine, on the other hand, if we were testing the hypothesis that the coin is biased towards heads specifically. Such a test must be defined using a composite hypothesis: (heads) and (heads) . Using the same test statistic, improbably-low numbers of heads would not provide any confidence to claim that the coin is biased towards heads. We would therefore want to only reject the null hypothesis if the number of heads is improbably high. Thus, we define our rejection region, , as .
Whenever only one tail of the distribution is used in the test (such as in this case), the test is called a one-tailed test (as opposed to the two-tailed test described previously). An important concept associated with one-tailed tests is the concept of a critical value. The critical value is a value , for which we can reject the null hypothesis if the test statistic, , is strictly greater (or strictly lower) than . In other words, it is the particular point in that separates the critical region from the acceptance region of the null hypothesis. For instance, in the heads-biased coin example the critical value is at confidence level . For two-tailed tests, we often end up having two critical values, one specifying the lower bound and the other one specifying the upper bound on the critical region.
To recap:
Simple Hypothesis: and
Composite Hypothesis: and (or, alternatively )
Two-tailed test: Considers both of the improbable (extreme) regions of the null-distribution, used with simple hypotheses.
One-tailed test: Only one critical region is defined, used to test composite hypotheses.
Critical value: A value , for which we can reject a hypothesis, if (or, equivalently, ), where is the test statistic. For simple hypotheses there often will be two critical values.