Critical Region, Significance Level and the Decision Rule

Once we have computed a null distribution (based on the assumed null hypothesis), we can determine the regions or values where the test statistic is very unlikely to reside; note this is essentially opposite to our approach for determining confidence intervals for an estimate that was described in the previous chapter. For instance, let us examine Figure 35, which is a plot of the null distribution for the twenty-coin-toss example under consideration. It can be argued that we are quite unlikely to observe two or fewer heads and, similarly, observe eighteen or more heads, given than the null hypothesis is true. Indeed, in Table 4 we see that the probability of such event, $P(T \leq 2)=P(T \geq 18)$ , is approximately 0.0002 . Therefore the total probability of the test statistic being in any of these two regions $P((T \leq 2) \cup(T \geq 18))$ is 0.0004 , given that the null hypothesis holds true.

$k$	$P(T \leq k)$	$P(T \geq k)$
0	$9.53674316406 \mathrm{e}-07$	1.0
1	$2.00271606445 \mathrm{e}-05$	0.999999046326
2	0.000201225280762	0.999979972839
3	0.00128841400146	0.999798774719
4	0.00590896606445	0.998711585999
5	0.020694732666	0.994091033936
6	0.0576591491699	0.979305267334
7	0.131587982178	0.94234085083
8	0.251722335815	0.868412017822
	$\ldots . \mathrm{snip}$
12	0.868412017822	0.251722335815
13	0.94234085083	0.131587982178
14	0.979305267334	0.0576591491699
15	0.994091033936	0.020694732666
16	0.998711585999	0.00590896606445
17	0.999798774719	0.00128841400146
18	0.999979972839	0.000201225280762
19	0.999999046326	$2.00271606445 \mathrm{e}-05$
20	1.0	$9.53674316406 \mathrm{e}-07$

Table 4: The cumulative probabilities for the test statistic of the coin-flip example given the null hypothesis $\left(H_{0} \equiv P(\right.$ heads $\left.)=0.5\right)$

Our analysis thus far already raises several interesting practical points. Using the null distribution, we defined a region, $R$ - which we will refer to as the critical or rejection region, over which we do not expect our test statistic to reside (i.e. $T \leq 2$ or $T \geq 18$ ). We have also computed a corresponding probability associated with finding the test statistic in this rejection region assuming the null hypothesis (i.e. 0.0004) - we will refer to this probability as our significance level, $\alpha$ . Of course, our selection of the critical region $T \leq 2$ or $T \geq 18$ is seemingly arbitrary; we could have just as convincingly said that we do not expect to observe 3 or fewer heads or 17 or more heads.

It is more common to work in the reverse direction; that is, by first specifying a significance level, $\alpha$ , and then determining the corresponding critical region using the null distribution (again quite analogous but opposite to our method for determining confidence intervals of an estimate). In other words, we can always define a critical region $R$ such that $P(T \in R)=\alpha$ (or $P(T \in R) \leq \alpha$ in the case of discrete distri- butions), where $\alpha$ is the significance level, provided that we know how to compute the null distribution.

With this in mind, we can now establish the decision rule we will use to make claims about the hypothesis test:

Define a significance level $\alpha$ (e.g. $0.01,0.05)$ that we want to base our decisions about null hypothesis on.
Define the corresponding critical or rejection region, $R$ , of the test statistic $(T)$ values such that $P(T \in R)=\alpha$ , given the null hypothesis is true.
Define the rejection rule as follows: we choose to reject the null hypothesis if $T \in R$ , otherwise we say that there is insufficient evidence to reject the null hypothesis under the significance level of $\alpha$ .

In our coin-flip example, for the significance level of $\alpha=0.0004$ we can define the rejection region for the null hypothesis as $R \equiv\{0,1,2,18,19,20\}$ . Thus, if we would observe 19 heads in 20 coin tosses in a sample, we would reject the null hypothesis and therefore claim the coin is not fair. However, if on the other hand we observed only eight heads in twenty coin tosses (as in the MLE example in the previous chapter), we would only be able to claim that there is insufficient evidence to say that the coin is biased. Note that this is not the same as claiming the coin is fair. The decision rule in this case simply states that the sample provides insufficient evidence to overturn our null hypothesis that the coin is fair.

One- vs. Two-tailed Tests

In the previous example, we proposed to reject the null hypothesis when the observed number of heads in a given sample was improbably low or improbably high. In statistical terms, we used both of the lowprobability 'tails' of the Binomial distribution as our rejection region. This was necessary because our initial hypothesis was simple, i.e. of the form: $H_{0}: \theta=\theta_{0}$ and $H_{1}: \theta \neq \theta_{0}$ - the coin is either fair, or not.

Imagine, on the other hand, if we were testing the hypothesis that the coin is biased towards heads specifically. Such a test must be defined using a composite hypothesis: $H_{0}: P$ (heads) $=0.5$ and $H_{1}: P$ (heads) $>0.5$ . Using the same test statistic, improbably-low numbers of heads would not provide any confidence to claim that the coin is biased towards heads. We would therefore want to only reject the null hypothesis if the number of heads is improbably high. Thus, we define our rejection region, $R$ , as $R \equiv\{18,19,20\}$ .

Whenever only one tail of the distribution is used in the test (such as in this case), the test is called a one-tailed test (as opposed to the two-tailed test described previously). An important concept associated with one-tailed tests is the concept of a critical value. The critical value is a value $c$ , for which we can reject the null hypothesis if the test statistic, $T$ , is strictly greater (or strictly lower) than $c$ . In other words, it is the particular point in $R$ that separates the critical region from the acceptance region of the null hypothesis. For instance, in the heads-biased coin example the critical value is $c=17$ at confidence level $\alpha=0.0002$ . For two-tailed tests, we often end up having two critical values, one specifying the lower bound and the other one specifying the upper bound on the critical region.

To recap:

Simple Hypothesis: $H_{0}: \theta=\theta_{0}$ and $H_{1}: \theta \neq \theta_{0}$

Composite Hypothesis: $H_{0}: \theta=\theta_{0}$ and $H_{1}: \theta<\theta_{0}$ (or, alternatively $H_{1}: \theta>\theta_{0}$ )

Two-tailed test: Considers both of the improbable (extreme) regions of the null-distribution, used with simple hypotheses.

One-tailed test: Only one critical region is defined, used to test composite hypotheses.

Critical value: A value $c$ , for which we can reject a hypothesis, if $T>c$ (or, equivalently, $T<c$ ), where $T$ is the test statistic. For simple hypotheses there often will be two critical values.

Test Statistic and the Null Distribution Type I and Type II Errors, Power of the Test