Artificial Intelligence 🤖
Type I and Type II Errors, Power of the Test

Type I and Type II Errors, Power of the Test

Whenever we are rejecting a null hypothesis, we are claiming that beyond a reasonable doubt the null hypothesis is false. However, even highly improbable events do occur occasionally. In hypothesis testing we can make two types of mistakes, which are summarised by the table below.

Reject H0H_{0}Do not reject H0H_{0}
H0H_{0} is TrueType I errorno error
H1H_{1} is Trueno errorType II error

Clearly from the table above, the Type I error occurs when the null hypothesis is rejected when it is actually true. We have already made attempts to safeguard against Type I errors by defining a rejection region a priori for the test statistic according to the specified significance level α\alpha (e.g. α=0.05\alpha=0.05 ). Therefore, the probability of such error is completely established by definition:

P( Type I error )≡P(T∈R∣H0 is True )=αP(\text { Type I error }) \equiv P\left(T \in R \mid H_{0} \text { is True }\right)=\alpha

Computing the probability of the Type II error is often tricky as in order to do this we need to specify the sampling distribution for the test statistic under the alternative hypothesis (H1)\left(H_{1}\right). We indicate this probability PP (Type II error) with the greek letter β\beta. An important concept related to the type II error probability is the power of the test which we define as 1−P1-P (Type II error) =1−β=1-\beta. Type I error null hypothesis is rejected, when it is actually true. P(P\left(\right. Reject H0∣H0=H_{0} \mid H_{0}= True) =α=\alpha, where α\alpha is the significance level of the test.

Type II error null hypothesis is not rejected, when the alternative hypothesis is true. P(P\left(\right. Do not reject H0∣H1=H_{0} \mid H_{1}= True )=β)=\beta

Power of the test is defined as 1−P1-P (Type II error) =1−β=1-\beta

In general, we assume that Type I error is more severe of the two.

When designing an experiment, we first fix the significance level of the test α\alpha (e.g. α=0.01\alpha=0.01 ) and then attempt to maximise the power of the test.

Figure 36: Type I and Type II errors for the coin flip example consisting of n=20n=20 trials for a significance level α=0.05\alpha=0.05. The null hypothesis is that the coin is fair (H0:P\left(H_{0}: P\right. (heads )=0.5)\left.)=0.5\right), and the alternative hypothesis is that the coin is heads-biased (H1:P\left(H_{1}: P\right. (heads) >0.5)\left.>0.5\right); for illustrative purposes, we have plotted the sampling distribution of the test statistic for a specific alternative hypothesis H1:P(H_{1}: P( heads )=0.75)=0.75. Thus, the Type II error is given by the sum of the sample distribution of the alternative hypothesis for all outcomes of the test statistic for which we do not reject H0H_{0} (i.e. the sum of P(T≤c∣H1)P\left(T \leq c \mid H_{1}\right); this results in β=0.382827\beta=0.382827.

The easiest way to illustrate these ideas is again by using an example. Let us revisit the composite hypothesis test that a coin is specifically biased towards heads. Recall our null hypothesis is H0:P(H_{0}: P( heads )=0.5)=0.5, and our alternative hypothesis is H1:PH_{1}: P (heads) >0.5>0.5. Of course there are an infinite number of sampling distributions for our alternative hypothesis, since the (unknown) true value for parameter pp could in theory be any value such that p>0.5p>0.5.

Imagine though that we were somehow given the true value of the distribution parameter pp. As an example, assume that the coin is heads for 3 out of every 4 flips (i.e. p=0.75p=0.75 ). Figure 36 shows a plot for the null distribution (p=0.5)(p=0.5) and the sampling distribution for the alternative hypothesis under consideration (p=0.75)(p=0.75). Here we have selected a significance level of α=0.05\alpha=0.05, which results in a rejection region R={15,16,17,18,19,20}R=\{15,16,17,18,19,20\}; thus we reject the null hypothesis if our test statistic TT is found in this region. However, we have also plotted the sampling distribution for the alternative hypothesis, and we see that there are several outcomes for which we accept the null hypothesis (i.e. T≤14T \leq 14 ) but have committed a Type II error since the alternative hypothesis is true! Summing our sampling distribution for the alternative hypothesis over all values T≤14T \leq 14 results in PP (Type II error) =β=0.383=\beta=0.383, and a Power of test value equal to 0.617 .

NOTE: There are a number of factors that influence the Power of a test, (1−β)(1-\beta). We can see from Figure 36 that if we decrease α\alpha that (1−β)(1-\beta) would in turn increase; however, we would never do this in practice! An experimenter should always set an significance level α\alpha to be in the neighborhood of 0.05 or less to safeguard against Type I errors.

Generally speaking, the power of the experiment can be maximised sampling a sufficient number of data points nn, as the standard error (SE) of our sample means was found to vary inversely to the square root of the sample size: σ/n\sigma / \sqrt{n}. Thus, the effect of decreasing σ\sigma will increase (1−β)(1-\beta), but one should avoid collecting sampling until a test is proven significant; in practice, α\alpha and nn should be parameters set before conducting the experiment.