Single Sample Inferences about the Population Mean:
The previous sections described a general framework for hypothesis testing. The remaining sections will provide a special treatment for the modelling of null distributions for common hypothesis tests. We begin by discussing methods for making claims about population means.
For instance, we may be interested in finding out whether the mean of a population has changed, given some new experimental design. We are concerned about testing the hypothesis that , that is that the true population parameter is equal to . The alternative hypothesis could either be simple or composite in this case. We will base our inference on , the MLE estimator of .
Testing for Normal Population with Known Variance,
Let's start with a population that we know is Normally distributed, with some variance that is known to us a priori. We are interested in modelling the mean of this population, .
In order to estimate this mean, we collect a sample , each drawn independently from the true population distribution: . We then compute the sample mean:
And recalling that the sum of several Normal random variables is always also Normally-distributed, we know the exact distribution of :
As we discussed in the previous chapter, we will use the Z-transformed distribution of to be our test statistic since we know it follows the standard Normal distrubution. We are particularly interested in modelling the null distribution (i.e. the null hypothesis that our true population parameter is ) of the test statistic:
Recall that will generally change depending for each sample selected from the same population. To quantify the uncertainty associated with these estimates, we had previously derived confidence intervals with a probability of of containing the population parameter :
Where
In hypothesis testing, we are essentially performing the opposite calculation in that we are interested in specifying those values for that are not likely to be observed assuming we know the true population parameter is . In other words, we seek to define the critical region assuming the null hypothesis to be true (given some simple or composite alternative hypothesis).
Consider the case where the alternative hypothesis we are testing is . Here we would reject the null hypothesis if the test statistic is improbably high (i.e. falls in the critical region) given a significance level . Recalling our tools for defining confidence intervals, we can define the critical value to be such that:
That is, we will reject the null hypothesis if is greater than some critical value for a given significance level :
Using the standard Normal tables, we see that the critical value for is ; therefore we would reject the null hypothesis if , as the critical region for the hypothesis is . A symmetrical line of reasoning applies for alternative hypotheses of a form (i.e. reject if .
However, for simple hypotheses , a two-tailed test is required Assuming , the rejection region would include both large positive and large negative values of each spanning of the probability space. From our standard Normal table, we define our rejection region to be since . As this particular rejection region is symmetric, we could summarise the rejection rule in terms of modulo of the test statistic: "reject if ".
Example: Long Jump Distances
In 15 attempts, a long jumper records the following distances (in metres):
Suppose that the distances are normally distributed and the population standard deviation is . Are the distances consistent with a mean jump length of , given a significance level ? What is the P-value?
Solution:
We first formally write down the hypotheses that we are interested in testing. In this example, we assume that the jump distances are normally distributed with some unknown mean, and known variance, i.e. , where we know . We want to test the hypotheses:
where (metres).
Since the variance of the population is known, we will use the standard Z-test to make claims about . Our test statistic, , will therefore follow the standard Normal distribution under the null hypothesis and is therefore defined as:
In this particular case we are testing a simple hypothesis, therefore a two-tailed test is required (i.e. each tail has a probability of . From the lookup table for standard Normal distribution given , we find that and therefore . Symmetrically, . The resulting rejection rule for our problem at a significance level of is: "we reject if the test statistic ". Given this rejection rule, we can compute by computing the mean of our sample, , and therefore claim that our test statistic is is equal to:
Applying the rejection rule above, we reject the null hypothesis and claim the jumps are not consistent with a mean jump length of 7 metres for a significance level of . The following figure illustrates our test graphically, where the blue line corresponds to our computed value :
By definition, the P-value is the lowest significance level at which the null hypothesis will be rejected. In order to find it, we look at the statistical probability tables again. In there we see that . Since a two-tailed test has been used in our case, we need to be aware that , and our -value is the sum of these two probabilities: P-value .
Testing for Normal Population with Unknown Variance, : Student's t-test
The previous section provides a convenient framework to deal with means of normally distributed data, when the variance is known. We noted when constructing confidence intervals that this is not likely the case for real-world problems, and if we were to use the sample standard deviation as an approximation for in computing the confidence intervals, then we required the Student -distribution to accurately model the distribution of the statistic. Thus, when the population variance is not know, we will represent the null distribution for the test statistic as:
Again remembering that is the Student -distribution with degrees of freedom and converges to the standard Normal in the limit that . We can therefore define the critical values for a given significance level by:
For example, to compute a one-sided test (i.e. ) for a significance level and a sample size of , we would use the lookup table for the Student -distribution to find such that :
Thus, the rejection region for is .
Example: Long Jump Distances, when variance is unknown
Suppose that we do not know the true population variance in the long jump distances example described in the previous section. Are the jump distances consistent with the mean jump length of 7 metres, given a significance level of 0.05 ? What is the P-value?
Solution:
We use the same experimental set up as in the previous example. Our test statistic is now defined in terms of the sample variance and follows the Student -distribution:
We can obtain critical values for the two-sided test under the Student -distribution with 14 degrees of freedom using the lookup table procedure described above for :
Therefore and similarly , allowing us to write down the rejection rule: "reject the null hypothesis if .
To compute the test statistic, , we first need to compute from the sample:
The equation above gives us , resulting in a test statistic of . Since we cannot reject the null hypothesis that the population mean is 7 metres. Graphically, this can be illustrated as follows:
We can use the CDF of the Student -distribution to compute the P-value, by noticing that, and, similarly, , therefore the -value is approximately 0.06 .
Testing for Non-Normal Population, Large Sample Size
If we do not know the distribution of population (or, in general, we do not think the distribution of the population is Normal), but the sample size is large enough, we can assume that the mean of the population is asymptotically Normal by the Central Limit Theorem. Essentially, this means that z- and t-tests can be applied.
Similarly, at a large , the sample variance becomes a good estimator to the true population variance (and, in fact, distribution starts to converge to the standard Normal), meaning that we can use the distribution as our null distribution, plugging in for .
Example: Apple Farmer
An apple farmer knows that a particular variety of tree should yield an average of 21.6 apples per tree. She is concerned that her trees of this variety are not performing as well as they should. From 400 trees the sample mean apple yield this year was 20.3 apples with sample standard deviation 8.4 apples. Should she be worried? What is the obvious flaw in this experiment? Assume the significance level is .
Solution:
The yield of every apple tree could be modelled by a Poisson distribution. Each tree would most likely have a different parameter for this distribution, depending on the soil, height, age of the tree, but we will assume that it is the same for all the trees in the population (this is the flaw of the experiment). Whilst each of the random variables is Poisson-distributed, their mean is asymptotically Normal. We are interested in testing the hypotheses: (i.e. the apple trees behave as expected) and (i.e. there's something wrong with the apple trees), given that we observed apples on average. A sample of apple trees is large enough for the Z-test to be applicable under the CLT. Under the null hypothesis, our test statistic follows the standard normal:
From our previous examples, we know that the critical values for a two-tailed Z-test at significance level are and . Therefore we would reject the null hypothesis if . Substituting the numbers in we get:
Since we claim that there is sufficient evidence to reject the null hypothesis for .