Hypotheses about Distributions: Goodness-of-Fit Tests
The final application for statistical hypothesis testing that we will consider is whether the data sampled could be modelled by some distribution. For instance, we may be interested to see if the model we came up with is actually representative of a given the data set. The goodness-of-fit test considers the following hypotheses:
Essentially we are testing to see if there is any evidence in the sample that the chosen distribution is a bad fit. Note that evidence for rejecting a particular distributional model does not point to why it fails, and thus offers no guidance for finding an alternative (i.e. better-fitting) model.
Let's illustrate this concept with the help of an example. Suppose that for a particular experiment, we suspect that the number of particles suspended in a dusty gas could be modelled using a Poisson distribution. In order to verify this, we momentarily flash the light onto a microscope field and count the particles seen a couple of times. We record our findings in a following table:
Number of particles seen | 0 | 1 | 2 | 3 | 4 | 5 | Total | |
---|---|---|---|---|---|---|---|---|
Frequency | 34 | 46 | 38 | 19 | 4 | 2 | 0 | 143 |
Please note how we have discretised the infinite range of values that can take into bins. The test statistic for goodness-of-fit tests rely on this discretisation of the probability distribution. Namely, Pearsons chi-squared statistic is defined as:
Where:
is the number of discrete bins for the distribution (in the example )
is the number of observations falling into the -th bin (i.e. the frequency row in the table)
is the expected number of observations falling into the particular distribution, given the model
In order to compute the expected counts , we first fit some distribution of choice to the data. Since we are using Poisson distribution in our example, we only have one parameter to fit: . We therefore compute its MLE estimate:
We can now compute the probabilities associated with each of the bins .
Bin | Probability | |
---|---|---|
0 | 0.2385 | |
1 | 0.3418 | |
2 | 0.2450 | |
3 | 0.1171 | |
4 | 0.0420 | |
5 | 0.0120 | |
6 | 0.0036 |
If we were to randomly place the values into the bins, according to the probabilities specified in the table we would expect to obtain items in each of the bin, where is the total number of items placed (i.e. in this particular example). We will not prove this, but the result comes from a generalisation of Binomial probability distribution for more than two outcomes.
After computing each of the values, we can augment the table:
Number of particles seen | 0 | 1 | 2 | 3 | 4 | 5 | Total | |
---|---|---|---|---|---|---|---|---|
Observed | 34 | 46 | 38 | 19 | 4 | 2 | 0 | 143 |
Expected | 34.0993 | 48.8837 | 35.0390 | 16.7436 | 6.001 | 1.7205 | 0.5131 | 143 |
Squared difference | 0.0099 | 8.3156 | 8.7675 | 5.0914 | 4.0030 | 0.0781 | 0.26331 | - |
Note how values are fractional, but all still sum to the same . From this table we compute the value for our test statistic :
Under the null hypothesis the Pearson's chi-squared test statistic follows a chi-square distribution with degrees of freedom, where is the number of data points and is the number of parameters in the fitted distribution:
We complete the test by consulting the tables of chi-squared distribution. Note that this test is always a right-tailed test, and therefore we reject the null hypothesis if the test statistic is strictly greater than the critical value for a given significance level.
In the dust particles example we used, the critical value for at significance level is 11.070. In fact, the value corresponds to p-value of 0.8569 under the said distribution. Therefore we cannot reject the null hypothesis (and therefore suspect the data can be modelled using Poisson distribution).