Artificial Intelligence 🤖
Interval Estimation

Interval Estimation

Up until now we have been using Maximum Likelihood Estimation to generate point estimates for our model parameters θ\theta. In particular, we derived expressions for the estimators θ^\hat{\theta} as a function of random sample DD. You might have noticed by now that most of our maximum likelihood estimators for distributions considered thus far have been taking on similar functional forms:

 Bernoulli: p^=1ni=1nDi Poisson: λ^=1ni=1nDi Normal: μ^=1ni=1nDi\begin{aligned} \text { Bernoulli: } \hat{p} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ \text { Poisson: } \hat{\lambda} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ \text { Normal: } \hat{\mu} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \end{aligned}

That is, the estimators above are equal to the sample mean Xˉ=1ni=1nDi\bar{X}=\frac{1}{n} \sum_{i=1}^{n} D_{i}. This is somewhat intuitive, as we had previously shown the true value of these model parameters to be equal to the mean value of the population distribution E[X]E[X] (see Table 1 ). Thus, the best approximation to E[X]E[X] would be to compute the mean from a random sample D=(D1,D2,,Dn)D=\left(D_{1}, D_{2}, \ldots, D_{n}\right) drawn from the underlying distribution for XX.

However, we had discussed earlier that DD itself is a random variable that will change with each sampling trial; therefore the value of the estimate will also change for each random sample. Fortunately, for the estimates listed above we already know how the sample means Xˉ\bar{X} is distributed over many sampling trials (e.g. see sampling distribution of sample means in Figure 29), and we use these sampling distributions to quantify our uncertainty for any estimate we compute.

Sample Mean for Normal Random Variables

We begin by considering the exact solution where our random samples DD are drawn from a Normal distribution. Recall from our discussion on the Normal distribution in Chapter 3 that a Z-transformation of a Normal random variable follows a standard Normal distribution:

Z=XμσN(0,1)Z=\frac{X-\mu}{\sigma} \sim N(0,1)

We had also shown that the sum of Normal random variables is itself a Normal random variable (note that this is exact and not a CLT approximation). Thus, if a maximum likelihood estimator corresponds to the mean Xˉ=1ni=1nDi\bar{X}=\frac{1}{n} \sum_{i=1}^{n} D_{i} of a sample (D1,D2,,Dn)\left(D_{1}, D_{2}, \ldots, D_{n}\right) from a Normal distribution (as is the case for μ^)\hat{\mu}), then we can precisely state that our sampling distribution for Xˉ(=μ^)\bar{X}(=\hat{\mu}) is:

P[a(Xˉμ)σ/nb]=12πabez2/2dzP\left[a \leq \frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

Where μ\mu and σ2\sigma^{2} are the mean and variance of the underlying Normal population distribution, respectively. Thus, we would like to use this sampling distribution to understand how good our estimate μ^=Xˉ\hat{\mu}=\bar{X} is of the true population mean μ\mu. For simplicity sake, let's say that we are given the population variance σ2\sigma^{2}. Then we can rearrange the above inequalities to deduce bounds around the unknown parameter μ\mu based on an estimate Xˉ\bar{X} :

P[Xˉ+aσnμXˉ+bσn]=12πabez2/2dzP\left[\bar{X}+a \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+b \cdot \frac{\sigma}{\sqrt{n}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

Thus, by choosing bounds aa and bb, we can integrate the standard normal on the righthand side to compute the probability that our unknown model parameter μ\mu is contained within the interval [Xˉ+aσn,Xˉ+bσn]\left[\bar{X}+a \cdot \frac{\sigma}{\sqrt{n}}, \bar{X}+b \cdot \frac{\sigma}{\sqrt{n}}\right]; we refer to this interval as the confidence interval.

In order to use our sampling distribution to compute the confidence interval, we need to first specify a probability that the interval we construct using an estimate Xˉ\bar{X} will contain μ\mu; we define this as our confidence level:

Confidence Level: (1α)=(1-\alpha)= Probability that a confidence interval computed from Xˉ\bar{X} will contain the true parameter value μ\mu

=12πabez2/2dz=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

It is immediately apparent from the above expression that once we have selected a value for the confidence level (i.e. by selecting an α\alpha ), then the corresponding values for aa and bb are also fixed since the standard normal curve is symmetric about Z=(Xˉμ)σ/nZ=\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}}. To see this, consider the confidence intervals around Z=(Xˉμ)σ/nZ=\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} for confidence levels of 0.95(α=0.05)0.95(\alpha=0.05) and 0.99(α=0.01)0.99(\alpha=0.01), as shown in Figure 30. Due to the symmetry of the standard normal curve, b=ab=-a and we can see in Figure 30 that b=1.96b=1.96 for α=0.05\alpha=0.05 and b=2.575b=2.575 for α=0.01\alpha=0.01.

Computing Confidence Interval for Standard Normal using Lookup Tables

A convenient way to compute the confidence interval for a given confidence level is to use the standard Normal CDF lookup tables (see right side of Figure 30). Once we have specified our confidence level (1α)(1-\alpha), we just search the standard Normal CDF lookup table for the corresponding zα/2(=b=a)z_{\alpha / 2}(=b=-a) such that:

P(Zzα/2)=(1α/2)P\left(Z \leq z_{\alpha / 2}\right)=(1-\alpha / 2)

So for the following scenarios we have:

Confidence Level =(1α)=(1-\alpha)α\alphaα/2\alpha / 21α/21-\alpha / 2zα/2=(b=a)z_{\alpha / 2}=(b=-a) such that P(Zzα/2)=(1α/2)P\left(Z \leq z_{\alpha / 2}\right)=(1-\alpha / 2)
0.950.050.0250.9751.96
0.990.010.0050.9952.575

Table 2: Using the standard Normal CDF lookup table to compute confidence interval for a given confidence level =(1α)=(1-\alpha).

We can therefore equivalently express our confidence interval for μ\mu as:

P[Xˉzα/2σnμXˉ+zα/2σn]=(1α)P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}}\right]=(1-\alpha)

NOTE: It is often helpful to think of zα/2z_{\alpha / 2} as how many units of standard error (σn)\left(\frac{\sigma}{\sqrt{n}}\right) we allow the confidence interval to be. Thus, the confidence interval is Xˉ±zα/2×\bar{X} \pm z_{\alpha / 2} \times Standard Error.

Figure 30: Computing confidence intervals for parameter estimates that correspond to the mean of a sample of Normal random variables. First we specify the confidence level (1α)(1-\alpha), which is represented here as the shaded area under the curve centred around our estimate. For a) we have specified the confidence level to be 95%95 \%, corresponding to α=0.05\alpha=0.05; likewise for c) we have specified the confidence level to be 99%99 \%, corresponding to α=0.01\alpha=0.01. To find the corresponding confidence interval for a given confidence level, we then use the standard Normal CDF lookup table (right side). For α=0.05\alpha=0.05 in b), we find the bound to be zα/2=1.96z_{\alpha / 2}=1.96; similarly for α=0.01\alpha=0.01 in d\mathbf{d} ), we find the bound to be zα/2=2.575z_{\alpha / 2}=2.575.

Sample Mean for Normal Random Variables, Unknown Population Variance (σ2)\left(\sigma^{2}\right)

Thus far we have considered the case where the population variance σ2\sigma^{2} was a known quantity. However, this parameter is seldom known so it is not precisely correct to say that (Xˉμ)σ/n\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} follows a standard normal distribution. Indeed, in practice we can only ascertain information about σ2\sigma^{2} by looking at the sample DD available to us. Recall that we derived an expression for the sample variance S2S^{2} by correcting for the bias our MLE estimator for σ2^\hat{\sigma^{2}} as:

 Sample variance: S2=1n1i=1n(DiXˉ)2 Sample standard deviation: S=1n1i=1n(DiXˉ)2\begin{aligned} \text { Sample variance: } S^{2} & =\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2} \\ \text { Sample standard deviation: } S & =\sqrt{\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}} \end{aligned}

Thus we can use SS as an approximation for σ\sigma in computing the confidence intervals. But how does our uncertainty in σ\sigma manifest itself in the calculation of these confidence intervals? Intuitively we can appreciate that for large enough sample sizes, SS will be a fairly accurate approximation of σ\sigma. But how do we deal with small sample sizes (i.e. n<100n<100 )?

The credit for recognising that (Xˉμ)σ/n\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} and (Xˉμ)S/n\frac{(\bar{X}-\mu)}{S / \sqrt{n}} do not follow the same distribution goes to William Sealy Gossett (who graduated from Oxford in 1899 with First Class degrees in Chemistry and Maths). As an employee at the Guinness brewery, Gossett worked on the task of making the art of brewery more scientific. However, for many of his experiments, the sample sizes were only on the order of 4 or 5 , and he knew that there was no possibly way of knowing the exact value of the true population variance σ2\sigma^{2} in his statistical calculations.

In 1908 Gossett published a new PDF that describes the distribution of (Xˉμ)S/n\frac{(\bar{X}-\mu)}{S / \sqrt{n}}. At that time, Guinness forbid any employees to publish papers for confidentiality reasons, so Gossett published these findings under the pen name of "Student". Thus, the resulting PDF was referred to as the Student tt-distribution (the ' tt ' in this name refers to its use in test statistics, which we will analyse in the next chapter).

Formally speaking, for a random sample DD of a Normal population distribution with mean μ\mu and variance σ2\sigma^{2}, sampling distribution of the sample mean Xˉ\bar{X} follows a Student tt-distribution with n1n-1 degrees of freedom (tn1)\left(t_{n-1}\right) :

T=XˉμS/ntn1T=\frac{\bar{X}-\mu}{S / \sqrt{n}} \sim t_{n-1}

Note here that we define the transform as TT instead of ZZ to distinguish it from the Z-transformation. The Student tt-distribution, tn1t_{n-1}, has a few interesting and important properties, which are illustrated graphically in Figure 31:

  1. It is bell-shaped and symmetric about zero, like the Normal distribution, but has "heavier tails" that approach zero probability at a slower rate.
  2. Its dispersion varies according the size of the sample, nn; the smaller the sample size, the greater the dispersion (to reflect the uncertainty in estimating the true variance). 3. As nn approaches ,tn1\infty, t_{n-1} converges to the standard Normal distribution N(0,1)N(0,1) (approximately n>100n>100 is close enough to use N(0,1))N(0,1)).

Figure 31: Difference between the Student t-distribution, tn1t_{n-1}, and standard Normal distribution, N(0,1)N(0,1). As the sample size increases from n=6n=6 to n=21n=21, we can see that tn1t_{n-1} becomes closer to the standard Normal distribution.

We can analogously define our confidence interval in terms of the Student tt-distribution for a Normal sample DD with unknown population variance σ2\sigma^{2} and sample size nn :

P[Xˉtα/2,n1SnμXˉ+tα/2,n1Sn]=(1α)P\left[\bar{X}-t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}}\right]=(1-\alpha)

Where we note in these tests that the Normal sample DD is used to compute both the sample mean and the sample standard deviation:

Xˉ=1ni=1nDiS=1n1i=1n(DiXˉ)2\begin{aligned} \bar{X} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ S & =\sqrt{\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}} \end{aligned}

The corresponding confidence interval bound tα/2,n1t_{\alpha / 2, n-1} can be found using a lookup table for the upper percentile of Student tt-distributions given a corresponding value for α\alpha and nn in a very similar manner to the standard Normal tables. Indeed, since tn1t_{n-1} is a symmetric distribution, we only need to need to find the value for the bound for the upper tail for a given confidence level 1α:P(Ttα/2,n1)1-\alpha: P\left(T \geq t_{\alpha / 2, n-1}\right) (where in contrast for the standard Normal we used the CDFP(Zzα/2))\left.\operatorname{CDF} P\left(Z \leq z_{\alpha / 2}\right)\right).

We illustrate this calculation graphically in Figure 32 for a confidence level of 95%(α=0.05)95 \%(\alpha=0.05) and sample sizes n=6n=6 and n=20n=20. Table 3 below also summarises the confidence interval bounds for additional α\alpha values and sample sizes (compare to Table 2):

α\alphaα/2\alpha / 2nnn1n-1tα/2,n1t_{\alpha / 2, n-1} such that P(Ttα/2,n1)=α/2P\left(T \geq t_{\alpha / 2, n-1}\right)=\alpha / 2
0.050.025652.5706
0.050.02511102.2281
0.050.02521202.0860
0.010.005654.0321
0.010.00511103.1693
0.010.00521202.8453

Table 3: Using the Student tt-distribution lookup table to compute confidence interval for a given α\alpha. Note that the 'degrees of freedom' is equal to n1n-1.

Again, it is often helpful to think of tα/2,n1t_{\alpha / 2, n-1} as how many units of standard error (Sn)\left(\frac{S}{\sqrt{n}}\right) we allow the confidence interval to be. Thus, we can write the confidence interval as Xˉ±tα/2,n1×\bar{X} \pm t_{\alpha / 2, n-1} \times Standard Error, just as we did in the case for the Normal population with known variance. However, we will generally expect a larger confidence interval (i.e. tα/2,n1zα/2t_{\alpha / 2, n-1} \geq z_{\alpha / 2} ) for a given α\alpha due to the additional uncertainty associated with estimating SS from the sample data.

For instance, we see in Table 3 for α=0.05\alpha=0.05 that as nn increases and our estimation of SS from the sample data becomes more accurate (i.e. SσS \sim \sigma ), t0.025,n1t_{0.025, n-1} approaches the standard Normal bound of z0.025=1.96z_{0.025}=1.96. Likewise as nn increases for α=0.01,t0.005,n1\alpha=0.01, t_{0.005, n-1} approaches the standard Normal bound of z0.005=2.575z_{0.005}=2.575. Using shorthand notation we can write the confidence interval for the estimate as:

Xˉ±tα/2,n1Sn\bar{X} \pm t_{\alpha / 2, n-1} \frac{S}{\sqrt{n}}

Figure 32: Computing confidence intervals for parameter estimates that correspond to the mean of a sample of Normal random variables. First we specify the confidence level (1α)(1-\alpha), which corresponds to the area under the curve centred around our estimate. For a) we have specified the area under the curve to be 95%95 \%, corresponding to α=0.05\alpha=0.05; likewise for c) we have specified the area under the curve to be 99%99 \%, corresponding to α=0.01\alpha=0.01. To find the corresponding values for the bounds around our estimate for a given confidence level, we then use the standard Normal lookup table. For α=0.05\alpha=0.05 in b), we find the bound to be zα/2=1.96z_{\alpha / 2}=1.96; similarly for α=0.01\alpha=0.01 in d\mathbf{d} ), we find the bound to be zα/2=2.575z_{\alpha / 2}=2.575.

Sample Mean for Non-Normal Distributions

So far we have considered how to compute confidence intervals for our sample means for a random sample from a Normal distribution. In the case where the population variance σ2\sigma^{2} was known:

Z=Xˉμσ/nN(0,1)P[Xˉzα/2σnμXˉ+zα/2σn]=(1α)\begin{gathered} Z=\frac{\bar{X}-\mu}{\sigma / \sqrt{n}} \sim N(0,1) \\ P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}}\right]=(1-\alpha) \end{gathered}

And in the case where the population variance is unknown (and must be estimated using the sample variance S2)\left.S^{2}\right) :

T=XˉμS/ntn1P[Xˉtα/2,n1SnμXˉ+tα/2,n1Sn]=(1α)\begin{gathered} T=\frac{\bar{X}-\mu}{S / \sqrt{n}} \sim t_{n-1} \\ P\left[\bar{X}-t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}}\right]=(1-\alpha) \end{gathered}

But we noted that for large enough nn (e.g. n>100n>100 ) that the Student tt-distribution converges to the standard Normal N(0,1)N(0,1), and as a result zα/2tα/2,n1z_{\alpha / 2} \approx t_{\alpha / 2, n-1} (i.e. the interval becomes the same).

However, we also seen for large sample sizes nn that the sample means Xˉ\bar{X} for a non-Normal population distribution follows a sampling distribution that is Normally distributed. For instance, Figure 29 shows that the sampling distribution of the sample means Wˉ\bar{W}, as computed from random samples of the exponential population distribution fW(wλ^=0.1)f_{W}(w \mid \hat{\lambda}=0.1) - (and therefore E[W]=1λ^μ,Var[W]=1λ^2σ)\left.E[W]=\frac{1}{\hat{\lambda}} \equiv \mu, \sqrt{\operatorname{Var}[W]}=\sqrt{\frac{1}{\hat{\lambda}^{2}}} \equiv \sigma\right), is itself a Normal distribution given by N(μ,σn)N\left(\mu, \frac{\sigma}{\sqrt{n}}\right).

Indeed this was the main result of the Central Limit Theorem at the end of Chapter 3: the sample means Xˉ\bar{X}, computed from random samples of any population distribution with mean μ\mu and variance σ2\sigma^{2}, is Normally distributed for large sample sizes nn :

limnP[a(Xˉμ)σ/nb]=12πabez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

And since the CLT only applies to large nn, we can use the sample standard deviation SS to approximate σ2\sigma^{2} without introducing too much further variability into our confidence intervals:

P[Xˉzα/2SnμXˉ+zα/2Sn](1α)P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{S}{\sqrt{n}}\right] \approx(1-\alpha)

Confidence Intervals for General Sample Statistics

We can generally define the procedure for constructing confidence intervals for any sample statistic as follows:

Let D1,,DnD_{1}, \ldots, D_{n} be a random sample DD from a population with an unknown parameter θ\theta. Given a confidence level (1α)(1-\alpha), and if l(D),u(D)l(D), u(D) are computed from sample statistics with the property that:

P[l(D)<θ<u(D)]=(1α)P[l(D)<\theta<u(D)]=(1-\alpha)

then we say that [l(D),u(D)][l(D), u(D)] is a 100(1α)%100(1-\alpha) \% confidence interval (CI)(\mathrm{CI}) for θ\theta. Thus, the confidence interval contains the true value of the parameter θ\theta with some known probability (1α)(1-\alpha).

To illustrate, consider an example for computing the confidence interval for μ\mu for a random sample DD from a Normal distribution and a confidence level of 95%(α=0.05)95 \%(\alpha=0.05). With the help of Figure 30 and Table 2 we find that:

θ=μ Sample mean: μ^=Xˉ=1ni=1nDil(D)=Xˉ1.96σnu(D)=Xˉ+1.96σn(1α)=12π1.961.96ez2/2dz=0.95\begin{aligned} \theta & =\mu \\ \text { Sample mean: } \hat{\mu}=\bar{X} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ l(D) & =\bar{X}-1.96 \cdot \frac{\sigma}{\sqrt{n}} \\ u(D) & =\bar{X}+1.96 \cdot \frac{\sigma}{\sqrt{n}} \\ (1-\alpha) & =\frac{1}{\sqrt{2 \pi}} \int_{-1.96}^{1.96} e^{-z^{2} / 2} d z=0.95 \end{aligned}

Thus 95%\mathbf{9 5 \%} confidence interval for μ\mu is given as:

[Xˉ1.96σn,Xˉ+1.96σn]\left[\bar{X}-1.96 \cdot \frac{\sigma}{\sqrt{n}}, \bar{X}+1.96 \cdot \frac{\sigma}{\sqrt{n}}\right]

Again, this means that the true population parameter μ\mu will be within this interval 95%95 \% of the time. In other words, if we were to conduct 100 sampling trials for DD, then the computed confidence interval would contain μ95\mu 95 times on average.

Example: Sampling Distribution of Sample Variance for a Normal Population

Thus far our sample statistic of interest has been the sample mean Xˉ\bar{X}, since it has been equal to most of our Maximum Likelihood Estimators for the probability models considered in the course:

θ^=Xˉ=1ni=1nDi\hat{\theta}=\bar{X}=\frac{1}{n} \sum_{i=1}^{n} D_{i}

However, we saw that the MLE of the variance of a Normal population distribution (σ2)\left(\sigma^{2}\right) was equal to the sample variance S2S^{2} :

σ^2=S2=1n1i=1n(DiXˉ)2\hat{\sigma}^{2}=S^{2}=\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}

This sample variance S2S^{2} is a statistic we have not analysed in much detail, but remember that for all statistics we can compute their corresponding sampling distributions.

Let us consider the example covered in lecture, where we sampled 10 random UG student heights (n=10)(n=10) per day for a year (365 days, or 365 sampling trials). We had used this data to compute the sampling distribution for the sample means (i.e. the distribution of mean heights computed each day), as shown in Figure 33a. We had also shown that this sampling distribution was very accurately approximated by a Normal distribution of Xˉ\bar{X} using the fact that E[Xˉ]=μE[\bar{X}]=\mu and Var[Xˉ]=σ2/n\operatorname{Var}[\bar{X}]=\sigma^{2} / n (i.e the standard error), where μ\mu is the population mean and σ2\sigma^{2} is the population variance. Generally we do not know μ\mu and σ\sigma, but temporarily assuming we had these values allowed us to perform a Z-transformation on Xˉ\bar{X} (Figure 33b) and then derive the confidence interval for μ\mu as a function of Xˉ\bar{X} and some multiple of the standard error (see previous sections for details).

In an analogous fashion, using this data can also construct a sampling distribution for the sample variance S2S^{2}, which is shown in Figure 33c. We can see that the sampling distribution for the sample variance generally skews to the right, and so one could guess that a Normal distribution (or any other symmetric distribution) would not be a good model. However, it turns out that if we transform S2S^{2} by diving by σ2/(n1)\sigma^{2} /(n-1) then this sampling distribution is modelled by the chi-squared distribution:

Let D1,,DnD_{1}, \ldots, D_{n} be a random sample DD from a Normal population with mean μ\mu and variance σ2\sigma^{2}. The ratio of the sampling distribution for the sample variance S2S^{2} over σ2/(n1)\sigma^{2} /(n-1) follows a chi-squared distribution of n1n-1 degrees of freedom.

(n1)S2σ2χn12\frac{(n-1) S^{2}}{\sigma^{2}} \sim \chi_{n-1}^{2}

As we did for the Normal and Student tt-distributions, for a given confidence level (1α)(1-\alpha) we can define a confidence interval for (n1)S2σ2\frac{(n-1) S^{2}}{\sigma^{2}} in terms of bounds χ1α/2,n12\chi_{1-\alpha / 2, n-1}^{2} and χα/2,n12\chi_{\alpha / 2, n-1}^{2} as follows:

P[χ1α/2,n12(n1)S2σ2χα/2,n12]=(1α)P\left[\chi_{1-\alpha / 2, n-1}^{2} \leq \frac{(n-1) S^{2}}{\sigma^{2}} \leq \chi_{\alpha / 2, n-1}^{2}\right]=(1-\alpha)

In this case we are interested in constructing a confidence interval for the unknown population variance σ2\sigma^{2}, so we take the reciprocal of the above expression:

P[1χ1α/2,n12σ2(n1)S21χα/2,n12]=(1α)P\left[\frac{1}{\chi_{1-\alpha / 2, n-1}^{2}} \geq \frac{\sigma^{2}}{(n-1) S^{2}} \geq \frac{1}{\chi_{\alpha / 2, n-1}^{2}}\right]=(1-\alpha)

Finally multiplying each term by (n1)S2(n-1) S^{2} gives us our confidence interval for the unknown population variance σ2\sigma^{2} :

P[(n1)S2χα/2,n12σ2(n1)S2χ1α/2,n12]=(1α)P\left[\frac{(n-1) S^{2}}{\chi_{\alpha / 2, n-1}^{2}} \leq \sigma^{2} \leq \frac{(n-1) S^{2}}{\chi_{1-\alpha / 2, n-1}^{2}}\right]=(1-\alpha)

Figure 33: a) Sampling distribution of sample means Xˉ\bar{X} of 10 student heights for 365 sampling trials. We see that it is normally distributed, and well modelled by N(E[Xˉ],Var[Xˉ])N(E[\bar{X}], \sqrt{\operatorname{Var}[\bar{X}]}). b) Z-transformation of sampling distribution of sample means to standardise calculation of confidence intervals for unknown population mean μ\mu. c) Sampling distribution of sample variance S2S^{2} of 10 student heights for 365 sampling trials. We see that it can be modelled by some asymmetric distribution, as it skews to the right. d) The ratio of S2S^{2} to σ2/(n1)\sigma^{2} /(n-1) is modelled by a chi-squared distribution for (n1)(n-1) degrees of freedom (χn12)\left(\chi_{n-1}^{2}\right), which will be used to compute confidence intervals for unknown population variance σ2\sigma^{2}.

Computing Chi-squared Confidence Bounds

Note that bounds χα/2,n12\chi_{\alpha / 2, n-1}^{2} and χ1α/2,n12\chi_{1-\alpha / 2, n-1}^{2} are not equal and opposite, as the chi-squared distribution is not symmetric (as is evident in Figure 33d). To illustrate, let us compute these bounds for the chi-squared distribution in Figure 33(χ92)33\left(\chi_{9}^{2}\right) for a 95%95 \% confidence level (α=0.05)(\alpha=0.05). Figure 34 a shows where these bounds are for areas of α/2\alpha / 2 under the curve of the left and right tails of χ92\chi_{9}^{2}.

As with the Student tt-distribution, we are often given lookup tables in the form of P(Y>y)=pP(Y>y)=p for the chi-squared distribution, where Yχn12Y \sim \chi_{n-1}^{2}. Therefore, to compute the bounds we need to find the following for the lower bound χ1α/2,n12\chi_{1-\alpha / 2, n-1}^{2} (as shown in Figure 34b):

P(Yχ1α/2,n12)=α/2P(Yχ0.975,92)=0.975P(Y2.7)=0.975\begin{aligned} P\left(Y \geq \chi_{1-\alpha / 2, n-1}^{2}\right) & =\alpha / 2 \\ P\left(Y \geq \chi_{0.975,9}^{2}\right) & =0.975 \\ P(Y \geq 2.7) & =0.975 \end{aligned}

Thus the lower bound is χ0.975,92=2.7\chi_{0.975,9}^{2}=2.7. Analogously we can compute the upper bound χα/2,n12\chi_{\alpha / 2, n-1}^{2} (shown in Figure 34c):

P(Yχα/2,n12)=α/2P(Yχ0.025,92)=0.025P(Y19.023)=0.025\begin{aligned} P\left(Y \geq \chi_{\alpha / 2, n-1}^{2}\right) & =\alpha / 2 \\ P\left(Y \geq \chi_{0.025,9}^{2}\right) & =0.025 \\ P(Y \geq 19.023) & =0.025 \end{aligned}

And we find the corresponding upper bound is χ0.025,92=19.023\chi_{0.025,9}^{2}=19.023.

Figure 34: a) 95%95 \% confidence interval for chi-squared distribution and n=10:χ92n=10: \chi_{9}^{2}. b) How to compute lower bound, χ1α/2,n12\chi_{1-\alpha / 2, n-1}^{2}, using chi-squared lookup table. c) How to compute upper bound, χα/2,n12\chi_{\alpha / 2, n-1}^{2}, using chi-squared lookup table.