Artificial Intelligence 🤖
Interval Estimation

Interval Estimation

Up until now we have been using Maximum Likelihood Estimation to generate point estimates for our model parameters θ\theta. In particular, we derived expressions for the estimators θ^\hat{\theta} as a function of random sample DD. You might have noticed by now that most of our maximum likelihood estimators for distributions considered thus far have been taking on similar functional forms:

 Bernoulli: p^=1ni=1nDi Poisson: λ^=1ni=1nDi Normal: μ^=1ni=1nDi\begin{aligned} \text { Bernoulli: } \hat{p} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ \text { Poisson: } \hat{\lambda} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ \text { Normal: } \hat{\mu} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \end{aligned}

That is, the estimators above are equal to the sample mean Xˉ=1ni=1nDi\bar{X}=\frac{1}{n} \sum_{i=1}^{n} D_{i}. This is somewhat intuitive, as we had previously shown the true value of these model parameters to be equal to the mean value of the population distribution E[X]E[X] (see Table 1 ). Thus, the best approximation to E[X]E[X] would be to compute the mean from a random sample D=(D1,D2,,Dn)D=\left(D_{1}, D_{2}, \ldots, D_{n}\right) drawn from the underlying distribution for XX.

However, we had discussed earlier that DD itself is a random variable that will change with each sampling trial; therefore the value of the estimate will also change for each random sample. Fortunately, for the estimates listed above we already know how the sample means Xˉ\bar{X} is distributed over many sampling trials (e.g. see sampling distribution of sample means in Figure 29), and we use these sampling distributions to quantify our uncertainty for any estimate we compute.

Sample Mean for Normal Random Variables

We begin by considering the exact solution where our random samples DD are drawn from a Normal distribution. Recall from our discussion on the Normal distribution in Chapter 3 that a Z-transformation of a Normal random variable follows a standard Normal distribution:

Z=XμσN(0,1)Z=\frac{X-\mu}{\sigma} \sim N(0,1)

We had also shown that the sum of Normal random variables is itself a Normal random variable (note that this is exact and not a CLT approximation). Thus, if a maximum likelihood estimator corresponds to the mean Xˉ=1ni=1nDi\bar{X}=\frac{1}{n} \sum_{i=1}^{n} D_{i} of a sample (D1,D2,,Dn)\left(D_{1}, D_{2}, \ldots, D_{n}\right) from a Normal distribution (as is the case for μ^)\hat{\mu}), then we can precisely state that our sampling distribution for Xˉ(=μ^)\bar{X}(=\hat{\mu}) is:

P[a(Xˉμ)σ/nb]=12πabez2/2dzP\left[a \leq \frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

Where μ\mu and σ2\sigma^{2} are the mean and variance of the underlying Normal population distribution, respectively. Thus, we would like to use this sampling distribution to understand how good our estimate μ^=Xˉ\hat{\mu}=\bar{X} is of the true population mean μ\mu. For simplicity sake, let's say that we are given the population variance σ2\sigma^{2}. Then we can rearrange the above inequalities to deduce bounds around the unknown parameter μ\mu based on an estimate Xˉ\bar{X} :

P[Xˉ+aσnμXˉ+bσn]=12πabez2/2dzP\left[\bar{X}+a \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+b \cdot \frac{\sigma}{\sqrt{n}}\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

Thus, by choosing bounds aa and bb, we can integrate the standard normal on the righthand side to compute the probability that our unknown model parameter μ\mu is contained within the interval [Xˉ+aσn,Xˉ+bσn]\left[\bar{X}+a \cdot \frac{\sigma}{\sqrt{n}}, \bar{X}+b \cdot \frac{\sigma}{\sqrt{n}}\right]; we refer to this interval as the confidence interval.

In order to use our sampling distribution to compute the confidence interval, we need to first specify a probability that the interval we construct using an estimate Xˉ\bar{X} will contain μ\mu; we define this as our confidence level:

Confidence Level: (1α)=(1-\alpha)= Probability that a confidence interval computed from Xˉ\bar{X} will contain the true parameter value μ\mu

=12πabez2/2dz=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

It is immediately apparent from the above expression that once we have selected a value for the confidence level (i.e. by selecting an α\alpha ), then the corresponding values for aa and bb are also fixed since the standard normal curve is symmetric about Z=(Xˉμ)σ/nZ=\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}}. To see this, consider the confidence intervals around Z=(Xˉμ)σ/nZ=\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} for confidence levels of 0.95(α=0.05)0.95(\alpha=0.05) and 0.99(α=0.01)0.99(\alpha=0.01), as shown in Figure 30. Due to the symmetry of the standard normal curve, b=ab=-a and we can see in Figure 30 that b=1.96b=1.96 for α=0.05\alpha=0.05 and b=2.575b=2.575 for α=0.01\alpha=0.01.

Computing Confidence Interval for Standard Normal using Lookup Tables

A convenient way to compute the confidence interval for a given confidence level is to use the standard Normal CDF lookup tables (see right side of Figure 30). Once we have specified our confidence level (1α)(1-\alpha), we just search the standard Normal CDF lookup table for the corresponding zα/2(=b=a)z_{\alpha / 2}(=b=-a) such that:

P(Zzα/2)=(1α/2)P\left(Z \leq z_{\alpha / 2}\right)=(1-\alpha / 2)

So for the following scenarios we have:

Confidence Level =(1α)=(1-\alpha)α\alphaα/2\alpha / 21α/21-\alpha / 2zα/2=(b=a)z_{\alpha / 2}=(b=-a) such that P(Zzα/2)=(1α/2)P\left(Z \leq z_{\alpha / 2}\right)=(1-\alpha / 2)
0.950.050.0250.9751.96
0.990.010.0050.9952.575

Table 2: Using the standard Normal CDF lookup table to compute confidence interval for a given confidence level =(1α)=(1-\alpha).

We can therefore equivalently express our confidence interval for μ\mu as:

P[Xˉzα/2σnμXˉ+zα/2σn]=(1α)P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}}\right]=(1-\alpha)

NOTE: It is often helpful to think of zα/2z_{\alpha / 2} as how many units of standard error (σn)\left(\frac{\sigma}{\sqrt{n}}\right) we allow the confidence interval to be. Thus, the confidence interval is Xˉ±zα/2×\bar{X} \pm z_{\alpha / 2} \times Standard Error.

Figure 30: Computing confidence intervals for parameter estimates that correspond to the mean of a sample of Normal random variables. First we specify the confidence level (1α)(1-\alpha), which is represented here as the shaded area under the curve centred around our estimate. For a) we have specified the confidence level to be 95%95 \%, corresponding to α=0.05\alpha=0.05; likewise for c) we have specified the confidence level to be 99%99 \%, corresponding to α=0.01\alpha=0.01. To find the corresponding confidence interval for a given confidence level, we then use the standard Normal CDF lookup table (right side). For α=0.05\alpha=0.05 in b), we find the bound to be zα/2=1.96z_{\alpha / 2}=1.96; similarly for α=0.01\alpha=0.01 in d\mathbf{d} ), we find the bound to be zα/2=2.575z_{\alpha / 2}=2.575.

Sample Mean for Normal Random Variables, Unknown Population Variance (σ2)\left(\sigma^{2}\right)

Thus far we have considered the case where the population variance σ2\sigma^{2} was a known quantity. However, this parameter is seldom known so it is not precisely correct to say that (Xˉμ)σ/n\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} follows a standard normal distribution. Indeed, in practice we can only ascertain information about σ2\sigma^{2} by looking at the sample DD available to us. Recall that we derived an expression for the sample variance S2S^{2} by correcting for the bias our MLE estimator for σ2^\hat{\sigma^{2}} as:

 Sample variance: S2=1n1i=1n(DiXˉ)2 Sample standard deviation: S=1n1i=1n(DiXˉ)2\begin{aligned} \text { Sample variance: } S^{2} & =\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2} \\ \text { Sample standard deviation: } S & =\sqrt{\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}} \end{aligned}

Thus we can use SS as an approximation for σ\sigma in computing the confidence intervals. But how does our uncertainty in σ\sigma manifest itself in the calculation of these confidence intervals? Intuitively we can appreciate that for large enough sample sizes, SS will be a fairly accurate approximation of σ\sigma. But how do we deal with small sample sizes (i.e. n<100n<100 )?

The credit for recognising that (Xˉμ)σ/n\frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} and (Xˉμ)S/n\frac{(\bar{X}-\mu)}{S / \sqrt{n}} do not follow the same distribution goes to William Sealy Gossett (who graduated from Oxford in 1899 with First Class degrees in Chemistry and Maths). As an employee at the Guinness brewery, Gossett worked on the task of making the art of brewery more scientific. However, for many of his experiments, the sample sizes were only on the order of 4 or 5 , and he knew that there was no possibly way of knowing the exact value of the true population variance σ2\sigma^{2} in his statistical calculations.

In 1908 Gossett published a new PDF that describes the distribution of (Xˉμ)S/n\frac{(\bar{X}-\mu)}{S / \sqrt{n}}. At that time, Guinness forbid any employees to publish papers for confidentiality reasons, so Gossett published these findings under the pen name of "Student". Thus, the resulting PDF was referred to as the Student tt-distribution (the ' tt ' in this name refers to its use in test statistics, which we will analyse in the next chapter).

Formally speaking, for a random sample DD of a Normal population distribution with mean μ\mu and variance σ2\sigma^{2}, sampling distribution of the sample mean Xˉ\bar{X} follows a Student tt-distribution with n1n-1 degrees of freedom (tn1)\left(t_{n-1}\right) :

T=XˉμS/ntn1T=\frac{\bar{X}-\mu}{S / \sqrt{n}} \sim t_{n-1}

Note here that we define the transform as TT instead of ZZ to distinguish it from the Z-transformation. The Student tt-distribution, tn1t_{n-1}, has a few interesting and important properties, which are illustrated graphically in Figure 31:

  1. It is bell-shaped and symmetric about zero, like the Normal distribution, but has "heavier tails" that approach zero probability at a slower rate.
  2. Its dispersion varies according the size of the sample, nn; the smaller the sample size, the greater the dispersion (to reflect the uncertainty in estimating the true variance). 3. As nn approaches ,tn1\infty, t_{n-1} converges to the standard Normal distribution N(0,1)N(0,1) (approximately n>100n>100 is close enough to use N(0,1))N(0,1)).

Figure 31: Difference between the Student t-distribution, tn1t_{n-1}, and standard Normal distribution, N(0,1)N(0,1). As the sample size increases from n=6n=6 to n=21n=21, we can see that tn1t_{n-1} becomes closer to the standard Normal distribution.

We can analogously define our confidence interval in terms of the Student tt-distribution for a Normal sample DD with unknown population variance σ2\sigma^{2} and sample size nn :

P[Xˉtα/2,n1SnμXˉ+tα/2,n1Sn]=(1α)P\left[\bar{X}-t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}}\right]=(1-\alpha)

Where we note in these tests that the Normal sample DD is used to compute both the sample mean and the sample standard deviation:

Xˉ=1ni=1nDiS=1n1i=1n(DiXˉ)2\begin{aligned} \bar{X} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ S & =\sqrt{\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}} \end{aligned}

The corresponding confidence interval bound tα/2,n1t_{\alpha / 2, n-1} can be found using a lookup table for the upper percentile of Student tt-distributions given a corresponding value for α\alpha and nn in a very similar manner to the standard Normal tables. Indeed, since tn1t_{n-1} is a symmetric distribution, we only need to need to find the value for the bound for the upper tail for a given confidence level 1α:P(Ttα/2,n1)1-\alpha: P\left(T \geq t_{\alpha / 2, n-1}\right) (where in contrast for the standard Normal we used the CDFP(Zzα/2))\left.\operatorname{CDF} P\left(Z \leq z_{\alpha / 2}\right)\right).

We illustrate this calculation graphically in Figure 32 for a confidence level of 95%(α=0.05)95 \%(\alpha=0.05) and sample sizes n=6n=6 and n=20n=20. Table 3 below also summarises the confidence interval bounds for additional α\alpha values and sample sizes (compare to Table 2):

α\alphaα/2\alpha / 2nnn1n-1tα/2,n1t_{\alpha / 2, n-1} such that P(Ttα/2,n1)=α/2P\left(T \geq t_{\alpha / 2, n-1}\right)=\alpha / 2
0.050.025652.5706
0.050.02511102.2281
0.050.02521202.0860
0.010.005654.0321
0.010.00511103.1693
0.010.00521202.8453

Table 3: Using the Student tt-distribution lookup table to compute confidence interval for a given α\alpha. Note that the 'degrees of freedom' is equal to n1n-1.

Again, it is often helpful to think of tα/2,n1t_{\alpha / 2, n-1} as how many units of standard error (Sn)\left(\frac{S}{\sqrt{n}}\right) we allow the confidence interval to be. Thus, we can write the confidence interval as Xˉ±tα/2,n1×\bar{X} \pm t_{\alpha / 2, n-1} \times Standard Error, just as we did in the case for the Normal population with known variance. However, we will generally expect a larger confidence interval (i.e. tα/2,n1zα/2t_{\alpha / 2, n-1} \geq z_{\alpha / 2} ) for a given α\alpha due to the additional uncertainty associated with estimating SS from the sample data.

For instance, we see in Table 3 for α=0.05\alpha=0.05 that as nn increases and our estimation of SS from the sample data becomes more accurate (i.e. SσS \sim \sigma ), t0.025,n1t_{0.025, n-1} approaches the standard Normal bound of z0.025=1.96z_{0.025}=1.96. Likewise as nn increases for α=0.01,t0.005,n1\alpha=0.01, t_{0.005, n-1} approaches the standard Normal bound of z0.005=2.575z_{0.005}=2.575. Using shorthand notation we can write the confidence interval for the estimate as:

Xˉ±tα/2,n1Sn\bar{X} \pm t_{\alpha / 2, n-1} \frac{S}{\sqrt{n}}

Figure 32: Computing confidence intervals for parameter estimates that correspond to the mean of a sample of Normal random variables. First we specify the confidence level (1α)(1-\alpha), which corresponds to the area under the curve centred around our estimate. For a) we have specified the area under the curve to be 95%95 \%, corresponding to α=0.05\alpha=0.05; likewise for c) we have specified the area under the curve to be 99%99 \%, corresponding to α=0.01\alpha=0.01. To find the corresponding values for the bounds around our estimate for a given confidence level, we then use the standard Normal lookup table. For α=0.05\alpha=0.05 in b), we find the bound to be zα/2=1.96z_{\alpha / 2}=1.96; similarly for α=0.01\alpha=0.01 in d\mathbf{d} ), we find the bound to be zα/2=2.575z_{\alpha / 2}=2.575.

Sample Mean for Non-Normal Distributions

So far we have considered how to compute confidence intervals for our sample means for a random sample from a Normal distribution. In the case where the population variance σ2\sigma^{2} was known:

Z=Xˉμσ/nN(0,1)P[Xˉzα/2σnμXˉ+zα/2σn]=(1α)\begin{gathered} Z=\frac{\bar{X}-\mu}{\sigma / \sqrt{n}} \sim N(0,1) \\ P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{\sigma}{\sqrt{n}}\right]=(1-\alpha) \end{gathered}

And in the case where the population variance is unknown (and must be estimated using the sample variance S2)\left.S^{2}\right) :

T=XˉμS/ntn1P[Xˉtα/2,n1SnμXˉ+tα/2,n1Sn]=(1α)\begin{gathered} T=\frac{\bar{X}-\mu}{S / \sqrt{n}} \sim t_{n-1} \\ P\left[\bar{X}-t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+t_{\alpha / 2, n-1} \cdot \frac{S}{\sqrt{n}}\right]=(1-\alpha) \end{gathered}

But we noted that for large enough nn (e.g. n>100n>100 ) that the Student tt-distribution converges to the standard Normal N(0,1)N(0,1), and as a result zα/2tα/2,n1z_{\alpha / 2} \approx t_{\alpha / 2, n-1} (i.e. the interval becomes the same).

However, we also seen for large sample sizes nn that the sample means Xˉ\bar{X} for a non-Normal population distribution follows a sampling distribution that is Normally distributed. For instance, Figure 29 shows that the sampling distribution of the sample means Wˉ\bar{W}, as computed from random samples of the exponential population distribution fW(wλ^=0.1)f_{W}(w \mid \hat{\lambda}=0.1) - (and therefore E[W]=1λ^μ,Var[W]=1λ^2σ)\left.E[W]=\frac{1}{\hat{\lambda}} \equiv \mu, \sqrt{\operatorname{Var}[W]}=\sqrt{\frac{1}{\hat{\lambda}^{2}}} \equiv \sigma\right), is itself a Normal distribution given by N(μ,σn)N\left(\mu, \frac{\sigma}{\sqrt{n}}\right).

Indeed this was the main result of the Central Limit Theorem at the end of Chapter 3: the sample means Xˉ\bar{X}, computed from random samples of any population distribution with mean μ\mu and variance σ2\sigma^{2}, is Normally distributed for large sample sizes nn :

limnP[a(Xˉμ)σ/nb]=12πabez2/2dz\lim _{n \rightarrow \infty} P\left[a \leq \frac{(\bar{X}-\mu)}{\sigma / \sqrt{n}} \leq b\right]=\frac{1}{\sqrt{2 \pi}} \int_{a}^{b} e^{-z^{2} / 2} d z

And since the CLT only applies to large nn, we can use the sample standard deviation SS to approximate σ2\sigma^{2} without introducing too much further variability into our confidence intervals:

P[Xˉzα/2SnμXˉ+zα/2Sn](1α)P\left[\bar{X}-z_{\alpha / 2} \cdot \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X}+z_{\alpha / 2} \cdot \frac{S}{\sqrt{n}}\right] \approx(1-\alpha)

Confidence Intervals for General Sample Statistics

We can generally define the procedure for constructing confidence intervals for any sample statistic as follows:

Let D1,,DnD_{1}, \ldots, D_{n} be a random sample DD from a population with an unknown parameter θ\theta. Given a confidence level (1α)(1-\alpha), and if l(D),u(D)l(D), u(D) are computed from sample statistics with the property that:

P[l(D)<θ<u(D)]=(1α)P[l(D)<\theta<u(D)]=(1-\alpha)

then we say that [l(D),u(D)][l(D), u(D)] is a 100(1α)%100(1-\alpha) \% confidence interval (CI)(\mathrm{CI}) for θ\theta. Thus, the confidence interval contains the true value of the parameter θ\theta with some known probability (1α)(1-\alpha).

To illustrate, consider an example for computing the confidence interval for μ\mu for a random sample DD from a Normal distribution and a confidence level of 95%(α=0.05)95 \%(\alpha=0.05). With the help of Figure 30 and Table 2 we find that:

θ=μ Sample mean: μ^=Xˉ=1ni=1nDil(D)=Xˉ1.96σnu(D)=Xˉ+1.96σn(1α)=12π1.961.96ez2/2dz=0.95\begin{aligned} \theta & =\mu \\ \text { Sample mean: } \hat{\mu}=\bar{X} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ l(D) & =\bar{X}-1.96 \cdot \frac{\sigma}{\sqrt{n}} \\ u(D) & =\bar{X}+1.96 \cdot \frac{\sigma}{\sqrt{n}} \\ (1-\alpha) & =\frac{1}{\sqrt{2 \pi}} \int_{-1.96}^{1.96} e^{-z^{2} / 2} d z=0.95 \end{aligned}