Artificial Intelligence 🤖
MLE of a Normal Distribution: Sample Mean and Sample Variance

MLE of a Normal Distribution: Sample Mean and Sample Variance

Before we begin our section on interval estimation, we will consider the MLE for Normal parameters μ\mu and σ\sigma. Recall the PDF for the Normal distribution, as given in Table 1:

fX(xμ,σ)=12πσ2e(xμ)2/2σ2f_{X}(x \mid \mu, \sigma)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-(x-\mu)^{2} / 2 \sigma^{2}}

From which we can derive the likelihood function for a sample dataset DD as:

L(θ={μ,σ})=P(Dμ,σ)=i=1nfX(Diμ,σ)=i=1n12πσ2e(Diμ)2/2σ2=(2πσ2)n/2e12σ2i=1n(Diμ)2\begin{aligned} L(\theta=\{\mu, \sigma\})=P(D \mid \mu, \sigma)=\prod_{i=1}^{n} f_{X}\left(D_{i} \mid \mu, \sigma\right) & =\prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\left(D_{i}-\mu\right)^{2} / 2 \sigma^{2}} \\ & =\left(2 \pi \sigma^{2}\right)^{-n / 2} e^{-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(D_{i}-\mu\right)^{2}} \end{aligned}

Taking the natural logarithm we get the corresponding log-likelihood:

L(μ,σ)=lnL(μ,σ)=n2ln(2πσ2)12σ2i=1n(Diμ)2\mathcal{L}(\mu, \sigma)=\ln L(\mu, \sigma)=-\frac{n}{2} \ln \left(2 \pi \sigma^{2}\right)-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(D_{i}-\mu\right)^{2}

Note that in this case, we need to solve for μ\mu and σ\sigma simultaneously; this will require setting the partial derivative of the log-likelihood function with respect to these variables to be equal to zero:

L(μ,σ)μ=0=1σ2i=1n(Diμ)L(μ,σ)σ2=0=n2σ2+12(1σ2)2i=1n(Diμ)2\begin{aligned} & \frac{\partial \mathcal{L}(\mu, \sigma)}{\partial \mu}=0=\frac{1}{\sigma^{2}} \sum_{i=1}^{n}\left(D_{i}-\mu\right) \\ & \frac{\partial \mathcal{L}(\mu, \sigma)}{\partial \sigma^{2}}=0=-\frac{n}{2 \sigma^{2}}+\frac{1}{2}\left(\frac{1}{\sigma^{2}}\right)^{2} \sum_{i=1}^{n}\left(D_{i}-\mu\right)^{2} \end{aligned}

We can easily solve the first equation for μ\mu :

i=1n(Diμ)=0nμ=i=1nDi\begin{aligned} \sum_{i=1}^{n}\left(D_{i}-\mu\right) & =0 \\ n \mu & =\sum_{i=1}^{n} D_{i} \end{aligned} μ^=1ni=1nDi=Xˉ\hat{\mu}=\frac{1}{n} \sum_{i=1}^{n} D_{i}=\bar{X}

Once again (and perhaps not surprising), we find that the MLE for μ\mu is the sample mean. We can then substitute our maximum likelihood estimate for μ=Xˉ\mu=\bar{X} into the equation for L(μ,σ)σ2\frac{\partial \mathcal{L}(\mu, \sigma)}{\partial \sigma^{2}} :

nσ2+i=1n(DiXˉ)2=0-n \sigma^{2}+\sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}=0 σ^2=1ni=1n(DiXˉ)2\hat{\sigma}^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}

This is a useful property of Normal distribution, whose two parameters correspond to the mean and variance of the population. In other words, sample mean and sample variance are the MLE estimates for these two parameters for Normal distribution. Before stating this formally, we shall first correct for the bias in the expression for σ2\sigma^{2}.

Correcting the Bias in MLE of σ2\sigma^{2}

Recall that when we derived the MSE(θ^)M S E(\hat{\theta}) (section 5.2.1), we defined a term known as the bias:

Bias(θ^)=(E[θ^]θ)\operatorname{Bias}(\hat{\theta})=(E[\hat{\theta}]-\theta)

Thus, the bias of an estimator is exactly zero when E[θ^]=θE[\hat{\theta}]=\theta; let's analyse this for our maximum likelihood estimator for σ2\sigma^{2} :

E[θ^]=E[σ^2]=E[1ni=1n(DiXˉ)2]=E[1ni=1n(Di22DiXˉ+Xˉ2)]=E[1n(i=1nDi2nXˉ2)]=1n[i=1nE[Di2]nE[Xˉ2]]\begin{aligned} E[\hat{\theta}]=E\left[\hat{\sigma}^{2}\right] & =E\left[\frac{1}{n} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}\right] \\ & =E\left[\frac{1}{n} \sum_{i=1}^{n}\left(D_{i}^{2}-2 D_{i} \bar{X}+\bar{X}^{2}\right)\right] \\ & =E\left[\frac{1}{n}\left(\sum_{i=1}^{n} D_{i}^{2}-n \bar{X}^{2}\right)\right] \\ & =\frac{1}{n}\left[\sum_{i=1}^{n} E\left[D_{i}^{2}\right]-n E\left[\bar{X}^{2}\right]\right] \end{aligned}

At this point we need to make use of the definition that Var[X]=E[X2]E[X]2\operatorname{Var}[X]=E\left[X^{2}\right]-E[X]^{2}, and rearrange as E[X2]=Var[X]+E[X]2E\left[X^{2}\right]=\operatorname{Var}[X]+E[X]^{2} to substitute for E[Di2]E\left[D_{i}^{2}\right] (noting that the random variable DiD_{i} is drawn from the same distribution for XX ) and E[Xˉ2]E\left[\bar{X}^{2}\right] (noting our results for the sample mean):

E[σ^2]=1n[i=1n(σ2+μ2)n(σ2n+μ2)]=n1nσ2\begin{aligned} E\left[\hat{\sigma}^{2}\right] & =\frac{1}{n}\left[\sum_{i=1}^{n}\left(\sigma^{2}+\mu^{2}\right)-n\left(\frac{\sigma^{2}}{n}+\mu^{2}\right)\right] \\ & =\frac{n-1}{n} \sigma^{2} \end{aligned}

Thus we see that E[θ^]θE[\hat{\theta}] \neq \theta ! To correct for this 'bias' the maximum likelihood estimator we need to multiply by nn1\frac{n}{n-1} :

S2=nn1σ^2=nn11ni=1n(DiXˉ)2=1n1i=1n(DiXˉ)2S^{2}=\frac{n}{n-1} \cdot \hat{\sigma}^{2}=\frac{n}{n-1} \cdot \frac{1}{n} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}=\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}

We refer to S2S^{2} as our sample variance. Finally, we define the sample mean (as before) and sample standard deviation as:

 Sample mean: Xˉ=1ni=1nDi Sample standard deviation: S=1n1i=1n(DiXˉ)2\begin{aligned} \text { Sample mean: } \bar{X} & =\frac{1}{n} \sum_{i=1}^{n} D_{i} \\ \text { Sample standard deviation: } S & =\sqrt{\frac{1}{n-1} \sum_{i=1}^{n}\left(D_{i}-\bar{X}\right)^{2}} \end{aligned}

The unbiased estimator for standard deviation is quite useful for some statistical tests, which we will see in the next chapter.