Hypothesis Testing

Hypothesis Test Statistics

$H_{0}: \mu=\mu_{0}, Z$ -test:

T=\frac{\bar{X}-\mu_{0}}{\sigma / \sqrt{n}} \sim N(0,1)

$H_{0}: \mu=\mu_{0}$ , Student's $t$ Test:

T=\frac{\bar{X}-\mu_{0}}{S / \sqrt{n}} \sim t_{n-1}

$H_{0}: \mu_{X}=\mu_{Y}$ , Two-Sample $Z$ -Test:

T=\frac{\bar{X}-\bar{Y}}{\sqrt{\sigma_{X}^{2} / n_{X}+\sigma_{Y}^{2} / n_{Y}}} \sim N(0,1)

$H_{0}: \mu_{X}=\mu_{Y}$ , Two-Sample $t$ Test:

\begin{gathered} T=\frac{\bar{X}-\bar{Y}}{\sqrt{S_{\text {pooled }}^{2} / n_{X}+S_{\text {pooled }}^{2} / n_{Y}}}=\frac{\bar{X}-\bar{Y}}{S_{\text {pooled }} \sqrt{1 / n_{X}+1 / n_{Y}}} \sim t_{n_{X}+n_{Y}-2} \\ S_{\text {pooled }}^{2}=\frac{\sum_{i=1}^{n_{X}}\left(X_{i}-\bar{X}\right)^{2}+\sum_{i=1}^{n_{Y}}\left(Y_{i}-\bar{Y}\right)^{2}}{n_{X}+n_{Y}-2}=\frac{\left(n_{X}-1\right) S_{X}^{2}+\left(n_{Y}-1\right) S_{Y}^{2}}{n_{X}+n_{Y}-2} \end{gathered}

$H_{0}: \mu_{d}=0$ , Paired $t$ Test:

T=\frac{\bar{D}-\mu_{d}}{S_{D} / \sqrt{n}}=\sim t_{n-1}

$H_{0}: \sigma^{2}=\sigma_{0}^{2}$

T=\frac{(n-1) S^{2}}{\sigma_{0}^{2}} \sim \chi_{n-1}^{2}

Goodness-of-Fit:

X^{2}=\sum_{i=1}^{k} \frac{\left(O_{i}-E_{i}\right)^{2}}{E_{i}} \sim \chi_{k-p-1}^{2}

where:

\begin{aligned} n & =\text { is the total number of observations } \\ k & =\text { is the number of discrete bins } \\ E_{i} & =\text { is the expected number of observations falling into the } i \text {-th bin }=n \cdot p_{i} \\ O_{i} & =\text { is the number of observations falling into the } i \text {-th bin } \end{aligned}

In the previous chapter we established the framework for estimating parameters $\hat{\theta}$ of a population using sample data $D$ , and then characterising the uncertainty around these estimates using confidence intervals (i.e. since they will vary with every new sample). In this chapter we will heavily draw upon these tools we have created for computing confidence intervals to solve the inverse problem: if a sample(s) come from different population(s). For instance, we could be interested at claiming that the experiment results are not consistent with the previously reported ones, and the inconsistency is unlikely to have been caused due to pure chance. We call this branch of statistics hypothesis testing.

Generally, we approach hypothesis testing in a three-step fashion:

First, we establish two contradictory hypotheses about the population: the null hypothesis (written $H_{0}$ ) and alternative hypothesis (written $H_{1}$ ). We are usually interested in reasoning about the alternative hypothesis, while the null hypothesis represents the status quo.
Once we have a null hypothesis, we then obtain the sampling distribution for the statistic of interest (e.g. sample means, sample variance) assuming the null hypothesis was true - this is referred to as the null distribution.
Finally, we collect a sample $D$ to compute our test statistic (e.g. sample mean), and see where this falls on our null distribution. If the test statistic falls on the tail(s) of the null distribution then we say we reject the null hypothesis (and thus accept the alternative hypothesis).

This is somewhat analogous to how a jury reaches a decision during a court trial. In this scenario, the null hypothesis $H_{0}$ is analogous to the defendant being innocent until "proven" guilty, and this null hypothesis is "accepted" unless substantial evidence (or in our case, our sample data) is provided to prove them guilty (i.e. alternate hypothesis, $H_{1}$ ) beyond any reasonable doubt.

Interval Estimation Null and Alternative Hypotheses