One way ANOVA: sampling distribution of $F$ and of $t$

Definition of the sampling distribution of the $F$ statistic in one way ANOVA, and of the $t$ statistic computed in follow up tests (contrasts/multiple comparisons)


Sampling distribution of $ F$:

As you may know, when we perform a one way ANOVA, we compute the $ F$ statistic $$ F = \dfrac{\mbox{mean square between}}{\mbox{mean square error}} $$ based on our samples from the $ I$ populations. Now suppose that we drew many more samples. Specifically, suppose that we repeated our study an infinite number of times. In each of the studies, we could compute the $ F$ statistic $ F = \frac{\mbox{mean square between}}{\mbox{mean square error}}$ based on the sampled data. Different studies would be based on different samples, resulting in different $ F$ values. The distribution of all these $ F$ values is the sampling distribution of $ F$. Note that this sampling distribution is purely hypothetical. We never really repeat our study an infite number of times, but hypothetically, we could.

Sampling distribution of $ F$ if H0 were true:

Suppose that the assumptions of the ANOVA hold, and that the null hypothesis that $\mu_1 = \mu_2 = \ldots = \mu_I$ is true. Then the sampling distribution of $ F$ is the $ F$ distribution with $ I - 1$ and $ N - I$ degrees of freedom. That is, most of the time we would find relatively small $ F$ values, and only sometimes we would find large $ F$ values. If we find an $ F$ value in our actual study that is very large, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($ F$ value in rejection region, small $ p$ value).

F distribution

Sampling distribution of $ t$:

In addition to the ANOVA $ F$ test, we may also want to perform $ t$ tests for contrasts or multiple comparisons:

$ t$ statistic for contrast:

  • $ t = \dfrac{c}{s_p\sqrt{\sum \dfrac{a^2_i}{n_i}}}$
$ t$ statistic multiple comparisons:
  • $ t = \dfrac{\bar{y}_g - \bar{y}_h}{s_p\sqrt{\dfrac{1}{n_g} + \dfrac{1}{n_h}}}$
based on our samples from the $ I$ populations. Again, suppose that we drew many more samples. Specifically, suppose that we repeated our study an infinite number of times. In each of the studies, we could compute the $ t$ statistic based on the sampled data. Different studies would be based on different samples, resulting in different $ t$ values. The distribution of all these $ t$ values is the sampling distribution of $ t$. Note that this sampling distribution is purely hypothetical. We would never really repeat our study an infite number of times, but hypothetically, we could.

Sampling distribution of $ t$ if H0 were true:

Suppose that the assumptions of the ANOVA hold, and that the null hypothesis tested by the $ t$ test is true (the population contrast $\Psi = 0$, or $\mu_g = \mu_h$). Then the sampling distribution of $ t$ is the $ t$ distribution with $ N - I$ degrees of freedom. That is, most of the time we would find $ t$ values close to 0, and only sometimes we would find $ t$ values further away from 0. If we find a $ t$ value in our actual study that is far away from 0, this is a rare event if the null hypothesis were true, and is therefore considered evidence against the null hypothesis ($ t$ value in rejection region, small $ p$ value).

t distribution