Theoretical Distribution
Simulation Under H₀
Results
Decision
Hypotheses & Computations
◆ THE BLUEPRINT
The One-Proportion Z-Test

Tests whether a population proportion \(\pi\) equals a hypothesized value \(\pi_0\).

Formulas
$$\hat{p} = \frac{k}{n} \qquad SE = \sqrt{\frac{\pi_0(1-\pi_0)}{n}} \qquad Z = \frac{\hat{p} - \pi_0}{SE}$$
Validity Conditions

The normal approximation requires both \(n\pi_0 \geq 10\) and \(n(1-\pi_0) \geq 10\).

Simulation Approach

Under H\(_0\), each simulated sample draws \(k^* \sim \text{Binomial}(n, \pi_0)\). We compute \(Z^*\) for each sample. The simulated p-value is the fraction of \(|Z^*|\) values as extreme or more extreme than the observed \(|Z|\).

Evidence Scale

p > 0.10: Weak evidence against H\(_0\)

0.05 < p < 0.10: Moderate evidence

0.01 < p < 0.05: Strong evidence

p < 0.01: Very strong evidence

Theoretical Distribution
Simulation Under H₀
Results
Decision
Hypotheses & Computations
◆ THE BLUEPRINT
The One-Sample t-Test

Tests whether a population mean \(\mu\) equals a hypothesized value \(\mu_0\) when the population standard deviation is unknown.

Formulas
$$SE = \frac{s}{\sqrt{n}} \qquad t = \frac{\bar{x} - \mu_0}{SE} \qquad df = n - 1$$
Degrees of Freedom

The t-distribution has \(n - 1\) degrees of freedom. As df increases, the t-distribution converges to the standard normal. With small samples, the heavier tails of the t-distribution account for the extra uncertainty from estimating \(\sigma\) with \(s\).

t vs. Z

Use Z when \(\sigma\) is known (rare). Use t when \(\sigma\) is unknown and estimated by \(s\). For large n, the two are nearly identical.

Validity Conditions

The t-test requires the population to be approximately normal, or \(n\) to be large enough for the CLT to apply. Tintle et al. use \(n \geq 20\) as the guideline. The traditional CLT threshold is \(n \geq 30\).

Tintle, N. et al. Introduction to Statistical Investigations (ISI).

Simulation Approach

Under H\(_0\), we generate \(n\) observations from N(\(\mu_0\), \(s\)) for each simulation. We compute the sample mean, sample SD, and \(t^*\) for each. The simulated p-value is the fraction of \(|t^*|\) values as extreme or more extreme than the observed \(|t|\).

Theoretical Distribution
Simulation Under H₀
Results
Decision
Hypotheses & Computations
◆ THE BLUEPRINT
The Two-Proportion Z-Test

Tests whether two population proportions \(\pi_1\) and \(\pi_2\) are equal.

Why Pool Under H₀?

Under H\(_0\): \(\pi_1 = \pi_2\). If they are equal, our best estimate of that common proportion uses all the data from both groups. The pooled proportion \(\hat{p} = \frac{k_1 + k_2}{n_1 + n_2}\) serves as the shared estimate for the standard error calculation.

Formulas
$$\hat{p} = \frac{k_1 + k_2}{n_1 + n_2} \qquad SE = \sqrt{\hat{p}(1 - \hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$$
$$Z = \frac{\hat{p}_1 - \hat{p}_2}{SE}$$
Four Validity Conditions

All four of the following must be \(\geq 10\): \(n_1\hat{p}\), \(n_1(1-\hat{p})\), \(n_2\hat{p}\), \(n_2(1-\hat{p})\).

Independence

The two groups must be independent. If subjects are matched or paired, use a different test.

Theoretical Distribution
Simulation Under H₀
Results
Decision
Hypotheses & Computations
◆ THE BLUEPRINT
Welch's Two-Sample t-Test

Tests whether two population means \(\mu_1\) and \(\mu_2\) are equal. Welch's version does not assume equal variances.

Formulas
$$SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \qquad t = \frac{\bar{x}_1 - \bar{x}_2}{SE}$$
Welch Degrees of Freedom

The Welch approximation for degrees of freedom accounts for unequal variances and unequal sample sizes.

$$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(s_1^2/n_1\right)^2}{n_1 - 1} + \frac{\left(s_2^2/n_2\right)^2}{n_2 - 1}}$$
Welch vs. Pooled

The pooled t-test assumes \(\sigma_1 = \sigma_2\) and uses a single pooled variance estimate. Welch's test relaxes this assumption. Welch is the safer default and is what R's t.test() uses.

Validity Conditions

Each group needs approximate normality or a large enough sample. Tintle et al. use \(n \geq 20\) per group. The traditional CLT threshold is \(n \geq 30\) per group.

Tintle, N. et al. Introduction to Statistical Investigations (ISI).

Simulation Approach

Under H\(_0\) (both groups have the same mean), we simulate from N(0, \(s_1\)) and N(0, \(s_2\)). For each pair of simulated samples, we compute the Welch t-statistic.

Theoretical Distribution
Simulation Under H₀
Results
Decision
Hypotheses & Computations
◆ THE BLUEPRINT
The Paired t-Test

Tests whether the mean difference \(\mu_d\) between paired observations equals zero. This reduces to a one-sample t-test on the differences.

Why Pairing Helps

Pairing removes between-subject variability. Instead of comparing two independent groups (each with their own variability), we analyze only the within-subject differences. This often reduces the standard error and increases power.

Formulas
$$SE = \frac{s_d}{\sqrt{n}} \qquad t = \frac{\bar{d}}{SE} \qquad df = n - 1$$

where \(\bar{d}\) is the mean of the differences and \(s_d\) is the standard deviation of the differences.

When to Use Paired vs. Two-Sample

Use the paired test when observations come in natural pairs: before/after measurements on the same subject, matched subjects, or repeated measures. Use the two-sample test when the groups are independent.

Validity Conditions

The differences need approximate normality or a large enough sample. Tintle et al. use \(n \geq 20\) pairs. The traditional CLT threshold is \(n \geq 30\).

Tintle, N. et al. Introduction to Statistical Investigations (ISI).

Simulation Approach

Under H\(_0\) (\(\mu_d = 0\)), we generate \(n\) differences from N(0, \(s_d\)) for each simulation. We compute \(\bar{d}^*\), \(s_d^*\), and \(t^*\) for each. The simulated p-value is the fraction of \(|t^*|\) values as extreme or more extreme than the observed \(|t|\).