THE FORGE • Confidence Intervals

Distribution & Critical Region

R Code

Step-by-Step

Quick Reference Table

What is a Critical Value?

The critical value is the number of standard errors from the center of the sampling distribution needed to capture the desired confidence level. It is the multiplier $M$ in the formula:

$$\text{CI} = \text{Estimate} \pm \underbrace{M}_{\text{critical value}} \times SE$$

Why $\alpha/2$?

A two-sided confidence interval splits the remaining probability equally between two tails. If the confidence level is 95%, then 5% total is outside the interval: 2.5% in each tail. The R functions use $1 - \alpha/2$ as the quantile.

Z vs. t

Use $z^*$ for proportions, where the SE formula does not involve $\sigma$. Use $t^*$ when estimating a mean with an unknown $\sigma$. The t-distribution has heavier tails, so $t^*$ is always larger than $z^*$ for the same confidence level.

When does t approach z?

As degrees of freedom increase, the t-distribution approaches the standard normal. By df = 30, the difference is small. By df = 120, they are nearly identical. The common Z critical values (1.645, 1.960, 2.576) are the limits as df approaches infinity.

Sampling Distribution

Coverage Simulation

Results

Confidence Interval

Formulas & Computations

One-Proportion Z Interval

Estimates a population proportion $\pi$ using the observed sample proportion $\hat{p}$.

Formula

$$\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Why use $\hat{p}$ in the SE?

Unlike hypothesis testing (which uses $\pi_0$ in the standard error), confidence intervals have no null value. We estimate the SE using the observed $\hat{p}$.

Validity Conditions

The normal approximation requires both $n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$.

Interpreting the Interval

We are $C\%$ confident that the true population proportion lies within the interval. This means that if we repeated the sampling process many times, about $C\%$ of the resulting intervals would contain the true parameter.

Coverage Simulation

Each simulated sample draws $k^* \sim \text{Binomial}(n, \hat{p})$ and computes a new CI. The coverage rate is the fraction of simulated intervals that contain the true value (set to the observed $\hat{p}$).

Duality with Hypothesis Testing

A $C\%$ confidence interval contains all values of $\pi_0$ that would not be rejected at the $\alpha = 1 - C$ significance level.

Construct a confidence interval for a population proportion.

Sample Size (n)

Successes (k)

Confidence Level

Distribution Scale

Original Standardized

Simulations

Sampling Distribution

Coverage Simulation

Results

Confidence Interval

Formulas & Computations

One-Sample t Interval

Estimates a population mean $\mu$ when the population standard deviation is unknown.

Formula

$$\bar{x} \pm t^* \frac{s}{\sqrt{n}} \qquad df = n - 1$$

Why t instead of z?

When $\sigma$ is unknown and estimated by $s$, the extra uncertainty is captured by the t-distribution. The t-distribution has heavier tails than the normal, so the interval is wider. As $n$ grows, $t^*$ converges to $z^*$.

Validity Conditions

The t-interval requires approximate normality or a large enough sample. Tintle et al. use $n \geq 20$. The traditional CLT threshold is $n \geq 30$.

Coverage Simulation

Each simulated sample draws $n$ observations from N($\bar{x}$, $s$), computes the sample mean, sample SD, and constructs a new t-interval. The coverage rate is the fraction of intervals that contain the true mean (set to the observed $\bar{x}$).

Duality with Hypothesis Testing

A $C\%$ confidence interval contains all values of $\mu_0$ that would not be rejected by a two-sided t-test at the $\alpha = 1 - C$ level.

Construct a confidence interval for a population mean.

Sample Size (n)

Sample Mean (x̄)

Sample SD (s)

Confidence Level

Distribution Scale

Original Standardized

Simulations

Sampling Distribution

Coverage Simulation

Results

Confidence Interval

Formulas & Computations

Two-Proportion Z Interval

Estimates the difference $\pi_1 - \pi_2$ between two population proportions.

Formula

$$(\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$$

No Pooling for CI

Unlike the two-proportion hypothesis test (which pools under H$_0$: $\pi_1 = \pi_2$), the confidence interval uses each group's own $\hat{p}$ in the standard error. There is no null hypothesis to assume equal proportions.

Validity Conditions

All four of the following must be $\geq 10$: $n_1\hat{p}_1$, $n_1(1-\hat{p}_1)$, $n_2\hat{p}_2$, $n_2(1-\hat{p}_2)$.

Contains Zero?

If the interval contains 0, there is no significant difference between the two proportions at the given confidence level. This is equivalent to failing to reject H$_0$: $\pi_1 = \pi_2$ in a two-sided test.

Coverage Simulation

Each simulation draws $k_1^* \sim \text{Binomial}(n_1, \hat{p}_1)$ and $k_2^* \sim \text{Binomial}(n_2, \hat{p}_2)$, then constructs a CI for the difference. The coverage rate is the fraction of intervals that contain the true difference.

Construct a confidence interval for the difference in two population proportions.

Group 1

n₁

Successes (k₁)

Group 2

n₂

Successes (k₂)

Confidence Level

Distribution Scale

Original Standardized

Simulations

Sampling Distribution

Coverage Simulation

Results

Confidence Interval

Formulas & Computations

Welch Two-Sample t Interval

Estimates the difference $\mu_1 - \mu_2$ between two population means. Welch's version does not assume equal variances.

Formula

$$(\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$

Welch Degrees of Freedom

$$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(s_1^2/n_1\right)^2}{n_1 - 1} + \frac{\left(s_2^2/n_2\right)^2}{n_2 - 1}}$$

Welch vs. Pooled

The pooled t-interval assumes $\sigma_1 = \sigma_2$. Welch's interval relaxes this assumption and is the safer default. R's t.test() uses Welch by default.

Validity Conditions

Each group needs approximate normality or a large enough sample. Tintle et al. use $n \geq 20$ per group. The traditional CLT threshold is $n \geq 30$ per group.

Coverage Simulation

Each simulation draws two independent samples from N($\bar{x}_1$, $s_1$) and N($\bar{x}_2$, $s_2$). For each pair, we compute the Welch t-interval with its own df. The coverage rate is the fraction of intervals that contain the true difference.

Contains Zero?

If the interval contains 0, there is no significant difference between the means at the given confidence level.

Construct a confidence interval for the difference in two population means (Welch).

Group 1

n₁

x̄₁

s₁

Group 2

n₂

x̄₂

s₂

Confidence Level

Distribution Scale

Original Standardized

Simulations

Sampling Distribution

Coverage Simulation

Results

Confidence Interval

Formulas & Computations

Paired t Interval

Estimates the mean difference $\mu_d$ between paired observations. This reduces to a one-sample t-interval on the differences.

Formula

$$\bar{d} \pm t^* \frac{s_d}{\sqrt{n}} \qquad df = n - 1$$

where $\bar{d}$ is the mean of the differences and $s_d$ is the standard deviation of the differences.

Why Pairing Helps

Pairing removes between-subject variability. Instead of comparing two independent groups, we analyze within-subject differences. This often reduces the standard error and produces a narrower interval.

When to Use Paired vs. Two-Sample

Use the paired interval when observations come in natural pairs: before/after measurements on the same subject, matched subjects, or repeated measures. Use the two-sample interval when the groups are independent.

Validity Conditions

The differences need approximate normality or a large enough sample. Tintle et al. use $n \geq 20$ pairs. The traditional CLT threshold is $n \geq 30$.

Contains Zero?

If the interval contains 0, there is no significant mean difference at the given confidence level. This is equivalent to failing to reject H$_0$: $\mu_d = 0$ in a two-sided paired t-test.

Coverage Simulation

Each simulated sample draws $n$ differences from N($\bar{d}$, $s_d$) and constructs a new t-interval. The coverage rate is the fraction of intervals that contain the true mean difference.

Construct a confidence interval for the mean difference between paired observations.

Number of Pairs (n)

Mean Difference (d̄)

SD of Differences (sᵈ)

Confidence Level

Distribution Scale

Original Standardized

Simulations

◆ THE BLUEPRINT

What is a Critical Value?

Why \(\alpha/2\)?

Z vs. t

When does t approach z?

◆ THE BLUEPRINT

One-Proportion Z Interval

Formula

Why use \(\hat{p}\) in the SE?

Validity Conditions

Interpreting the Interval

Coverage Simulation

Duality with Hypothesis Testing

◆ THE BLUEPRINT

One-Sample t Interval

Formula

Why t instead of z?

Validity Conditions

Coverage Simulation

Duality with Hypothesis Testing

◆ THE BLUEPRINT

Two-Proportion Z Interval

Formula

No Pooling for CI

Validity Conditions

Contains Zero?

Coverage Simulation

◆ THE BLUEPRINT

Welch Two-Sample t Interval

Formula

Welch Degrees of Freedom

Welch vs. Pooled

Validity Conditions

Coverage Simulation

Contains Zero?

◆ THE BLUEPRINT

Paired t Interval

Formula

Why Pairing Helps

When to Use Paired vs. Two-Sample

Validity Conditions

Contains Zero?

Coverage Simulation