4.2.1-confidence-intervals-for-difference-in-means

<< 4.2 Confidence Intervals | 4.2.2 Confidence Interval for Difference in Proportion >>

Proof: Confidence interval for difference in mean, inequal variances

Let

$X, Y$ : Random variable
$μ_{1}$ : Mean of $X$
$μ_{2}$ : Mean of $Y$
$Δ = μ_{1} - μ_{2}$
$σ_{1}^{2} = Var (X), σ_{2}^{2} = Var (Y)$ exists

Let

$X_{1}, \dots, X_{n_{1}}$ : Random sample from $X$
$Y_{1}, \dots, Y_{n_{2}}$ : Random sample from $Y$
$\overset{ˉ}{X} = \frac{1}{n} \sum_{i = 1}^{n_{1}} X_{i}$ : Sample mean of $X_{i}$
$\overset{ˉ}{Y} = \frac{1}{n} \sum_{i = 1}^{n_{1}} Y_{i}$ : Sample mean of $Y_{i}$

Assume the random samples $X_{i}$ and $Y_{i}$ are independent.

Then $\overset{ˉ}{X} \sim N (μ_{1}, \frac{σ _{1}^{2}}{n _{1}})$ and $\overset{ˉ}{Y} \sim N (μ_{2}, \frac{σ _{2}^{2}}{n _{2}})$

Let

$\hat{Δ} = \overset{ˉ}{X} - \overset{ˉ}{Y}$

Then $E (\hat{Δ}) = E (\overset{ˉ}{X} - \overset{ˉ}{Y}) = E (\overset{ˉ}{X}) - E (\overset{ˉ}{Y}) = μ_{1} - μ_{2}$ , thus $\hat{Δ}$ is an unbiased estimator of $Δ$ . Additionally, $Var (\hat{Δ}) = Var (\overset{ˉ}{X} - \overset{ˉ}{Y}) = Var (\overset{ˉ}{X}) + Var (\overset{ˉ}{Y}) = \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}$

Let

$S_{1}^{2} = \frac{1}{n _{1} - 1} \sum_{i = 1}^{n_{1}} (X_{i} - \overset{ˉ}{X})^{2}$ : Sample variance of $X_{i}$
$S_{2}^{2} = \frac{1}{n _{2} - 1} \sum_{i = 1}^{n_{2}} (Y_{i} - \overset{ˉ}{Y})^{2}$ : Sample variance of $Y_{i}$

Then $S_{1}^{2} \sim N (σ^{2}, \frac{2 σ ^{4}}{n - 1})$ and $S_{2}^{2} \sim N (σ_{2}^{2}, \frac{2 σ _{2}^{4}}{n _{2} - 1})$

By independence of the random samples, we may obtain the distribution for $\hat{Δ}$ by standartization:

\overset{ˉ}{X} - \overset{ˉ}{Y} = \hat{Δ} ⟺ \frac{Δ ^ - ( μ _{1} - μ _{2} )}{\frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}} ⟺ \frac{Δ ^ - Δ}{\frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}} \sim N (μ_{1} - μ_{2}, \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}) \sim N (0, 1) \sim N (0, 1)

Thus similar to Example 4.2.1, its $(1 - α) 100%$ confidence interfal for $Δ$ is

1 - α = P z_{α /2} < \frac{Δ ^ - Δ}{\frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}} < z_{α /2} = P z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} < \hat{Δ} - Δ < - z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} = P - \hat{Δ} + z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} < - Δ < - \hat{Δ} - z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} = P \hat{Δ} - z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} < Δ < \hat{Δ} + z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} = P \hat{Δ} - z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} < μ_{1} - μ_{2} < \hat{Δ} + z_{α /2} \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}

By estimating the variance by the sample variances, this leads to the approximate $(1 - α) 100%$ confidence interval for $Δ = μ_{1} - μ_{2}$ given by $((\overset{x}{ˉ} - \overset{y}{ˉ}) - z_{α /2} \frac{s _{1}^{2}}{n _{1}} + \frac{s _{1}^{2}}{n _{1}}, (\overset{x}{ˉ} - \overset{y}{ˉ}) + z_{α /2} \frac{s _{1}^{2}}{n _{1}} + \frac{s _{1}^{2}}{n _{1}})$

Proof: Confidence interval for difference in mean, equal variances

Continuing from the previous example, assume $σ_{1}^{2} = σ_{2}^{2} = σ^{2}$ . Thus the distributions can differ only in location, i.e., a location model

Assume $X \sim N (μ_{1}, σ^{2})$ and $Y \sim N (μ_{2}, σ^{2})$ .

Continuing from the previous example

\frac{Δ ^ - Δ}{\frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}} \frac{( X ˉ - Y ˉ ) - ( μ _{1} - μ _{2} )}{σ ^{2} ( \frac{1}{n _{1}} + \frac{1}{n _{2}} )} \frac{( X ˉ - Y ˉ ) - ( μ _{1} - μ _{2} )}{σ ( \frac{1}{n _{1}} + \frac{1}{n _{2}} )} \sim N (0, 1) \sim N (0, 1) \sim N (0, 1) (1)

Let $S_{p}^{2} = \frac{( n _{1} - 1 ) S _{1}^{2} + ( n _{2} - 1 ) S _{2}^{2}}{n _{1} + n _{2} - 2}$

Recall $E (S_{1}) = σ_{1}^{2}, E (S_{2}) = σ_{2}^{2}, σ_{1}^{2} = σ_{2}^{2} = σ^{2}$ . As a result,

E (S_{p}^{2}) = E [\frac{( n _{1} - 1 ) S _{1}^{2} + ( n _{2} - 1 ) S _{2}^{2}}{n _{1} + n _{2} - 2}] = \frac{E [ ( n _{1} - 1 ) S _{1}^{2} + ( n _{2} - 1 ) S _{2}^{2} ]}{n _{1} + n _{2} - 2} = \frac{E [ ( n _{1} - 1 ) S _{1}^{2} ] + E [ ( n _{2} - 1 ) S _{2}^{2} ]}{n _{1} + n _{2} - 2} = \frac{( n _{1} - 1 ) E ( S _{1}^{2} ) + ( n _{2} - 1 ) E ( S _{2}^{2} )}{n _{1} + n _{2} - 2} = \frac{( n _{1} - 1 ) σ _{1}^{2} + ( n _{2} - 1 ) σ _{2}^{2}}{n _{1} + n _{2} - 2} = \frac{σ ^{2} [( n _{1} - 1 ) + ( n _{2} - 1 )]}{n _{1} + n _{2} - 2} = \frac{σ ^{2} ( n _{1} + n _{2} - 2 )}{n _{1} + n _{2} - 2} = σ^{2}

Therefore, $S_{p}^{2}$ is an unbiased estimator of $σ^{2}$ . We call it the pooled estimator of $σ^{2}$ .

By Normal Distribution Relationships, $(n_{1} - 1) S_{1}^{2} / σ^{2} \sim χ^{2} (n_{1} - 1)$ and $(n_{2} - 1) S_{2}^{2} / σ^{2} \sim χ^{2} (n_{2} - 1)$ . Also, $S_{1}^{2}$ and $S_{2}^{2}$ are independent. Therefore, by Corollary 3.3.1,

\frac{( n _{1} - 1 ) S _{1}^{2}}{σ ^{2}} + \frac{( n _{2} - 1 ) S _{2}^{2}}{σ ^{2}} \frac{S _{p}^{2} ( n - 2 )}{σ ^{2}} \sim χ^{2} (n_{1} - 1 + n_{2} - 2) \sim χ^{2} (n - 2)

Finally, because $S_{1}^{2}$ and $S_{2}^{2}$ is independent of $\overset{ˉ}{X}$ and $\overset{ˉ}{Y}$ respectively, and the random samples are independent of each other, it follows that $S_{p}^{2}$ is independent of expression $(1)$ .

Thus by 3.6.1 The t-distribution we may construct a random variable with t-distribution:

T = \frac{[( X ˉ - Y ˉ ) - ( μ _{1} - μ _{2} )] / σ \frac{1}{n _{1}} + \frac{1}{n _{2}}}{( n - 2 ) S _{p}^{2} / ( n - 2 ) σ ^{2}} = \frac{( X ˉ - Y ˉ ) - ( μ _{1} - μ _{2} )}{S _{p} \frac{1}{n _{1}} + \frac{1}{n _{2}}} \sim t_{n - 2}

The confidence interval may then be found:

1 - α = P t_{n - 2} < \frac{( X ˉ - Y ˉ ) - ( μ _{1} - μ _{2} )}{S _{p} \frac{1}{n _{1}} + \frac{1}{n _{2}}} < t_{n_{2}} = P (t_{n - 2} S_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}} < (\overset{ˉ}{X} - \overset{ˉ}{Y}) - (μ_{1} - μ_{2}) < t_{n_{2}} S_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}}) = P ((\overset{x}{ˉ} - \overset{y}{ˉ}) - t_{n - 2} S_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}} < μ_{1} - μ_{2} < (\overset{x}{ˉ} - \overset{y}{ˉ}) + t_{n_{2}} S_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}})

From the last result, we can see that the following interval is an exact $(1 - α) 100%$ confidence interval for $Δ = μ_{1} - μ_{2}$

((\overset{x}{ˉ} - \overset{y}{ˉ}) - t_{(α /2, n - 2)} s_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}}, (\overset{x}{ˉ} - \overset{y}{ˉ}) + t_{(α /2, n - 2)} s_{p} \frac{1}{n _{1}} + \frac{1}{n _{2}},)

FAZuH's Notes

Table of Contents

Table of Contents

4.2.1 Confidence Intervals for Difference in Means

Proof: Confidence interval for difference in mean, inequal variances

Proof: Confidence interval for difference in mean, equal variances

Recent Notes

index

tugas-kelompok-2_202510301543

mathstat7.5

theorem-neyman-theorem_202508052229

theorem-central-limit-theorem_202509190924

Graph View

Backlinks