<< 4.2 Confidence Intervals | 4.2.2 Confidence Interval for Difference in Proportion >>
Proof: Confidence interval for difference in mean, inequal variances
Let
Let
X 1 , … , X n 1 : Random sample from X
Y 1 , … , Y n 2 : Random sample from Y
X ˉ = n 1 ∑ i = 1 n 1 X i : Sample mean of X i
Y ˉ = n 1 ∑ i = 1 n 1 Y i : Sample mean of Y i
Assume the random samples X i and Y i are independent.
Then X ˉ ∼ N ( μ 1 , n 1 σ 1 2 ) and Y ˉ ∼ N ( μ 2 , n 2 σ 2 2 )
Let
Then E ( Δ ^ ) = E ( X ˉ − Y ˉ ) = E ( X ˉ ) − E ( Y ˉ ) = μ 1 − μ 2 , thus Δ ^ is an unbiased estimator of Δ .
Additionally, Var ( Δ ^ ) = Var ( X ˉ − Y ˉ ) = Var ( X ˉ ) + Var ( Y ˉ ) = n 1 σ 1 2 + n 2 σ 2 2
Let
S 1 2 = n 1 − 1 1 ∑ i = 1 n 1 ( X i − X ˉ ) 2 : Sample variance of X i
S 2 2 = n 2 − 1 1 ∑ i = 1 n 2 ( Y i − Y ˉ ) 2 : Sample variance of Y i
Then S 1 2 ∼ N ( σ 2 , n − 1 2 σ 4 ) and S 2 2 ∼ N ( σ 2 2 , n 2 − 1 2 σ 2 4 )
By independence of the random samples, we may obtain the distribution for Δ ^ by standartization:
X ˉ − Y ˉ = Δ ^ ⟺ n 1 σ 1 2 + n 2 σ 2 2 Δ ^ − ( μ 1 − μ 2 ) ⟺ n 1 σ 1 2 + n 2 σ 2 2 Δ ^ − Δ ∼ N ( μ 1 − μ 2 , n 1 σ 1 2 + n 2 σ 2 2 ) ∼ N ( 0 , 1 ) ∼ N ( 0 , 1 )
Thus similar to Example 4.2.1 , its ( 1 − α ) 100% confidence interfal for Δ is
1 − α = P z α /2 < n 1 σ 1 2 + n 2 σ 2 2 Δ ^ − Δ < z α /2 = P z α /2 n 1 σ 1 2 + n 2 σ 2 2 < Δ ^ − Δ < − z α /2 n 1 σ 1 2 + n 2 σ 2 2 = P − Δ ^ + z α /2 n 1 σ 1 2 + n 2 σ 2 2 < − Δ < − Δ ^ − z α /2 n 1 σ 1 2 + n 2 σ 2 2 = P Δ ^ − z α /2 n 1 σ 1 2 + n 2 σ 2 2 < Δ < Δ ^ + z α /2 n 1 σ 1 2 + n 2 σ 2 2 = P Δ ^ − z α /2 n 1 σ 1 2 + n 2 σ 2 2 < μ 1 − μ 2 < Δ ^ + z α /2 n 1 σ 1 2 + n 2 σ 2 2
By estimating the variance by the sample variances, this leads to the approximate ( 1 − α ) 100% confidence interval for Δ = μ 1 − μ 2 given by
( ( x ˉ − y ˉ ) − z α /2 n 1 s 1 2 + n 1 s 1 2 , ( x ˉ − y ˉ ) + z α /2 n 1 s 1 2 + n 1 s 1 2 )
Proof: Confidence interval for difference in mean, equal variances
Continuing from the previous example , assume σ 1 2 = σ 2 2 = σ 2 . Thus the distributions can differ only in location, i.e., a location model
Assume X ∼ N ( μ 1 , σ 2 ) and Y ∼ N ( μ 2 , σ 2 ) .
Continuing from the previous example
n 1 σ 1 2 + n 2 σ 2 2 Δ ^ − Δ σ 2 ( n 1 1 + n 2 1 ) ( X ˉ − Y ˉ ) − ( μ 1 − μ 2 ) σ ( n 1 1 + n 2 1 ) ( X ˉ − Y ˉ ) − ( μ 1 − μ 2 ) ∼ N ( 0 , 1 ) ∼ N ( 0 , 1 ) ∼ N ( 0 , 1 ) ( 1 )
Let
S p 2 = n 1 + n 2 − 2 ( n 1 − 1 ) S 1 2 + ( n 2 − 1 ) S 2 2
Recall E ( S 1 ) = σ 1 2 , E ( S 2 ) = σ 2 2 , σ 1 2 = σ 2 2 = σ 2 . As a result,
E ( S p 2 ) = E [ n 1 + n 2 − 2 ( n 1 − 1 ) S 1 2 + ( n 2 − 1 ) S 2 2 ] = n 1 + n 2 − 2 E [ ( n 1 − 1 ) S 1 2 + ( n 2 − 1 ) S 2 2 ] = n 1 + n 2 − 2 E [ ( n 1 − 1 ) S 1 2 ] + E [ ( n 2 − 1 ) S 2 2 ] = n 1 + n 2 − 2 ( n 1 − 1 ) E ( S 1 2 ) + ( n 2 − 1 ) E ( S 2 2 ) = n 1 + n 2 − 2 ( n 1 − 1 ) σ 1 2 + ( n 2 − 1 ) σ 2 2 = n 1 + n 2 − 2 σ 2 [( n 1 − 1 ) + ( n 2 − 1 )] = n 1 + n 2 − 2 σ 2 ( n 1 + n 2 − 2 ) = σ 2
Therefore, S p 2 is an unbiased estimator of σ 2 . We call it the pooled estimator of σ 2 .
By Normal Distribution Relationships , ( n 1 − 1 ) S 1 2 / σ 2 ∼ χ 2 ( n 1 − 1 ) and ( n 2 − 1 ) S 2 2 / σ 2 ∼ χ 2 ( n 2 − 1 ) . Also, S 1 2 and S 2 2 are independent. Therefore, by Corollary 3.3.1 ,
σ 2 ( n 1 − 1 ) S 1 2 + σ 2 ( n 2 − 1 ) S 2 2 σ 2 S p 2 ( n − 2 ) ∼ χ 2 ( n 1 − 1 + n 2 − 2 ) ∼ χ 2 ( n − 2 )
Finally, because S 1 2 and S 2 2 is independent of X ˉ and Y ˉ respectively, and the random samples are independent of each other, it follows that S p 2 is independent of expression ( 1 ) .
Thus by 3.6.1 The t-distribution we may construct a random variable with t-distribution:
T = ( n − 2 ) S p 2 / ( n − 2 ) σ 2 [( X ˉ − Y ˉ ) − ( μ 1 − μ 2 )] / σ n 1 1 + n 2 1 = S p n 1 1 + n 2 1 ( X ˉ − Y ˉ ) − ( μ 1 − μ 2 ) ∼ t n − 2
The confidence interval may then be found:
1 − α = P t n − 2 < S p n 1 1 + n 2 1 ( X ˉ − Y ˉ ) − ( μ 1 − μ 2 ) < t n 2 = P ( t n − 2 S p n 1 1 + n 2 1 < ( X ˉ − Y ˉ ) − ( μ 1 − μ 2 ) < t n 2 S p n 1 1 + n 2 1 ) = P ( ( x ˉ − y ˉ ) − t n − 2 S p n 1 1 + n 2 1 < μ 1 − μ 2 < ( x ˉ − y ˉ ) + t n 2 S p n 1 1 + n 2 1 )
From the last result, we can see that the following interval is an exact ( 1 − α ) 100% confidence interval for Δ = μ 1 − μ 2
( ( x ˉ − y ˉ ) − t ( α /2 , n − 2 ) s p n 1 1 + n 2 1 , ( x ˉ − y ˉ ) + t ( α /2 , n − 2 ) s p n 1 1 + n 2 1 , )