<< 4.2.1 Confidence Intervals for Difference in Means | 4.4 Order Statistics.md >>
Proof: Confidence interval for difference in proportions
Let
X ∼ Binomial ( n 1 , p 1 ) : Binomial distribution with parameters n 1 and p 1
Y ∼ Binomial ( n 2 , p 2 ) : Binomial distribution with parameters n 2 and p 2
Δ = p 1 − p 2 : Difference in proportions
Assume X and Y are independent.
Let
p ^ 1 = n 1 X : Sample proportion for X
p ^ 2 = n 2 Y : Sample proportion for Y
Δ ^ = p ^ 1 − p ^ 2 : Sample difference in proportions
Then E ( p ^ 1 ) = p 1 and E ( p ^ 2 ) = p 2 , so
E ( Δ ^ ) = E ( p ^ 1 − p ^ 2 ) = E ( p ^ 1 ) − E ( p ^ 2 ) = p 1 − p 2 = Δ
Thus Δ ^ is an unbiased estimator of Δ .
Additionally, Var ( p ^ 1 ) = n 1 p 1 ( 1 − p 1 ) and Var ( p ^ 2 ) = n 2 p 2 ( 1 − p 2 )
By independence,
Var ( Δ ^ ) = Var ( p ^ 1 − p ^ 2 ) = Var ( p ^ 1 ) + Var ( p ^ 2 ) = n 1 p 1 ( 1 − p 1 ) + n 2 p 2 ( 1 − p 2 )
For sufficiently large n 1 and n 2 , by the Central Limit Theorem , p ^ 1 and p ^ 2 are approximately normally distributed :
p ^ 1 ∼ N ( p 1 , n 1 p 1 ( 1 − p 1 ) ) and p ^ 2 ∼ N ( p 2 , n 2 p 2 ( 1 − p 2 ) )
Therefore,
Δ ^ = p ^ 1 − p ^ 2 ∼ N ( p 1 − p 2 , n 1 p 1 ( 1 − p 1 ) + n 2 p 2 ( 1 − p 2 ) )
By standardization:
n 1 p 1 ( 1 − p 1 ) + n 2 p 2 ( 1 − p 2 ) Δ ^ − Δ n 1 p 1 ( 1 − p 1 ) + n 2 p 2 ( 1 − p 2 ) ( p ^ 1 − p ^ 2 ) − ( p 1 − p 2 ) ∼ N ( 0 , 1 ) ∼ N ( 0 , 1 )
Since p 1 and p 2 are unknown, we estimate them using p ^ 1 and p ^ 2 :
n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) ( p ^ 1 − p ^ 2 ) − ( p 1 − p 2 ) ≈ N ( 0 , 1 )
Thus the approximate ( 1 − α ) 100% confidence interval for Δ is
1 − α = P − z α /2 < n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) Δ ^ − Δ < z α /2 = P − z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) < Δ ^ − Δ < z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) = P Δ ^ − z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) < Δ < Δ ^ + z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) = P ( p ^ 1 − p ^ 2 ) − z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) < p 1 − p 2 < ( p ^ 1 − p ^ 2 ) + z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 )
From the last result, we can see that the following interval is an approximate ( 1 − α ) 100% confidence interval for Δ = p 1 − p 2
( p ^ 1 − p ^ 2 ) − z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 ) , ( p ^ 1 − p ^ 2 ) + z α /2 n 1 p ^ 1 ( 1 − p ^ 1 ) + n 2 p ^ 2 ( 1 − p ^ 2 )