Stratified random sampling is a sampling method that splits popultaion into strata that is internally homogeneous, but not from each other. We would then take samples from each strata.
About stratified random sampling
Simple or systematic random sampling is only suitable for homogeneous populations.
Homogeneous populations are rarely encountered.
Example:
- Suppose we want to estimate the income of private employees across fifty cities.
- Cities can be grouped into large, medium, and small cities.
- The number of private employees living in large cities is far greater than in small or medium cities.
- In other words, the distribution is not symmetric.
- A number of private employees are selected from each group, and the estimation is based on the combination of these samples.
The population is divided into several subpopulations called strata.
Stratified random sampling is the process of grouping population into strata, then selecting samples from each stratum and combining these samples to estimate population parameters.
Advantages
- Improves precision.
- In addition to information about the population, information about each stratum is also obtained.
- Information is easier to collect.
Parameter estimation
A population of size is divided into strata with sizes , with sample sizes , where .
Stratum total
Population total
Population mean
Where:
- : Total population
- : Total value of sampling units in the -th stratum
- : Mean of the -th stratum
- : Population mean
These estimators are unbiased because simple random sampling is also used for each stratum.
Variance analysis
Population variance
Can be decomposed into within-stratum variance () and between-stratum variance (). .
Estimator variance
Where:
- is the variance of the -th stratum.
Precision
Homogeneous strata reduce the within-stratum variance ().
As a result, stratified sampling is more accurate than simple random sampling.
Determining sample size
Suppose we desire:
- : Precision
- : Reliability level
Then, the variance estimator ( TODO need link) is:
Solve for in the variance formula for each allocation method, yielding:
- Equal allocation
- Proportional allocation
- Optimum allocation
- Neyman allocation
Sample allocation methods
-
Equal allocation
- The sample size is the same for each stratum.
- Does not consider cost functions.
-
Proportional allocation
- The sample size is proportional to the stratum size.
- Most commonly used.
- Does not consider cost functions.
-
Optimum allocation Suppose:
- : Fixed cost
- : Cost per sampling unit in the -th stratum.
- Considers costs.
- Costs vary across strata.
-
Neyman allocation
- Considers costs.
- Costs are the same across strata.