
Warning
- During whole procedure, a large amount of single parameter t-tests were conducted. This means that we have very probably included some unimportant independent variables in the model (Type I errors) and eliminated some important ones (Type II errors).
- Do not be deceived by the impressive-looking t-statistics, as stepwise regression in nature, is an algorithm biased toward large t-statistics.
Bidirectional elimination
- Identifies the response , and set of potentially important predictors to use.
- Test the parameter of all possible one-predictor models .
- Pick a predictor which yields the largest t-statistic out of all the tests on step .
- Test the remaining predictors to fit into a two-predictor model, and picks which yields largest t-statistics.
- Test the first predictor to see if is still significant.
- If is no longer significant, remove . Search a predictor with largest t-statistic in the presence of the term .
- Continue searching for predictors, until no further predictors can be found that yield significant t-statistics (at level).
Forward selection
Similar to bidirectional elimination. The difference is that this algorithm doesn’t test previously added predictors like on step
Backward elimination
Instead of starting by fitting all possible one-predictor model like bidirectional elimination, this algorithm starts with all predictors included in the initial model.
This algorithm then tests for the least significant predictor, and removes them, the complete opposite of forward selection.
Software implementation
data(mtcars)
model_intercept <- lm(mpg ~ 1, data = mtcars)
model_full <- lm(mpg ~ ., data = mtcars)
# Bidirectional elimination
both <- step(model_intercept, direction = 'both', scope = formula(model_full))
# Forward selection
forward <- step(model_intercept, direction = 'forward', scope = formula(model_full))
# Backwards elimination
backward <- step(model_full, direction = 'backward')