About best model selection

Selecting appropriate predictor variables in regression modeling, especially for prediction, is crucial (Darlington, 2017).

Choosing the right predictors ensures the regression model effectively explains variable relationships. Variable selection methods help identify the best and most significant predictors for the model. The preferred regression model uses the simplest set of predictors.

The selection methods to be studied are backward selection, forward selection, and stepwise. Before applying these methods, understanding the partial F-test and sequential F-test is necessary. The partial F-test is used in backward selection, while the sequential F-test is used in forward selection.

Best model criteria

CriteriaFormulaOptimum?
Minimum
Maximum
Minimum

Mean squared error (MSE)

Mean squared error measures error in a model. If a model does not have any errors, then . As the error goes up, MSE goes up.

Coefficient of determination

Coefficient of determination () is a statistic that measures proportion or total variation around the mean of the response variable that can be explained by the regression model.

(Ranges from 0 to 1, inclusive).

Adjusted coefficient of determination

Adjusted coefficient of determination () is a statistic, similar to Coefficient of determination but unaffected by degrees of freedom of the residual sum of squares or the total sum of squares.

Unlike Coefficient of determination, adding predictor variables to the model does not always increase .

CP Mallow statistic

Prediction Sum of Squares (PRESS)

A form of cross-validation used in regression analysis that measures the model’s fit to a sample of observations not used in estimating the model.