Fitting a linear model

In the real world, it is often unfeasable to measure every observation in a population. We can approximate a model fit by withdrawing samples from the population instead.

Model fit uses the following notation to differentiate between sample and population model, which is:

Coefficient formulas

or, in matrix form,

An estimator for error variance

The best estimate of is the variance of residual , which is the estimate of error

Comparison of regression line using population data and sample data

Regression EquationParameters (estimates)DataNotes
PopulationRarely done, since population is not always available
Sample are estimators of
ErrorPopulationNot Known
ResidualSampleEstimator of

Formulas

Theoretically,

Emprically,

Where is the population size, and SSE is sum of squares error.

Where is the amount of predictor variables, and SSR is sum of squares regression.

Interpretation of

We can anticipate that although the predicted value is different from the actual value, 95% of the actual values of are within . A good model should give small .