Linear regression is a technique used to investigate the relationship between two quantitative variables.

Fitting a Regression Line

The data for this example comes from the mtcars dataset. The lm() function is used to fit a regression line to the specified explanatory and response variables. The data = option in the lm() function must be used so that other functions will work later on.

mtcars.lm <- lm(mpg ~ wt, data=mtcars); # syntax is y ~ x

Output

Once you have fit a regression line you can use it to get the slope, intercept, residuals, fitted values, and many more calculations. The summary.lm() function will display many of the values associated with the fitted model.

summary.lm(mtcars.lm)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Intercept and Slope

The summary statistics for the slope and intercept of the regression line can be found by typing in the following.

summary.lm(mtcars.lm)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Residuals

A list of all the residuals can be obtained by running the following.

summary.lm(mtcars.lm)$residuals
##          1          2          3          4          5          6          7          8          9 
## -2.2826106 -0.9197704 -2.0859521  1.2973499 -0.2001440 -0.6932545 -3.9053627  4.1637381  2.3499593 
##         10         11         12         13         14         15         16         17         18 
##  0.2998560 -1.1001440  0.8668731 -0.0502472 -1.8830236  1.1733496  2.1032876  5.9810744  6.8727113 
##         19         20         21         22         23         24         25         26         27 
##  1.7461954  6.4219792 -2.6110037 -2.9725862 -3.7268663 -3.4623553  2.4643670  0.3564263  0.1520430 
##         28         29         30         31         32 
##  1.2010593 -4.5431513 -2.7809399 -3.2053627 -1.0274952

You can also choose to display specific residuals by running the same command as before but with an additional argument. This example shows the residual for case number 23.

summary.lm(mtcars.lm)$residuals[23]
##        23 
## -3.726866

Prediction

Use the predict.lm() function to predict a value using the regression line that was fit above. The example below will predict the mpg for wt = 3.2.

predict.lm(mtcars.lm, data.frame(wt =3.2 ) )
##        1 
## 20.18282

Remember that the linear model must be defined using the data = option or predict.lm() will not work. mtcars.lm <- lm(mpg ~ wt, data=mtcars); # syntax is y ~ x

Scatter Plot with Regression Line

Scatterplots with linear regression lines are generally used to evaluate how well the model fits the data.

Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College