Linear regression is a technique used to investigate the relationship between two quantitative variables.
The data for this example comes from the mtcars
dataset. The lm()
function is used to fit a regression line to the specified explanatory and response variables. The data =
option in the lm()
function must be used so that other functions will work later on.
mtcars.lm <- lm(mpg ~ wt, data=mtcars); # syntax is y ~ x
Once you have fit a regression line you can use it to get the slope, intercept, residuals, fitted values, and many more calculations. The summary.lm()
function will display many of the values associated with the fitted model.
summary.lm(mtcars.lm)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The summary statistics for the slope and intercept of the regression line can be found by typing in the following.
summary.lm(mtcars.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285126 1.877627 19.857575 8.241799e-19
## wt -5.344472 0.559101 -9.559044 1.293959e-10
A list of all the residuals can be obtained by running the following.
summary.lm(mtcars.lm)$residuals
## 1 2 3 4 5 6 7 8 9
## -2.2826106 -0.9197704 -2.0859521 1.2973499 -0.2001440 -0.6932545 -3.9053627 4.1637381 2.3499593
## 10 11 12 13 14 15 16 17 18
## 0.2998560 -1.1001440 0.8668731 -0.0502472 -1.8830236 1.1733496 2.1032876 5.9810744 6.8727113
## 19 20 21 22 23 24 25 26 27
## 1.7461954 6.4219792 -2.6110037 -2.9725862 -3.7268663 -3.4623553 2.4643670 0.3564263 0.1520430
## 28 29 30 31 32
## 1.2010593 -4.5431513 -2.7809399 -3.2053627 -1.0274952
You can also choose to display specific residuals by running the same command as before but with an additional argument. This example shows the residual for case number 23.
summary.lm(mtcars.lm)$residuals[23]
## 23
## -3.726866
Use the predict.lm()
function to predict a value using the regression line that was fit above. The example below will predict the mpg for wt = 3.2.
predict.lm(mtcars.lm, data.frame(wt =3.2 ) )
## 1
## 20.18282
Remember that the linear model must be defined using the data = option or predict.lm()
will not work. mtcars.lm <- lm(mpg ~ wt, data=mtcars); # syntax is y ~ x
Scatterplots with linear regression lines are generally used to evaluate how well the model fits the data.
Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College