The dplyr
package contains many functions used to modify data. The first ten rows of the iris
dataset are displayed below.
library(dplyr) # Loads the dplyr library
iris %>% head(10) # Displays the iris dataset
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
The mutate()
function can be used to create new variables or to modify existing variables. The code below shows how to convert the Sepal.Length measurements from centimeters to inches.
iris <- iris %>% mutate(Sepal.Length = Sepal.Length * 0.39)
iris %>% head(10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1.989 3.5 1.4 0.2 setosa
## 2 1.911 3.0 1.4 0.2 setosa
## 3 1.833 3.2 1.3 0.2 setosa
## 4 1.794 3.1 1.5 0.2 setosa
## 5 1.950 3.6 1.4 0.2 setosa
## 6 2.106 3.9 1.7 0.4 setosa
## 7 1.794 3.4 1.4 0.3 setosa
## 8 1.950 3.4 1.5 0.2 setosa
## 9 1.716 2.9 1.4 0.2 setosa
## 10 1.911 3.1 1.5 0.1 setosa
The iris dataset contains five variables. Two of the variables are measurements taken from iris petals. Suppose we wanted to calculate the ratio of petal length to petal width and add it to our dataset. The code below calculates the ratio and stores it in the iris dataset.
iris <- iris %>% mutate(Petal.Ratio = Petal.Length / Petal.Width)
iris %>% head(10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Ratio
## 1 1.989 3.5 1.4 0.2 setosa 7.000000
## 2 1.911 3.0 1.4 0.2 setosa 7.000000
## 3 1.833 3.2 1.3 0.2 setosa 6.500000
## 4 1.794 3.1 1.5 0.2 setosa 7.500000
## 5 1.950 3.6 1.4 0.2 setosa 7.000000
## 6 2.106 3.9 1.7 0.4 setosa 4.250000
## 7 1.794 3.4 1.4 0.3 setosa 4.666667
## 8 1.950 3.4 1.5 0.2 setosa 7.500000
## 9 1.716 2.9 1.4 0.2 setosa 7.000000
## 10 1.911 3.1 1.5 0.1 setosa 15.000000
Sometimes a variable is coded using numbers but is actually a categorical variable. The am
variable in the mtcars
dataset is coded as a zero or a 1 depending on whether the car has an automatic or manual transmission. Variables with numeric values are read in as quantitative data by default. A tibble
will indicate the variable type just below the variable name. The am
variable has dbl below it indicating that it is a double or numeric variable.
mtcars <- mtcars %>% as_tibble() # converts the data to a tbble
mtcars %>% head(10) # Display the first 10 obs.
## # A tibble: 10 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
Use the mutate function to change the variable type from a double, quantitative variable, to a character, categorical variable. Notice that the variable type for am
nor reads chr.
mtcars <- mtcars %>% mutate(am = factor(am)) # Modifys the variable
mtcars %>% head(10) # displays the data
## # A tibble: 10 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
This method can also be used to change and add variable labels and levels.
Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College