The dplyr package contains many functions used to modify data. The first ten rows of the iris dataset are displayed below.

library(dplyr) # Loads the dplyr library 

iris %>% head(10) # Displays the iris dataset
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

Modifying Variables

The mutate() function can be used to create new variables or to modify existing variables. The code below shows how to convert the Sepal.Length measurements from centimeters to inches.

iris <- iris %>% mutate(Sepal.Length = Sepal.Length * 0.39)
iris %>% head(10)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1         1.989         3.5          1.4         0.2  setosa
## 2         1.911         3.0          1.4         0.2  setosa
## 3         1.833         3.2          1.3         0.2  setosa
## 4         1.794         3.1          1.5         0.2  setosa
## 5         1.950         3.6          1.4         0.2  setosa
## 6         2.106         3.9          1.7         0.4  setosa
## 7         1.794         3.4          1.4         0.3  setosa
## 8         1.950         3.4          1.5         0.2  setosa
## 9         1.716         2.9          1.4         0.2  setosa
## 10        1.911         3.1          1.5         0.1  setosa

Creating Variable

The iris dataset contains five variables. Two of the variables are measurements taken from iris petals. Suppose we wanted to calculate the ratio of petal length to petal width and add it to our dataset. The code below calculates the ratio and stores it in the iris dataset.

iris <- iris %>% mutate(Petal.Ratio = Petal.Length / Petal.Width)
iris %>% head(10)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Ratio
## 1         1.989         3.5          1.4         0.2  setosa    7.000000
## 2         1.911         3.0          1.4         0.2  setosa    7.000000
## 3         1.833         3.2          1.3         0.2  setosa    6.500000
## 4         1.794         3.1          1.5         0.2  setosa    7.500000
## 5         1.950         3.6          1.4         0.2  setosa    7.000000
## 6         2.106         3.9          1.7         0.4  setosa    4.250000
## 7         1.794         3.4          1.4         0.3  setosa    4.666667
## 8         1.950         3.4          1.5         0.2  setosa    7.500000
## 9         1.716         2.9          1.4         0.2  setosa    7.000000
## 10        1.911         3.1          1.5         0.1  setosa   15.000000

Modifying Variable Types

Sometimes a variable is coded using numbers but is actually a categorical variable. The am variable in the mtcars dataset is coded as a zero or a 1 depending on whether the car has an automatic or manual transmission. Variables with numeric values are read in as quantitative data by default. A tibble will indicate the variable type just below the variable name. The am variable has dbl below it indicating that it is a double or numeric variable.

mtcars <- mtcars %>% as_tibble() # converts the data to a tbble

mtcars %>% head(10) # Display the first 10 obs.  
## # A tibble: 10 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4

Use the mutate function to change the variable type from a double, quantitative variable, to a character, categorical variable. Notice that the variable type for am nor reads chr.

mtcars <- mtcars %>% mutate(am = factor(am)) # Modifys the variable
 
mtcars %>% head(10) # displays the data
## # A tibble: 10 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs am     gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0 1         4     4
##  2  21       6  160    110  3.9   2.88  17.0     0 1         4     4
##  3  22.8     4  108     93  3.85  2.32  18.6     1 1         4     1
##  4  21.4     6  258    110  3.08  3.22  19.4     1 0         3     1
##  5  18.7     8  360    175  3.15  3.44  17.0     0 0         3     2
##  6  18.1     6  225    105  2.76  3.46  20.2     1 0         3     1
##  7  14.3     8  360    245  3.21  3.57  15.8     0 0         3     4
##  8  24.4     4  147.    62  3.69  3.19  20       1 0         4     2
##  9  22.8     4  141.    95  3.92  3.15  22.9     1 0         4     2
## 10  19.2     6  168.   123  3.92  3.44  18.3     1 0         4     4

This method can also be used to change and add variable labels and levels.

Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College