The dplyr
package contains many functions used to manipulate data.
The filter
function in the dplyr
package constructs a subset of the original data set based on specific values of a variable or variables. In other words, the filter
functions selects rows from a data set.
The code below generates a data set that contains only cars with 3 gears and 8 cylinders.
library(dplyr) # Loads the dplyr library
mtcars_cyl8 <- mtcars %>% filter(gear==3, cyl==8) # creates the subset
mtcars_cyl8 # displays the subset
## # A tibble: 12 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl>
## 1 18.7 8 360 175 3.15 3.44 17.0 0 Automatic 3 2
## 2 14.3 8 360 245 3.21 3.57 15.8 0 Automatic 3 4
## 3 16.4 8 276. 180 3.07 4.07 17.4 0 Automatic 3 3
## 4 17.3 8 276. 180 3.07 3.73 17.6 0 Automatic 3 3
## 5 15.2 8 276. 180 3.07 3.78 18 0 Automatic 3 3
## 6 10.4 8 472 205 2.93 5.25 18.0 0 Automatic 3 4
## 7 10.4 8 460 215 3 5.42 17.8 0 Automatic 3 4
## 8 14.7 8 440 230 3.23 5.34 17.4 0 Automatic 3 4
## 9 15.5 8 318 150 2.76 3.52 16.9 0 Automatic 3 2
## 10 15.2 8 304 150 3.15 3.44 17.3 0 Automatic 3 2
## 11 13.3 8 350 245 3.73 3.84 15.4 0 Automatic 3 4
## 12 19.2 8 400 175 3.08 3.84 17.0 0 Automatic 3 2
The select
function in the dplyr
package creates a dataset that contains only the specified variables. In other words, the select
function selects entire columns to include in the new dataset.
The code below creates a dataset with only the cyl
, wt
, andgear
variables.
library(dplyr)
mtcars_small <- mtcars %>% select(cyl, wt, gear) # Creates the subset
mtcars_small # displays the subset
## # A tibble: 32 x 3
## cyl wt gear
## <fct> <dbl> <fct>
## 1 6 2.62 4
## 2 6 2.88 4
## 3 4 2.32 4
## 4 6 3.22 3
## 5 8 3.44 3
## 6 6 3.46 3
## 7 8 3.57 3
## 8 4 3.19 4
## 9 4 3.15 4
## 10 6 3.44 4
## # … with 22 more rows
The drop_na
function in the tidyr
package removes observations from a dataset based on missing values of a variable or variables. This opperation could also be done with filter
.
The example dataset below contains variables a and b. Each variable has missing values denoted with NA.
library(tidyr)
library(tibble)
data_tbl <- tibble( a = c("A", "A", "B", NA, NA), b = c(NA, "C", "D", NA, "D") )
data_tbl
## # A tibble: 5 x 2
## a b
## <chr> <chr>
## 1 A <NA>
## 2 A C
## 3 B D
## 4 <NA> <NA>
## 5 <NA> D
# removes all rows in the dataset where variable a has missing values.
data_tbl %>% drop_na(a)
## # A tibble: 3 x 2
## a b
## <chr> <chr>
## 1 A <NA>
## 2 A C
## 3 B D
# removes all rows in the dataset where variable b has missing values.
data_tbl %>% drop_na(b)
## # A tibble: 3 x 2
## a b
## <chr> <chr>
## 1 A C
## 2 B D
## 3 <NA> D
# removes all rows in the dataset where either variable a or b has missing values.
data_tbl %>% drop_na(a, b)
## # A tibble: 2 x 2
## a b
## <chr> <chr>
## 1 A C
## 2 B D
Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College