There are many ways to read in and store data in R. Here are some of the more popular was to do so.

Variables

Suppose you measured the height of 5 individuals. The heights were 60.2, 61.3, 59.0, 68.9, 70.8 inches. You will most likely want to use these measurements for more than one calculation, so it makes sense to save them. The code below saves all of the height measurements to a variable called height. Reading the code from left to right can be interpreted as height gets (<-) the values contained within c( ).

# creates the variable
height <- c(60.2, 61.3, 59.0, 68.9, 70.8)

# displays the variable
height
## [1] 60.2 61.3 59.0 68.9 70.8

Suppose you also recorded weight and eye color. A variable name must be a single string of characters with no spaces. The variable for eye color has a period in place of the space. It is good practice to give your variables descriptive names.

weight <- c(144, 1118, 160, 112, 180)
eye.color <- c("blue", "blue", "brown", "hazel", "brown")

Data Frames

Data frames are the structures that R uses to store multiple variables. Data frames are created using a syntax similar to variables. Each variable in a data frame is displayed as its own column.

# creates the data frame
my.data <- data.frame(height, weight, eye.color)

# displays the data frame
my.data
##   height weight eye.color
## 1   60.2    144      blue
## 2   61.3   1118      blue
## 3   59.0    160     brown
## 4   68.9    112     hazel
## 5   70.8    180     brown

The functions below will provide you with various details and information about a data frame.

  • str(my.data) will list all of the variables and variable types in the my.data data set.

  • summary(my.data) will calculate summary statistics for all of the variables in the my.data data set.

  • ?my.data will produce a help page, if one exists, for the my.data data set. Help pages contain the names of each variable and a brief description of what each variable represents.

Tibbles

A tibble is a fancier version of a data frame. Tibbles are not loaded into R by default. They are part of the tibble package which is part of a collection of packages referred to as the Tidyverse. The syntax for creating a tibble is very similar to that of a data frame.

# Load the tibble package
library(tibble)

# Create a tibble
my.data.tibble <- tibble(height, weight, eye.color)

# Display the tibble
my.data.tibble
## # A tibble: 5 x 3
##   height weight eye.color
##    <dbl>  <dbl> <chr>    
## 1   60.2    144 blue     
## 2   61.3   1118 blue     
## 3   59      160 brown    
## 4   68.9    112 hazel    
## 5   70.8    180 brown

You can see that the tibble contains variable type information at the top of the data set where a data frame did not. The str(), summary(), and ? functions also work with tibbles.

Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College