Many of the functions in R do not handle missing data. If any of the functions below return NA
it is because there is missing data. add the argument na.rm = TRUE
to the function to handle missing data or use the favstats()
function in the mosaic
package as an alternative.
library(mosaic) # loads the favstats() function
The variable used for these examples come from the mtcars
dataset.
Also known as the average. Here are two different ways to calculate the mean.
mean(mtcars$wt);
favstats(mtcars$wt)$mean;
The median is the value in the dataset that splits it into two equal pieces. Here are two different ways to calculate the median.
median(mtcars$wt);
favstats(mtcars$wt)$median;
Sorts the data from least to greatest.
sort(mtcars$wt);
The standard deviation is a measure of spread. Here are two different ways to calculate the standard deviation.
sd(mtcars$wt);
favstats(mtcars$wt)$sd;
The variance is a measure of spread.
var(mtcars$wt);
Here are two different ways to calculate percentiles.
quantile(mtcars$wt, c(.25, .50) ); # 25th and 50th percentile
favstats(mtcars$wt)$Q1 # 1st quartile, 25th Percentile
The smallest value in the dataset. Here are two different ways to calculate the minimum value of a variable.
min(mtcars$wt);
favstats(mtcars$wt)$min;
The largest value in the dataset. Here are two different ways to calculate the maximum value of a variable.
max(mtcars$wt);
favstats(mtcars$wt)$max;
Which is a function that is used to look up the row / index / case number of specific values. For example, suppose you wish to know which car in the mtcars
dataset had a weight of 1.835 (thousand pounds).
which(mtcars$wt == 1.835)
## [1] 20
You could also look up which car(s) had the minimum weight by typing the following.
which(mtcars$wt == min(mtcars$wt) )
## [1] 28
You could look up which car(s) weigh more than 5 (thousand pounds).
which(mtcars$wt >= 5 )
## [1] 15 16 17
The dplyr
package makes producing summary tables easy.
library(dplyr)
mtcars %>%
summarize( n = n(),
Min = min(mpg),
Q1 = quantile(mpg, .25),
Avg_MPG = mean(mpg),
Q3 = quantile(mpg, .75),
Max = max(mpg)
)
## # A tibble: 1 x 6
## n Min Q1 Avg_MPG Q3 Max
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 32 10.4 15.4 20.1 22.8 33.9
Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College