Calculating Statistics

Many of the functions in R do not handle missing data. If any of the functions below return NA it is because there is missing data. add the argument na.rm = TRUE to the function to handle missing data or use the favstats() function in the mosaic package as an alternative.

library(mosaic) # loads the favstats() function

The variable used for these examples come from the mtcars dataset.

Base R and mosaic

Mean

Also known as the average. Here are two different ways to calculate the mean.

mean(mtcars$wt); 
favstats(mtcars$wt)$mean;

Median

The median is the value in the dataset that splits it into two equal pieces. Here are two different ways to calculate the median.

median(mtcars$wt);
favstats(mtcars$wt)$median;

Sort

Sorts the data from least to greatest.

sort(mtcars$wt);

Sample Standard Deviation

The standard deviation is a measure of spread. Here are two different ways to calculate the standard deviation.

sd(mtcars$wt);
favstats(mtcars$wt)$sd;

Sample Variance

The variance is a measure of spread.

var(mtcars$wt);

Percentiles

Here are two different ways to calculate percentiles.

quantile(mtcars$wt, c(.25, .50) ); # 25th and 50th percentile
favstats(mtcars$wt)$Q1 # 1st quartile, 25th Percentile

Minimum

The smallest value in the dataset. Here are two different ways to calculate the minimum value of a variable.

min(mtcars$wt);
favstats(mtcars$wt)$min;

Maximum

The largest value in the dataset. Here are two different ways to calculate the maximum value of a variable.

max(mtcars$wt);
favstats(mtcars$wt)$max;

Which

Which is a function that is used to look up the row / index / case number of specific values. For example, suppose you wish to know which car in the mtcars dataset had a weight of 1.835 (thousand pounds).

which(mtcars$wt == 1.835)

## [1] 20

You could also look up which car(s) had the minimum weight by typing the following.

which(mtcars$wt == min(mtcars$wt) )

## [1] 28

You could look up which car(s) weigh more than 5 (thousand pounds).

which(mtcars$wt >= 5 )

## [1] 15 16 17

Tidyverse

Multiple Statistics

The dplyr package makes producing summary tables easy.

library(dplyr)
mtcars %>% 
  summarize( n = n(), 
             Min = min(mpg),
             Q1 = quantile(mpg, .25),
             Avg_MPG = mean(mpg), 
             Q3 = quantile(mpg, .75),
             Max = max(mpg)
             )

## # A tibble: 1 x 6
##       n   Min    Q1 Avg_MPG    Q3   Max
##   <int> <dbl> <dbl>   <dbl> <dbl> <dbl>
## 1    32  10.4  15.4    20.1  22.8  33.9

Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College