Sampling

Random Digit Table

Just like in your stats book, your computer contains a table of seemingly random outcomes. The best way to perform randomization, so that it can be reproduced, is to set a pointer so that the computer knows where to start looking for random numbers. The following command sets the starting line to 142.

set.seed(142); # Sets the starting point in the random digit table

By doing this, others will now be able to reproduce the randomization scheme that you specified in your experimental design. Compare the output from the following two blocks of code.

set.seed(142); # Sets the starting point in the random digit table
sample(c(1,2,3,4,5,6) ); # Generates the random sample

## [1] 5 4 1 3 2 6

set.seed(142); # Sets the starting point in the random digit table
sample(c(1,2,3,4,5,6) ); # Generates the random sample

## [1] 5 4 1 3 2 6

The output is the same because the starting point in the random digit table was the same each time a random sample was generated.

If an additional sample is generated the output will be different because the starting point in the random digit table was not reset.

sample(c(1,2,3,4,5,6) ); # Generates the random sample

## [1] 3 2 5 4 1 6

Every time set.seed(142) is run, the computer looks at the beginning of line 142 for the random value. If set.seed(142) is not run, the sample is generated using the position in the random digits table where the previous sample left off.

Sampling

Suppose you want to generate 25 randomly selected items from a list of 100.

my.list <- c(1:100); # creates a list names 
my.sample <- sample(my.list, size=25, replace=FALSE); # Draws 25 unique integers from the list
my.sample; # Displays the sampled integers

##  [1]  5 21 90 30 63  6 56 99  8 11  7 96 52  1 15 39 81 62 67 78 34 83 47 86 98

Suppose you want to randomly select two names from a list of five.

my.names <- c("Steve", "Bruce", "Tony", "Peter", "Flint"); # creates a list of names
my.sample <- sample(my.names, size=2, replace=FALSE); # Draws 2 unique names from the list
my.sample; # Displays the sampled names

## [1] "Steve" "Peter"

You can generate another sample by re-running the last two lines of code.

my.sample <- sample(my.names, size=2, replace=FALSE); # Draws 2 unique names from the list
my.sample; # Displays the sampled names

## [1] "Bruce" "Tony"

Example

Load the file example.csv See Loading Data for help.

Load the data.

example.data <- read.csv(file="http://homepages.gac.edu/~anienow2/MCS_142/R/example.csv", header=TRUE, sep=",")

Display the dataset

example.data

##     Name Height Eye.Color   Hand Gender
## 1 Shelly     55      Blue  Right      F
## 2  Alvin     60     Green   Left      M
## 3  Simon     67      Blue  Right      M
## 4   Theo     72     Green  Right      M
## 5  Nicki     54     Brown  Right      F

Set the Random Seed

set.seed(1979)

Generate a list of numbers. In this case, there are 5 subjects in the dataset so the list will be 1, 2, 3, 4, 5. Since the length of the list is often unknown you can generalize this by using the length() function. Running the code below will generate a list of size 2 that contains numbers 1 through 5.

my.list <- sample( c(1:length(example.data$Name)), size=2, replace=FALSE); 
my.list;

## [1] 5 2

You can see the data for the individuals that were sampled by typing the following.

example.data[my.list, ];

##    Name Height Eye.Color  Hand Gender
## 5 Nicki     54     Brown Right      F
## 2 Alvin     60     Green  Left      M

You can make and display a dataset of just the individuals that are in the sample by running the following:

new.data <- example.data[my.list, ]; 
new.data;

##    Name Height Eye.Color  Hand Gender
## 5 Nicki     54     Brown Right      F
## 2 Alvin     60     Green  Left      M

Mathematicss, Computer Science, and Statistics Department Gustavus Adolphus College