Resampling doesn’t need to be hard

code
intermediate
r
resampling
Author

Daniel Kick

Published

November 4, 2020

As an example to show how accessible resampling can be, here’s bit of code that resamples an anova and computes an empirical p value.

temp is a dataframe containing the data Condition is a column with exactly that temp_col is the name of a dependent variable. It’s a string to make this easy to reuse. if you haven’t used map before it’s basically a for loop that returns a list. When the output get’s passed into unlist it becomes an array.

temp_shuffle <- temp
resample_array <- map(1:1000, function(i){
     temp_shuffle$Condition <- sample(temp_shuffle$Condition, replace = F)
     fm <- lm(as.formula(paste0(temp_col, " ~ Condition")), data = temp_shuffle)      
     return(car::Anova(fm)[1,3])
}) %>% unlist()
ep <- mean(resample_array >= car::Anova(fm)[1,3])

The down side is that it takes orders of magnitude more time to run because you’re running the same code hundreds or thousands of times. This is only a problem if you need crazy high precision or have a really complex/hard to fit model. For reference using the code above took about ~2 seconds/dv for 1000 iterations on my machine.

A handy pattern is to use map to summarize data and then bind it.

map_res <- map(names(M)[names(M) != "Sample"], function(i){
  res <- shapiro.test(M[[i]])

  return(
    list(
    mrna = i,
    p = res$p.value
    )
  )
})

shapiro_res <- do.call(rbind, map_res)