Resampling doesn’t need to be hard

code

intermediate

resampling

Author

Daniel Kick

Published

November 4, 2020

As an example to show how accessible resampling can be, here’s bit of code that resamples an anova and computes an empirical p value.

temp is a dataframe containing the data Condition is a column with exactly that temp_col is the name of a dependent variable. It’s a string to make this easy to reuse. if you haven’t used map before it’s basically a for loop that returns a list. When the output get’s passed into unlist it becomes an array.

temp_shuffle <- temp
resample_array <- map(1:1000, function(i){
     temp_shuffle$Condition <- sample(temp_shuffle$Condition, replace = F)
     fm <- lm(as.formula(paste0(temp_col, " ~ Condition")), data = temp_shuffle)      
     return(car::Anova(fm)[1,3])
}) %>% unlist()
ep <- mean(resample_array >= car::Anova(fm)[1,3])

The down side is that it takes orders of magnitude more time to run because you’re running the same code hundreds or thousands of times. This is only a problem if you need crazy high precision or have a really complex/hard to fit model. For reference using the code above took about ~2 seconds/dv for 1000 iterations on my machine.

A handy pattern is to use map to summarize data and then bind it.

map_res <- map(names(M)[names(M) != "Sample"], function(i){
  res <- shapiro.test(M[[i]])

  return(
    list(
    mrna = i,
    p = res$p.value
    )
  )
})

shapiro_res <- do.call(rbind, map_res)