Resampling doesn’t need to be hard
As an example to show how accessible resampling can be, here’s bit of code that resamples an anova and computes an empirical p value.
temp
is a dataframe containing the data Condition
is a column with exactly that temp_col
is the name of a dependent variable. It’s a string to make this easy to reuse. if you haven’t used map
before it’s basically a for loop that returns a list. When the output get’s passed into unlist
it becomes an array.
<- temp
temp_shuffle <- map(1:1000, function(i){
resample_array $Condition <- sample(temp_shuffle$Condition, replace = F)
temp_shuffle<- lm(as.formula(paste0(temp_col, " ~ Condition")), data = temp_shuffle)
fm return(car::Anova(fm)[1,3])
%>% unlist()
}) <- mean(resample_array >= car::Anova(fm)[1,3]) ep
The down side is that it takes orders of magnitude more time to run because you’re running the same code hundreds or thousands of times. This is only a problem if you need crazy high precision or have a really complex/hard to fit model. For reference using the code above took about ~2 seconds/dv for 1000 iterations on my machine.
A handy pattern is to use map to summarize data and then bind it.
<- map(names(M)[names(M) != "Sample"], function(i){
map_res <- shapiro.test(M[[i]])
res
return(
list(
mrna = i,
p = res$p.value
)
)
})
<- do.call(rbind, map_res) shapiro_res