Kumle, L., Vo, M. L-H., & Draschkow, D.
latest update: May 2021
This notebook aims at accompanying Scenario 1 in Kumle, Vo & Draschkow (2021).
Additionally to giving a brief general introduction to simulation-based power analyses, this notebook focuses on using power analysis as a tool for a priori sample size estimation. To do so, the packages mixedpower (Kumle, Vo & Draschkow, 2018) and simr (Green & Macleod, 2016) are used and all simulations are based on (generalized) mixed models fitted with lme4 (Bates, Maechler, Bolker & Walker, 2015). All simulations shown in this notebook are applicable for linear mixed models as well as generalized linear mixed models.
See Scenario 2 for power analyses varying factors other than participants and Scenario 3 for starting a simulation-based power estimation from scratch.
Being able to estimate power is important as it is closely linked to the reliability and replicability of empirical findings. Classical solutions to power analysis work with analytical formulas. However, (generalized) linear mixed models ((G)LMMs) are often too complex to be solved analytically and therefore require a different approach. A flexible and more intuitive alternative to analytic power solutions are simulation-based power analyses (Brysbaert & Stevens, 2018; Thomas & Juanes, 1996). In simple terms, one basic question behind power analyses is: “Suppose there really is an effect of a certain size and I run my experiment one hundred times - how many times will I get a statistically significant result?” (Coppock, 2013). As it is possible to simulate the outcome of an experiment, power can be calculated based on the proportion of significant simulations to all simulations (Johnson et al., 2015; Thomas & Juanes, 1996).
The shared principle of all simulation-based power analyses solutions can therefore be broken down into the following steps:
1) simulation of new data sets,
2) analysis of each data set and test for statistical significance, and
3) calculation of the proportion of significant to all simulations.
However, accuracy of the power estimate strongly depends on the accuracy of our simulation - informing the parameters of the simulation therefore is a critical step. A more detailed discussion can be found in Kumle, Vo & Draschkow (2021).
All simulations in this notebook will be based on an available and preceding well-powered design. This is possibly the most desired solution since we can utilize a (G)LMM fitted on real and independent empirical data to inform the simulation parameters.
Data used for this tutorial originates from Yan, Zhou, Shu, Yusupu, Miao, Kruegel & Kliegl (2014). All analyses and data were obtained from http://read.psych.uni-potsdam.de where the authors made them available.
Yan et all. (2014) tested 48 subjects, each of whom read 120 sentences investigating the effect of different factors on the first landing position (FLP, i.e. the position in a sentence your eyes first land on).
Suppose the goal is to conduct a study replicating and further investigating the effect of morphological complexity (i.e. number of suffixes) and word length – we would need to conduct a power analysis in order to inform the sample size of the follow-up study.
To focus on the actual power analysis procedure, we already preprocessed the data set according to the author’s analysis and the data can be downloaded here. Moreover, several variables have been renamed for more clarity.
# get data
load("Yan_et_al.RData") # data set is called "YanData"
First, we need a corresponding model fitted with lme4.
library(lme4)
# LMM including word length and word complexity as fixed effects and random intercepts for subject and sentence
FLPmodel <- lmer(flp ~ word_length * complexity + (1|subject) + (1|sentence),
data = YanData)
summary(FLPmodel, corr = F) # let's have a look!
## Linear mixed model fit by REML ['lmerMod']
## Formula: flp ~ word_length * complexity + (1 | subject) + (1 | sentence)
## Data: YanData
##
## REML criterion at convergence: 54454.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0880 -0.6547 0.0432 0.6640 5.6162
##
## Random effects:
## Groups Name Variance Std.Dev.
## sentence (Intercept) 0.05543 0.2354
## subject (Intercept) 0.35487 0.5957
## Residual 3.20984 1.7916
## Number of obs: 13523, groups: sentence, 120; subject, 48
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.66181 0.09082 40.320
## word_length 1.51191 0.04605 32.832
## complexity -0.07534 0.02531 -2.977
## word_length:complexity 0.11581 0.03293 3.517
The FLPmodel includes word length ( ß = 1.511) and morphological complexity (ß = - 0.075) as well as their interaction (ß = 0.116) as fixed effects (see Table 1). Moreover, we included random intercepts for the random effects of subjects and sentence (i.e. stimuli) making this model a typical example with crossed random effects as described by Baayen et al. (2008).
While it is possible to implement power analyses from scratch, using preprogrammed packaged saves us a lot of time and effort. To start with, we will make use of the mixedpower package (Kumle, Vo & Draschkow, 2018). An overview of all functions included in mixedpower can be found in its documentation and more theoretical guidance on the specific parameters can be found in Kumle, Vo & Draschkow (2021).
# install mixedpower
if (!require("devtools")) {
install.packages("devtools", dependencies = TRUE)}
devtools::install_github("DejanDraschkow/mixedpower") # mixedpower is hosted on GitHub
# load library
library(mixedpower)
Since we want to explore power for different sample sizes, which corresponds to one of our random variables (i.e. subject), we will make use of the mixedpower() function.
As can be seen in the documentation, using mixedpower() requires to specify various parameters. To get a quick overview of those parameters, ?mixedpower can be used as well. While those parameters can be directly specified as arguments in the function, we chose to make it more explicit and specify them beforehand by assigning them to variables.
First, we will provide some general information about the model we have chosen to simulate power for.
# ------------------------------------------ #
# INFORMATION ABOUT MODEL USED FOR SIMULATION
model <- FLPmodel # which model do we want to simulate power for?
data <- YanData # data used to fit the model
fixed_effects <- c("word_length", "complexity") # all fixed effects specified in FLPmodel
simvar <- "subject" # which random effect do we want to vary in the simulation?
Next, we will set the parameters determining the details of our simulation.
# ------------------------------------------ #
# SIMULATION PARAMETERS
steps <- c(20, 30, 40, 50, 60) # which sample sizes do we want to look at?
critical_value <- 2 # which t/z value do we want to use to test for significance?
n_sim <- 1000 # how many single simulations should be used to estimate power?
Finally, we will combine this information in the mixedpower()-function and run the power analysis. This might take a while (depending on the complexity and size of the model as well as on the number of cores on the machine used for simulation) … time to take a short break! Running on a machine with 6 cores, the following simulation will take approximately 10-15 minutes …
# ------------------------------------------ #
# RUN SIMULATION
power_FLP <- mixedpower(model = FLPmodel, data = YanData,
fixed_effects = c("word_length", "complexity"),
simvar = "subject", steps = c(20,30,40,50,60),
critical_value = 2, n_sim = 1000)
# let's have a first look at the results:
power_FLP
Comparing the parameters we set above with the documentation or help page of mixedpower reveals that we skipped two parameters - namely SESOI = F and databased = T. Both parameters determine the effect sizes used for simulation and the default is to include a databased simulation (i.e. using the beta coefficients found in the specified model) and no SESOI simulation (i.e. specifiying a Smalles effect of interest which is used to simulate power). However, we can provide a vector with new beta coefficients and hand it to mixedpower() using the SESOI argument. Theoretical guidance on this issue can be found in the accompanying tutorial paper (Kumle, Vo & Draschkow (2021)).
So let’s include a SESOI simulation as well:
# ------------------------------------------ #
# INCLUDE SESOI SIMULATION
SESOI <- c(3.66, 0.75, -0.065, 0.09) # specify SESOI
power_SESOI <- mixedpower(model = FLPmodel, data = YanData,
fixed_effects = c("word_length", "complexity"),
simvar = "subject", steps = c(20,30,40,50,60),
critical_value = 2, n_sim = 1000,
SESOI = SESOI, databased = T)
Note that we set databased = T again. Since we already estimated databased power above, we could save some time and not run it again this time (this will result it slightly different estimates due to the nature of our simulations). However, we will run it again to later plot both estimations at once.
# let's have a first look at the results:
power_SESOI
Being able to interpret and combine the results of power simulations is important for a power analysis to be a useful assistant in experimental planning. Before we try to interpret the results of our power analysis, we will plot it using the plotting function multiplotPower() included in the mixedpower package.
# ------------------------------------------ #
# PLOT RESULTS
multiplotPower(power_SESOI)
Finally, we need to transfer the plots and results into a decision regarding the sample size of our planned study. General guidance can be found in the accompanying tutorial paper. In this notebook, we will adapt a strategy resulting in > 80% power for all effects included in the model. Using the databased estimate, we would agree to test 50 participants. However, using the SESOI estimate, we would test more than 60 participants as we do not reach a power of >80% for all effects with a sample size of 60. Moreover, additional SESOI-simulations with larger sample sizes would be necessary to reach a sample size that fulfils our power criterion.
There are other resources to perform simulation-based power analyses, one of them being simr (Green & Macleod, 2016). The tutorial mainly focusses on mixedpower, as this package has advantages in computational speed and allows for more factors to be varied simultaneously. However, simr provides some exceptionally easy to implement functions that are worth looking at.
Similar to the mixedpower()-function, simr provides a function called powerCurve() which allows to explore power for different sample sizes. However, all functions included in simr simulate power for one effect at a time - we therefore need to specify which effect we want to test. To get a first overview, we would like to explore power for 20, 30, 40 and 60 participants. As this includes a sample size larger than in the data set used for simulation, we need to extend the data set before we hand it to the simulation.
library(simr)
# -------------------------------------------- #
# EXTEND DATA SET
FLPmodel_extended <- extend(FLPmodel, along="subject", n = 60) #extend data set to include 60 participants
All functions included in simr simulate power for one effect at a time - we therefore need to specify which effect we want to test. Moreove, since computational speed in simr is limited for more complex models and data, we will only include 100 simulations.
library(simr)
# -------------------------------------------- #
# POWER ANALYSIS WITH SIMR
powerC <- powerCurve(fit = FLPmodel_extended, test = fixed("complexity"), along = "subject",
breaks = c(20, 30, 40, 60))
# let's have a look at the results
print(powerC)
## Power for predictor 'complexity', (95% confidence interval),
## by number of levels in subject:
## 20: 41.00% (31.26, 51.29) - 5662 rows
## 30: 58.00% (47.71, 67.80) - 8523 rows
## 40: 72.00% (62.13, 80.52) - 11093 rows
## 60: 84.00% (75.32, 90.57) - 16934 rows
##
## Time elapsed: 73 h 56 m 18 s
We see that results are similar to the results achieved with mixedpower although they differ slightly. This is to be expected, as the simulations in simr are based on only 100 repetitions and therefore are more prone to variation.
Simr also comes with an option to change the effect sizes in accordance with a SESOI approach. All that has to be done is to modify the effect sizes in the model before handing it to the powerCurve()-function.
# -------------------------------------------- #
# SESOI WITH SIMR
fixef(FLPmodel)["complexity"] <- -0.065
After this step, we can just repeat the power analysis shown above.
Continue with Notebook 2 for power analyses varying factors other than sample size and Notebook 3 for starting a simulation-based power estimation from scratch.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). Fitting Linear Mixed-Effects Models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1), 1???20. https://doi.org/10.5334/joc.10
Coppock, A. (2013). 10 Things to Know About Statistical Power. Retrieved September 20, 2018, from http://egap.org/methods-guides/10-things-you-need-know-about-statistical-power
Green, P., & Macleod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493-498. https://doi.org/10.1111/2041-210X.12504
Johnson, P. C. D., Barry, S. J. E., Ferguson, H. M., & Müller, P. (2015). Power analysis for generalized linear mixed models in ecology and evolution. Methods in Ecology and Evolution, 6(2), 133–142. https://doi.org/10.1111/2041-210X.12306
Kumle, L., Võ, M.LH. & Draschkow, D. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behav Res (2021). https://doi.org/10.3758/s13428-021-01546-0
Kumle, L., Vo, M. L-H., & Draschkow, D. (in preparation). Estimating power in linear and generalized linear mixed models: an open introduction and tutorial in R.
Thomas, L., & Juanes, F. (1996). The importance of statistical power analysis: An example from Animal Behaviour. Animal Behaviour, 52(4), 856-859. https://doi.org/10.1006/anbe.1996.0232
Yan, M., Zhou, W., Shu, H., Yusupu, R., Miao, D., Kruegel, A., & Kliegl, R. (2014). Eye movements guided by morphological structure: Evidence from the Uighur language. Cognition, 132(2), 181???215. https://doi.org/10.1016/j.cognition.2014.03.008