Scenario 2: Simulating different units (random variables)

Kumle, L., Vo, M. L-H., & Draschkow, D.

latest update: March 2020

This notebook accompanies Scenario 2 in Kumle, Vo & Draschkow (2021).

For a general introduction to simulation-based power analyses as well as using simulations to explore power for different sample sizes, see Scenario 1.

In this notebook, we will cover scenarios in which we wish to simulate power for random variables other than subject. To do so, the packages mixedpower (Kumle, Vo & Draschkow, 2018) and simr (Green & Macleod, 2016) are used and all simulations are based on (generalized) mixed models fitted with lme4 (Bates, Maechler, Bolker & Walker, 2015). All simulations shown in this notebook are applicable for linear mixed models as well as generalized linear mixed models.

See Scenario 3 for starting a simulation-based power estimation from scratch.

Import Data

As in Scenario 1, we will base our simulation on data published by Yan, Zhou, Shu, Yusupu, Miao, Kruegel & Kliegl (2014) retrieved from http://read.psych.uni-potsdam.de. A preprocessed data set can be downloaded here.

# get data
load("Yan_et_al.RData") # data set is called "YanData"

Yan et all. (2014) tested 48 subjects, each of whom read 120 sentences investigating the effect of different factors on the first landing position while reading (FLP, i.e. the position in a sentence your eyes first land on) - a more detailed introduction to the data set can be found in Scenario 1 where we varied the number of subjects. The levels of the second random parameter (i.e. stimuli/items), however, have been kept constant until now.

Again, we are interested in the effect of morphological complexity (i.e. number of suffixes) and word length and therefore will include them as fixed effects in our model fitted with lme4:

library(lme4)
#  LMM including word length and word complexity as fixed effects and random intercepts for subject and sentence
FLPmodel <- lmer(flp ~ word_length * complexity + (1|subject) + (1|sentence),
                 data = YanData)

summary(FLPmodel, corr = F) # let's have a look!

## Linear mixed model fit by REML ['lmerMod']
## Formula: flp ~ word_length * complexity + (1 | subject) + (1 | sentence)
##    Data: YanData
## 
## REML criterion at convergence: 54454.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0880 -0.6547  0.0432  0.6640  5.6162 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  sentence (Intercept) 0.05543  0.2354  
##  subject  (Intercept) 0.35487  0.5957  
##  Residual             3.20984  1.7916  
## Number of obs: 13523, groups:  sentence, 120; subject, 48
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)             3.66181    0.09082  40.320
## word_length             1.51191    0.04605  32.832
## complexity             -0.07534    0.02531  -2.977
## word_length:complexity  0.11581    0.03293   3.517

The corresponding FLPmodel includes word length ( ß = 1.511) and morphological complexity (ß = - 0.075) as well as their interaction (ß = 0.116) as fixed effects (see Table 1). Moreover, we included random intercepts for the random effects of subjects and sentence (i.e. stimuli) making this model a typical example with crossed random effects as described by Baayen et al. (2008).

Varying the number of stimuli

This time, we are not interested in changing the number of subjects but the number of sentences around the original number of 120.

mixedpower

Using the mixedpower package, all that has to be changed compared to Scenario 1 is the specification of simvar and its steps. For clarity, we will again specify all parameters before running the actual power analysis.

# install mixedpower
if (!require("devtools")) {
    install.packages("devtools", dependencies = TRUE)}
devtools::install_github("DejanDraschkow/mixedpower") # mixedpower is hosted on GitHub

# load library
library(mixedpower)

First, we will provide some general information about the model we have chosen to simulate power for. Note how we specified “sentence” as the random variable we want to vary.

# ------------------------------------------ #
# INFORMATION ABOUT MODEL USED FOR SIMULATION

model <- FLPmodel # which model do we want to simulate power for?
data <- YanData # data used to fit the model
fixed_effects <- c("word_length", "complexity") # all fixed effects specified in FLPmodel
simvar <- "sentence" # which random variable do we want to vary in the simulation?

Next, we will set the parameters determining the details of our simulation. As the original data set contains 120 sentences, we will explore power for a range of sentences around that number.
Similar to Scenario 1, we will also include the SESOI (i.e. smallest effect of interest) specification. An introduction to the SESOI can be found in Notebook 1 and the tutorial paper Kumle, Vo & Draschkow (2021).

# ------------------------------------------ #
# SIMULATION PARAMETERS
steps <- c(100, 120, 140, 160, 180) # which sample sizes do we want to look at?
critical_value <- 2 # which t/z value do we want to use to test for significance?
n_sim <- 1000 # how many single simulations should be used to estimate power?

# ------------------------------------------ #
# INCLUDE SESOI SIMULATION
SESOI <- c(3.66, 0.75, -0.065, 0.09) # specify SESOI

Finally, we will combine this information in the mixedpower()-function and run the power analysis. This might take a while (depending on the complexity and size of the model as well as on the number of cores on the machine used for simulation) … time to take a short break! Running on a machine with 6 cores, the following simulation will take approximately 10-15 minutes …

# ------------------------------------------ #
# RUN SIMULATION
power_sentences <- mixedpower(model = FLPmodel, data = YanData,
                              fixed_effects = c("word_length", "complexity"),
                              simvar = "subject", steps = c(100, 120, 140, 160, 180),
                              critical_value = 2, n_sim = 1000,
                              SESOI = SESOI, databased = T)

Let’s have a look at the results and plot them:

# let's have a first look at the results:
power_sentences

# ------------------------------------------ #
# PLOT RESULTS
multiplotPower(power_sentences)

simr

The same analyses can be done with the simr package (Green & Macleod, 2016) - with the difference that we have to test one effect at a time. Similar to the analysis with mixedpower, we just specify “sentence” as the variable we want to simulate along and extend the model along this random effect first.

library(simr)
library(simr)
# -------------------------------------------- #
# EXTEND DATA SET
FLPmodel_extended <- extend(FLPmodel, along="sentence", n = 180) #extend data set to include 180 sentences

# -------------------------------------------- #
# POWER ANALYSIS WITH SIMR
powerC_sentences <- powerCurve(fit = FLPmodel_extended, test = fixed("complexity"), along = "sentence",
                               breaks = c(100, 120, 140, 160, 180))

A more detailed tutorial for simr can be found here and the inclusion of the simr package in this notebook simply aims at introducing the package to give a more complete overview of existing resources.

Varying the levels multiple random variables simultaneously

So far, we only varied the levels of one random variable while we kept the other one constant. However, being able to vary both simultaneously would be desirable as it allows to estimate power for different combinations of levels.

To do so, we will make use of the R2power()-function in mixedpower which allows to do exactly that (check ?R2power or the documentation of mixedpower for more details). The principle behind the simulations in R2power stays the same as in the other approaches in mixedpower introduced earlier. First, new data sets are simulated before they are used to refit the model entered into the simulation. Second, all fixed effects (and interactions) in the refitted models are checked for significance before power is estimated as the proportion of significant to all simulation.
However, the simulation of new data sets now follows a two-step simulation process where the levels of one random variable are simulated before the levels of the second random variable are adapted. To do so, we need to specify the desired levels and names of both random variables.

Imagine wanting to explore power for a range of sentences - but not for 48 subjects like in the data published by Yan et al. (2014) but for 30 subjects. Consequentially, we would keep sentences as our “simvar” and add subject as the second varying random variable. All other parameters will stay the same as in the simulation run with mixedpower() above.

# ------------------------------------------ #
# SPECIFY R2 PARAMETERS
R2var <- "subject" # specify "subject" as the second random effect we want to vary
R2level <- 30 # which level should "subject" have?

# ------------------------------------------ #
# RUN SIMULATION
R2power <- R2power(model = FLPmodel, data = YanData,
                   fixed_effects = c("word_length", "complexity"),
                   simvar = "sentence", steps = c(100,120,140,160,180),
                   R2var = "subject", R2level = 30,  critical_value = 2,
                   n_sim = 1000, SESOI = SESOI, databased = T)

# let's have a look at the results
R2power

# ------------------------------------------ #
# PLOT RESULTS
multiplotPower(R2power)

Finally, we need to transfer the plots and results into a decision regarding the sample size of our planned study. General guidance can be found in the accompanying tutorial paper. In this notebook, we will adapt a strategy resulting in > 80% power for all effects included in the model. To reach this threshold, we could either test 48 subjects each of whom reads 160 sentences. In a scenario where we only have resources for 30 subjects even 180 sentences would not be enough to power our study with > 80% power for all effects if we base our decision on the SESOI effect (as can be seen in the results of the R2power simulation). Moreover, we could run more simulations for even more combinations and base our decision on them. Repeating this process with different combinations (e.g. different R2levels) enables an extensive overview of power and the factors that influence it, which can be especially useful in cases in which one of the random effects cannot be increased, i.e. a ceiling on the amount of participants or stimuli.

Continue with Notebook 3 for starting a simulation-based power estimation from scratch.

References

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). Fitting Linear Mixed-Effects Models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01

Green, P., & Macleod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493-498. https://doi.org/10.1111/2041-210X.12504

Kumle, L., Vo, M. L-H., & Draschkow, D. (2018). Mixedpower: a library for estimating simulation-based power for mixed models in R. https://doi.org/10.5281/zenodo.1341047

Kumle, L., Võ, M.LH. & Draschkow, D. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behav Res (2021). https://doi.org/10.3758/s13428-021-01546-0

Yan, M., Zhou, W., Shu, H., Yusupu, R., Miao, D., Kruegel, A., & Kliegl, R. (2014). Eye movements guided by morphological structure: Evidence from the Uighur language. Cognition, 132(2), 181???215. https://doi.org/10.1016/j.cognition.2014.03.008