Nested random effects: A GLMM example.

Kumle, L., Vo, M. L-H., & Draschkow, D.

latest update: May 2021

This Notebook serves as an additional resource for Kumle, Vo & Draschkow (2021). While the main tutorial focusses on power analyses in (generalized) linear mixed models ((G)LMMs) with crossed random effects, this notebook briefly demonstrates the use of both the simr package (Green & Macleod, 2016) as well as the mixedpower package (Kumle, Vo & Draschkow, 2018) for designs with nested random effects.

For a more detailed introduction to both packages and their utilities, we recommend starting with the Scenarios introduced in the main tutorial paper and the corresponding Notebooks (Notebook 1, Notebook 2, Notebook 3). Additionally, more detailed information regarding the simr package can be found in Green & MacLeod (2016) and its documentation.

Example study: Patients nested in doctors and hospitals

Let us consider the second scenario introduced in this “Mixed effects logistic regression” tutorial:

A large HMO wants to know what patient and physician factors are most related to whether a patient’s lung cancer goes into remission after treatment as part of a larger study of treatment outcomes and quality of life in patients with lunger cancer.

In the current Notebook, we will extend their analysis by conducting a power analysis based on the three level logistic model with a random intercept for doctors and a random intercept for hospitals seen below (cf. model m3b). A detailed description of the data as well as all predictors can be found in the original tutorial.

library(lme4)

# load data:
hdp <- read.csv("https://stats.idre.ucla.edu/stat/data/hdp.csv")

# preprocess data according to tutorial:
hdp <- within(hdp, {
  Married <- factor(Married, levels = 0:1, labels = c("no", "yes"))
  DID <- factor(DID)
  HID <- factor(HID)
  CancerStage <- factor(CancerStage)
})


# fit model:
model_final <- glmer(remission ~ Age + LengthofStay + FamilyHx + IL6 + CRP + CancerStage +
               Experience + (1 + LengthofStay | DID) + (1 | HID), data = hdp, family = binomial,
              nAGQ = 1)

summary(model_final, corr = F)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: remission ~ Age + LengthofStay + FamilyHx + IL6 + CRP + CancerStage +  
##     Experience + (1 + LengthofStay | DID) + (1 | HID)
##    Data: hdp
## 
##      AIC      BIC   logLik deviance df.resid 
##   7147.7   7246.5  -3559.9   7119.7     8511 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4984 -0.4214 -0.1777  0.3625  8.3443 
## 
## Random effects:
##  Groups Name         Variance Std.Dev. Corr 
##  DID    (Intercept)  0.2501   0.5001        
##         LengthofStay 0.1402   0.3745   -0.13
##  HID    (Intercept)  0.5430   0.7369        
## Number of obs: 8525, groups:  DID, 407; HID, 35
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.540798   0.545366  -0.992 0.321381    
## Age            -0.015059   0.005951  -2.530 0.011394 *  
## LengthofStay   -0.191576   0.044966  -4.260 2.04e-05 ***
## FamilyHxyes    -1.343378   0.095436 -14.076  < 2e-16 ***
## IL6            -0.058482   0.011506  -5.083 3.72e-07 ***
## CRP            -0.020847   0.010177  -2.048 0.040518 *  
## CancerStageII  -0.294999   0.076397  -3.861 0.000113 ***
## CancerStageIII -0.873228   0.101639  -8.591  < 2e-16 ***
## CancerStageIV  -2.301406   0.170831 -13.472  < 2e-16 ***
## Experience      0.104044   0.023886   4.356 1.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## convergence code: 0
## Model failed to converge with max|grad| = 1.28386 (tol = 0.002, component 1)

Simple power analysis

First, we will utilize the simr to perform a quick and simple power analysis. Using the powerSim()-function included in simr, we can simulate power for a specified fixed effect and use the design parameters found in the data to inform the model. Precisely, we will simulate power for the predictor “Age” for all 8,525 patients who are nested in 407 doctors, who again are nested in 35 hospitals.

Note: Due to the complexity of the model used to inform the power simulation, all simulations included in this notebook are very time-consuming. To increase speed we limited the simulation repetitions to 100. However, we strongly recommend increasing this number whenever actual sample size planning is being conducted.

library(simr)
# power analysis with simr for effect "Age"
power <- powerSim(model_final, test = fixed("Age"), nsim = 100)

## Power for predictor 'Age', (95% confidence interval):
##       68.00% (57.92, 76.98)
## 
## Test: z-test
##       Effect size for Age is -0.015
## 
## Based on 100 simulations, (99 warnings, 0 errors)
## alpha = 0.05, nrow = 8525
## 
## Time elapsed: 1 h 53 m 30 s
## 
## nb: result might be an observed power calculation

The output of the power simulation tells us that with 8,525 patients who are nested in 407 doctors, who again are nested in 35 hospitals, the design would have a power of 64% for the predictor “Age”.

What if we are also interested in power for other predictors? Using the same command in simr, we can simulate power for different fixed effects. All we have to do is specifying the effect of interest. Let’s assume we want to estimate power for the second predictor “Length of stay”:

# power analysis with simr for effect "Length of Stay"
power <- powerSim(model_final, test = fixed("LengthofStay"), nsim = 100)

## Power for predictor 'LengthofStay', (95% confidence interval):
##       98.00% (92.96, 99.76)
## 
## Test: z-test
##       Effect size for LengthofStay is -0.19
## 
## Based on 100 simulations, (99 warnings, 0 errors)
## alpha = 0.05, nrow = 8525
## 
## Time elapsed: 1 h 49 m 25 s
## 
## nb: result might be an observed power calculation

Clearly, estimated power for different fixed effects differs greatly (which is to be expected since the magnitude of the fixed effects included in our model differ) and a design with 8,525 patients who are nested in 407 doctors, who again are nested in 35 hospitals would have a power of 98% to detect the effect of “Length of Stay”.

Changing the number of levels in random variables

So far, we estimated power for single fixed effects and used the sample sizes (8,525 patients, 407 doctors, 35 hospitals) found in the data set to inform the power simulation. However, a method for A) changing sample sizes and B) estimating power for multiple (or all) fixed effects would be helpful.

Given the our design has different sample sizes for different levels (patients, doctors, hospitals), being able to explore the effect of changes on each level would be extremely informative. Here, simr offers different easy to implement options. First, we will make use of the powerCurve()-function which allows to estimate power over a range of sample sizes.

Since our future study is aiming to detect the fixed effect “Length of Stay”, we can explore power for smaller sample sizes. First, we will vary the number of doctors. What if instead of 407 doctors, patients would be nested in 100 or 200 doctors?
To answer this question, we will simulate “along” the random variable “DID” (i.e. “Doctor ID”) and specify 100 and 200 as sample sizes.

# power analysis with simr for different numbers of doctors
power <- powerCurve(model_final, test = fixed("LengthofStay"), along = "DID", breaks = c(100, 200), nsim = 100)

## Power for predictor 'LengthofStay', (95% confidence interval),
## by largest value of DID:
##     100: 59.00% (48.71, 68.74) - 2133 rows
##     200: 79.00% (69.71, 86.51) - 4288 rows
## 
## Time elapsed: 1 h 19 m 28 s

However, so far we mainly looked at the effects of “Length of Stay” and “Age”. Being able to estimate power for multiple fixed effects would save us the time of having to repeat the above analysis for different fixed effects. Here, we will utilize mixedpower and the included R2power()-function. To do so, a few design parameters need to be specified to inform the simulation (see Notebook 1 for more details):

Besides the model and data used to inform the simulation, we need to specify the variable names of all fixed effects, which random variable we want to simulate along (i.e. simvar) and which sample sizes we want to explore (i.e. steps). Additionally, we need to specify a critical value (i.e. t or z value) used to determine statistical significance during the simulation process and how many repetitions the simulation should entail (i.e. n_sim).

In the simulation below we will explore power for all fixed effects in a design with 100, 200 and 407 doctors (who each sees 21 patients on average), who are nested in different hospitals. To keep the number of hospitals constant at 35, we will explicitly specify the number of the second random variable (i.e. R2var = hospital ID (HID)) to 35 using the R2level- argument.

Note: The mixedpower package is tailored towards designs with crossed random factors and might provide less accurate and time efficient results in use cases with nested random designs.

library(mixedpower)

fixed_effects <- c("Age", "LengthofStay",  "FamilyHx" , "IL6", "CRP",  "CancerStage" ,"Experience")

power <- R2power(model = model_final, data = hdp,
                    fixed_effect, simvar = "DID",
                    steps = c(100, 200,407),
                    critical_value = 2, n_sim = 100,
                    R2var = "HID", R2level = 35)

##         100  200  407      mode         effect
## 1 0.2959184 0.36 0.79 databased            Age
## 2 0.5000000 0.93 0.99 databased   LengthofStay
## 3 1.0000000 1.00 1.00 databased    FamilyHxyes
## 4 0.7551020 0.94 1.00 databased            IL6
## 5 0.1428571 0.30 0.51 databased            CRP
## 6 0.5612245 0.71 0.97 databased  CancerStageII
## 7 0.9897959 1.00 1.00 databased CancerStageIII
## 8 1.0000000 1.00 1.00 databased  CancerStageIV
## 9 0.5204082 0.97 1.00 databased     Experience

Power for different effect sizes

So far, our power analyses relied on the exact effect size found in the data used to inform the simulation. However, adopting effect sizes from published data involves the risk of performing the analyses on inflated effect sizes, which in turn would result in an underpowered design. Consequentially, being able to modify the expected effect size is vital.

Both the simr and mixedpower package provide methods to specify the expected effect size by changing the beta coefficient of the fixed effect in the model used to inform the simulation.

Using simr and the command seen below, we can change the beta coefficient in advance of the power simulation. We chose to repeat the simple power analysis from above for the effect “Length of stay” expecting an effect 50 % smaller than the effect found in the data.

# power analysis with simr for specified effect size
fixef(model_final)["LengthofStay"] <- -0.095

power <- powerSim(model_final, test = fixed("LengthofStay"), nsim = 100)

## Power for predictor 'LengthofStay', (95% confidence interval):
##       73.00% (63.20, 81.39)
## 
## Test: z-test
##       Effect size for LengthofStay is -0.095
## 
## Based on 100 simulations, (100 warnings, 0 errors)
## alpha = 0.05, nrow = 8525
## 
## Time elapsed: 1 h 57 m 11 s

A similar approach can be found in the mixedpower package, which allows to specify a smallest effect of interest through changing the beta coefficients of the fixed effects in the model used to inform the simulation. Since we will estimate power for all fixed effects at the same time, we also need to provide adjusted beta coefficients for all of them. Here, we chose to decrease each effect by 20% and keep the intercept constant. Then, we can hand the adjusted effect sizes to the R2power function (SESOI parameter). Additionally, we will set databased = T, since we already ran a simulation with the beta coefficients found in the data used to fit the model.

# decrease all effects by 20% but keep intercept constant
SESOI <- c(-0.54, model_final@beta[2:10]*0.80)

power <- R2power(model = model_final, data = hdp,
                    fixed_effect, simvar = "DID",
                    steps = c(100, 200,407),
                    critical_value = 2, n_sim = 100,
                    R2var = "HID", R2level = 35,
                    SESOI = SESOI, databased = F) # include SESOI

##         100  200  407  mode         effect
## 1 0.1958763 0.28 0.59 SESOI            Age
## 2 0.4432990 0.71 0.96 SESOI   LengthofStay
## 3 1.0000000 1.00 1.00 SESOI    FamilyHxyes
## 4 0.5979381 0.83 1.00 SESOI            IL6
## 5 0.1649485 0.26 0.40 SESOI            CRP
## 6 0.3195876 0.60 0.85 SESOI  CancerStageII
## 7 0.9587629 1.00 1.00 SESOI CancerStageIII
## 8 1.0000000 1.00 1.00 SESOI  CancerStageIV
## 9 0.4639175 0.71 0.99 SESOI     Experience

Using the plotting function (multiplotPower()) included in mixedpower, we can then visualize the results. Here, we plot both the databased and SESOI analysis with R2power().

References

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). Fitting Linear Mixed-Effects Models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01

Green, P., & Macleod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493-498. https://doi.org/10.1111/2041-210X.12504

Kumle, L., Vo, M. L-H., & Draschkow, D. (2018). Mixedpower: a library for estimating simulation-based power for mixed models in R. https://doi.org/10.5281/zenodo.1341047

Kumle, L., Võ, M.LH. & Draschkow, D. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behav Res (2021). https://doi.org/10.3758/s13428-021-01546-0