Simr Power Analysis For GLMMs
Simr Power Analysis For GLMMs
Simr Power Analysis For GLMMs
APPLICATION
simr: an R package for power analysis of generalised linear mixed
models by simulation
Peter Green1*, Catriona J. MacLeod1
1
Summary
1. The R package simr allows users to calculate power for generalised linear mixed models
from the lme4 package. The power calculations are based on Monte Carlo simulations.
2. It includes tools for (a) running a power analysis given a model and design; and (b)
calculating power curves to assess trade-offs between power and sample size.
3. This paper presents a tutorial using a simple example of count data with mixed effects
(with structure representative of environmental monitoring data) to guide the user along a
gentle learning curve, adding only a few commands or options at a time.
Key-words: experimental design, sample size, glmm, random effects, Monte Carlo,
type II error.
install.packages("devtools")
library(devtools)
install_github("pitakakariki/simr@mee_v3")
library(simr)
Introduction
The power of a hypothesis test is defined as the probability that the test will reject the null
12
hypothesis, assuming that the null hypothesis is false. Put another way, if an effect is real,
13
what is the probability that an analysis will judge that the effect is statistically significant?
14
If a study is underpowered, resources might be wasted and real effects might be missed
15
(Legg & Nagy 2006; Field et al. 2007). On the other hand, a large study might be
16
overpowered and so be more expensive than is necessary (Johnson et al. 2015). Therefore it
17
is good practice to perform a power analysis before collecting data, to ensure that the sample
18
has the appropriate size to answer whatever research question is being considered.
19
Generalised linear mixed models (GLMMs) are important in ecology, allowing the
20
analysis of counts and proportions as well as continuous data (Bolker et al. 2009), and
21
controlling for spatial pseudoreplication (Raudenbush & Liu 2000; Rhodes & Jonzn 2001).
22
Monte Carlo simulation is a flexible and accurate method appropriate for realistic
23
ecological study designs (Bolker 2008; Johnson 2015). There are some cases where we could
24
use analytical formulas to calculate the power, but these will usually be an approximation or
25
require a special form for the design (Arnold et al. 2011). Simulation is a single method
26
applicable across a wide range of models and methods. Even when formulae are available for
27
a particular model and design, locating and applying the appropriate formula might be
28
29
30
setting up a simulation experiment could be too complicated (see e.g. Bolker 2008, Chapter
31
5). Even for someone experienced in R, the time taken setting up the analysis might be better
32
34
There are a range of R packages (see Figure 1) currently available for power analysis
35
(Martin 2012; Donohue & Edland 2013; Reich et al. 2012; Galecki & Burzykowski 2013).
36
However there are none that handle both non-normal response variables and a wide range of
37
fixed and random effect specifications (Johnson et al. 2015). simr is a power analysis
38
package for R, designed to interoperate with the lme4 package for GLMMs (Bates et al.
39
2014).
40
simr is designed to work with any linear mixed model or generalised linear mixed model
41
that can be fit with either lmer or glmer from lme4. This allows for a wide range of
42
models with different fixed and random effect specifications. Linear models and generalised
43
linear models using lm and glm in base R are also supported, to allow for models with no
44
random effects.
45
There are a range of tests for GLMMs, which can be fast but approximate, or slow but
46
accurate (Bolker et al. 2009; Verbeke & Molenberghs 2009). This package has interfaces to a
47
large number of tests for single or multiple fixed or random effects (see Appendix 1), both
48
from lme4 and from external packages (Halekoh & Hjsgaard 2014; Scheipl et al. 2008).
49
A power analysis in simr starts with a model fitted in lme4. This will typically be based
50
on an analysis of data from a pilot study, but more advanced users can create artificial pilot
51
data from scratch (see Appendix 2). This design allows for a gentle learning curve for any
52
53
In simr, power is calculated by repeating the following three steps: (1) simulate new
54
values for the response variable using the model provided; (2) refit the model to the simulated
55
response; (3) apply a statistical test to the simulated fit. In this setup the tested effect is
known to exist, and so every positive test is a true positive and every negative test is a Type II
57
error. The power of the test can be calculated from the number of successes and failures at
58
59
Tutorial
60
This tutorial illustrates some of the functions available within the simr package. Our goal
61
is to provide a gentle learning curve by guiding the user through increasingly complex
62
63
The tutorial uses the example dataset which is included in the package. The dataset is
64
65
abundance) measured at ten levels of the continuous fixed effect variable x (e.g. study year)
66
for three groups g (e.g. study site). There is also a continuous response variable y, which is
67
68
FITTING A MODEL
69
We start by fitting a very simple Poisson mixed effects model in lme4 to the example
70
dataset. In this case we have a random intercept model, where each group (g) has its own
71
72
library(simr)
73
74
summary(model1)
75
## <snip>
76
## Fixed effects:
77
##
78
## (Intercept)
79
## x
0.27173
-0.11481
0.03955
0.0037 **
This tutorial focusses on inference about the trend in x. In this case, the estimated effect
size for x is -0.11, which is significant at the 0.01 level using the default z-test.
82
Note that we have deliberately used a very simple model to make this tutorial easy to
83
follow. A proper analysis would, for example, have a larger number of groups, and would
84
consider problems such as overdispersion. Although a simple model is used for this tutorial,
85
86
87
Suppose that we wanted to replicate this study. If the effect is real, would we have enough
88
89
90
Before starting a power analysis, it is important to consider what sort of effect size you are
91
interested in. Power generally increases with effect size, with larger effects being easier to
92
detect. Retrospective observed power calculations, where the target effect size comes from
93
94
For this example we will consider the power to detect a slope of -0.05. The fixed effects
95
within the fitted glmer model can be accessed with the lme4 function fixef. The simr
96
function fixef<- can then be used to change the size of the fixed effect. The size of the
97
fixed effect for the variable x can be changed from -0.11 to -0.05 as follows:
98
fixef(model1)["x"]
99
##
100
## -0.1148147
101
102
In this tutorial we only change the fixed slope for the variable x. However, we could also
104
change the random effect parameters or the residual variance (for models where that is
105
appropriate). See the online help entry ?modify for more details.
106
107
Once the model and effect size have been specified, a power analysis is very easy in
108
simr. Since these calculations are based on Monte Carlo simulations, your results may be
109
slightly different. If you want to get the same results as the tutorial, you can use
110
set.seed(123).
111
powerSim(model1)
112
113
114
##
115
## Test: z-test
116
##
117
##
118
119
120
##
121
## Time elapsed: 0 h 3 m 6 s 2
122
The power to reject the null hypothesis of zero trend in x is about 33%, given this
123
particular setup. This would almost always be considered insufficient; traditionally 80%
124
power is considered adequate (although this arbitrary threshold is not always appropriate
125
Of the 1000 simulations, 5 produced warnings. In this case the random number generator has thrown up a
handful of simulations where the data were a poor fit for the model. Since this only occurred in a very small
proportion of cases it is not a cause for concern. To read these warnings, we can use
lastResult()$warnings (see Appendix 1).
2
These timings come from a laptop computer with an Intel Core i5-2520M CPU @ 2.50GHz and 4GB
RAM.
In practice, the z-test might not be suitable for such a small example (Bolker et al. 2009).
127
A parametric bootstrap test (e.g. Halekoh & Hjsgaard 2014) might be preferred for the final
128
analysis. However, the faster z-test is more suitable for learning to use the package and for
129
initial exploratory work during a power analysis. For examples of different test specifications,
130
131
132
In the first example, estimated power was low. A small pilot study often will not have
133
enough power to detect a small effect, but a larger study might. In simr, the extend
134
135
The pilot study had observations at 10 values of x, representing for example study years 1
136
through 10. In this step we will calculate the effect of increasing this to 20 years.
137
138
powerSim(model2)
139
140
141
##
142
## Test: z-test
143
##
144
##
145
146
147
##
148
## Time elapsed: 0 h 3 m 37 s
149
The along argument specifies which variable is being extended, and n specifies how
150
many levels to replace it with. The extended model2 will now have x values from 1 to 20, in
151
153
effect of size -0.05. In fact, the study might be overpowered with that sample size.
154
155
When data collection is costly, the user might want to collect only as much data as are
156
needed to achieve a certain level of statistical power. The powerCurve function in simr
157
158
159
In the previous example, we found very high power when observations were taken at 20
160
values of the variable x. Could we reduce that number while keeping our power above the
161
162
163
print(pc2)
164
165
## by largest value of x:
166
##
3:
5.70% ( 4.35,
7.32) - 9 rows
167
##
5:
7.40% ( 5.85,
9.20) - 15 rows
168
##
169
##
170
##
171
##
172
##
173
##
174
##
175
##
176
##
177
## Time elapsed: 0 h 20 m 24 s
178
plot(pc2)
Note that we have saved this result to the variable pc2 to match the numbering in
180
model2. Since model1 did not have sufficient power, we did not run it through
181
powerCurve. The plotted output is shown in Figure 2. We can see that the power to detect a
182
trend in x increases with sampling size. The results here were based on fitting the model to
183
10 different automatically chosen subsets. The smallest subset uses just the first 3 years (i.e. 9
184
observations), and the largest uses the all 20 hypothetical study years (i.e. 60 rows of data).
185
This analysis suggests that the study would have to run for 16 years to have 80% power to
186
187
188
It might not be feasible to increase the number of values of x observed. For example, if x
189
were study year, we might be unwilling to wait longer for our results. In this case, increasing
190
the number of study sites or the number of measurements at each site might be a better
191
option. These two analyses start back with our original model1, which had 10 study years.
192
193
We can add extra levels for g the same way we added extra values for x. For example if
194
the variable g represents our study sites, we could increase the number of sites from 3 to 15.
195
196
197
plot(pc3)
198
The main change from the previous example is that we have passed the variable g to the
199
along argument. The output for this analysis is shown in Figure 3. To reach 80% power, we
200
202
We can replace the along argument to extend and powerCurve with the within
203
argument to increase the sample size within groups. Each group has only one observation at
204
each level of x and g. We can extend this to 5 observations per site per year as follows:
205
206
207
print(pc4)
208
209
210
##
211
##
212
##
213
##
214
##
215
##
216
## Time elapsed: 0 h 11 m 35 s
217
plot(pc4)
218
Note the breaks argument to powerCurve. This overrides the default behaviour, and
219
gives us one through five observations per combination of x and g. Figure 4 shows that we
220
221
Other features
222
The powerSim function assumes a number of default settings to make it simple to use,
223
but it can be modified to meet specific needs. For example, the number of simulations
224
(nsim), the random number seed for reproducible results (seed), or the nominal confidence
225
level (alpha) can be altered. For example, by modifying the nsim argument from its
226
default setting of 1000, we can increase the precision of our power estimate by increasing the
227
number of simulations. More details can be found in the online help at ?powerSim.
Many of the model parameters can be set using the functions described in the online help
229
at ?modify. We modified a fixed effect parameter in the tutorial, but simr also has
230
functions for setting random effect variance parameters and residual variances where
231
applicable.
232
By default, simr tests the first fixed effect in a model. However, a wide range of tests can
233
be specified using the test argument, including tests for multiple fixed effects and single or
234
multiple random effects. Further examples are provided in the Test examples vignette
235
(Appendix 1), and details of the test functions available in simr are in the online help
236
at ?tests.
237
Further work
238
Version 1.0 of simr is designed for any linear or generalised linear mixed model fitted
239
using lmer or glmer in the lme4 package, and for any linear or generalised linear model
240
using lm or glm, and is focussed on calculating power for hypothesis tests. In future versions
241
we plan to:
242
243
244
245
246
Acknowledgements
247
This research was funded by New Zealands Ministry of Business, Innovation and
248
249
References
251
Arnold, B. F., Hogan, D. R., Colford, J. M., & Hubbard, A. E. (2011) Simulation methods to
252
estimate design power: an overview for applied research. BMC Medical Research
253
254
Bates, D., Maechler, M., Bolker, B. & Walker, S. (2014) lme4: Linear mixed-effects models
255
256
257
258
Bolker, B. M. (2008) Ecological models and data in R. Princeton University Press, Princeton
and Oxford.
259
Bolker, B.M., Brooks, M.E., Clark, C.J., Geange, S.W., Poulsen, J.R., Stevens, M.H.H. &
260
White, J-S.S. (2009) Generalized linear mixed models: a practical guide for ecology
261
262
263
264
265
266
267
268
Donohue, M.C. & Edland, S.D. (2013) longpower: Power and sample size calculators for
longitudinal data. R package version 1.0-11.
Field, S.A., OConnor, P.J., Tyre, A.J. & Possingham, H.P. (2007) Making monitoring
meaningful. Austral Ecology, 32, 485491.
Galecki, A. & Burzykowshi T. (2013). Linear mixed-effects models using R: A step-by-step
approach. Springer, New York.
Halekoh, U., & Hjsgaard, S. (2014). A Kenward-Roger Approximation and Parametric
269
Bootstrap Methods for Tests in Linear Mixed Models The R package pbkrtest.
270
271
272
Hoenig, J.M. & Heisey, D.M. (2001), The Abuse of Power: The Pervasive Fallacy of Power
Calculations for Data Analysis. The American Statistician, 55, 1924.
Johnson, P. C. D., Barry, S. J. E., Ferguson, H. M., Mller, P. (2015), Power analysis for
274
generalized linear mixed models in ecology and evolution. Methods in Ecology and
275
Evolution, 6: 133142.
276
277
278
Legg, C.J. & Nagy, L. (2006) Why most conservation monitoring is, but need not be, a waste
of time. Journal of Environmental Management, 78, 194199.
Martin, J. G. A., Nussey, D. H., Wilson, A. J. & Rale, D. (2011) Measuring individual
279
280
281
Martin, J. (2012) pamm: Power analysis for random effects in mixed models. R package
282
283
284
285
286
version 0.7.
R Core Team (2015) R: A language and environment for statistical computing. R
Development Core Team, Vienna, Austria.
Raudenbush, S.W. & Liu, X (2000) Statistical power and optimal design for multisite
randomised trials. Psychological Methods, 5, 199213.
287
Reich, N.G., Myers, J.A., Obeng, D., Milstone, A.M. & Perl, T.M. (2012) Empirical power
288
289
290
291
populations: how should sampling effort be allocated between space and time?
292
293
Scheipl, F., Greven, S. and Kuechenhoff, H. (2008) Size and power of tests for a zero
294
random effect variance or polynomial regression in additive and linear mixed models.
295
Sims, M., Elston, D.A., Harris, M.P. & Wanless, S. (2007) Incorporating variance uncertainty
297
298
299
300
Verbeke, G. & Molenberghs, G. (2009) Linear mixed models for longitudinal data. Springer,
New York.
301
Supporting information
302
Additional Supporting Information may be found in the online version of this article.
303
304
305
306
307
Appendix 1 The Test examples vignette, illustrating some of the range of models and
tests available in simr.
Appendix 2 The Power analysis from scratch vignette, explaining how to start a power
analysis without relying on pilot data.
Appendix 3 Additional details about the simulation process and power calculations.
308
309
Figure 1: Assessing the capabilities of R packages for power analysis of mixed effects
310
models: pamm (Martin 2012), longpower (Donohue & Edland 2013), clusterPower
311
(Reich et al. 2012), nlmeU (Galecki & Burzykowski 2013) and simr (this paper).
312
313
Figure 2: Power ( 95% CI) to detect a fixed effect with size -0.05, calculated over a
314
range of sample sizes using the powerCurve function. The number of distinct values
315
317
318
Figure 3: Power ( 95% CI) to detect a fixed effect with size -0.05, calculated over a
319
range of sample sizes using the powerCurve function. The number of levels for the
320
321
322
Figure 4: Power ( 95% CI) to detect a fixed effect with size -0.05, calculated over a
323
range of sample sizes using the powerCurve function. The number of observations at
324