Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
1 (11/11/2019)
Installing G*Power
Types of tests
T-tests
Correlation
Analysis of Variance (ANOVA)
References
There are two important concepts here: alpha and beta. Alpha is the probability of concluding
there is an effect when there is not one (type I error). This is normally set at .05 (5%) and it is
the threshold we look at for a significant effect. Setting alpha to .05 means we are willing to
make a type I error 5% of the time in the long run. Beta is the probability of concluding there is
not an effect when there really is one (type II error). This is normally set at .2 (20%), which
Version 1.1 (11/11/2019)
means we are willing to make a type II error 20% of the time in the long run. These values are
commonly used in psychology, but you could change them. However, both values should ideally
decrease rather than increase the number of errors you are willing to accept.
Power is the ability to detect an effect if there is one there to be found. Or in other words “if an
effect is a certain size, how likely are we to find it?” (Baguley, 2004; 73). Power relates to beta,
as power is 1 - beta. Therefore, if we set beta to .2, we can expect to detect a particular effect
size 80% of the time if we repeated the procedure over and over. Carefully designing an
experiment in advance allows you to control the type I and type II error rate you would expect in
the long run. However, studies have shown that these two concepts are not given much thought
when designing experiments.
The implications of low power is a waste of resources and a lack of progress. A study that is not
sensitive to detect the effect of interest will just produce a non-significant finding more often than
not. However, this does not mean there is no effect, just that your test was not sensitive enough
to detect it. One analogy (paraphrased from this lecture by Richard Morey) to help understand
this is trying to tell two pictures apart that are very blurry. You can try and squint, but you just
cannot make out the details to compare them with any certainty. This is like trying to find a
significant difference between groups in an underpowered study. There might be a difference,
but you just do not have the sensitivity to differentiate the groups. In order to design an
experiment to be informative, it should be sufficiently powered to detect effects which you think
are practically interesting (Morey & Lakens, 2016). This is sometimes called the smallest effect
size of interest. Your test should be sensitive enough to avoid missing any values that you
would find practically or theoretically interesting. Fortunately, there is a way to calculate how
many participants are required to provide a sufficient level of power known as power analysis.
In the simplest case, there is a direct relationship between statistical power, the effect you are
interested in, alpha, and the sample size. This means that if you know three of these values,
Version 1.1 (11/11/2019)
you can calculate the fourth. For more complicated types of analyses, you need some additional
parameters, but we will tackle this as we come to it. The most common types of power analysis
nd sensitivity.
are a priori a
An a priori power analysis tells you how many participants are required to detect a given effect
size. A sensitivity power analysis tells you what effect sizes your sample size is sensitive to
detect. Both of these types of power analysis can be important for designing a study and
interpreting the results. If you need to calculate how many participants are required to detect a
ower analysis. If you know how many participants you
given effect, you can perform an a priori p
have (for example you may have a limited population or did not conduct an a priori power
analysis), you can perform a sensitivity power analysis to calculate which effect sizes your study
is sensitive enough to detect.
Another type of power analysis you might come across is post-hoc. This provides you with the
observed power given the sample size, effect size, and alpha. You can actually get SPSS to
provide this in the output. However, this type of power analysis is not recommended as it fails to
consider the long run aspect of these statistics. There is no probability attached to individual
studies. There is either an effect observed (significant p value), or there is not an effect
observed (non-significant p value). I highly recommend ignoring this type of power analysis and
focusing on a priori or sensitivity power analyses.
For this guide, we are going to look at how you can use G*Power (Faul, Erdfelder, Buchner, &
Lang, 2009) to estimate the sample size you need to detect the effect you are interested in, and
the considerations you need to make when designing an experiment.
Installing G*Power
G*Power is a free piece of software developed at Universität Düsseldorf in Germany.
Unfortunately it is no longer in development, with the last update being in July 2017. Therefore,
the aim of this guide is to help you navigate using G*Power as it is not the most user friendly
programme. You can download G*Power on this page. Under the heading “download” click on
the appropriate version for whether you have a Windows or Mac computer. Follow the
installation instructions and open it up when it has finished installing.
Version 1.1 (11/11/2019)
Types of tests
T-tests
To start off, we will look at the simplest example in t-tests. We will look at how you can calculate
power for an independent samples and paired samples t-test.
We are going to begin by seeing how you can calculate power a priori for an independent
samples t-test. First, we will explore what each section of this window does.
● Test family - To select the family of test such as t tests, F tests (ANOVA), or 𝜒2. We need
the default t tests for this example, so keep it as it is.
● Statistical test - To select the specific type of test. Within each family, there are several
different types of test. For the t-test, you can have two groups, matched pairs, and
several others. For this example, we need two groups.
Version 1.1 (11/11/2019)
The most difficult part in calculating the required sample size is deciding on an effect size. The
end of this guide is dedicated to helping you think about or calculate the effect size needed to
power your own studies. When you are less certain of the effects you are anticipating, you can
use general guidelines. For example, Cohen’s (1988) guidelines (e.g. small: Cohen’s d = 0.2,
medium: Cohen’s d = 0.5, large: Cohen’s d = 0.8) are still very popular. Other studies have tried
estimating the kind of effects that can be expected from particular fields. For this example, we
will use Richard, Bond, & Stokes-Zoota (2003) who conducted a gargantuan meta-analysis of
25,000 studies from different areas of social psychology. They wanted to quantitatively describe
the last century of research and found that across all studies, the average standardised effect
size was d = 0.43. We can use this as a rough guide to how many participants we would need to
detect an effect of this size.
We can plug these numbers into G*Power and select the following parameters: tail(s) = two,
effect size d = 0.43, α err prob = .05, Power (1 - β err prob) = 0.8, and Allocation ratio N2 / N1 =
1. You should get the following window:
Version 1.1 (11/11/2019)
This tells us that to detect the average effect size in social psychology, we would need two
groups of 86 participants (N = 172) to achieve 80% power in a two-tailed test. This is a much
bigger sample size than what you would normally find for the average t-test reported in a journal
article. This would be great if you had lots of resources, but as a psychology student, you may
not have the time to collect this amount of data. For modules that require you to conduct a small
research project, follow the sample size guidelines in the module, but think about what sample
size you would need if you were to conduct the study full scale and incorporate it into your
discussion.
Now that we have explored how many participants we would need to detect the average effect
size in social psychology, we can tinker with the parameters to see how the number of
participants changes. This is why it is so important to perform a power analysis before you start
collecting data, as you can explore how changing the parameters impacts the number of
participants you need. This allows you to be pragmatic and save resources where possible.
Version 1.1 (11/11/2019)
● Tail(s) - if you change the number of tails to one, this decreases the number of
participants in each group from 86 to 68. This saves a total of 36 participants. If your
experiment takes 30 minutes, that is saving you 18 hours worth of work while still
providing your experiment with sufficient power. However, using one-tailed tests can be
a contentious area. See Ruxton & Neuhäuser (2010) for an overview of when you can
justify using one-tailed tests.
● α err prob - setting alpha to .05 says in the long run, we want to limit the amount of type I
errors we make to 5%. Some suggest this is too high, and we should use a more
stringent error rate. If you change α err prob to .01, we would need 128 participants in
each group, 84 more participants than our first estimate (42 more hours of data
collection).
● Power (1 - β err prob) - this is where we specify the amount of type II errors we are
willing to make in the long run. This also has a conventional level of .80. There are also
calls for studies to be designed with a lower type II error rate by increasing power to .90.
This has a similar effect to lowering alpha. If we raise Power (1 - β err prob) to .90, we
would need 115 participants in each group, 58 more than our first estimate (29 more
hours of data collection).
“In order to detect an effect size of Cohen’s d = 0.43 with 80% power (alpha = .05,
two-tailed), G*Power suggests we would need 86 participants per groups (N = 172) in an
independent samples t-test”.
This provides the reader with all the information they would need in order to reproduce the
power analysis, and ensure you have calculated it accurately.
would be too small for you to reliably detect. If you change type of power analysis to sensitivity,
you will get the following screen with slightly different input parameters:
All of these parameters should look familiar apart from Sample size group 1 and 2, and effect
size d is now under Output Parameters. Imagine we had finished collecting data and we knew
we had 40 participants in each group. If we enter 40 for both group 1 and 2, and enter the
standard details for alpha (.05), power (.80), and tails (two), we get the following output:
Version 1.1 (11/11/2019)
This tells us that the study is sensitive to detect effect sizes of d = 0.63 with 80% power. This
helps us to interpret the results sensibly if your result was not significant. If you did not plan with
power in mind, you can see what effect sizes your study is sensitive to detect. We would not
have enough power to reliably detect effects smaller than d = 0.63 with this number of
participants. It is important to highlight here that power exists along a curve. We have 80%
power to detect effects of d = 0.63, but we have 90% power to detect effects of approximately d
= 0.73 or 50% power to detect effects of around d = 0.45. This can be seen in the following
figure which you can create in G*Power using the X-Y plot for a range of values button:
Version 1.1 (11/11/2019)
This could also be done for an a priori power analysis, where you see the power curve for the
number of participants rather than effect sizes. This is why it is so important you select your
smallest effect size of interest when planning a study, as it will have greater power to detect
larger effects, but power decreases if the effects are smaller than anticipated.
“An independent samples t-test with 40 participants per groups (N = 80) would be
sensitive to effects of Cohen’s d = 0.63 with 80% power (alpha = .05, two-tailed). This
means the study would not be able to reliably detect effects smaller than Cohen’s d =
0.63”.
This provides the reader with all the information they would need in order to reproduce the
sensitivity power analysis, and ensure you have calculated it accurately.
Now this is even simpler than when we wanted to conduct a power analysis for an independent
samples t-test. We only have four parameters as we do not need to specify the allocation ratio.
As it is a paired samples t-test, every participant must contribute a value for each condition. If
we repeat the parameters from before and expect an effect size of d = 0.43 (here it is called dz
for the within-subjects version of Cohen’s d), your window should look like this:
Version 1.1 (11/11/2019)
This suggests we would need 45 participants to achieve 80% power using a two-tailed test. This
is 127 participants fewer than our first estimate (saving approximately 64 hours of data
collection). This is a very important lesson. Using a within-subjects design will always save you
participants for the simple reason that instead of every participant contributing one value, they
are contributing two values. Therefore, it approximately halves the sample size you need to
detect the same effect size (I recommend Daniël Laken’s blog post to learn more). When you
are designing a study, think about whether you could convert the design to within-subjects to
make it more efficient.
Version 1.1 (11/11/2019)
“In order to detect an effect size of Cohen’s d = 0.43 with 80% power (alpha = .05,
two-tailed), G*Power suggests we would need 45 participants in a paired samples t-test”.
This provides the reader with all the information they would need in order to reproduce the
power analysis, and ensure you have calculated it accurately.
This shows that the design would be sensitive to detect an effect size of d = 0.53 with 30
participants. Remember power exists along a curve, as we would have more power for larger
effects, and lower power for smaller effects. Plot the curve using X-Y plot if you are interested.
Correlation
The next simplest type of statistical test is the ability to detect a correlation between two
variables.
Some of the input parameters are the same as we have seen previously, but we have two new
options:
● Correlation ρ H1 - This refers to the correlation you are interested in detecting. In the
case of correlation, this is your smallest effect size of interest.
● Correlation ρ H0 - This refers to the null hypothesis. In most statistical software, this is
assumed to be 0 as you want to test if the correlation is significantly different from 0, i.e.
no correlation. However, you could change this to any value you want to compare your
alternative correlation coefficient to.
For the first example, we will turn back to the meta-analysis by Richard, Bond, & Stokes-Zoota
(2003). The effect size can be converted between Cohen’s d and r (the correlation coefficient). If
you want to convert between different effect sizes, I recommend section 13 of this online
calculator. Therefore, the average effect size in social psychology is equivalent to r = .21. If we
wanted to detect a correlation equivalent to or larger than .21, we could enter the following
parameters: tails (two), Correlation ρ H1 (.21), α err prob (.05), Power (0.8), and Correlation ρ
H0 (0). This should produce the following window:
Version 1.1 (11/11/2019)
This suggests that we would need 175 participants to detect a correlation of .21 with 80%
power. This may seem like a lot of participants, but this is what is necessary to detect a
correlation this small. Similar to the t-test, we can play around with the parameters to see how it
changes how many participants are required.
● Tail(s) - for a two-tailed correlation, we are interested in whether the correlation is
equivalent to or larger than ±.21. However, we may have good reason to expect that the
correlation is going to be positive, and it would be a better idea to use a one-tailed test.
Now we would only need 138 participants to detect a correlation of .21, which would be
37 participants fewer saving 19 hours of data collection.
● Power (1 - β err prob) - Perhaps we do not want to miss out on detecting the correlation
20% of the time in the long run, and wanted to conduct the test with greater sensitivity.
We would need 59 more participants (30 more hours of data collection) for a total of 234
participants to detect the correlation with 90% power (two-sided).
“In order to detect a Pearson’s correlation coefficient of r = .21 with 80% power (alpha =
.05, two-tailed), G*Power suggests we would need 175 participants”.
Correlations (sensitivity)
Like t-tests, if we know how many participants we have access to, we can see what effects our
design is sensitive enough to detect. In many neuroimaging studies, researchers will look at the
correlation between a demographic characteristic (e.g. age or number of cigarettes smoked per
day) and the amount of activation in a region of the brain. Neuroimaging studies are typically
very small as they are expensive to run, so you often find sample sizes of only 20 participants. If
we specify tails (two), alpha (.05), power (.80), and sample size (20), you should get the
following window:
Version 1.1 (11/11/2019)
This shows that with 20 participants, we would only have 80% power to detect correlations of r =
.58 in the long run. We would only have enough power to detect a large correlation by Cohen’s
guidelines. Note there is a new option here called Effect direction. This does not change the
size of the effect, but converts it to a positive or negative correlation depending on whether you
expect it to be bigger or smaller than 0.
Most of the input parameters are the same as what we have dealt with for t-tests and
correlation. However, we have a different effect size (Cohen’s f) to think about, and we need to
Version 1.1 (11/11/2019)
specify the number of groups we are interested in sampling which will normally be three or
more.
ANOVA is an omnibus test that compares the means across three or more groups. This means
Cohen’s d would not be informative as it describes the standardised mean difference between
two groups. In order to describe the average effect across many groups, there is Cohen’s f.
Cohen (1988) provided guidelines for this effect size too, with values of .10 (small), .25
(medium), and .40 (large). However, this effect size is not normally reported in journal articles or
produced by statistics software. In its place, we normally see partial eta-squared (𝜼2p) which
describes the percentage of variance explained by the independent variable when the other
variables are partialed out. In order words, it isolates the effect of that particular independent
variable. When there is only one IV, 𝜼2p will provide the same result as eta-squared (𝜼2).
Fortunately, G*Power can convert from 𝜼2p to Cohen’s f in order to calculate the sample size.
With many effect sizes, you can convert one to the other. For example, you can convert
between r and Cohen’s d, and useful to us here, you can convert between Cohen’s d and 𝜼2p. In
order to convert the different effect sizes, there is section 13 of this handy online calculator. A
typical effect size in psychology is 𝜼2p = .04 which equates to Cohen’s d = 0.40. In order to use
𝜼2p in G*Power, we need to convert it to Cohen’s f. Next to Effect size f, there is a button called
Determine which will open a new tab next to the main window. From select procedure, specify
Effect size from variance, and then click Direct. Here is where you specify the 𝜼2p you are
powering the experiment for. Enter .04, and you should have a screen that looks like this:
If you click Calculate and transfer to main window, it will input the Cohen’s f value for you in the
main G*Power window. Finally, input alpha (.05), power (.80), and groups (3), and you should
get the following output:
Version 1.1 (11/11/2019)
This shows us that we would need 237 participants split across three groups in order to power
the effect at 80%. G*Power assumes you are going to recruit equal sample sizes which would
require 79 participants in each group. we can play around with some of the parameters to see
how it changes how many participants are required.
● Alpha - If we wanted to make fewer type I errors in the long-run, we could select a more
stringent alpha level of .01. We would now need 339 participants (113 per group) to
detect the effect with 80% power. This means 102 participants more, which would take
51 more hours of data collection.
● Power (1 - β err prob) - Perhaps we do not want to miss out on detecting the effect 20%
of the time in the long run, and wanted to conduct the test with greater sensitivity. We
would need 72 more participants (36 more hours of data collection) for a total of 309
participants to detect the effect with 90% power.
Version 1.1 (11/11/2019)
“In order to detect an effect of 𝜼2p = .04 with 80% power in a one-way between-subjects
ANOVA (three groups, alpha = .05), G*Power suggests we would need 79 participants in
each group (N = 237)”.
This shows us that we have 80% power to detect effect sizes of Cohen’s f = 0.40. This equates
to a large effect, and we can convert it to 𝜼2p using the online calculator. This is equivalent to an
effect of 𝜼2p = .14. As a reminder, power exists along a curve. Cohen’s f = 0.40 is the smallest
effect size we can detect reliably at 80% power. However, we would have greater power to
detect larger effects, and lower power to detect smaller effects. It is all about what effect sizes
you do not want to miss out on. The power curve for 72 participants and four groups looks like
this:
The first three input parameters should be familiar by now. The number of groups should be 1
as we have a fully within-subjects design. The number of measurements are the number of
conditions we have in our within-subjects IV. To keep it simple, we will work with three
conditions, so enter 3 as the number of measurements. Now we have a couple of unfamiliar
parameters.
The correlation among repeated measures is something we will not need to worry about for
most applications, but it’s important to understand why it is here in the first place. In a
within-subjects design, one of the things that affect power is how correlated the measurements
are. As the measurements come from the same people on similar conditions, they are usually
correlated. If there was 0 correlation between the conditions, the sample size calculation would
be very similar to a between-subjects design. As the correlation increases towards 1, the
sample size you would require to detect a given effect will get smaller. The option is here as
Version 1.1 (11/11/2019)
G*Power assumes the effect size (Cohen’s f) and the correlation between conditions are
separate. However, if you are using 𝜼2p from SPSS, the correlation is already factored in to the
effect size as it is based on the sum of squares. This means G*Power would provide a totally
misleading value for the required sample size. In order to tell G*Power the correlation is already
factored in to the effect size, click on options at the bottom of the window and choose which
effect size specification you want. For our purposes, we need as in SPSS. Select that option
and click OK, and you will notice that the correlation among repeated measures parameter has
disappeared. This is because we no longer need it when we use 𝜼2p from SPSS.
The second unfamiliar input parameter is the nonsphericity correction. If you have used a
within-subjects ANOVA in SPSS, you may be familiar with the assumption of sphericity. If
sphericity is violated, it can lead to a larger number of type I errors. Therefore, a nonsphericity
correction (e.g. Greenhouse-Geisser) is applied to decrease the degrees of freedom which
reduces power in order to control type I error rates. This means if we suspect the measures may
violate the sphericity assumption, we would need to factor this into the power analysis in order
to adequately power the experiment. To begin, we will leave the correction at 1 for no
correction, but later we will play around with lower values in order to explore the effect of a
nonsphericity correction on power.
For the first power analysis, we will use the same typical effect size found in psychology as the
between-subjects example. Click determine, and enter .04 for partial 𝜼2 (make sure effect size is
set to SPSS in options). Click calculate and transfer to main window to convert it to Cohen’s f.
We will keep alpha (.05) and power (.80) at their conventional levels. Click calculate to get the
following window:
Version 1.1 (11/11/2019)
This shows that we would need 119 participants to complete three conditions for 80% power. If
we compare this to the sample size required for the same effect size in a between-subjects
design, we would need 118 fewer participants than the 237 participants before. This would save
59 hours worth of data collection. This should act as a periodic reminder that within-subjects
designs are more powerful than between-subjects designs.
Now it is time to play around with the parameters to see how it affects power.
● Number of groups - One of the interesting things you will find is if we recalculate this for
four conditions instead of three, we actually need fewer participants. We would need 90
participants to detect this effect across four conditions for 80% power. This is because
each participant is contributing more measurements, so the total number of observations
increases.
Version 1.1 (11/11/2019)
● Nonsphericity correction - Going back to three conditions, we can see the effect of a
more stringent nonsphericity correction by decreasing the parameter. If we have three
conditions, this can range from 0.5 to 1, with 0.5 being the most stringent correction (for
a different number of conditions, the smallest lower bound can be calculated by 1 / m - 1,
where m is the number of conditions. So for four conditions, it would be 1 / 3 = 0.33, but
we would use 0.34 as it must be bigger than the lower bound). If we selected 0.5, we
would need 192 participants to detect the effect size across three conditions. This is 73
more participants (37 more hours of data collection) than if we were to assume we do
not need to correct for nonsphericity. You might be wondering how you select the value
for nonsphericity correction. Hobson and Bishop (2016) have a supplementary section of
their article dedicated to their power analysis. This is a helpful source for seeing how a
power analysis is reported in a real study, and they choose the most stringent
nonsphericity correction. This means they are less likely to commit a type II error as they
may be overestimating the power they need, but this may not be feasible if you have less
resources. A good strategy is exploring different values and thinking about the maximum
number of participants you can recruit in the time and resources you have available.
“In order to detect an effect of partial eta squared = .04 with 80% power in a one-way
within-subjects ANOVA (three groups, alpha = .05, non-sphericity correction = 1),
G*Power suggests we would need 119 participants”.
The final thing to cover is to explore how sensitive a within-subjects design would be once we
know the sample size we are dealing with. Change type of power analysis to sensitivity. If we
did not conduct an a priori power analysis, but ended up with 61 participants and three
conditions, we would want to know what effect sizes we can reliably detect. If we retain the
same settings, and include 61 as the total sample size, we get the following output once we
click calculate:
Version 1.1 (11/11/2019)
This shows us that we would have 80% power to detect effect sizes of Cohen’s f = .29. This
corresponds to 𝜼2p = .08 or a medium effect size.
Alternatively, another way to efficiently work out the sample size is to check on your results as
you are collecting data. You might not have a fully informed idea of the effect size you are
expecting, or you may want to stop the study half way through if you already have convincing
evidence. However, this must be done extremely carefully. If you keep collecting data and
testing to see whether your results are significant, this drastically increases the type I error rate
(Simons, Nelson, and Simonsohn, 2011). If you check enough times, your study will eventually
produce a significant p value by chance even if the null hypothesis was really true. In order to
check your results before collecting more data, you need to perform a process called sequential
analysis (Lakens, 2014). This means that you can check the results intermittently, but for each
time you check the results you must perform a type I error correction. This works like a
Bonferroni correction for pairwise comparisons. For one method of sequential analysis, if you
check the data twice, your alpha would be .029 instead of .05 in order to control the increase in
type I error rate. This means for both the first and second look at the data, you would use an
alpha value of .029 instead of .05. See Lakens (2014) for an overview of this process.
Version 1.1 (11/11/2019)
For the first example, we will recalculate the effect size from Diemand-Yauman et al. (2011) as
they report a t-test with Cohen’s d, so we can see how well the recalculated effect size fits in
with what they have reported. On page three of their article, there is the following sentence: “An
independent samples t-test revealed that this trend was statistically significant (t(220) = 3.38, p
< .001, Cohen’s d = 0.45)”. From the methods, we know there are 222 participants, and the t
statistic equals 3.38. We can use this information in the online app to calculate the effect size.
Select the independent samples t-test tab, and enter 222 into total N and 3.38 into t Value. If
you click calculate, you should get the following output:
This shows that the recalculated effect size is nice and consistent. The value is 0.45 in both the
article and our recalculation. This window shows the range of information that you can enter to
Version 1.1 (11/11/2019)
recalculate the effect size. The minimum information you need is the total N and t Value, or you
will get an error message.
If you needed to recalculate the effect size for a paired samples t-test, then the options look very
similar. You only have one sample size to think about, as each participant will complete both
conditions. Therefore, we will move on to recalculating an effect size for ANOVA. Please note,
this calculator only works for between-subjects ANOVA, including main effects and interactions.
If you need the effect size for a one-way within-subjects ANOVA or a factorial mixed ANOVA,
then you would need the full SPSS output, which is not likely to be included in an article. If it is
available, you can use this spreadsheet by Daniël Lakens to calculate the effect size, but it’s
quite a lengthy process.
For the next example, we will use a one-way ANOVA reported in James et al. (2015). On page
1210, there is the following sentence: “there was a significant difference between groups in
overall intrusion frequency in daily life, F(3, 68) = 3.80, p = .01, η2p =.14”. If we click on the F
tests tab of the online calculator, we can enter 3.80 for the F statistic, 3 for treatment degrees of
freedom, and 68 for residual degrees of freedom. You should get the following output:
As the standard deviation decreases, it makes it easier to detect the same difference between
the two groups indicated by the decreasing red shaded area. For an overview of how
measurement error can impact psychological research, see Schmidt & Hunter (1996).
If you are using a cognitive task, another way to decrease the variability is to increase the
number of trials the participant completes (see Baker, 2019). The idea behind this is
experiments may have high within-subject variance, or the variance of the condition is high for
each participant. One way to decrease this is to increase the number of observations per
condition, as it increases the precision of the estimate in each participant. Therefore, if you are
limited in the number of participants you can collect, an alternative would be to make each
participant complete a larger number of trials.
Version 1.1 (11/11/2019)
References
Baguley, T. (2004). Understanding statistical power in the context of applied research. Applied
Baker, D. H., Vilidaite, G., Lygo, F. A., Smith, A. K., Flack, T. R., Gouws, A. D., & Andrews, T. J.
(2019, July 3). Power contours: Optimising sample size and precision in experimental
Bakker, M., Hartgerink, C. H. J., Wicherts, J. M., & van der Maas, H. L. J. (2016). Researchers’
1069–1077. https://doi.org/10.1177/0956797616647519
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., &
Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of
https://doi.org/10.1038/nrn3475
https://doi.org/10.1037/h0045186
Cohen, J. (1988) Statistical Power Analysis for the Behavioural Sciences. New Jersey:
Cohen, J. (1994). ‘The Earth Is Round (P < .05)’. American Psychologist, 49, (12), 997–1003
Diemand-Yauman, C., Oppenheimer, D. M., & Vaughan, E. B. (2011). Fortune favors the ():
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using
Version 1.1 (11/11/2019)
G*Power 3.1: Tests for correlation and regression analyses. Behavior Research
Hobson, H. M., & Bishop, D. V. (2016). Mu suppression–a good measure of the human mirror
James, E. L., Bonsall, M. B., Hoppitt, L., Tunbridge, E. M., Geddes, J. R., Milton, A. L., &
1201-1215.
Lakens, D., & Caldwell, A. R. (2019, May 28). Simulation-Based Power-Analysis for Factorial
Morey, R. D., & Lakens, D. (2016). Why most of psychology is statistically unfalsiable.
Retrieved from
https://github.com/richarddmorey/psychology_resolution/blob/master/paper/response.pdf
Neyman, J. (1977). Frequentist Probability and Frequentist Statistics. Synthese, 36( 1), 97–131.
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One Hundred Years of Social
https://doi.org/10.1037/1089-2680.7.4.331
Ruxton, G. D., & Neuhäuser, M. (2010). When should we use one-tailed hypothesis testing?
https://doi.org/10.1111/j.2041-210X.2010.00014.x
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons
https://doi.org/10.1037/1082-989X.1.2.199
Sedlmeier, P., & Gigerenzer, G. (1989). Do Studies of Statistical Power Have an Effect on the
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed