5.A Practical Guide For Studying Human Behavior in The Lab
5.A Practical Guide For Studying Human Behavior in The Lab
5.A Practical Guide For Studying Human Behavior in The Lab
*Equal contributions
Contact: [email protected]
1
Brain Circuits & Behavior lab, IDIBAPS, Barcelona, Spain
2
Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole
Normale Supérieure - PSL Research University, 75005 Paris, France
3
Department of Experimental Psychology, University of Oxford, Oxford, UK
4
Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra
Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona,
Spain
5
Centre de Recerca Matemàtica, Bellaterra, Barcelona, Spain
Abstract
In the last few decades, the field of neuroscience has witnessed major technological
advances that have allowed researchers to measure and control neural activity with great
detail. Yet, behavioral experiments in humans remain an essential approach to investigate
the mysteries of the mind. Their relatively modest technological and economic requisites
make behavioral research an attractive and accessible experimental avenue for
neuroscientists with very diverse backgrounds. However, like any experimental enterprise, it
has its own inherent challenges that may pose practical hurdles, especially to less
experienced behavioral researchers. Here, we aim at providing a practical guide for a steady
walk through the workflow of a typical behavioral experiment with human subjects. This
primer concerns the design of an experimental protocol, research ethics and subject care, as
well as best practices for data collection, analysis, and sharing. The goal is to provide clear
instructions for both beginners and experienced researchers from diverse backgrounds in
planning behavioral experiments.
Introduction
We are witnessing a technological revolution in the field of neuroscience, with increasingly
large-scale neurophysiological recordings in behaving animals (Gao & Ganguli, 2015)
combined with the high-dimensional monitoring of behavior (Musall, Urai, Sussillo, &
Churchland, 2019; Pereira, Shaevitz, & Murthy, 2020) and causal interventions (Jazayeri &
Afraz, 2017) at its forefront. Yet, behavioral experiments remain an essential tool to
investigate the mysteries underlying the human mind (Niv, 2020; Read, 2015) — especially
when combined with computational modelling (Ma & Peters, 2020; Wilson & Collins, 2019) —
and constitute, compared to other approaches in neuroscience, an affordable and accessible
approach. Ultimately, measuring behavior is the most effective way to gauge the ecological
relevance of cognitive processes (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel,
2017; Niv, 2020).
Here, rather than focussing on the theory of empirical measurement, we aim at providing a
practical guide on how to overcome practical obstacles on the way to a successful
experiment. While there are many excellent textbooks focused on the theory underlying
behavioral experiments (Forstmann & Wagenmakers, 2015; Gescheider, 2013; Kingdom &
Prins, 2016; Lee & Wagenmakers, 2013), the practical know-how, which is key to
successfully implementing these empirical techniques, is mostly informally passed down from
one researcher to another. This primer attempts to capture these practicalities, and is based
on a collaborative effort to compare our individual practices. We believe that the advice
provided here is applicable to any experiment where human subjects respond, through
stereotyped behavior, to the controlled presentation of stimuli of any modality in order to
study perception, high-level cognitive functions, such as memory, reasoning and language,
motor control and beyond. These recommendations are mostly addressed to beginners and
neuroscientists who are new to behavioral experiments, but can also help experienced
researchers reflect on their daily practices. We hope that this primer nudges researchers
from a wide range of backgrounds to run human behavioral experiments.
We assume here that the reader already has a working hypothesis about a valid research
question. Developing an interesting hypothesis is the most creative part of any experimental
enterprise. How do you know you have a valid research question? Try to explain your
question and why it is important to a colleague. If you have trouble verbalizing it, go back to
the drawing board — that nobody did it before is not a valid reason in itself. Once you have
identified a scientific question and operationalized your hypothesis, the rules proposed below
are intended to lead you towards the behavioral dataset needed to test your hypothesis. We
present these rules as a sequence of steps, though some steps can be taken in parallel,
whilst others are better taken iteratively in a loop, as shown in Figure 1. To have maximal
control of the experimental process, we encourage the reader to get the full picture and
consider all the steps before starting to implement it.
Figure 1. Proposed workflow for a behavioral experiment. See main text for details of each
rule.
Rule 1. Do it
There are many reasons to choose human behavioral experiments over other experimental
techniques in neuroscience. Most importantly, analysis of human behavior is a powerful and
arguably essential means to studying the mind (Krakauer et al., 2017; Niv, 2020). In practice,
studying behavior is also one of the most affordable experimental approaches. This,
however, has not always been the case. Avant-garde psychophysical experiments dating
back to the late 40s (Koenderink, 1999), or even to the 19th century (Wontorra & Wontorra,
2011), involved expensive custom-built technology, sometimes difficult to fit in an office room
(Koenderink, 1999). Nowadays, a typical human behavioral experiment requires relatively
inexpensive equipment, a few hundred euros to compensate voluntary subjects, and a
hypothesis about how the brain processes information. Indeed, behavioral experiments on
healthy human adults are usually substantially faster and cheaper than other neuroscience
experiments, such as human neuroimaging or experiments with other animals. In addition,
ethical approval is easier to obtain (see Rule 4), since behavioral experiments are the least
invasive approach to study the computations performed by the brain, and subjects participate
voluntarily.
For cognitive studies of higher-level processes, you may have more freedom over the choice
of stimuli. For example, in a reinforcement learning task where subjects track the value
associated with certain stimuli, the choice of stimuli is completely arbitrary. However, make
sure the stimuli are effectively neutral (e.g. in a reinforcement learning framework, a green
stimulus could be associated with higher value than green) and matched at the perceptual
level (e.g. do not use a bright large red stimulus for one condition, and a dim grey small
stimulus for another).
More trials from less subjects vs. less trials from more subjects
This is a common compromise, and there is no silver bullet for it, as it depends largely on the
origin of the effect you are after. As a rule of thumb, if you are interested in studying different
strategies or other individual characteristics (e.g. (A Tversky & Kahneman, 1974)), then you
should sample the population extensively and collect data from as many subjects as possible
(Waskom et al., 2019). On the other hand, if the process of interest occurs consistently
across individuals, as it is often assumed for sensory or motor systems, then capturing
population heterogeneity might be less relevant (Read, 2015). In these cases, it can be
beneficial to use a small sample of subjects whose behavior is thoroughly assessed with
many trials (Smith & Little, 2018; Waskom et al., 2019).
Eye tracker
If eye movements can be a confound, you can control them either by design by eliminating
the incentive of moving the eyes, or through a fixation cross that minimizes eye movements
(Thaler, Schütz, Goodale, & Gegenfurtner, 2013). If you need to control eye gaze, for
example to interrupt a trial if the subject does not fixate at the right spot, use an eye tracker.
There are several affordable options, including those that you can build from scratch (Hosp et
al., 2020; Mantiuk, Kowalik, Nowosielski, & Bazyluk, 2012) that work reasonably well if
ensuring fixation is all you need (Funke et al., 2016). If your lab has an EEG setup, EOG
signals can provide a rough measure of eye movements (e.g. (Quax, Dijkstra, van Staveren,
Bosch, & van Gerven, 2019)). Recent powerful deep learning tools (Yiu et al., 2019) can also
be used to track eye movements with a camera, but some only work offline (Bellet, Bellet,
Nienborg, Hafed, & Berens, 2019; Mathis et al., 2018).
Other devices
There are plenty of “brain and body sensors' that can provide informative measures to
complement behavioral outputs. check OpenBCI for open source, low-cost products (Frey,
2016). If you need precise control over the timing of different auditory and visual events,
consider using validation measures with external devices (toolkits, such as the Black Box
Toolkit (Plant, Hammond, & Turner, 2004) can be helpful). Before buying any expensive
equipment, check if someone in your community already has the tool you need, and
importantly, if it is compatible with the rest of your toolkit, such as response devices, available
ports, eye trackers, but also software and your operating system.
Make sure that typical findings are confirmed (e.g. higher accuracy and faster reaction times
for easier trials, preference for higher rewards, etc.) and that most responses occur within the
allowed time window. Reality checks can reveal potential bugs in your code, such as
incorrectly saved data or incorrect assignment of stimuli to task conditions (Table 2), or
unexpected strategies employed by your subjects. The subjects might be using superstitious
behavior or alternative strategies that defeat the purpose of the experiment altogether (e.g.
people may close their eyes in an auditory task while you try to measure the impact of visual
distractors). In general, subjects will tend to find the path of least resistance towards the
promised reward (money, course credit, etc).
Inclusion criteria
Reality checks can form the basis of your exclusion criteria, e.g. applying cutoff thresholds
regarding the proportion of correct trials, response latencies, lapse rates, etc. Make sure your
exclusion criteria are orthogonal to your main question, i.e. that they do not produce any
systematic bias on your variable of interest . You can decide the exclusion criteria after you
collect a cohort of subjects but always make decisions about which participants (or trials) to
exclude before testing the main hypotheses in that cohort. Proceed with special caution when
defining exclusion criteria for online experiments or studies with clinical populations, where
performance is likely more heterogeneous and potentially worse, to avoid excluding more
data in these groups than what is expected after piloting in the lab. All excluded subjects
should be reported in the manuscript and, and their data shared together with the other
subjects.
Preregistration
This crisis has motivated the practice of preregistering experiments before the actual data
collection (Kupferschmidt, 2018; Lakens, 2019). In practice, this consists of a short document
that answers standardized questions about the experimental design and planned statistical
analyses. The optimal time for preregistration is once you finish tweaking your experiment
through piloting and power analysis (Rule 2-5). Preregistration may look like an extra hassle
before data collection, but it will actually often save you time: writing down explicitly all your
hypotheses, predictions and analyses is itself a good reality check, and might reveal some
inconsistencies that lead you back to amending your paradigm. More importantly, it helps to
protect from the conscious or unconscious temptation of changing the analyses or hypothesis
as you go. The text you generate at this point can be reused for the introduction and methods
sections of your manuscript. Alternatively, you can opt for registered reports, where you
submit a prototypic version of your final manuscript without the results to peer-review (D.
Stephen Lindsay, 2016). If your report survives peer-review, it is accepted in principle, which
means that whatever the outcome, the manuscript will be published (given that the study was
rigorously conducted). High-impact journals suchs as eLife, Nature Human Behavior, and
Nature Communications already accept this format.
Several databases manage and store preregistrations, such as the popular Open Science
Framework (OSF.io) or AsPredicted.org, which offers more concrete guidelines. Importantly,
these platforms keep your registration private, so there is no added risk of being scooped.
Preregistering your analyses does not mean you cannot do exploratory analyses, just that
these analyses will be explicitly marked as such. This transparency strengthens your
arguments when reviewers read your manuscript and protects you from committing scientific
misconduct involuntarily. If you do not want to register your experiment publicly, consider at
least writing a private document in which you detail your decisions before embarking on data
collection or analyses. This might be sufficient for most people to counteract the temptation of
incurring in questionable research practices.
Replication
You can also opt to replicate the important results of your study in a new cohort of subjects
(ideally two). In essence, this means that analyses run on the first cohort are exploratory,
while the same analyses run on subsequent cohorts are considered confirmatory. If you plan
to run new experiments to test predictions that emerged from your findings, include the
replication of these findings in the new experiments. For most behavioral experiments, the
cost of running new cohorts with the same paradigm is small in comparison to the great
benefit of consolidating your results. In general, unless we have a very focussed hypothesis
or have limited resources, we prefer replication over registration. First, it allows for less
constrained analyses on the original cohort data, because you don’t tie your hands until
replication. Then, by definition, replication is the ultimate remedy to the replication crisis.
Finally, you can use both approaches together and pre-register before replicating your
results.
In summary, pre-registrations (Lakens, 2019) and replications (R A Klein et al., 2018; Richard
A. Klein et al., 2014) will help to improve the standards of science, partially by protecting
against involuntary malpractices, and will greatly strengthen your results in the eyes of your
reviewers and readers. Beware however that preregistration and replication cannot replace a
solid theoretical embedding of your hypothesis (Guest & Martin, 2021; Szollosi et al., 2020).
Providing feedback
In general, we recommend giving performance feedback after each trial or block, but this
aspect can depend on the specific design. For example, whereas feedback is mandatory in
reinforcement learning paradigms, feedback can be counterproductive in other cases. First,
feedback influences the next few trials due to win-stay-lose-switch strategies (Abrahamyan,
Silva, Dakin, Carandini, & Gardner, 2016; Urai, Braun, & Donner, 2017) or other types of
superstitious behavior (Ono, 1987). Make sure this nuisance has very limited impact on your
variables of interest, unless you are precisely interested in these effects. Second, participants
can use feedback as a learning signal (Massaro, 1969) which will lead to an increase in
performance throughout the session, especially for strategy-based paradigms or paradigms
that include confidence reports (Schustek, Hyafil, & Moreno-Bote, 2019).
Save your data in a tidy table (Wickham, 2014) and store it in a software-independent format
(e.g. a .csv instead of a .mat file) which makes it easy to analyze and share (Rule 10). Don’t
be afraid of redundant variables (e.g. response identity and response accuracy); redundancy
enables robustness to correct for possible mistakes. If some modality produces continuous
output, such as pupil size or cursor position, save it in a separate file rather than creating
kafkaesque data structures. If you use an eye-tracker or neuroimaging device, make sure
you save synchronized timestamps in both data streams for later data alignment (see Table
2). If you end up changing your design after starting data collection, even for small changes,
save those version names in a lab notebook. If the lab does not use a lab notebook, start
using one (Schnell, 2015). Mark all incidents there, even those that may seem unimportant at
the moment. Back up your code and data regularly, making sure you comply with the ethics
of data handling (see also Rules 4 and 10). Finally, don’t stop data collection after the
experiment is done. At the end of the experiment, debrief your participant. Ask questions
such as “Did you see so-and-so?” or “Tell us about the strategy you used to solve part II” to
make sure the subjects understood the task (see Rule 5). It is also useful to include an
informal questionnaire at the beginning of the experiment, e.g. demographics (should you
have approval from the ethics committee).
Model fitting
There are packages or toolboxes that implement model fitting for most regression analyses
(Seabold & Perktold, 2010) and standard models of behavior, such as the Drift Diffusion
Model (Shinn, Lam, & Murray, 2020; Wiecki, Sofer, & Frank, 2013) or Reinforcement
Learning Models (e.g. (Daunizeau, Adam, & Rigoux, 2014)). For models that are not
contained in statistical packages, you can implement custom model fitting in three steps: 1)
formalize your model as a series of computational, parameterized operations that transform
your stimuli and other factors into behavioral reports (e.g. choice and/or response times).
Remember that you are describing a probabilistic model, so at least one operation must be
noisy; 2) write down the likelihood function, i.e. the probability of observing a sequence of
responses under your model, as a function of model parameters; and 3) use a maximization
procedure (e.g. function fmincon in matlab or optimize in python) to find the parameters that
maximize the likelihood of your model for each participant individually — the so called
maximum-likelihood (ML) parameters. This can also be viewed as finding the parameters that
minimize the loss function, or the model error on predicting subject behavior.
Make sure your fitting procedure captures what you expect by validating it on synthetic data,
where you know the true parameter values (Heathcote, Brown, & Wagenmakers, 2015;
Palminteri et al., 2017; Wilson & Collins, 2019). Compute uncertainty (e.g. confidence
intervals) about the model parameters using bootstrap methods (parametric bootstrapping if
you are fitting a sequential model of behavior, classical bootstrapping otherwise). Finally, you
probably want to know whether your effect is consistent across subjects, or whether the
effect differs between different populations, in which case you should compute confidence
intervals across subjects. Sometimes, subjects’ behavior differs qualitatively and cannot be
captured by the same individual model. In these cases, Bayesian model selection allows you
to accommodate the possible heterogeneity of your cohort (Rigoux, Stephan, Friston, &
Daunizeau, 2014).
● Each analysis should answer a question: keep the thread of your story in mind
and ask one question at a time.
● Think of several analyses that could falsify your current interpretation, and
only rest assured after finding a coherent picture in the cumulative evidence.
● Start by visualizing the results in different conditions using the simplest
methods (e.g. means with standard errors).
● Getting a feeling for a method means to understand its assumptions, and how
your data might violate them. Data violates assumptions in many situations,
but not always in a way that is relevant to your findings - so know your
assumptions, and don’t be a slave to the stats.
● Non-parametric methods (e.g. bootstrap, permutation tests and
cross-validation), the swiss knife of statistics, are often a useful approach
because they do not make assumptions about the distribution of the data.
● Make sure that you test for interactions when appropriate (Nieuwenhuis,
Forstmann, & Wagenmakers, 2011).
● If your evidence coherently points to a null finding, use Bayesian statistics to
see if you can formally accept it (Keysers, Gazzola, & Wagenmakers, 2020).
Model validation
After fitting your model to each participant, you should validate it by using the fitted parameter
values to simulate responses, and compare them to behavioral patterns of the participant
(Heathcote et al., 2015; Wilson & Collins, 2019). This control makes sure that the model not
only quantitatively performs well, but that it captures the qualitative effects in your data
(Palminteri et al., 2017).
Model comparison
In addition to your main hypothesis, always define one or several “null models” that
implement alternative hypotheses and compare them using model comparison techniques
(Heathcote et al., 2015; Wilson & Collins, 2019). In general, use cross-validation for model
selection, unless you have a small dataset (<100 trials (Varoquaux, 2018)), in which case it is
better to use parametric methods such as AIC/BIC; or even better, fully Bayesian methods
(Daunizeau et al., 2014). For nested models — when the complex model includes the simpler
one — you can use the likelihood-ratio test to perform significance testing.
Model prediction
Successfully predicting behavior in novel experimental data is the Holy Grail of the
epistemological process. Here, one should make predictions about the cognitive process in a
wider set of behavioral measures or conditions. For example, you might fit your model on
reaction times and use those fits to make predictions about a secondary variable (Rule 8),
such as choices or eye movements, or generate predictions from the model in another set of
experimental conditions.
Concluding remarks
Our goal here was to provide practical advice, rather than illuminating the theoretical
foundations for designing and running behavioral experiments with humans. Our
recommendations, or rules, span the whole process involved in designing and setting up an
experiment, recruiting and caring for the subjects, and recording, analyzing and sharing data.
Through the collaborative effort of collecting our personal experiences and writing them down
in this manuscript we have learned a lot. In fact, many of these rules were learned after
painfully realizing that doing the exact opposite was a mistake. We thus wrote the ‘practical
guide’ we wished we had read when we embarked on the adventure of our first behavioral
experiment. Some rules are therefore rather subjective, and might not resonate with every
reader, but we remain hopeful that most of them are helpful to overcome the practical hurdles
inherent to performing behavioral experiments with humans.
Database Type of data URL
Generic data
Dryad (o) Data from different fields of biology, including behavior datadryad.org
Google Dataset Search (o) All types of data, including behavior datasetsearch.research.google.com
Nature Scientific Data (o,p) All types of data, including behavior nature.com/sdata
OSF (o) All types of data, including behavior and neuroimaging. osf.io
Preregistration service
Human data
CamCan (c) Cognitive and neuroimaging data of subjects across adult cam-can.org
lifespan
Human Brain Project (p) Mostly human and mouse recordings, including behavior kg.ebrains.eu/search
Oasis (c) Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging oasis-brains.org
and Alzheimer’s Disease
The Healthy Brain Network Psychiatric, behavioral, cognitive, and lifestyle phenotypes, as fcon_1000.projects.nitrc.org/indi/cmi_health
(c) well as multimodal brain imaging of children and adolescents y_brain_network
(5-21)
Animal data
2) Your code breaks when a certain key is hit, or when Check which keys are assigned in your paradigm, and which
secondary external hardware (e.g. eye tracker) unexpectedly lead to the interruption of the program. Check in advance
stops sending signals. what happens if external hardware problems emerge.
Perform a crash test of your code to make sure it is resilient
to wrong keys being hit, or keys being hit at the wrong time.
3) You made an “improvement” just before the experimental Never use untested code.
session. Your code now breaks unexpectedly or doesn’t run
at all during data collection.
4) Some software sends a notification, such as software Switch off all software you don’t need, disable automatic
updates, in the middle of a session. The experiment is updates. Disable the internet connection.
interrupted, and the subject might not even notify you.
5) The randomization of stimuli or conditions is wrong, or Make sure to use a different random seed (whenever you
identical for all subjects. want your data to be independent) and save it along with the
other variables. Inspect the distribution of conditions in your
data generated by your randomization.
6) Your subject is not doing what they should and you don’t Have a control screen or a remote connection to mirror the
notice. subject’s display (e.g. with Chrome Remote Desktop), but
make sure it will not introduce delays. There, also print
ongoing performance measures.
7) You overwrite data/code from earlier sessions or subjects. Add a line of code that checks if the filename where you
This data is now lost. want to store the data does not already exist. Backup output
directory regularly through git. Alternatively or additionally,
save your data directly on dropbox, google drive or other
automatic system.
8) You save participant data with the wrong identifier and Use multiple identifiers to name a file: subject and session ID
later cannot assign it correctly. + date and time + computer ID, for example.
9) You decided at some point to adjust “constant” Define all experimental parameters at the beginning of your
experimental parameters during data collection. Now, which code, preferably in a flexible format such as a python
participants saw what? dictionary, and save them in a separate log file for each
session or include them in your table repeatedly for each
trial.
10) After data collection, you start having doubts about the Save “time stamps” in your data table for each event of a trial
timing of events, and the temporal alignment with continuous (fixation onset, stimulus onset, etc.). Make sure your first
data, possibly stored on another device (fMRI, eye tracking). event is temporally aligned to the onset of continuous data.
Table 2. Top 10 most common coding and data handling errors committed by the authors
when doing psychophysics, and how to avoid them. These are loosely sorted by type of error
(crashes, incorrect runs, saving data issues), not by frequency.
Acknowledgements
The authors thank Daniel Linares for useful comments on the manuscript. Authors are
supported by the Spanish Ministry of Economy and Competitiveness (RYC-2017-23231 to
A.H.), the “la Caixa” Banking Foundation (Ref: LCF/BQ/IN17/11620008, H.S.), and the
European Union’s Horizon 2020 Marie Skłodowska-Curie grant (Ref: 713673, H.S.). JB was
supported by the Fyssen foundation and by the Bial Foundation (Ref: 356/18). S.S-F. is
funded by Ministerio de Ciencia e Innovación (Ref: PID2019-108531GB-I00 AEI/FEDER),
AGAUR Generalitat de Catalunya (Ref: 2017 SGR 1545), and the FEDER/ERFD Operative
Programme for Catalunya 2014-2020.
Bibliography
Abrahamyan, A., Silva, L. L., Dakin, S. C., Carandini, M., & Gardner, J. L. (2016). Adaptable
https://doi.org/10.1073/pnas.1518786113
Barnes, N. (2010). Publish your computer code: it is good enough. Nature, 467( 7317), 753.
https://doi.org/10.1038/467753a
Bausell, R. B., & Li, Y.-F. (2002). Power analysis for experimental research: A practical guide
for the biological, medical and social sciences. Cambridge: Cambridge University Press.
https://doi.org/10.1017/CBO9780511541933
Bellet, M. E., Bellet, J., Nienborg, H., Hafed, Z. M., & Berens, P. (2019). Human-level
Borgo, M., Soranzo, A., & Grassi, M. (2012). Psychtoolbox: sound, keyboard and mouse. In
MATLAB for Psychologists (pp. 249–273). New York, NY: Springer New York.
https://doi.org/10.1007/978-1-4614-2197-9_10
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., &
Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of
neuroscience. Nature Reviews. Neuroscience, 14( 5), 365–376.
https://doi.org/10.1038/nrn3475
Cheadle, S., Wyart, V., Tsetsos, K., Myers, N., de Gardelle, V., Herce Castañón, S., &
Summerfield, C. (2014). Adaptive gain control during human perceptual choice. Neuron,
Crawford, J. L., Yee, D. M., Hallenbeck, H. W., Naumann, A., Shapiro, K., Thompson, R. J.,
& Braver, T. S. (2020). Dissociable effects of monetary, liquid, and social incentives on
https://doi.org/10.3389/fpsyg.2020.02212
Mechanical Turk as a tool for experimental behavioral research. Plos One, 8(3), e57410.
https://doi.org/10.1371/journal.pone.0057410
Daunizeau, J., Adam, V., & Rigoux, L. (2014). VBA: a probabilistic treatment of nonlinear
models for neurobiological and behavioural data. PLoS Computational Biology, 10(1),
e1003441. https://doi.org/10.1371/journal.pcbi.1003441
Diaz, G. (2020, April 27). Highly cited publications on vision in which authors were also
http://visionscience.com/pipermail/visionlist_visionscience.com/2020/004205.html
Difallah, D., Filatova, E., & Ipeirotis, P. (2018). Demographics and dynamics of mechanical
Search and Data Mining - WSDM ’18 (pp. 135–143). New York, New York, USA: ACM
Press. https://doi.org/10.1145/3159652.3159661
Dykstra, O. (1966). The orthogonalization of undesigned experiments. Technometrics : A
Journal of Statistics for the Physical, Chemical, and Engineering Sciences, 8( 2), 279.
https://doi.org/10.2307/1266361
Fetsch, C. R. (2016). The importance of task design and behavioral control for understanding
the neural basis of cognitive functions. Current Opinion in Neurobiology, 37, 16–22.
https://doi.org/10.1016/j.conb.2015.12.002
Fiedler, K., & Schwarz, N. (2016). Questionable Research Practices Revisited. Social
https://doi.org/10.1177/1948550615612150
Forstmann, B. U., & Wagenmakers, E.-J. (Eds.). (2015). An Introduction to Model-Based
https://doi.org/10.1007/978-1-4939-2236-9
https://doi.org/10.5220/0005954501050114
Funke, G., Greenlee, E., Carter, M., Dukes, A., Brown, R., & Menke, L. (2016). Which eye
tracker is right for your research? performance evaluation of several cost variant eye
trackers. Proceedings of the Human Factors and Ergonomics Society Annual Meeting,
Gao, P., & Ganguli, S. (2015). On simplicity and complexity in the brave new world of
https://doi.org/10.1016/j.conb.2015.04.003
Garin, O. (2014). Ceiling Effect. In A. C. Michalos (Ed.), Encyclopedia of Quality of Life and
https://doi.org/10.4324/9780203774458
Gillan, C. M., & Rutledge, R. B. (2021). Smartphones and the neuroscience of mental health.
Gleeson, P., Davison, A. P., Silver, R. A., & Ascoli, G. A. (2017). A commitment to open
https://doi.org/10.1016/j.neuron.2017.10.013
Guest, O., & Martin, A. E. (2021). How computational modeling can force theory building in
https://doi.org/10.1177/1745691620970585
Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (2015). An introduction to good practices
to Model-Based Cognitive Neuroscience (pp. 25–48). New York, NY: Springer New York.
https://doi.org/10.1007/978-1-4939-2236-9_2
Hosp, B., Eivazi, S., Maurer, M., Fuhl, W., Geisler, D., & Kasneci, E. (2020). RemoteEye: An
Methods. https://doi.org/10.3758/s13428-019-01305-2
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine,
Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable
research practices with incentives for truth telling. Psychological Science, 23(5),
524–532. https://doi.org/10.1177/0956797611430953
Kaggle. (2019). State of Data Science and Machine Learning 2019. Retrieved May 16, 2020,
from https://www.kaggle.com/kaggle-survey-2019
Kerr, N. L. (1998). HARKing: hypothesizing after the results are known. Personality and
https://doi.org/10.1207/s15327957pspr0203_4
Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis
788–799. https://doi.org/10.1038/s41593-020-0660-4
https://doi.org/10.1016/C2012-0-01278-1
Klein, R A, Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., … Nosek, B.
https://doi.org/10.1177/2515245918810225
Klein, Richard A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., …
142–152. https://doi.org/10.1027/1864-9335/a000178
Knoblauch, K., & Maloney, L. T. (2012). Modeling psychophysical data in R. New York, NY:
https://doi.org/10.1068/p2806ed
Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017).
https://doi.org/10.1016/j.neuron.2016.12.041
Kupferschmidt, K. (2018). More and more scientists are preregistering their studies. Should
you? Science. https://doi.org/10.1126/science.aav4786
Kvarven, A., Strømland, E., & Johannesson, M. (2020). Comparing meta-analyses and
423–434. https://doi.org/10.1038/s41562-019-0787-z
analysis. https://doi.org/10.31234/osf.io/jbh4w
Lange, K., Kühn, S., & Filevich, E. (2015). “just another tool for online studies” (JATOS): an
easy solution for setup and management of web servers supporting online studies. Plos
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course.
Linares, D., Marin-Campos, R., Dalmau, J., & Compte, A. (2018). Validation of motion
perception of briefly displayed images using a tablet. Scientific Reports, 8(1), 16056.
https://doi.org/10.1038/s41598-018-34466-9
Lindeløv, J. K. (2019, June 28). Common statistical tests are linear models. Retrieved August
Mantiuk, R., Kowalik, M., Nowosielski, A., & Bazyluk, B. (2012). Do-It-Yourself Eye Tracker:
Low-Cost Pupil-Based Eye Tracker for Computer Graphics Applications. Lecture Notes
Marin-Campos, R., Dalmau, J., Compte, A., & Linares, D. (2020). StimuliApp: psychophysical
Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M.
(2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep
Ma, W. J., & Peters, B. (2020). A neural network walks into a lab: towards using deep nets as
Musall, S., Urai, A. E., Sussillo, D., & Churchland, A. K. (2019). Harnessing behavioral
Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of
1105–1107. https://doi.org/10.1038/nn.2886
Niv, Y. (2020). The primacy of behavioral research for understanding the brain.
https://doi.org/10.31234/osf.io/y8mxe
https://doi.org/10.1126/science.aac4716
Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in
https://doi.org/10.1016/j.tics.2017.03.011
Pashler, H., & Mozer, M. C. (2013). When does fading enhance perceptual category
learning? Journal of Experimental Psychology. Learning, Memory, and Cognition, 39( 4),
1162–1173. https://doi.org/10.1037/a0031679
Pereira, T. D., Shaevitz, J. W., & Murthy, M. (2020). Quantifying behavior to understand the
https://doi.org/10.1038/s41593-020-00734-z
Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F. da V., …
Vizcaíno, J. A. (2016). Ten simple rules for taking advantage of git and github. PLoS
Pisupati, S., Chartarifsky-Lynn, L., Khanal, A., & Churchland, A. K. (2019). Lapses in
Plant, R. R., Hammond, N., & Turner, G. (2004). Self-validating presentation and response
timing in cognitive paradigms: how and why? Behavior Research Methods, Instruments,
& Computers : A Journal of the Psychonomic Society, Inc, 36( 2), 291–303.
https://doi.org/10.3758/bf03195575
Prins, N. (2013). The psi-marginal adaptive method: How to give nuisance parameters the
attention they deserve (no more, no less). Journal of Vision, 13( 7), 3.
https://doi.org/10.1167/13.7.3
Quax, S. C., Dijkstra, N., van Staveren, M. J., Bosch, S. E., & van Gerven, M. A. J. (2019).
Eye movements explain decodability during perception and cued attention in MEG.
Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection
https://doi.org/10.1016/j.neuroimage.2013.08.065
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
Rule, A., Birmingham, A., Zuniga, C., Altintas, I., Huang, S.-C., Knight, R., … Rose, P. W.
(2019). Ten simple rules for writing and sharing computational analyses in Jupyter
https://doi.org/10.1371/journal.pcbi.1007007
Sauter, M., Draschkow, D., & Mack, W. (2020). Building, hosting and recruiting: A brief
introduction to running behavioral experiments online. Brain Sciences, 10( 4).
https://doi.org/10.3390/brainsci10040251
Schnell, S. (2015). Ten simple rules for a computational biologist’s laboratory notebook.
https://doi.org/10.1371/journal.pcbi.1004385
Schustek, P., Hyafil, A., & Moreno-Bote, R. (2019). Human confidence judgments reflect
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with
Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96). SciPy.
https://doi.org/10.25080/Majora-92bf1922-011
Semuels, A. (2018, January 23). The Online Hell of Amazon’s Mechanical Turk . Retrieved
https://www.theatlantic.com/business/archive/2018/01/amazon-mechanical-turk/551192/
Shinn, M., Lam, N. H., & Murray, J. D. (2020). A flexible framework for simulating and fitting
https://doi.org/10.1177/0956797611417632
Smith, P. L., & Little, D. R. (2018). Small is beautiful: In defense of the small-N design.
https://doi.org/10.3758/s13423-018-1451-8
Stewart, N., Chandler, J., & Paolacci, G. (2017). Crowdsourcing samples in cognitive
https://doi.org/10.1016/j.tics.2017.06.007
Strasburger, H. (1994, July). Strasburger’s psychophysics software overview . Retrieved May
Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific Misconduct and the Myth of
https://doi.org/10.1177/1745691612460687
Szollosi, A., Kellen, D., Navarro, D. J., Shiffrin, R., van Rooij, I., Van Zandt, T., & Donkin, C.
https://doi.org/10.1016/j.tics.2019.11.009
Thaler, L., Schütz, A. C., Goodale, M. A., & Gegenfurtner, K. R. (2013). What is the best
fixation target? The effect of target shape on stability of fixational eye movements. Vision
Thompson, W. H., Wright, J., Bissett, P. G., & Poldrack, R. A. (2019). Dataset Decay: the
https://doi.org/10.1101/801696
Tversky, Amos, & Kahneman, D. (1989). Rational choice and the framing of decisions. In B.
Karpak & S. Zionts (Eds.), Multiple criteria decision making and risk analysis using
https://doi.org/10.1007/978-3-642-74919-3_4
Tversky, A, & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases.
’t Hart, B. M., Achakulvisut, T., Blohm, G., Kording, K., Peters, M. A. K., Akrami, A., … Hyafil,
neuroscience. https://doi.org/10.31219/osf.io/9fp4v
Urai, A. E., Braun, A., & Donner, T. H. (2017). Pupil-linked arousal is driven by decision
uncertainty and alters serial choice bias. Nature Communications, 8, 14637.
https://doi.org/10.1038/ncomms14637
Varoquaux, G. (2018). Cross-validation failure: Small sample sizes lead to large error bars.
Waskom, M. L., Okazawa, G., & Kiani, R. (2019). Designing and interpreting psychophysical
https://doi.org/10.1016/j.neuron.2019.09.016
Wichmann, F. A., & Hill, N. J. (2001a). The psychometric function: I. Fitting, sampling, and
https://doi.org/10.3758/BF03194544
Wichmann, F. A., & Hill, N. J. (2001b). The psychometric function: II. Bootstrap-based
https://doi.org/10.3758/BF03194545
https://doi.org/10.1002/9781119170174.epcn507
https://doi.org/10.18637/jss.v059.i10
Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the
https://doi.org/10.3389/fninf.2013.00014
Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of
Yates, J. L., Park, I. M., Katz, L. N., Pillow, J. W., & Huk, A. C. (2017). Functional dissection
of signal and noise in MT and LIP during decision-making. Nature Neuroscience, 20( 9),
1285–1292. https://doi.org/10.1038/nn.4611
Yiu, Y.-H., Aboulatta, M., Raiser, T., Ophey, L., Flanagin, V. L., Zu Eulenburg, P., & Ahmadi,
https://doi.org/10.1016/j.jneumeth.2019.05.016
Yoon, J., Blunden, H., Kristal, A. S., & Whillans, A. V. (2019). Framing Feedback Giving as
Advice Giving Yields More Critical and Actionable Input. Harvard Business School.