(David H. Barlow, Michel Hersen) Single Case Exper
(David H. Barlow, Michel Hersen) Single Case Exper
(David H. Barlow, Michel Hersen) Single Case Exper
Experimental Designs
(PG PS-56)
Pergamon Titles of Related Interest
Barlow/Hayes/Nelson THE SCIENTIST PRACTITIONER: Research
and Accountability in Clinical and Educational Settings
Bellack/Hersen RESEARCH METHODS IN CLINICAL PSYCHOLOGY
Hersen/Bellack BEHAVIORAL ASSESSMENT: A Practical
Handbook, Second Edition
Ollendick/Hersen CHILD BEHAVIORAL ASSESSMENT: Principles
and Procedures
Related Journals*
BEHAVIORAL ASSESSMENT
PERSONALITY AND INDIVIDUAL DIFFERENCES
David H. Barlow
SUNY at Albany
Michel Hersen
University of Pittsburgh School of Medicine
PE R G A M O N PRESS
NEW YORK • OXFORD • BEIJING • FRANKFURT
SÃO PAULO • SYDNEY • TOKYO • TORONTO
U.S.A. Pergamon Press Inc., Maxwell House, Fairview Park,
Elmsford, New York 10523, U.S.A.
U.K. Pergamon Press plc, Headington Hill Hall,
Oxford 0X3 OBW, England
PEOPLE'S REPUBLIC Pergamon Press, Room 4037, Qianmen Hotel, Beijing,
OF CHINA People's Republic of China
FEDERAL REPUBLIC Pergamon Press GmbH, Hammerweg 6,
OF GERMANY D-6242 Kronberg, Federal Republic of Germany
BRAZIL Pergamon Editora Ltda, Rua Eça de Queiros, 346,
CEP 04011, Paraiso, São Paulo, Brazil
AUSTRALIA Pergamon Press Australia Pty Ltd., P.O. Box 544,
Potts Point, N.S.W. 2011, Australia
JAPAN Pergamon Press, 5th Floor, Matsuoka Central Building,
1-7-1 Nishishinjuku, Shinjuku-ku, Tokyo 160, Japan
CANADA Pergamon Press Canada Ltd., Suite No 271,
253 College Street, Toronto, Ontario, Canada M5T 1 R5
Barlow, David H.
Single case experimental designs, 2nd ed.
(Pergamon general psychology series)
Author's names in reverse order in 1st ed., 1976.
Includes bibliographies and indexes.
1. Psychology-Research. 2. Experimental design.
1. Hersen, Michel. II. Title. III. Series. [DNLM:
1. Behavior. 2. Psychology, Experimental. 3. Research
design. BF 76.5 H572s]
BF76.5.B384 1984 150'.724 84-6292
ISBN 0-08-030136-3
ISBN 0-08-030135-5 (soft)
Printing: 4 5 6 78 9 Year: 1 2 3 4 5 6 7 8 9 0
v
VI Contents
References 374
At the time, this seemed a reasonable statement to make, but we think that
few of us involved in applied research anticipated the explosive growth of
interest in single-case designs and how many methodological and strategical
innovations would subsequently appear. As a result of developments in the 8
years since the first edition, this book can be more accurately described as new
than as revised. Fully 5 of the 10 chapters are new or have been completely
rewritten. The remaining five chapters have been substantially revised and
updated to reflect new guidelines and the current wisdom on experimental
strategies involving single-case designs.
Developments in the field have not been restricted to new or modified
experimental designs. New thinking has emerged on the analyses of data from
these designs, particularly with regard to use of statistical procedures. We
were most fortunate in having Alan Kazdin take into account these develop
ments in the revision of his chapter on statistical analyses for single-case
experimental designs. Furthermore, the area of techniques of measurement
and assessment relevant to single-case designs has changed greatly in the years
since the first edition. Don Hartmann, the Editor of Behavioral Assessment
and one of the leading figures in assessment and single-case designs, has
strengthened the book considerably with his lucid chapter. Nevertheless, the
primary purpose of the book was, and remains, the provision of a source-
book of single-case designs, with guidelines for their use in applied settings.
To Sallie Morgan, who is very tired of typing the letters A-B-C over and
over again for the past 10 years, we can say that we couldn’t have done it
without you, or without Mary Newell and Susan Capozzoli. Also, Susan
SCED—A*
IX
X Preface
David H. Barlow
Albany, New York
Michel Hersen
Pittsburgh, Pennsylvania
Epigram
1.1. INTRODUCTION
*In this book applied research refers to experimentation in the area of human
behavior change relevant to the disciplines of clinical psychology, psychiatry, social
work, and education.
2 Single-case Experimental Designs
In the meantime, applied research was off to a shaky start in the offices of
early psychiatrists with a technique known as the case study method. The
separate development of applied research is traced from those early begin
nings through the grand collaborative group comparison studies proposed in
the 1950s. The subsequent disenchantment with this approach in applied
research forced a search for alternatives. The rise and fall of the major
alternatives—process research and naturalistic studies—is outlined near the
end of the chapter. This disenchantment also set the stage for a renewal of
interest in the scientific study of the individual. The multiple origins of single
case experimental designs in the laboratories of experimental psychology and
the offices of clinicians complete the chapter. Descriptions of single-case
designs and guidelines for their use as they are evolving in applied research
comprise the remainder of this book.
well known that summaries are not required. What is often overlooked,
however, is that Pavlov’s basic findings were gleaned from single organisms
and strengthened by replication on other organisms. In terms of scientific
yield, the study of the individual organism reached an early peak with Pavlov,
and Skinner would later cite this approach as an important link and a strong
bond between himself and Pavlov (Skinner, 1966a).
other words, inference is made from the sample to the population. This work
and the subsequent developments in the field of sampling theory made it
possible to talk in terms of psychological principles with broad generality and
applicability—a primary goal in any science. This type of estimation, how
ever, was based on appropriate statistics, averages, and intersubject variabil
ity in the sample, which further reinforced the group comparison approach in
basic research.
As the science of psychology grew out of its infancy, its methodology was
largely determined by the lure of broad generality of findings made possible
through the brillant work of Fisher and his followers. Because of the empha
sis on averages and intersubject variability required by this design in order to
make general statements, the intensive study of the single organism, so
popular in the early history of psychology, fell out of favor. By the 1950s,
when investigators began to consider the possibility of doing serious research
in applied settings, the group comparison approach was so entrenched that
anyone studying single organisms was considered something of an oddity by
no less an authority than Underwood (1957). The Zeitgeist in psychological
research was group comparison and statistical estimation. While an occa
sional paper was published during the 1950s defending the study of the single
case (S. J. Beck, 1953; Rosenzweig, 1951), or at least pointing out its place in
psychological research (duMas, 1955), very little basic research was carried
out on single-cases. A notable exception was the work of B. F. Skinner and his
students and colleagues, who were busy developing an approach known as
the experimental analysis of behavior, or operant conditioning. This work,
however, did not have a large impact on methodology in other areas of
psychology during the 1950s, and applied research was just beginning.
Against this background, it is not surprising that applied researchers in the
1950s employed the group comparison approach, despite the fact that the
origins of the study of clinically relevant phenomena were quite different
from the origin of more basic research described above.
time, firmly link successful treatment with the necessity of discovering the
etiology of the behavior disorder. One wonders if the early development of
clinical techniques, including psychoanalysis, would have been different if
careful observers like Breuer had been cognizant of the experimental implica
tions of their clinical work. Of course, this small leap from uncontrolled case
study to scientific investigation of the single case did not occur because of a
lack of awareness of basic scientific principles in early clinicians. The result
was an accumulation of successful individuals’ case studies, with clinicians
from varying schools claiming that their techniques were indispensable to
success. In many cases their claims were grossly exaggerated. Brill noted in
1909 on psychoanalysis that “The results obtained by the treatment are
unquestionably very gratifying. They surpass those obtained by simpler
methods in two chief respects; namely, in permanence and in the prophylactic
value they have for the future” (Brill, 1909). Much later, in 1935, Kessel and
Hyman observed, “this patient was saved from an inferno and we are
convinced that this could have been achieved by no other method” (Kessel &
Hyman, 1933). From an early behavioral standpoint, Max (1935) noted the
electrical aversion therapy produced “95 percent relief” from the compulsion
of homosexuality.
These kinds of statements did little to endear the case study method to
serious applied researchers when they began to appear in the 1940s and 1950s.
In fact, the case study method, if anything, deteriorated somewhat over the
years in terms of the amount and nature of publicly observable data available
in these reports. Frank (1961) noted the difficulty in even collecting data from
a therapeutic hour in the 1930s due to lack of necessary equipment, reluc
tance to take detailed notes, and concern about confidentiality. The advent of
the phonograph record at this time made it possible at least to collect raw data
from those clinicians who would cooperate, but this method did not lead to
any fruitful new ideas on research. With the advent of serious applied
research in the 1950s, investigators tended to reject reports from uncontrolled
case studies due to an inability to evaluate the effects of treatment. Given the
extraordinary claims by clinicians after successful case studies, this attitude is
understandable. However, from the viewpoint of single-case experimental
designs, this rejection of the careful observation of behavior change in a case
report had the effect of throwing out the baby with the bathwater.
defined than in most case reports, and techniques tended to be fixed and
“school” oriented. Because all procedures achieved some success, practi
tioners within these schools concentrated on the positive results, explained
away the failures, and decided that the overall results confirmed that their
procedures, as applied, were responsible for the success. Due to the strong
and overriding theories central to each school, the successes obtained were
attributed to theoretical constructs underlying the procedure. This precluded
a careful analysis of elements in the procedure or the therapeutic intervention
that many have been responsible for certain changes in a given case and had
the effect of reinforcing the application of a global, ill-defined treatment
from whatever theoretical orientation, to global definitions of behavior disor
ders, such as neurosis. This, in turn, led to statements such as “psy
chotherapy works with neurotics.” Although applied researchers later
rejected these efforts as unscientific, one carryover from this approach was
the notion of the average response to treatment; that is, if a global treatment
is successful on the average with a group of “neurotics,” then this treatment
will probably be successful with any individual neurotic who requests treat
ment.
Intuitively, of course, descriptions of results from 50 cases provide a more
convincing demonstration of the effectiveness of a given technique than
separate descriptions of 50 individual cases. A modification of this approach
utilizing updated strategies and procedures and with the focus on individual
responses has been termed clinical replication. This strategy can make a
substantial contribution to the applied research process (see chapter 10). The
major difficulty with this approach, however, particularly as it was practiced
in early years, is that the category in which these clients are classified most
always becomes unmanageably heterogeneous. The neurotics described in
Eysenck’s (1952) paper may have less in common than any group of people
one would choose randomly. When cases are described individually, however,
a clinician stands a better chance of gleaning some important information,
since specific problems and specific procedures are usually described in more
detail. When one lumps cases together in broadly defined categories, individ
ual case descriptions are lost and the ensuing report of percentage success
becomes meaningless. This unavoidable heterogeneity in any group of pa
tients is an important consideration that will be discussed in more detail in
this chapter and in chapter 2.
The clearer definition of variables and the call for experimental questions
that were precise enough to be answered were major advances in applied
research. The extensive review of psychotherapy research by Bergin and
Strupp (1972), however, demonstrated that even under these more favorable
conditions, the application of the group comparison design to applied prob
lems posed many difficulties. These difficulties, or objections, which tend to
limit the usefulness of a group comparison approach in applied research, can
be classified under five headings: (1) ethical objections, (2) practical problems
in collecting large numbers of patients, (3) averaging of results over the
group, (4) generality of findings, and (5) intersubject variability.
Ethical objections
An oft-cited issue, usually voiced by clinicians, is the ethical problem
inherent in withholding treatment from a no-treatment control group. This
notion, of course, is based on the assumption that the therapeutic interven
tion, in fact, works, in which case there would be little need to test it at all.
Despite the seeming illogic of this ethical objection, in practice many clini
cians and other professional personnel react with distaste to withholding
some treatment, however inadequate, from a group of clients who are under
going significant human suffering. This attitude is reinforced by scattered
examples of experiments where control groups did endure substantial harm
during the course of the research, particularly in some pharmacological
experiments.
Practical problems
On a more practical level, the collection of large numbers of clients
homogeneous for a particular behavior disorder is often a very difficult task.
In basic research in experimental psychology most subjects are animals (or
college sophomores), where matching of relevant behaviors or background
variables such as personality characteristics is feasible. When dealing with
severe behavior disorders, however, obtaining sufficient clients suitably
matched to constitute the required groups in the study is often impossible. As
Isaac Marks, who is well known for his applied research with large groups,
noted:
from these a tiny number are suitable for inclusion in the homogeneous sample
one wishes to study. Selection of the sample can be so time consuming that it
severely limits research possibilities. Consider the clinician who wishes to assem
ble a series o f obsessive-compulsive patients to be assigned at random into one of
two treatment conditions. He will need at least 20 such cases for a start, but
obsessive-compulsive neuroses (not personality) make up only 0.5-3 percent of
the psychiatric outpatients in Britain and the USA. This means the clinician will
need a starting population of about 2000 cases to sift from before he can find his
sample, and even then this assumes that all his colleagues are referring every
suitable patient to him. In practice, at a large center such as the Maudsley
Hospital, it would take up to two years to accumulate a series of obsessive
compulsives for study (Bergin & Strupp, 1972, p. 130).
Averaging of results
A third difficulty noted by many applied researchers is the obscuring of
individual clinical outcome in group averages. This issue was cogently raised
by Sidman (1960) and Chassan (1967, 1979) and repeatedly finds its way into
16 Single-case Experimental Designs
Generality of findings
Averaging and the complexity of individual patients also bring up some
related problems. Because results from group studies do not reflect changes in
individual patients, these findings are not readily translatable or generalizable
to the practicing clinician since, as Chassan (1967) pointed out, the clinician
cannot determine which particular patient characteristics are correlated with
improvement. In ignorance of the responses of individual patients to treat
ment, the clinician does not know to what extent a given patient is similar to
patients who improved or perhaps deteriorated within the context of an
overall group improvement. Furthermore, as groups become more homoge
neous, which most researchers agree is a necessary condition to answer
specific questions about effects of therapy, one loses the ability to make
inferential statements to the population of patients with a particular disorder
because the individual complexities in the population will not have been
adequately sampled. Thus it becomes difficult to generalize findings at all
beyond the specific group of patients in the experiment. These issues of
averaging and generality of findings will be discussed in greater detail in
chapter 2.
The Single-case in Basic and Applied Research 17
Intersubject variability
A final issue bothersome to clinicians and applied researchers is variability.
Between-subject group comparison designs consider only variability between
subjects as a method of dealing with the enormous differences among indi
viduals in a group. Progress is usually assessed only once (in a posttest). This
large intersubject variability is often responsible for the “weak” effect ob
tained in these studies, where some clients show considerable improvement
and others deteriorate, and the average improvement is statistically significant
but clinically weak. Ignored in these studies is within-subject variability or the
clinical course of a specific patient during treatment, which is of great
practical interest to clinicians. This issue will also be discussed more fully in
chapter 2.
Naturalistic studies
The advantage of the naturalistic study for most clinicians was that it did
little to disrupt the typical activities engaged in by clinicians in day-to-day
practice. Unlike with the experimental group comparison design, clinicians
were not restricted by precise definitions of an independent variable (treat
ment, time limitation, or random assignment of patients to groups). Kiesler
(1971) noted that naturalistic studies involve “ . . . live, unaltered, minimally
controlled, unmanipulated ‘natural’ psychotherapy sequences—so-called ex
periments of nature” (p. 54). Naturally this approach had great appeal to
clinicians for it dealt directly with their activities and, in doing so, promised
to consider the complexities inherent in treatment. Typically, measures of
multiple therapist and patient behaviors are taken, so that all relevant vari
ables (based on a given clinician’s conceptualization of which variables are
relevant) may be examined for interrelationships with every other variable.
Perhaps the best known example of this type of study is the project at the
Menninger Foundation (Kernberg, 1973). Begun in 1954, this was truly a
mammoth undertaking involving 38 investigators, 10 consultants, three dif
ferent project leaders, and 18 years of planning and data collection. Forty-
two patients were studied in this project. This group was broadly defined,
although overtly psychotic patients were excluded. Assignment of patient to
therapist and to differing modes of psychoanalytic treatment was not random
but based on clinical judgments of which therapist or mode of treatment was
most suitable for the patient. In other words, the procedures were those
normally in effect in a clinical setting. In addition, other treatments, such as
pharmacological or organic interventions, were administered to certain pa
tients as needed. Against this background, the investigators measured multi
ple patient characteristics (such as various components of ego strength) and
correlated these variables, measured periodically throughout treatment by
referring to detailed records of treatment sessions, with multiple therapeutic
activities and modes of treatment. As one would expect, the results are
enormously complex and contain many seemingly contradictory findings. At
least one observer (Malan, 1973) noted that the most important finding is that
purely supportive treatment is ineffective with borderline psychotics, but
working through of the transference relationship under hospitalization with
this group is effective. Notwithstanding the global definition of treatment and
the broad diagnostic categories (borderline psychotic) also present in early
group comparison studies, this report was generally hailed as an extremely
important breakthrough in psychotherapy research. Methodologists, how
ever, were not so sure. While admitting the benefits of a clearer definition of
psychoanalytic terms emanating from the project, May (1973) wondered
about the power and significance of the conclusions. Most of this criticism
concerns the purported strength of the naturalistic study—that is, the lack of
The Single-case in Basic and Applied Research 19
Process research
The second alternative to between-group comparison research was the
process approach so often referred to in the APA conferences on psy
chotherapy research (e.g., Strupp & Luborsky, 1962). Hoch and Zubin’s
(1964) popular phrase “flight into process” was an accurate description of the
reaction of many clinical investigators to the practical and methodological
difficulties of the large group studies. Typically, process research has con
cerned itself with what goes on during therapy between an individual patient
and therapist instead of the final outcome of any therapeutic effort. In the
late 1950s and early 1960s, a large number of studies appeared on such topics
as relation of therapist behavior to certain patient behaviors in a given
interview situation (e.g., Rogers, Gendlin, Kiesler, & Truax, 1967). As such,
process research held much appeal for clinicians and scientists alike. Clini
cians were pleased by the focus on the individual and the resulting ability to
study actual clinical processes. In some studies repeated measures during
therapy gave clinicians an idea of the patient’s course during treatment.
Scientists were intrigued by the potential of defining variables more precisely
within one interview without concerning themselves with the complexities
involved before or after the point of study. The increased interest in process
research, however, led to an unfortunate distinction between process and
outcome studies (see Kiesler, 1966). This distinction was well stated by Lu
borsky (1959), who noted that process research was concerned with how
changes took place in a given interchange between patient and therapist,
whereas outcome research was concerned with what change took place as a
result of treatment. As Paul (1969) and Kiesler (1966) pointed out, the
dichotomization of process and outcome led to an unnecessary polarity in the
manner in which measures of behavior change were taken. Process research
collected data on patient changes at one or more points during the course of
therapy, usually without regard for outcome, while outcome research was
concerned only with pre-post measures outside of the therapeutic situation.
Kiesler noted that this was unnecessary because measures of change within
treatment can be continued throughout treatment until an “outcome” point is
reached. He also quoted Chassan (1962) on the desirability of determining
what transpired between the beginning and end of therapy in addition to
The Single-case in Basic and Applied Research 21
The state of affairs of clinical practice and research in the 1960s satisfied
few people. Clinical procedures were largely judged as unproven (Bergin &
Strupp, 1972; Eysenck, 1965), and the prevailing naturalistic research was
unacceptable to most scientists concerned with precise definition of variables
and cause-effect relationships. On the other hand, the elegantly designed and
scientifically rigorous group comparison design was seen as impractical and
incapable of dealing with the complexities and idiosyncrasies of individuals
by most clinicians. Somewhere in between was process research, which dealt
mostly with individuals but was correlational rather than experimental. In
addition, the method was viewed as incapable of evaluating the clinical
effects of treatment because the focus was on changes within treatment rather
than on outcome.
These developments were a major contribution to the well-known and oft-
cited scientist-practitioner split (e.g., Joint Commission on Mental Illness and
Health, 1961). The notion of an applied science of behavior change growing
out of the optimism of the 1950s did not meet expectations, and many
clinician-scientists stated flatly that applied research had no effect on their
clinical practice. Prominent among them was Matarazzo, who noted, “Even
after 15 years, few of my research findings affect my practice. Psychological
science per se doesn’t guide me one bit. I still read avidly but this is of little
direct practical help. My clinical experience is the only thing that has helped
SCED—B
22 Single-case Experimental Designs
sign, it is unlikely that the same results would have obtained with a household
pet in its natural environment. Yet these are precisely the conditions under
which most applied researchers must work.
The plea of applied researchers for appropriate methodology grounded in
the scientific method to investigate complex problems in individuals is never
more evident than in the writings of Gordon Allport. Allport argued most
eloquently that the science of psychology should attend to the uniqueness of
the individual (e.g., Allport, 1961, 1962). In terms commonly used in the
1950s, Allport became the champion of the idiographic (individual) ap
proach, which he considered superior to the nomothetic (general or group)
approach.
Why should we not start with individual behavior as a source o f hunches (as we
have in the past) and then seek our generalization (also as we have in the past) but
finally come back to the individual not for the mechanical application o f laws (as
we do now) but for a fuller and more accurate assessment then we are now able
to give? I suspect that the reason our present assessments are now so often feeble
and sometimes even ridiculous, is because we do not take this final step. We stop
with our wobbly laws o f generality and seldom confront them with the concrete
person. (Allport, 1962, p. 407)
Due to the lack of a practical, applied methodology with which to study the
individual, however, most of Allport’s own research was nomothetic. The
increase in the intensive study of the individual in applied research led to a
search for appropriate methodology, and several individuals or groups began
developing ideas during the 1950s and 1960s.
the guilt control phase and improved during the rational discussion phase.
These fluctuations around the regression line were statistically significant.
This effect, of course, is weak and of dubious importance because overall
improvement in paranoid scores was not functionally related to treatment.
Furthermore, several guidelines for a true experimental analysis of the treat
ment were violated. Examples of experimental error include the absence of
baseline measurement to determine the pretreatment course of the paranoid
beliefs and the simultaneous withdrawal of one treatment and introduction of
a second treatment (see chapter 3). The importance of the case and other
early work from M. B. Shapiro, however, is not the knowledge gained from
any one experiment, but the beginnings of the development of a scientifically
based methodology for evaluating effects of treatment within a single-case.
To the extent that Shapiro’s correlational studies were similar to process
research, he broke the semantic barrier which held that process criteria were
unrelated to outcome. He demonstrated clearly that repeated measures within
an individual could be extended to a logical end point and that this end point
was the outcome of treatment. His more important contribution from our
point of view, however, was the demonstration that independent variables in
applied research could be defined and systematically manipulated within a
single-case, thereby fulfilling the requirements of a “true” experimental ap
proach to the evaluation of therapeutic technique (Underwood, 1957). In
addition, his demonstration of the applicability of the study of the individual
case to the discovery of issues relevant to psychopathology was extremely
important. This approach is only now enjoying more systematic application
by some of our creative clinical scientists (e.g., Turkat & Maisto, in press).
Quasi-experimental designs
given intervention. Thus one can observe changes from a baseline as a result
of a given intervention. While the inclusion of a baseline is a distinct method
ological improvement, this design is basically correlational in nature and is
unable to isolate effects of therapeutic mechanisms or establish cause-effect
relationships. Basically, this design is the A-B design described in chapter 5.
The equivalent time series design, however, involves experimental manipula
tion of independent variables through alteration of treatments, as in the M.
B. Shapiro and Ravenette study (1959), or introduction and withdrawal of
one treatment in an A-B-A fashion. Approaching the study of the individual
from a different perspective than Shapiro, Campbell and Stanley arrived at
similar conclusions on the possibility of manipulation of independent vari
ables and establishment of cause-effect relationships in the study of a single
case.
What was perhaps the more important contribution of these methodolo
gists, however, was the description of various limitations of these designs in
their ability to rule out alternative plausible hypotheses (internal validity) or
the extent to which one can generalize conclusions obtained from the designs
(external validity) (see chapter 2).
It remained for Chassan (1967, 1979) to pull together many of the method
ological advances in single-case research to that point in a book that made
clear distinctions between the advantages and disadvantages of what he
termed extensive (group) design and intensive (single-case) design. Drawing
on long experience in applied research, Chassan outlined the desirability and
applicability of single-case designs evolving out of applied research in the
1950s and early 1960s. While most of his own experience in single-case design
concerned the evaluation of pharmacologic agents for behavior disorders,
Chassan also illustrated the uses of single-case designs in psychotherapy
research, particularly psychoanalysis. As a statistician rather than a practic
ing clinician, he emphasized the various statistical procedures capable of
establishing relationships between therapeutic intervention and dependent
variables within the single-case. He concentrated on the correlation type of
design using trend analysis but made occasional use of a prototype of the A-
B-A design (e.g., Beliak & Chassan, 1964), which, in this case, extended the
work of M. B. Shapiro to evaluation of drug effects but, in retrospect,
contained some of the same methodological faults. Nevertheless, the sophisti
cated theorizing in the book on thorny issues in single-case research, such as
generality of findings from a single-case, provided the most comprehensive
treatment of these issues to this time. Many of Chassan’s ideas on this subject
will appear repeatedly in later sections of this book.
The Single-case in Basic and Applied Research 29
Kazdin, 1978, and Krasner, 1971a, for a history of behavior therapy). The
relevance of the experimental analysis of behavior to applied research is the
development of sophisticated methodology enabling intensive study of indi
vidual subjects. In rejecting a between-subject approach as the only useful
scientific methodology, Skinner (1938, 1953) reflected the thoughts of the
early physiologists such as Claude Bernard and emphasized repeated objec
tive measurement in a single subject over a long period of time under highly
controlled conditions. As Skinner noted (1966b), “ . . . instead of studying a
thousand rats for one hour each, or a hundred rats for ten hours each, the
investigator is likely to study one rat for a thousand hours” (p. 21), a
procedure that clearly recognizes the individuality of an organism. Thus,
Skinner and his colleagues in the animal laboratories developed and refined
the single-case methodology that became the foundation of a new applied
science. Culminating in the definitive methodological treatise by Sidman
(1960), entitled Tactics o f Scientific Research, the assumption and conditions
of a true experimental analysis of behavior were outlined. Examples of fine-
grain analyses of behavior and the use of withdrawal, reversal, and multi
element experimental designs in the experimental laboratories began to
appear in more applied journals in the 1960s, as researchers adapted these
strategies to the investigation of applied problems.
It is unlikely, however, that this approach would have had a significant
impact on applied clinical research without the growing popularity of behav
ior therapy. The fact that M. B. Shapiro and Chassan were employing
rudimentary prototypes of withdrawal designs (independent of influences
from the laboratories of operant conditioning) without marked effect on
applied research would seem to support this contention. In fact, even earlier,
F. C. Thorne (1947) described clearly the principle of single-case research,
including A-B-A withdrawal designs, and recommended that clinical research
proceed in this manner, without apparent effect (Barlow et al., 1983). The
growth of the behavior therapy approach to applied problems, however,
provided a vehicle for the introduction of the methodology on a scale that
attracted attention from investigators in applied areas. Behavior therapy, as
the application of the principles of general-experimental and social psychol
ogy to the clinic, also emphasized direct measurement of clinically relevant
target behaviors and experimental evaluation of independent variables or
“treatments.” Since many of these “principles of learning” utilized in behav
ior therapy originally emanated from operant conditioning, it was a small
step for behavior therapists to also borrow the operant methodology to
validate the effectiveness of these same principles in applied settings. The
initial success of this approach (e.g., Ullmann & Krasner, 1965) led to similar
evaluations of additional behavior therapy techniques that did not derive
directly from the operant laboratories (e.g., Agras et al., 1971; Barlow,
Leitenberg, & Agras, 1969). During this period, methodology originally
The Single-case in Basic and Applied Research 31
intended for the animal laboratory was adapted more fully to the investiga
tion of applied problems and “applied behavior analysis” became an impor
tant supplementary and, in some cases, alternative methodological approach
to between-subjects experimental designs.
The early pleas to return to the individual as the cornerstone of an applied
science of behavior have been heeded. The last several years have witnessed
the crumbling of barriers that precluded publication of single-case research in
any leading journal devoted to the study of behavioral problems. Since the
first edition of this book, a proliferation of important books has appeared
devoted, for example, to strategies for evaluating data from single-case
designs (Kratochwill, 1978b), to the application of these methods in social
work (Jayaratne & Levy, 1979), or to the philosophy underlying this approach
to applied research (J. M. Johnston & Pennypacker, 1980). Other excellent
books have appeared concentrating specifically on descriptions of design
alternatives (Kazdin, 1982b), and major handbooks on research are not
complete without a description of this approach (e.g., Kendall & Butcher,
1982).
More importantly, the field has not stood still. From their more recent
origins in evaluating the application of operant principles to behavior disor
ders, single-case designs are now fully incorporated into the armamentarium
of applied researchers generally interested in behavior change beyond the
subject matter of the core mental health professions or education. Profes
sions such as rehabilitation medicine are turning increasingly to this approach
as appropriate to the subject matter at hand (e.g., Schindele, 1981), and the
field is progressing. New design alternatives have appeared only recently, and
strategies involved in more traditional approaches have been clarified and
refined. We believe that the recent methodological developments and the
demonstrated effectiveness of this methodology provide a base for the estab
lishment of a true science of human behavior with a focus on the paramount
importance of the individual. A description of this methodology is the
purpose of this book.
CHAPTER 2
2.1. INTRODUCTION
Two issues basic to any science are variability and generality of findings.
These issues are handled somewhat differently from one area of science to
another, depending on the subject matter. The first section of this chapter
concerns variability.
In applied research, where individual behavior is the primary concern, it is
our contention that the search for sources of variability in individuals must
occur if we are to develop a truly effective clinical science of human behavior
change. After a brief discussion of basic assumptions concerning sources of
variability in behavior, specific techniques and procedures for dealing with
behavioral variability in individuals are outlined. Chief among these are
repeated measurement procedures that allow careful monitoring of day-to-
day variability in individual behavior, and rapidly changing, improvised
experimental designs that facilitate an immediate search for sources of va
riability in an individual. Several examples of the use of this procedure to
track down sources of intersubject or intrasubject variability are presented.
The second section of this chapter deals with generality of findings. Histori
cally, this has been a thorny issue in applied research. The seeming limitations
in establishing wide generality from results in a single-case are obvious, yet
establishment of generality from results in large groups has also proved
elusive. After a discussion of important types of generality of findings, the
shortcomings of attempting to generalize from group results in applied
research are discussed. Traditionally, the major problems have been an inabil
ity to draw a truly random sample from human behavior disorders and the
difficulty of generalizing from groups to an individual. Applied researchers
attempted to solve the problem by making groups as homogeneous as possi
32
General Issues in A Single-case Approach 33
2.2. VARIABILITY
The notion that behavior is a function of a multiplicity of factors finds
wide agreement among scientists and professional investigators. Most scien
tists also agree that as one moves up the phylogenetic scale, the sources of
variability in behavior become greater. In response to this, many scientists
choose to work with lower life forms in the hope that laws of behavior will
emerge more readily and be generalizable to the infinitely more complex area
of human behavior. Applied researchers do not have this luxury. The task of
the investigator in the area of human behavior disorders is to discover
functional relations among treatments and specific behavior disorders over
and above the welter of environmental and biological variables impinging on
the patient at any given time. Given these complexities, it is small wonder that
most treatments, when tested, produce small effects or, in Bergin and Strupp’s
terms, weak results (Bergin & Strupp, 1972).
egy. Rather than assuming that variability is intrinsic to the organism, one
should make every effort to discover sources of behavioral variability among
organisms such that laws of behavior could be studied with the precision and
specificity found in physics. This precision, of course, would require close
attention to the behavior of the individual organism. If one rat behaves
differently from three other rats in an experimental condition, the proper
tactic is to find out why. If the experimenter succeeds, the factors that produce
that variability can be eliminated and a “cleaner” test of the effects of the
original independent variable can be made. Sidman recognized that behav
ioral variability may never be entirely eliminated, but that isolation of as
many sources of variability as possible would enable an investigator to
estimate how much variability actually is intrinsic.
Although one may question this strategy in basic research, as Sidman has, the
amount of control an experimenter has over the behavioral history and
current environmental variables impinging on the laboratory animal makes
this strategy at least feasible. In applied research, when control over behav
ioral histories or even current environmental events is limited or nonexistent,
there is far less probability of discovering a treatment that is effective over
and above these uncontrolled variables. This, of course, was the major cause
of the inability of early group comparison studies to demonstrate that the
treatment under consideration was effective. As noted in chapter 1, some
clients were improving while others were worsening, despite the presence of
the treatment. Presumably, this variability was not intrinsic but due to current
life circumstances of the clients.
Repeated measures
The basis of this search for sources of variability is repeated measurement
of the dependent variable or problem behavior. If this tactic has a familiar
ring to practitioners, it is no accident, for this is precisely the strategy every
practitioner uses daily. It is no secret to clinicians or other behavior change
agents in applied settings that behavioral improvement from an initial obser
vation to some end point sandwiches marked variability in the behavior
between these points. A major activity of clinicians is observing this variabil
ity and making appropriate changes in treatment strategies or environmental
circumstances, where possible, to eliminate these fluctuations from a general
improving trend. Because measures in the clinic seldom go beyond gross
observation, and treatment consists of a combination of factors, it is difficult
for clinicians to pinpoint potential sources of variability, but they speculate;
with increased clinical experience, effective clinicians may guess rightly more
often than wrongly. In some cases, weekly observation may go on for years.
As Chassan (1967) pointed out:
These terms are “ad hoc” definitions which move the focus o f inquiry away from
repetitive patterns with observable frequencies to fixed momentary states. But
this notion of the momentary present is specious and deceptive; it is neither fixed
nor momentary nor immediately present, but an inferred condition (p. 39).
A prior design in which variables are distributed, for example, in a Latin square,
may be a severe handicap. When effects on behavior can be immediately ob
served, it is more efficient to explore relevant variables by manipulating them in
an improvised and rapidly changing design. Similar practices have been responsi
ble for the greater part of modern science (Honig, 1966, p. 21).
More recently, this feature of single-case designs has been termed response
guided experimentation (Edgington, 1983, 1984).
General Issues in A Single-case Approach 39
B l o c k s of Two S e s s i o n s
t Circumference change to males averaged over each phase )
individual Sessions
(Circumference change to males averaged over each phase)
that 30 seconds of viewing the female slide alone was followed by 30 seconds
of viewing both the male and female slides simultaneously (side by side),
followed by 30 seconds of the male slide alone. This adjustment (labeled
simultaneous presentation) produced increases in heterosexual arousal in the
separate measurement sessions, which reversed during a return to the original
classical conditioning procedure and increased once again during the second
phase, in which the slides were presented simultaneously. The experiment
suggested that classical conditioning was also effective with this client but
only after a sensitive temporal adjustment was made.
Merely observing the “outcome” of the 2 subjects at the end of a fixed
point in time would have produced the type of intersubject variability so
common in outcome studies of therapeutic techniques. That is, one subject
would have improved with the initial classical conditioning procedure
whereas one subject would have remained unchanged. If this pattern contin
ued over additional subjects, the result would be the typical weak effect
(Bergin & Strupp, 1972) with large intersubject variability. Highlighting the
variability through repeated measurement in the individual and improvising a
new experimental design as soon as a variation in response was noted (in this
42 Single-case Experimental Designs
case no response) allowed an immediate search for the cause of this unrespon
siveness. It should also be noted that this research tactic resulted in immediate
clinical benefit to the patient, providing a practical illustration of the merging
of scientist and practitioner roles in the applied researcher.
male
asthmatic attacks most often followed meetings with the patient’s mother,
particularly if these meetings occurred in the home of the mother. After this
relationship was demonstrated, the patient experienced a change in her life
circumstances which resulted in moving some distance away from her mother.
During the ensuing 20 months, only nine attacks were recorded despite the
fact that these attacks had occurred daily for a period of 2 years prior to
intervention. What is more remarkable is that eight of the attacks followed
her now infrequent visits to her mother.
Once again, the procedure of repeated measurement highlighted individual
fluctuation, allowing a search for correlated events that bore potential causal
relationships to the behavior disorder. It should be noted that no experimen
tal analysis was undertaken in this case to isolate the mother as the cause of
asthmatic attacks. However, the dramatic reduction of high-frequency at
tacks after decreased contact with the mother provided reasonably strong
evidence about the contributory effects of visits to the mother, in an A-B
fashion. What is more convincing, however, is the reoccurrence of the attacks
at widely spaced intervals after visits to the mother during the 20-month
follow-up. This series of naturally occurring events approximates a contrived
A-B-A-B. . . design and effectively isolates the mother’s role in the patient’s
asthmatic attacks (see chapter 5).
2 . 4 . BEHAVIOR TRENDS
AND INTRASUBJECT AVERAGING
C A L O R I E S C O N S U M E D PER DAY
F IG U R E 2-5. C a lo ric in ta k e presented o n a d aily basis during reinforcem ent and reinforcem ent
and feed b a ck p h a ses for the patient w h o se d ata is presented in Figure 2-4. (R ep lotted from Figure
3, p. 283, from : A g ra s, W. S ., B a rlo w , D . H ., C h a p in , H . N ., A b e l, G . G ., an d L eitenberg, H .
(1974]. B e h a v io r m o difica tio n o f a n orexia n ervosa. Archives o f General Psychiatry, 3 0 , 2 7 9 -2 8 6 .
C op yrig h t 1974 by A m erica n M ed ical A sso c ia tio n . R ep rod u ced by p erm issio n .)
ity assumed a pattern of roughly one day of high caloric intake followed by a
day of low intake, the average of 2 days presents a stable pattern. When
feedback was added during the next 12-day phase, the day-to-day variability
remained, but the range was displaced upward, from 2,150 to 3,800 calories
per day. Once again, this pattern of variability was approximately one day of
high caloric intake followed by a low value. In fact, this pattern obtained
throughout the experiment.
In this experiment, feedback was clearly a potent therapeutic procedure
over and above the variability, whether one examines the data day-by-day or
48 Single-case Experimental Designs
in blocks of 2 days. The averaged data, however, present a clear picture cf the
effect of the variable over time. Since the major purpose of the experiment
was to demonstrate the effects of various therapeutic variables with anorex
ics, we chose to present the data in this way. It was not our intention,
however, to ignore the daily variability. The fairly regular pattern of change
suggests several environmental or metabolic factors that may account for
these changes. If one were interested in more basic research on eating patterns
in anorexics, one would have to explore possible sources of this variability in
a finer analysis than we chose to undertake here.
It is possible, of course, that feedback might not have produced the clear
and clinically relevant increase noted in these data. If feedback resulted in a
small increase in caloric intake that was clearly visible only when data were
averaged, one would have to resort to statistical tests to determine if the
increase could be attributed to the therapeutic variable over and above the
day-to-day variability (see chapter 9). Once again, however, one may question
the clinical relevance of the therapeutic procedure if the improvement in
behavior is so small that the investigator must use statistics to determine if
change actually occurred. If this situation obtained, the preferred strategy
might be to improvise on the experimental design and augment the thera
peutic procedure such that more relevant and substantial changes were pro
duced. The issue of clincial versus statistical significance, which was discussed
in some detail above, is a recurring one in single-case research. In the last
analysis, however, this is always reduced to judgments by therapists, educa
tors, etc. on the magnitude of change that is relevant to the setting. In most
cases, these magnitudes are greater than changes that are merely statistically
significant.
The above example notwithstanding, the conservative and preferred ap
proach of data presentation in single-case research is to present all of the data
so that other investigators may examine the intrasubject variability firsthand
and draw their own conclusions on the relevance of this variability to the
problem.
Large intrasubject variability is a common feature during repeated mea
surements of target behaviors in a single-case, particularly in the beginning of
an experiment, when the subject may be accommodating to intrusive mea
sures. How much variability the researcher is willing to tolerate before
introducing an independent variable (therapeutic procedure) is largely a
question of judgment on the part of the investigator. Similar procedural
problems arise when introduction of the independent variable itself results in
increased variability. Here the experimenter must consider alteration in length
of phases to determine if variability will decrease over time (as it often does),
clarifying the effects of the independent variable. These procedural questions
will be discussed in some detail in chapter 3.
General Issues in A Single-case Approach 49
The search for sources of variability within individuals and the use of
improvised and fast-changing experimental designs appear to be contrary to
one of the most cherished goals of any science—the establishment of general
ity of findings. Studying the idiosyncrasies of one subject would seem, on the
surface, to confirm Underwood’s (1957) observation that intensive study of
individuals will lead to discovery of laws that are applicable only to that
individual. In fact, the identification of sources of variability in this manner
leads to increases in generality of findings.
If one assumes that behavior is lawful, then identifying sources of variabil
ity in one subject should give us important leads in sources of variability in
other similar subjects undergoing the same treatments. As Sidman (1960)
pointed out,
And again,
It is unrealistic to expect that a given variable will have the same effects upon all
subjects under all conditions. As we identify and control a greater number o f the
conditions that determine the effects of a given experimental operation, in effect
we decrease the variability that may be expected as a consequence of the opera
tion. It then becomes possible to produce the same results in a greater number of
subjects. Such generality could never be achieved if we simply accepted inter-
subject variability and gave equal status to all deviant subjects in an investigation
(p. 190).
Types of generality
Generalization means many things. In applied research, generalization
usually refers to the process in which behavioral or attitudinal changes in the
treatment setting “generalize” to other aspects of the client’s life. In educa
tional research this can mean generalization of behavioral changes from the
classroom to the home. Generalization of this type can be determined by
observing behavioral changes outside of the treatment setting.
There are at least three additional types of generality in behavior change
research, however, that are more relevant to the present discussion. The first
is generality of findings across subjects or clients; that is, if a treatment effects
certain behavior changes in one subject, will the same treatment also work in
other subjects with similar characteristics? As we shall see below, this is a
large question because subjects can be “similar” in many different ways. For
instance, subjects may be similar in that they have the same diagnostic labels
or behavioral disorders (e.g., schizophrenia or phobia). In addition, subjects
may be of similar age (e.g., between 14 and 16) or come from similar
socioeconomic backgrounds.
Generality across behavior change agents is a second type. For instance,
will a therapeutic technique that is effective when applied by one behavior
change agent also be effective when applied to the same problem by different
agents? A common example is the classroom. If a young, attractive, female
teacher successfully uses reinforcement principles to control disruptive behav
ior in her classroom, will an older female teacher who is more stern also be
able to apply successfully the same principles to similar problems in her class?
Will an experienced therapist be able to treat a middle-aged claustrophobic
more effectively than a naive therapist who uses exactly the same procedure?
A third type of generality concerns the variety of settings in which clients
are found. The question here is will a given treatment or intervention applied
by the same or similar therapist, to similar clients, work as well in one setting
as another? For example, would reinforcement principles that work in the
classroom also work in a summer camp setting, or would desensitization of
an agoraphobic in an urban office building be more difficult than in a rural
setting?
These questions are very important to clinicians who are concerned with
General Issues in A Single-case Approach 51
which treatments are most effective with a given client in a given setting.
Typically, clinicians have looked to the applied researcher to answer these
questions.
ies. In applied research, however, one must study what is available, and this
may result in a heavy weighting on certain client characteristics and inade
quate sampling of other characteristics. Results of a treatment applied to this
sample cannot be generalized to the population. For example, techniques to
control disruptive behavior in the classroom will be less than generalizable if
they are tested in a class where students are from predominantly middle-class
suburbs and inner-city students are underrepresented.
Even in the great snake phobic epidemic of the 1960s, where the behavior
in question was circumscribed and clearly defined, the clients to whom
various treatments were applied were almost uniformly female college sopho
mores whose fear was neither too great (they could not finish the experiment
on time) nor too little (they would finish it too quickly). Most investigators
admitted that the purpose of these experiments was not to generalize treat
ment results to clinical populations, but to test theoretical assumptions and
generate hypotheses. The fact remains, however, that these results cannot even
be generalized beyond female college sophomores to the population of snake
fearers, where age, sex, and amount of fear would all be relevant.
It should be noted that all examples above refer to generality of findings
across clients with similar behavior and background characteristics. Most
studies at least consider the importance of generality of findings along this
dimension, although few have been successful. What is perhaps more impor
tant is the failure of most studies to consider the generality problem in the
other two dimensions—namely, setting generality and behavior change agent
(therapist) generality. Several investigators (e.g., Kazdin, 1973b, 1980b;
McNamara & MacDonough, 1972) have suggested that this information may
be more important than client generality. For example, Paul (1969) noted
after a survey of group studies that the results of systematic desensitization
seemed to be a function of the qualifications of the therapist rather than
differences among clients. Furthermore, in regard to setting generality,
Brunswick (1956) suggested that, “In fact, proper sampling of situations and
problems may be in the end more important than proper sampling of subjects
considering the fact that individuals are probably on the whole much more
alike than are situations among one another” (p. 39). Because of these
problems, many sophisticated investigators specializing in research methodol
ogy have accepted the impracticability of random sampling in this context
and have sought other methods for establishing generality (e.g., Kraemer,
1981).
The failure to be able to make statistically inferential statements, even
about populations of clients based on most clinical research studies, does not
mean that no statements about generality can be made. As Edgington (1966)
pointed out, one can make statements at least on generality of findings to
similar clients based on logical non-statistical considerations. Edgington re
ferred to this as logical generalization, and this issue, along with generality to
SCED—C
54 Single-case Experimental Designs
while the other group becomes the no-treatment control. This arrangement,
which has characterized much clinical and educational research, suffers for
two reasons; (1) To the extent that the “available” clients are not a random
sample, one cannot generalize to the population; and (2) to the extent that the
group is heterogeneous on any of a number of characteristics, one cannot
make statements about the individual. The only statement that can be made
concerns the average response of a group with that particular makeup which,
unfortunately, is unlikely to be duplicated again. As Bergin (1966) noted, it
was even difficult to say anything important about individuals within the
group based on the average response because his analysis demonstrated that
some were improving and some deteriorating (see Strupp & Hadley, 1979).
The result, as Chassan (1967, 1979) eloquently pointed out, was that the
behavior change agent did not know which treatment or aspect of treatment
was effective that was statistically better than no treatment but that actually
might make a particular patient worse.
generalization and which features can be ignored (e.g., hair color) will depend
on the judgment of the clinician and the state of knowledge at the time. But if
one can generalize in logical fashion from a patient whose results or charac
teristics are well specified as part of a homogeneous group, then one can also
logically generalize from a single individual whose response and biographical
characteristics are specified. In fact, the rationale has enabled applied re
searchers to generalize the results of single-case experiments for years (Dukes,
1965; Shontz, 1965). To increase the base for generalization from a single
case experiment, one simply repeats the same experiment several times on
similar patients, thereby providing the clinician with results from a number of
patients.
findings across all important domains in applied research (within the limits
discussed above), one major problem remains: Applied researchers seldom do
this kind of study. As noted in chapter 1, section 1.5, the major reasons for
this are practical. The enormous investment of money and time necessary to
collect large numbers of homogeneous patients has severely inhibited this
type of endeavor. And often, even in several different settings, the necessary
number of patients to complete a study is just not available unless one is
willing to wait years. Added to this are procedural difficulties in recruiting
and paying therapists, ensuring adequate experimental controls such as dou
ble-blind procedures within a large setting, and overcoming resistance to
assigning a large number of patients to placebo or control conditions, as well
as coping with the laborious task of recording and analyzing large amounts of
data (Barlow & Hersen, 1973; Bergin & Strupp, 1972).
In addition, the arguments raised in the last section on inflexibility of the
group design are also applicable here. If one patient does not improve or
reacts in an unusual way to the therapeutic procedure, administration of the
procedure must continue for the specified number of sessions. The unsuccess
ful or aberrant results are then, of course, averaged into the group results
from that experimental cell, thus precluding an immediate analysis of the
intersubject variability, which will lead to increased generality.
Systematic and clinical replication procedures involve exploring the effects
of different settings, therapists, or clients on a procedure previously demon
strated as successful in a direct replication series. In other words, to borrow
the example from the factorial design, a single-case design may demonstrate
that a treatment for severe depression works on an inpatient unit. Several
direct replications then establish generality among homogeneous patients.
The next task is to replicate the procedure once again, in different settings
with different therapists or with patients with different background charac
teristics. Thus the goals of systematic and clinical replication in terms of
generality of findings are similar to those of the factorial study.
At first glance, it does not appear as if replication techniques within single
case methodology would prove any more practical in answering questions
concerning generality of findings across therapists, settings, and types of
behavior disorder. While direct replication can begin to provide answers to
questions on generality of findings across similar clients, the large questions
of setting and therapist generality would also seem to require significant
collaboration among diverse investigators, long-range planning, and a large
investment of money and time—the very factors that were noted by Bergin
and Strupp (1972) to preclude these important replication effects. The sur
prising fact concerning this particular method of replication, however, is that
these issues are not interfering with the establishment of generality of find
ings, since systematic and clinical replication is in progress in a number of
areas of applied research. In view of the fact that systematic and clinical
SCED—C*
62 Single-case Experimental Designs
It was observed in chapter 1 that applied researchers during the 1950s and
1960s often considered single-case versus between-group comparison research
as an either-or proposition. Most investigators in this period chose one
methodology or the other and eschewed the alternative. Much of this polemic
characterized the idiographic-nomothetic dichotomy in the 1950s (Allport,
1961). This type of argument, of course, prevented many investigators from
asking the obvious question: Under what condition is one type of design more
appropriate than another? As single-case designs have become more sophisti
cated, the number of questions answered by this strategy has increased. But
there are many instances in which single-case designs either cannot answer the
relevant applied research question or are less applicable. The purpose of this
book, of course, is to make a case for the relevance of single-case experimen
tal designs and to cover those issues, areas, and examples where a single-case
approach is appropriate and important. We would be remiss, however, in
ignoring those areas where alternative experimental designs offer a better
answer.
Actuarial questions
There are several related questions or issues that require experimental
strategies involving groups. Baer (1971) referred to one as actuarial, although
he might have said political. The fact is, after a treatment has been found
effective, society wants to know the magnitude of its effects. This informa
tion is often best conveyed in terms of percentage of people who improved
compared to an untreated group. If one can say that a treatment works in 75
out of 100 cases where only 15 out of 100 would improve without treatment,
this is the kind of information that is readily understood by society. In a
systematic replication series, the results would be stated differently. Here the
investigator would say that under certain conditions the treatment works,
while under other conditions it does not work; and other therapeutic variables
must be added. While this statement might be adequate for the practicing
clinician or educator, little information on the magnitude of effect is con
veyed. Because society supports research and, ultimately, benefits from it, this
General Issues in A Single-case Approach 63
actuarial approach is not trivial. As Baer (1971) pointed out, this problem
. . is similar to that of any insurance company, we merely need to know
how often a behavioral analysis changes the relevant behavior of society
toward the behavior, just as the insurance company needs to know how often
age predicts death rates” (p. 366). It should be noted, however, that a study
such as this cannot answer why a treatment works; it is simply capable of
communicating the size of the effect. But if the treatment package is the result
of a series of single-case designs, then one should already know why it works,
and demonstration of the magnitude of effect is all that is needed.
Several cautions should be noted when proceeding in this manner. First, the
cost and practical limitation of running a large-group study do not allow
unlimited replication of this effort, if it can be done at all. Thus one should
have a well-developed treatment package that has been thoroughly tested in
single-case experimental designs and replications before embarking on this
effort. Preferably, the investigator should be well into a systematic replicaton
series in order to have some idea of the client, setting, or therapeutic variables
that predict success. Groups can then be constructed in a homogeneous
fashion. Premature application of the group comparison design, where a
treatment or the conditions under which it is effective have not been ade
quately worked out, can only produce the characteristic weak effect with
large intersubject variability that is so prevalent in group comparison studies
to date (Bergin & Strupp, 1972). Of course, well-developed clinical replication
series, where a comprehensive treatment package is replicated across many
individuals with a given problem, can also specify size or effect and the
percentage of clinical success. But the information from the comparison
group would be missing.
General Procedures in
Single-case Research
3.1. INTRODUCTION
67
68 Single-case Experimental Designs
Baseline stability
When selecting a baseline, its stability and range of variability must be
carefully examined. McNamara and MacDonough (1972) have raised an issue
that is continuously faced by all of those involved in applied clinical research.
They specifically posed the following question: “How long is long enough for
a baseline?” (p. 364). Unfortunately, there is no simple response or formula
that can be applied to this question, but a number of suggestions have been
made. Baer, Wolf, and Risley (1968) recommended that baseline measure
ment be continued over time “until its stability is clear” (p. 94). McNamara
and MacDonough concurred with Wolf and Risley’s (1971) recommendation
that repeated measurement be applied until a stable pattern emerges. How
ever, there are some practical and ethical limitations to extending initial
measurement beyond certain limits. The first involved a problem of logistics.
72 Single-case Experimental Designs
under these circumstances would then meet the kind of criticism that has been
leveled at the applied clinical researcher who uses single-case methodology.
For example, Bandura (1969) argued that there is no difficulty in interpreting
performance changes when differences between phases are large (e.g., the
absence of overlapping distributions) and when such differences can be
replicated across subjects (see chapter 10). However, he underscored the
difficulties in reaching valid conclusions when there is “considerable variabil
ity during baseline conditions” (p. 243).
Examples of baselines
With the exception of a brief discussion in Hersen (1982) and in Barlow
and Hersen’s (1973) paper, which was primarily directed toward a psychiatric
readership, the different varieties of baselines commonly encountered in
applied clinical research have neither been examined nor presented in logical
sequence in the experimental literature. Thus the primary function of this
section is to provide and familiarize the interested applied researcher with
examples of baseline patterns. For the sake of convenience, hypothetical
examples, based on actual patterns reported in the literature, will be illus
trated and described. Methods for dealing with each pattern will be outlined,
and an attempt to formulate some specific rules (a la cookbook style) will be
undertaken.
The issue concerning the ultimate length of the baseline measurement
phase was previously discussed in some detail. However, it should be pointed
out here that “A minimum of three separate observation points, plotted on
the graph, during this baseline phase are required to establish a trend in the
data” (Barlow & Hersen, 1973, p. 320). Thus three successively increasing or
decreasing points would constitute establishment of either an upward or
downward trend in the data. Obviously, in two sets of data in which the same
trend is exhibited, differences in the slope of the line will indicate the extent or
power of the trend. By contrast, a pattern in which only minor variation is
seen would indicate the recording of a stable baseline pattern. An example of
such a stable baseline pattern is depicted in Figure 3-1. Mean number of facial
tics averaged over three daily 15-minute videotaped sessions are presented for
a 6-day period. Visual inspection of these data reveal no apparent upward or
downward trend. Indeed, data points are essentially parallel to the abscissa,
while variability remains at a minimum. This kind of baseline pattern, which
shows a constant rate of behavior, represents the most desirable trend, as it
permits an unequivocal departure for analyzing the subsequent efficacy of a
treatment intervention. Thus the beneficial or detrimental effects of the
following intervention should be clear. In addition, should there be an ab
sence of effects following introduction of a treatment, it will also be ap
parent. Absence of such effects, then, would graphically appear as a
74 Single-case Experimental Designs
DAYS
F IG U R E 3-1. T h e stab le b a selin e. H y p o th etica l d ata for m ean n um ber o f facial tics averaged
over three d a ily 15-m inute v id eo ta p ed se ssion s.
continuation of the steady trend first established during the baseline measure
ment phase.
A second type of baseline trend that frequently is encountered in applied
clinical research is such that the subject’s condition under study appears to be
worsening (known as the deteriorating baseline—Barlow & Hersen, 1973).
Once again, using our hypothetical data on facial tics, an example of this kind
of baseline trend is presented in Figure 3-2. Examination of this figure shows
a steadily increasing linear function, with the number of tics observed aug
menting over days. The deteriorating baseline is an acceptable pattern inas
much as the subsequent application of a successful treatment intervention
should lead to a reversed trend in the data (i.e., a decreasing linear function
over days). However, should the treatment be ineffective, no change in the
slope of the curve would be noted. If, on the other hand, the treatment
application leads to further deterioration (i.e., if the treatment is actually
detrimental to the patient—see Bergin, 1966), it would be most difficult to
assess its effects using the deteriorating baseline. In other words, a differen
tial analysis as to whether a trend in the data was simply a continuation of the
baseline pattern or whether application of a detrimental treatment specifically
led to its continuation could not be made. Only if there appeared to be a
pronounced change in the slope of the curve following introduction of a
detrimental treatment could some kind of valid conclusion be reached on the
basis of visual inspection. Even then, the withdrawal and réintroduction of
the treatment would be required to establish its controlling effects. But from
both clinical and ethical considerations, this procedure would be clearly
unwarranted.
A baseline pattern that provides difficulty for the applied clinical researcher
General Procedures in Single-case Research 75
F IG U R E 3-2. T h e in creasing b aselin e (target beh avior deterioratin g). H y p oth etical data for
m ean num ber o f facial tics averaged over three daily 15-m inute v id eotap ed se ssion s.
is one that reflects steady improvement in the subject’s condition during the
course of initial observation. An example of this kind of pattern appears in
Figure 3-3. Inspection of this figure shows a linear decrease in tic frequency
over a 6-day period. The major problem posed by this pattern, from a
research standpoint, is that application of a treatment strategy while improve
ment is already taking place will not allow for an adequate assessment of the
intervention. Secondly, should improvement be maintained following initia
tion of the treatment intervention, the experimenter would be unable to
attribute such continued improvement to the treatment unless a marked
change in the slope of the curve were to occur. Moreover, removal of the
treatment and its subsequent reinstatement would be required to show any
controlling effects.
An alternative (and possibly a more desirable) strategy involves the contin
uation of baseline measurement with the expectation that a plateau will be
reached. At that point, a steady pattern will emerge and the effects of
treatment can then be easily evaluated. It is also possible that improvement
seen during baseline assessment is merely a function of some extrinsic vari
able (Sidman, 1960) of which the experimenter is currently unaware. Follow
ing Sidm an’s recom m endations, it then behooves the methodical
experimenter, assuming that time limitations and clinical and ethical consider
ations permit, to evaluate empirically, through experimental analysis, the
possible source (e.g., “placebo” effects) of covariation. The results of this
kind of analysis could indeed lead to some interesting hunches, which then
might be subjected to further verification through the experimental analysis
method (see chapter 2, section 2.3).
The extremely variable baseline presents yet another problem for the
76 Single-case Experimental Designs
DAYS
F IG U R E 3-3. T h e decreasing b aselin e (target beh a v io r im provin g). H yp o th etica l d ata for
m ean nu m b er o f facial tics averaged ov er three daily 15-m inute vid eotap ed se ssion s.
F IG U R E 3-4. T h e variable b a selin e. H y p o th etica l d ata for m ean num ber o f facial tics
averaged o ver three 15-m inute v id eo ta ped session s.
DAYS
F IG U R E 3-5. T he variable-stable baselin e. H y p o th etica l data for m ean num ber o f facial tics
averaged over three daily 15-m inute vid eotaped session s.
78 Single-case Experimental Designs
F IG U R E 3-7. T h e d ecreasin g-in creasin g b a selin e. H y p o th etical d ata for m ean num ber o f
facial tics averaged over three d aily 15-m inute v id eotaped se ssion s.
F IG U R E 3-8. T h e u n stab le b aselin e. H y p o th etica l data for m ean num ber o f facial tics
averaged over three daily 15-m inute v id eo ta ped session s.
80 Single-case Experimental Designs
reinforcement alone) following the second baseline also does not permit firm
conclusions, either with respect to the effects of social reinforcement alone or
in contrast to the combined treatment of token and social reinforcement. The
experimenter is not in a position to examine the interactive effects of the BC
and C phases, as they are not adjacent to one another.
If our experimenter were interested in accurately evaluating the interactive
effects of token and social reinforcement, the following extended design
would be considered appropriate: A-B-A-B-BC-B-BC. When this experimen
tal strategy is used, the interactive effects of social and token reinforcement
can be examined systematically by comparing differences in trends between
the adjacent B (token reinforcement) and BC (token and social reinforce
ment) phases. The subsequent return to B and réintroduction of the com
bined BC would allow for analysis of the additive and controlling effects of
social reinforcement, assuming expected trends in the data occur.
A published example of the correct manipulation of variables across
phases appears in Figure 3-9. In this study, Leitenberg et al., (1968) examined
the separate and combined effects of feedback and praise on the mean
F IG U R E 3-10. E ach p o in t represents o n e sessio n and indicates the num ber o f intervals in w hich
the su b ject w as o u t o f his seat (to p ) or talkin g w itho u t p erm ission (b o tto m ). A total o f 90 such
intervals w as p o ssib le w ithin a 15-m inute se ssio n . A sterisk s over p oin ts in d icate session s that
resulted in tim e b ein g spent in th e b o o th . (F igure 1, p. 2 37, from : R am p , E ., U lrich , R ., &
D ulaney, S . (1 9 7 1 ). D ela y ed tim eo u t as a p roced u re for reducing d isruptive classroom behavior:
A case study. Journal o f Applied Behavior Analysis, 4, 2 3 5 -2 3 9 . C opyright 1971 by S ociety for
th e E xp erim en tal A n a ly sis o f B ehavior, Inc. R ep rod u ced by p erm ission .)
S 1.00
o
z 0.75 o
O
■ -J tk - z ^
% 0 50
S 0.25 21
a.
2 0 . . . w • W .......
<
1
u 1.00
UNP UNI S HE D
5 0.7 5
B IT E
O 0.5 0
Z 0.25
O
£ 0 A t %
at
O
a.
O BASE T 0 BASE WATCH T 0
at
a. 1 2 3 4 5
F IG U R E 3-11. P r o p o r tio n o f to ta l intervals in w hich Bang (pu n ished) and Bite (unp u n ish ed )
resp onses were recorded for SI in 47 free-play p eriod s. (Figure 1, p. 88, from : P endergrass, V. E.
(1972). T im eo u t from p o sitiv e rein forcem en t fo llo w in g p ersistent, high-rate behavior in retar
d ates. Journal o f Applied Behavior Analysis, 5 , 8 5 -9 1 . C opyright 1972 by S ociety for E xp erim en
tal A n a ly sis o f B ehavior, Inc. R ep rod u ced by p erm issio n .)
General Procedures in Single-case Research 85
LOOKING SMILING
F IG U R E 3-12. M ean num ber o f lo o k s and sm iles for three co u p les in 10-second intervals plotted
in b lo ck s o f 2 m inu tes for the V id eo ta p e F eedback P lu s F ocu sed Instructions D esign . (Figure 3,
p . 556, from : Eisler, R . M ., H ersen , M ., & A gras, W. S. (1973). E ffects o f v id eotap e and
in stru ctional feed b ack o n non verb al m arital interaction: A n a n a log study. Behavior Therapy; 4,
5 5 1 -5 5 8 . C op yrigh t 1973 by A sso c ia tio n for the A d v a n cem en t o f B ehavior Therapy. R eproduced
by p erm issio n .)
SCED—D
86 Single-case Experimental Designs
cdO
rrc
Om
O Z
£(/>0<
OO
~n~n
U)
25c
cn>
c
3)
O
m
U)
F IG U R E 3-13. T otal score o n card sort per experim ental d ay and total freq u en cy o f p ed op h ilic
sexual urges in b lo ck s o f 4 d ays su rroun d in g each exp erim ental day. (L ow er scores in d icate less
sexual a r o u sa l.). (F igure 1, p . 5 99, from : B arlow , D . H ., L eitenberg, H ., & A gras, W. S . (1969).
E xp erim en ta l c o n tro l o f sexual d ev ia tio n through m a n ip u la tio n o f the n ox io u s scene in covert
se n sitiz a tio n . Journal o f Abnormal Psychology, 7 4 , 5 9 6 -6 0 1 . C opyright 1969 by the A m erican
P sy c h o lo g ic a l A s so c ia tio n . R ep ro d u ced b y p erm issio n .)
General Procedures in Single-case Research 87
CB-AB) the only variable manipulated across phases was the time con
tingency.
There are several other important issues related to the investigation of drug
effects in single-case experimental designs that merit careful analysis. They
include the double-blind evaluation of results, long-term carryover effects of
phenothiazines, and length of phases. These will be discussed in some detail
in section 3.6 of this chapter and in chapter 7.
F IG U R E 3-1 4 . D a ily p ercen tages o f tim e spent in social in teraction w ith adults and w ith children
during a p p ro x im a tely 2 h ou rs o f each m o rn in g sessio n . (Figure 2, p. 515, from : A llen K. E .,
H art, B . M ., B u ell, J. S ., H arris, F. R ., & W olf, M . M . (1 9 6 4 ). E ffects o f social rein forcem en t o n
isolate b eh a v io r o f a nursery sc h o o l ch ild . Child Development, 35 5 1 1 -5 1 8 . C opyright 1964.
R ep rod u ced by p erm issio n o f T h e S o ciety for R esearch in C hild D ev elo p m en t, In c.)
Withdrawal of treatment
The specific point at which the experimenter removes the treatment vari
able (second A phase in the A-B-A design) in the withdrawal design is
multidetermined. Among the factors to be considered are time limitations
imposed by the treatment setting, staff cooperation when working in institu
tions (J. M. Johnston, 1972), and ethical considerations when removal of
treatment can possibly lead to some harm to the subject (e.g., head banging
in a retardate) or others in the environment (e.g., physical assaults toward
General Procedures in Single-case Research 91
F I G U R E 3 -1 5 . I n c r e a s in g tr e a tm e n t p h a s e fo llo w e d b y d e c r e a s in g b a s e lin e . H y p o t h e t ic a l d a t a
fo r fr e q u e n c y o f s o c ia l r e s p o n s e s in a s c h iz o p h r e n ic p a tie n t p er 2 -h o u r p e r io d o f o b s e r v a t io n .
92 Single-case Experimental Designs
DAYS
F IG U R E 3-16. H ig h -lev el treatm ent phase fo llo w e d by low -level b aselin e. H yp o th etica l d ata
for frequency o f social resp onses in a schizoph renic patient per 2-hou r period o f o b ser vation .
in Figure 3-18. Inspection of the figure reveals that after a stable pattern is
obtained in baseline, introduction of contingent reinforcement leads to an
immediate and dramatic improvement, which is then followed by a marked
decreasing linear function. This trend is in evidence despite the fact that the
last data point in contingent reinforcement is clearly above the highest point
achieved in baseline. Removal of treatment and a return to baseline condi
tions on Day 13 similarly result in a decreasing trend in the data. Therefore,
no conclusions as to the controlling effects of contingent reinforcement are
possible, as it is not clear whether the decreasing trend in the second baseline
is a function of the treatment’s withdrawal or mere continuation of the trend
begun during treatment. Even if withdrawal of treatment were to lead to the
stable low-level pattern seen in the first baseline period, the same problems in
interpretation would be posed.
When the aforementioned trend appears during the course of experimental
treatment, it is recommended that the phase be continued until a more
consistent pattern emerges. However, if this strategy is pursued, the equiva
lent length of adjacent phases is altered (see section 3.6). A second strategy,
although admittedly somewhat weak, is to reintroduce treatment in Phase 4
(thus, we have an A-B-A-B design), with the expectation that a reversed trend
in the data will reflect improvement. There would then be limited evidence for
the treatment’s controlling effects.
A similar problem ensues when treatment is withdrawn in the example that
appears in Figure 3-19. In spite of an initial upward trend in the data when
contingent reinforcement is first introduced (B), the decreasing trend in the
latter half of the phase, which is then followed by a similar decline during the
second baseline (A), prevents an analysis of the treatment’s controlling ef-
u u n i.
' BASELINE BASELINE
REINF.
■
\ A
V \
•
o \ \
\
■ \* •
- * » » * » i .. i i__ » » »
1 3 5 7 9 11 13 15 17
DAYS
SCED—D*
p e r io d o f o b s e r v a tio n .
94 Single-case Experimental Designs
fects. Therefore, the same recommendations made in the case of Figure 3-18
apply here.
Although there has been some intermittent discussion in the literature with
regard to the length of phases when carrying out single-case experimental
research (Barlow & Hersen, 1973; Bijou et al., 1969; Chassan, 1967; J. M.
Johnston, 1972; Kazdin, 1982b), a complete examination of the problems
faced and the decision to be made by the researcher has yet to appear.
Therefore, in this section the major issues involved will be considered includ
96 Single-case Experimental Designs
ing individual and relative length of phases, carryover effects and cyclic
variations. In addition, these considerations will be examined as they apply to
the study of drugs on behavior.
He notes further:
F IG U R E 3-20. E x ten sio n o f the treatm ent p h a se in an attem pt to sh ow its e ffe cts. H yp oth etical
data in w h ich the e ffe c ts o f tim e-o u t o n daily freq u en cy o f hitting other children (based o n a 2-
hour free-p lay situ a tio n ) in a 3-year-old m ale child are ex am in ed .
Training (RCT) in a “secondary enuretic” child (see Figure 3-21). Two target
behaviors, number of enuretic episodes and mean frequency of daily urina
tion, were selected for study in an A-B-A-B experimental design. During
baseline, the child recorded the natural frequency of target behaviors and
received counseling from the experimenter on general issues relating to home
and school. Following baseline, the first week of RCT involved teaching the
child to postpone urination for a 10-minute period after experiencing each
urge. Delay of urination was increased to 20 and 30 minutes in the next 2
weeks. During Weeks 7-9 RCT was withdrawn, but was reinstated in Weeks
10-14.
Examination of Figure 3-21 indicates that each of the first three phases
consisted of 3 weeks, with data reflecting the controlling effects of RCT on
both target behaviors. Reinstatement of RCT in the final phase led to re
newed control, and the treatment was extended to 5 weeks to ensure main
tenance of gains.
It might be noted that phase and data patterns do not often follow the ideal
sequence depicted in the Miller (1973) study. And, as a consequence, experi
menters frequently are required to make accommodations for ethical, proce-
CONSECUTIVE OAYS
F IG U R E 3-21. N u m b er o f en uretic ep iso d es per w eek and m ean num ber o f daily u rin ation s per
w eek for S u b ject 1. (F igu re 1, p. 2 91, from : M iller, P. M . (1 9 7 3). A n experim ental an alysis o f
reten tion c o n tro l training in the treatm ent o f n o cturna l enuresis in tw o in stitu tion alized a d o le s
cen ts. Behavior Therapy; 4 , 2 8 8 -2 9 4 . C op yrigh t 1973 by A sso c ia tio n for the A d v an cem en t o f
B ehavior Therapy. R ep rod u ced by p erm issio n .)
General Procedures in Single-case Research 99
Carryover effects
A parametric issue that is very much related to the comparative lengths of
adjacent baseline and treatment phases is one of overlapping (carryover)
effects. Carryover effects in behavioral (as distinct from drug) studies usually
appear in the second baseline phase of the A-B-A-B type design and are
characterized by the experimenter’s inability to retrieve original levels of
baseline responding. Not only is the original baseline rate not recoverable in
some cases (e.g., Ault, Peterson, & Bijou, 1968; Hawkins et al., 1966), but on
occasion (e.g., Zeilberger, Sampen, & Sloane, 1968) the behavior under study
undergoes more rapid modification the second time the treatment variable is
introduced.
Presence of carryover effects has been attributed to a variety of factors
including changes in instructions across experimental conditions (Kazdin,
1973b), the establishment of new conditioned reinforcers (Bijou et al., 1969),
the maintenance of new behavior through naturally occurring environmental
contingencies (Krasner, 1971b), and the differences in stimulus conditions
across phases (Kazdin & Bootzin, 1972). Carryover effects in behavioral
research are an obvious clinical advantage, but pose a problem experimen
tally, as the controlling effects of procedures are then obfuscated.
Proponents of the group comparison approach (e.g., Bandura, 1969)
contend that the presence of carryover effects in single-case research is one of
its major shortcomings as an experimental strategy. Both in terms of drug
evaluation (Chassan, 1967) and with respect to behavioral research (Bijou et
al., 1969), short periods of experimentation (application of the treatment
variable) were recommended to counteract these difficulties. Examining the
problem from the operantTramework, Bijou et al. argued that “In studies
involving stimuli with reinforcing properties, relatively short experimental
periods are advocated, since long ones might allow enough time for the
establishment of new conditioned reinforcers” (p. 202). Carryover effects are
also an important consideration in alternating treatment designs but are more
easily handled through counterbalancing procedures (see chapter 8).
A major difficulty in carrying out meaningful evaluations of drugs on
behavior using single-case methodology involves their carryover effects from
one phase to the next. This is most problematic when withdrawing active drug
100 Single-case Experimental Designs
treatment (B phase) and returning to the placebo (A, phase) condition in the
A-Ai-B-A,-B design. With respect to such effects, Chassan (1967) pointed out
that “This, for instance, is thought likely to be the case in the use of
monoaminoxidase inhibitors for the treatment of depression” (p. 204). Simi
larly, when using phenothiazine derivatives, the experimenter must exercise
caution inasmuch as residuals of the drugs have been found to remain in body
tissues for extended periods of time (as long as 6 months in some cases)
following their discontinuance (Ban, 1969).
However, it is possible to examine the short-term effects of phenothiazines
on designated target behaviors (Liberman et al., 1973), but it behooves the
experimenter to demonstrate, via blood and urine laboratory studies, that
controlling effects of the drug are truly being demonstrated. That is to say,
correlations (statistical and graphic data patterns) between behavioral
changes and drug levels in body tissues should be demonstrated across
experimental phases.
Despite the carryover difficulties encountered with the major tranquilizers
and antidepressants, the possibility of conducting extended studies in long
term facilities should be explored, assuming that high ethical and experimen
tal standards prevail. In addition, study of the short-term efficacy of the
minor tranquilizers and amphetamines on selected target behaviors is quite
feasible.
Cyclic variations
A most neglected issue in experimental single-case research is that of cyclic
variations (see chapter 2, sections 2.2 and 2.3, for a more general discussion
of variability). Although the importance of cyclic variations was given atten
tion by Sidman (1960) with respect to basic animal research, and J. M.
Johnston & Pennypacker (1981) in a more applied context, the virtual ab
sence of serious consideration of this issue in the applied literature is striking.
This issue is of paramount concern when using adult female subjects as their
own controls in short-term (one month or less) investigations. Despite the
fact that the effects of the estrus cycle on behavior are given some consider
ation by Chassan (1967), he argued that . . a 4-week period (with random
phasing) would tend to distribute menstrual weeks evenly between treat
ments” (p. 204). However, he did recognize that “The identification of such
weeks in studies involving such patients would provide an added refinement
for the statistical analysis of the data” (p. 204).
Whether one is examining drug effects or behavioral interventions, the
implications of cyclic variation for single-case methodology are enormous.
Indeed, the psychiatric literature is replete with examples of the deleterious
effects (leading to increased incidence of psychopathology) of the premen
strual and menstrual phases of the estrus cycle on a wide variety of target
General Procedures in Single-case Research 101
the multiple baseline strategy is ideally suited for studying such variables, in
that withdrawals of treatment are not required to show the controlling effects
of particular techniques (Baer et al., 1968; Barlow & Hersen, 1973; Hersen,
1982; Kazdin, 1982b). A complete discussion of issues related to the varieties
of multiple baseline designs currently being employed by applied researchers
appears in chapter 7.
In this section, however, the limited use and evaluation of therapeutic
instructions in withdrawal designs will be examined and illustrated. Let us
consider the problems involved in “withdrawing” therapeutic instructions. In
contrast to a typical reinforcement procedure, which can be introduced,
removed, and reintroduced at will, an instructional set, after it has been
given, technically cannot be withdrawn. Certainly, it can be stopped (e.g.,
Eisler, Hersen, & Agras, 1973) or changed (Agras et al., 1969; Barlow,
Agras, Leitenberg, Callahan, & Moore, 1972), but it is not possible to remove
it in the same sense as one does in the case of reinforcement. Therefore, in
light of these issues, when examining the interacting effects of instructions
and other therapeutic variables (e.g., social reinforcement), instructions are
typically maintained constant across treatment phases while the therapeutic
variable is introduced, withdrawn, and reintroduced in sequence (Hersen,
Gullick, Matherne, & Harbert, 1972).
Exceptions
There are some exceptions to the above that periodically have appeared in
the psychological literature. In two separate studies the short-term effects of
instructions (Eisler, Hersen, & Agras, 1973) and the therapeutic value of
instructional sets (Barlow et al., 1972) were examined in withdrawal designs.
In one of a series of analogue studies, Eisler, Hersen and Agras investigated
the effects of focused instructions (“We would like you to pay attention as to
how much you are looking at each other”) on two nonverbal behaviors
(looking and smiling) during the course of 24 minutes of free interaction in
three married couples. An A-B-A-B design was used, with A consisting of 6
minutes of interaction videotaped between a husband and wife in a small
television studio. The B phase also involved 6 minutes of videotaped interac
tion, but focused instructions on looking were administered three times at 2-
minute intervals over a two-way intercom system by the experimenter from
the adjoining control room. During the second A phase, instructions were
discontinued, while in the second B they were renewed, thus completing 24
minutes of taped interaction.
Retrospective ratings of looking and smiling for husbands and wives (mean
data for the three couples were used, as trends were similar in all cases)
appear in Figure 3-22. Looking duration in baseline for both spouses was
moderate in frequency. In the next phase, focused instructions resulted in a
General Procedures in Single-case Research 103
LOOKING SMILING
F IG U R E 3-22. M ean n um ber o f lo o k s and sm iles for three co u p les in 10-second intervals plotted
in b lo ck s o f 2 m inu tes fo r the F o cu sed Instructions A lo n e D esig n . (Figure 4, p. 556, from : Eisler,
R . M ., H ersen , M ., & A g ra s, W. S. (1973). E ffects o f v id eo ta p e and instructional feedback o n
non verb al m arital interactions: A n a n a lo g study. Behavior Therapy, 4, 5 5 1 -5 5 8 . C op yrigh t 1973
by A sso c ia tio n fo r the A d v a n cem en t o f B ehavior Therapy. R eproduced by p erm ission .)
104 Single-case Experimental Designs
In summary, data from this study show that covert sensitization treatment
is the effective procedure and that therapeutic expectancy is definitely not the
primary ingredient leading to success. To the contrary, a positive set paired
with a placebo-relaxation condition in baseline did not yield improvement in
the target behavior.
Although the design in this study permits conclusions as to the efficacy of
positive and negative sets, a more direct method of assessing the problem
could have been accomplished in the following design: (1) baseline placebo,
(2) acquisition with positive instructions, (3) acquisition with negative instruc
tions, and (4) acquisiton with postive instructions. When labeled alphabeti
cally, it provides an A-BC-BD-BC design. In the event that negative
instructions were to exert a negative effect in the BD phase, a reversed trend
in the data would appear. On the other hand, should negative instructions
have no effect or a negligible effect, then a continued downward linear trend
would appear across phases BC, BD and the return to BC.
Assessment Strategies
by Donald R Hartmann
4.1. INTRODUCTION
Thanks to Lynne Zarbatany for her critical reading of an earlier draft o f this
chapter and to Andrea Stavros for her typing and editorial assistance.
107
108 Single-case Experimental Designs
The stem of the assessment funnel represents the baseline, treatment, and
follow-up phases of an intervention study. Measurement during these phases
requires a more narrow focus on the target behavior for purposes of refining,
and in some cases, extensively modifying, the intervention and subsequently
evaluating its impact.3 Assessment during these phases typically employs
direct observation of the target behavior(s) in either contrived or natural
Assessment Strategies 111
D efinition: Peer in teraction refers to a social relationship betw een agem ates
such that they m utu ally influence each other (C h aplin , 1975).
E laboration: Peer in teraction is scored w hen the child is (a) w ithin three feet
o f a peer and either (b) en gaged in con versation or physical
a ctivity w ith the peer or (c) jo in tly using a toy or other play
o b je ct.
Note. F rom G e lfa n d , D . M . & H a rtm a n n , D . P. C h ild behavior: A n alysis and therapy (2nd ed .).
E lm sfo r d , NY: P erg a m o n P ress. C op yrigh t 1984. R ep rod u ced by p erm ission.
sampled (P. H. Bornstein et al., 1980), these issues either have not captured
the interests of behavior change researchers, or the cost of conducting obser
vations in multiple settings has exceeded available resources.
While most investigators would prefer to observe behavior as it naturally
occurs (e.g., Kazdin, 1982b), a number of factors may require that observa
tions be conducted elsewhere. The reasons for employing contrived or ana
logue settings include convenience to observers and clients; the need for
standardization or measurement sensitivity; or the fact that the target behav
ior naturally occurs as a low rate, and observations in natural settings would
involve excessive dross. All of these factors may have determined R. T. Jones,
Kazdin and Haney’s (1981b) choice of a contrived setting to assess the
effectiveness of a program to improve children’s skill in escaping from home
emergency fires.
The correspondence between behavior observed in contrived observational
settings and in naturalistic settings varies as a function of (1) similarities in
their physical characteristics, (2) the persons present, and (3) the control
exerted by the observation process (Nay, 1979). Even if assessments are
conducted in naturalistic settings, the observations may produce variations in
the cues that are normally present in these settings. For example, setting cues
may change when structure is imposed on observation settings. Structuring
may range from presumably minor restrictions in the movement and activities
of family members during home observations to the use of highly contrived
situations, as in some assessments of fears and social skills. Haynes (1978),
McFall (1977), and Nay (1977, 1979) provided examples of representative
studies that employed various levels and types of structuring in observation
settings; they also discussed the potential advantages and limitations of
structuring relative to cost, measurement sensitivity, and generalizability.
Cues in observation settings may also be affected by the type of observers
used and their relationship to the persons observed. Observers can vary in
their level of participation with the observed. At the one extreme are nonpar
ticipant (independent) observers whose only role is to gather data. At the
other extreme are self-observations conducted by the subject or client. In
termediate levels of participant-observation are represented by significant
others, such as parents, peers, siblings, teachers, aides, and nurses, who are
normally present in the setting where observations take place (e.g., Bickman,
1976). The major advantages of participant-observers is that they may be
present at times that might otherwise be inconvenient for independent obser
vers, and their presence may be less obtrusive. On the other hand, they may
be less dependable, more subject to biases, and more difficult to train and
evaluate than are independent observers (Nay, 1979).
When observation settings vary from natural life settings either because of
the presence of possibly obtrusive external observers or the imposition of
structure, the ecological validity of the observations is open to question (e.g.,
114 Single-case Experimental Designs
Barker & Wright, 1955; Rogers-Warren & Warren, 1977). Methods of limiting
these threats to ecological validity are discussed in the section on observer
effects.
Though selection of observation settings is an important issue, investiga
tors must also determine how best to sample behaviors within these settings.
Sampling of behavior is influenced by how observations are scheduled.
Behavior cannot be continuously observed and recorded except by partici
pant-observers and when the targets are low-frequency events (see, for exam
ple, the Clinical Frequency Recording System employed by Paul & Lentz,
1977) , or when self-observation procedures are employed (see Nelson, 1977).
Otherwise, the times in which observations are conducted must be sampled,
and decisions must be made about the number of observation sessions to be
scheduled and the basis for scheduling. More samples are required when
behavior rates are low, variable, and changing (either increasing or decreas
ing); when events controlling the target behaviors vary substantially; and
when observers are asked to employ complex coding procedures (Haynes,
1978) .
Once a choice has been made about how frequently to schedule sessions, a
session duration must be chosen. In general, briefer sessions are necessary to
limit observer fatigue when a complex coding system is used, when coded
behaviors occur at high rates, and when more than one subject must be
observed simultaneously. Ultimately, however, session duration, as well as the
number of observation sessions, should be chosen to minimize costs and to
maximize the representativeness, sensitivity, and reliability of data and the
output of information per unit of time. For an extended discussion of these
issues as they apply to scheduling, see Arrington (1943). If observations are to
be conducted on more than one subject, decisions must be made concerning
the length of time and the order in which each subject will be observed.
Sequential methods, in which subjects are observed for brief periods in a
previously randomized, rotating order, are superior to fewer but longer
observations or to haphazard sampling (e.g., Thomson, Holmberg, & Baer,
1974).
category responses such as pitches hit, or instructions complied with; and (3)
when individuals are themselves the measurement units, such as the number
of individuals who litter, overeat, commit murder, or are in their seats at the
end of recess (Kazdin, 1982b). Behaviors such as crying, for which individual
incidents vary in temporal or in other important respects or which may be
difficult to classify into discrete events, are better evaluated using another
response dimension such as duration.
When response occurrences are easily discriminated, and occur at moder
ate to low rates, frequencies can be tallied conveniently by moving an object,
such as a paper clip, from one pocket to another; by placing a check mark on
a sheet of paper; or by depressing the knob on a wrist counter. When
responses occur at very low rates, even a busy participant can record a wide
range of behavior for a large number of individuals (e.g., Wood, Callahan,
Alevizos, & Teigen, 1979). More complex observational settings require the
use of a complicated recording apparatus or of multiple observers; sampling
of behaviors, individual or both; or making repeated passes through either
video or audio recordings of the target behaviors (e.g., Holm, 1978; Simpson,
1979).
Response duration, or one of its derivatives such as percentage of time
spent in an activity, is assessed when a temporal characteristic of a response is
targeted such as the length of time required to perform the response, the
response latency, or the interresponse time (Cone & Foster, 1982). While
duration is less commonly observed than is frequency (e.g., M. B. Kelly,
1977), duration has been measured for a variety of target responses including
the length of time that a claustrophobic, patient sat in a small room (Leiten-
berg et al., 1968) and latency to comply with classroom instructions
(Fjellstedt & Sulzer-Azaroff, 1973).
Duration measures require the availability of a suitable timing device and a
target response with clearly discernible onsets and offsets. In single-variable
studies, the general availability and convenience of digital wristwatches with
real time and stopwatch functions may enable even a participant observer to
serve as the primary source of data. In the case of multiple-target behaviors, a
complex timing device such as a multiple-channel event recorder such as a
Datamyte is required.
Response quality is typically assessed when target behaviors vary either in
(1) intensity or amplitude, such as noise level and penile erection; (2) ac
curacy, such as descriptions of place and time used to test general orientation;
or (3) acceptability, such as the appropriateness of assertion and the intelligi
bility of speech (Cone & Foster, 1982). These qualitative dimensions may be
evaluated on continuous or discrete scales, and the discrete scales can them
selves be dichotomous or multi-categorical. For example, assessment of the
amount of food spilled by a child could be made by weighing the child and
the food on his or her plate before and after each meal (quantitative,
116 Single-case Experimental Designs
the fragmentary picture it gives of the stream of behavior; (2) the difficulty of
identifying sources of disagreements between observers, unless the observa
tions are locked into real time; (3) the unreliability of observations when
response onset or offset are difficult to discriminate; and (4) the tendency of
observers to nod off when coded events occur infrequently (Nay, 1979; Reid,
1978; Sulzer-Azaroff & Mayer, 197). Despite these disadvantages, event re
cording is a commonly used method in behavior change research (M. B. Kelly,
(1977).
Duration recording is used when one of the previously discussed temporal
aspects of responding is targeted. According to M.B. Kelly (1977), duration
recording is the least used of the common recording techniques, perhaps in
part because of the belief that frequency is a more basic response characteris
tic (e.g., Bijou et al., 1969), and perhaps in part because of the apparent ease
of estimating duration by either of the two methods described next.
Scan sampling, also referred to as instantaneous time sampling, momen
tary time sampling, and discontinuous probe time sampling, is particularly
useful with behaviors for which duration (percentage of time occurrence) is a
more meaningful dimension than is frequency. With scan sampling, the
observer periodically scans the subject or client and notes whether or not the
behavior is occurring at the instant of the observation. The brief observation
periods that give this technique its name can be signaled by the beep of a
digital watch, an oven timer, or an audiotape played through an earplug, on
either a fixed or random schedule. Impressive applications of scan sampling
with chronic mental patients were described by Paul and his associates (Paul
& Lentz, 1977; Power, 1979).
The final procedure, interval recordings is also referred to as time sampling,
one-zero recording, and the Hansen system. It is at the same time one of the
most popular recording methods (M. B. Kelly, 1977) and one of the most
troublesome (e.g., Altman, 1974; Kraemer, 1979). With this technique, an
observation session is divided into brief observe-record intervals, and each
interval is scored if the target behavior occurs either throughout the interval,
or, more commonly, during any part of the interval (Powell, Martindale,
& Kulp, 1975). The observation and recording intervals can be signaled
efficiently and unobtrusively by means of an earpiece speaker used in con
junction with a portable cassette audio recorder. The observers listen to an
audiotape on which is recorded the number of each observation and record
ing interval, separated by the actual length of these intervals. If data sheets
are similarly numbered, the likelihood of observers getting lost is substan
tially reduced in comparison to the use of other common signaling devices.
While interval recording procedures have been recommended for their
ability to measure both response frequency and response duration, recent
research indicates that this method may provide seriously distorted estimates
of both of these response characteristics (see Hartmann & Wood, 1982). As a
SCED—E
118 S in g le o se Experimental Designs
Observer effects
Observer effects represent a conglomerate of systematic or directional
errors in behavior observations that may result from using human observers.
The most widely recognized and potentially hazardous of these effects include
reactivity, bias, drift, and cheating (e.g., Johnson & Bolstad, 1973; Kent &
Foster, 1977; Wildman & Erickson, 1977).
Reactivity refers to the fact that subjects may respond atypically as a result
of being aware that their behavior is being observed (Weick, 1968). The
factors that contribute to reactivity (e.g., Arrington, 1939; Kazdin, 1982a)
Assessment Strategies 119
M ETHOD A D V A N T A G E S A N D D IS A D V A N T A G E S
Note . A d a p te d fro m G e lfa n d , D . M . & H a rtm a n n , D . P. (1984). C hild behavior: A n alysis and
therapy (2nd e d .). E lm sfo r d , NY: P erg a m o n P ress. C op yrigh t 1984. R eproduced by p erm ission.
120 Single-case Experimental Designs
has gradually evolved over time (Arrington, 1939, 1943) or when response
definitions or measurement procedures are informally altered to suit novel
changes in the topography of some target behavior (Doke, 1976). Drift can
also result from observer satiation or boredom (Weick, 1968). Observer drift
can cause inflated estimates of interobserver reliability when these estimates
are based on data obtained (1) during training sessions, (2) from overt
reliability assessment no matter when scheduled, or (3) from a long-standing,
familiar team of observers during the course of a lengthy investigation (see
Hartmann & Wood, 1982).
Drift can be limited or its effects reduced by providing continuing training
throughout a project, by training and recalibrating all observers at the same
time, and by inserting random and covert reliability probes throughout the
course of the investigation. Alternatively, investigators can take steps to
evaluate the presence of observer drift by having observers periodically rate
prescored videotapes (sometimes referred to as criterion videotapes), by
conducting reliability assessment across rotating members of observation
teams, and by using independent reliability assessors (see reviews by Cone &
Foster, 1982; Hartmann & Wood, 1982; Haynes, 1978).
Observer cheating has been reported only rarely (e.g., Azrin, Holz, Ulrich,
& Goldiamond, 1961). More commonly, observers have been known to
calculate inflated reliability coefficients, though these calculation mistakes are
not necessarily the result of intentional fabrication (e.g., Rusch, Walker, &
Greenwood, 1975). Precautions against observer cheating include random,
unannounced reliability spot checks; collection of data forms immediately
after an observation session ends; restriction of data analysis and reliability
calculations to individuals who did not collect the data; provision of pens
rather than pencils to raters (obvious corrections might then be evaluated as
an indirect measure of cheating); and reminders to observers about the
canons of science and the dire consequences of cheating (Hartmann & Wood,
1982). See the section on staging reliability assessments (p. 124) for further
suggestions regarding limiting observer drift and observer cheating.
throughout this training phase, and all scoring decisions and clarifications
should be posted in an observer log or noted in the observation manual that
each observer carries.
Practice in the observation setting follows. Practice observations can serve
the dual purpose of desensitizing observers to fears about the setting (i.e.,
inpatient psychiatric unit) and allowing subjects or clients to habituate to the
observation procedures. Training considerations outlined in the previous step
are also relevant here. Particular attention should be given to observer
motivation. Reid (1982) suggests that observer motivation and morale may be
strengthened by providing observers with (1) varied forms of scientific stimu
lation such as directed readings on topics related to the project, and (2)
incentives for obtaining reliable and accurate data.
During the course of the investigation, periodic retraining and recalibration
sessions should be conducted with all observers: recalibration could include
spot tests on the observation manual, coding of prescored videotapes, and
covert reliability assessments. If data quality declines, extra retraining ses
sions should be held. At the end of the investigation, observers should be
interviewed to ascertain any biases or other potential confounds that may
have influenced their observations. Observers should be informed about the
nature and results of the investigation and should receive acknowledgment in
technical reports or publications.
Reliability
Observational instruments require periodic assessments to ensure that they
promote correct decisions regarding treatment effectiveness. Such evaluations
are particularly critical for relatively untried observational instruments, for
those that attempt to obtain scores on multiple-response dimensions, and for
those that are applied in uncontrolled, naturalistic settings by unprofessional
personnel. Traditionally, these evaluations have fallen under the domain of
one of the various theories of reliability (or more recently of generalizability)
and its associated methods (Cronbach et al., 1972; Nunnally, 1978).
Any reliability analysis requires a series of decisions. These decisions
involve selecting the dimensions of observation that require formal assess
ment; deciding on the conditions under which reliability data will be
gathered; choosing a unit of analysis; selecting a summary reliability statistic;
interpreting the values of reliability statistics; modifying, if necessary, the
data collection plan; and reporting reliability information.
The first step in assessing data quality is to decide the dimensions (or facets)
of the data that are important to the research question. Potentially relevant
dimensions can include observers, coding categories, occasions, and settings
(e.g., Cone, 1977). With the exception of interobserver reliability,6 these
dimensions have not engaged the systematic attention of researchers using
Assessment Strategies 125
SU M M A R Y TABLE
02
R aw A greem en t = a + d - .85
O ccurrence A g reem en t = a/(a + b + c) = .80
N on occu rrence A g reem en t = d/(b + c + d) = .63
Kappa = (a + d - p tp 2 - q iq2)/(\ - p & 2 - - .66
ments (e.g., Cone & Foster, 1982; Hawkins & Dotson, 1975), whereas other
procedures provide formal correction for chance agreements. The most pop
ular of these corrected statistics is Cohen’s kappa (J. Cohen, 1960). Kappa
has been discussed and illustrated by Hartmann (1977) and Hollenbeck
(1978), and a useful technical bibliography on kappa appears in Hubert
(1977). Kappa may be used for summarizing observer agreement as well as
accuracy (Light, 1971), for determining consistency among many raters
(A. J. Conger, 1980), and for evaluating scaled (partial) consistency among
observers (J. Cohen, 1968).
Table 4-5 includes qualitative data from a subject—scores from six sessions
for two observers—and analyses of these data. The percentage agreement for
these data, sometimes called marginal agreement (Frick & Semmel, 1978), is
the ratio of the smaller value (frequency or duration) to the larger value
obtained by two observers, multiplied by 100. This form of percentage
agreement also has been criticized for potentially inflating reliability estimates
(Hartmann, 1977). Berk (1979) advocated use of generalizability coefficients,
as these statistics provide more information and permit more options than do
either percentage agreement or simple correlation coefficients (also see Hart
mann, 1977; Mitchell, 1979; and Shrout & Fleiss, 1979). Despite these advan
tages, some researchers argue that generalizability and related correlational
approaches should be avoided because their mathematical properties may
128 Single-case Experimental Designs
O BSER V ER S
1 11 9 82%
2 8 6 75%
3 9 7 78%
4 10 9 90%
5 12 11 92%
6 8 8 100%
A N A L Y S IS O F V A R IA N C E S U M M A R Y
G E N E R A L IZ A B IL IT Y O R IN T E R C L A S S C O E F F IC IE N T S (ICQ
example, sessions), and the experimental design. Thus, data quality must be
evaluated in the context of these factors (Hartmann & Gardner, 1979). If
consideration of these factors indicates that the data are of adequate quality,
further modification of the observational system is not required. However,
if one or more forms of reliability prove unacceptable, revision of the
research plan is in order.
If the quality of data is judged unsatisfactory, a number of options are
available to the investigator. For example, if consistency across observers is
inadequate, the investigator can train observers more extensively, improve
observation and recording conditions, clarify definitions, use more than one
observer to gather data and analyze the average of the observers’ scores, or
employ some combination of the options just described (Hartmann, 1982).
If the performance of observer is adequate, but the target behavior varies
substantially across occasions, the researcher may modify the observational
setting by removing distracting stimuli or by adding a brief habituation
period to each observational session (e.g., Sidman, 1960), increase the length
of each observation period until a session duration is discovered which will
provide consistent data, or increase the number of sessions and then average
scores over the number of sessions required to achieve stable performance.
The option that is selected will depend upon the purpose of the study and
on practical considerations, such as the investigator’s ability to identify and
control undesirable sources of variability and the feasibility of increasing the
number or length of observation sessions (Hartmann & Gardner, 1981).
Recommendations for reporting reliability information have ranged from
the suggestion that investigators embellish their primary data displays with
disagreement ranges and chance agreement levels (Birkimer & Brown, 1979)
to advocacy of what appear to be cumbersome tests of statistical significance
(Yelton, Wildman, & Erickson, 1977). The recommendations that follow
were proposed by Hartmann and Wood (1982): (1) Reliability estimates
should be reported on interobserver accuracy, consistency, or both, as well as
on session reliability; (2) in the case of interobserver consistency or accuracy
assessed with agreement statistics, either a chance-corrected index or the
chance level of agreements for the index used should be reported; (3) reliabil
ity should be reported for covert reliability assessments scheduled periodically
throughout the course of the study, for different subjects (if relevant), and
across experimental conditions; and (4) reliability should be reported for each
variable that is the focus of substantive analysis.
Validity
Validity, or the extent to which a score measures what it is intended to
measure, has not received much attention in observation research (e.g.,
Johnson & Bolstad, 1973; O’Leary, 1979). In fact, observations have been
130 Single-case Experimental Designs
Target behaviors may be identified for which direct observations are im
practical, impossible, or unethical (e.g., Cone & Foster, 1982). In such cases,
one or more alternative assessment techniques are required. These techniques
may include products of behavior, self-report measures, or physiological
procedures. Measurement of behavioral products, such as number of emptied
liquor containers, may be particularly useful when the target behavior is
relatively inaccessible to direct observation because of its infrequency, sub
tlety, or private nature; when either the behavior or its observation causes
embarrassment to the client; or when observation by others would otherwise
disrupt or seriously distort the form, incidence, or duration of the response.
Self-report measures also may be useful in such circumstances, though they
are prey to a number of distorting influences. At other times, physiological
measures may be required, because either the response is ordinarily inaccessi
ble to unaided human observers or observers cannot provide measures of
sufficient precision. It is to these classes of measures that we briefly turn next.
Behavioral products
Many target behaviors have relatively enduring effects on the environment.
Measuring these behavioral effects or products allows the investigator to
make inferences about the target behaviors associated with the products. This
indirect approach to assessment has several advantages including conven
ience, nonreactivity, and economy. Because the products remain accessible for
some length of time, they can be accurately and precisely measured at a time,
132 Single-case Experimental Designs
Self-report measures
In the tripartite classification of responses (motor, cognitive, and physiolo
gical), self-report measures are associated with the assessment of the cognitive
domain—thoughts, beliefs, preferences, and other subjective dimensions—
because of the inaccessibility of this domain to more direct assessment
approaches. However, self-report techniques also can be used to measure
motor and physiological responses that potentially could be assessed objec
tively (e.g., Barrios, Hartmann, & Shigetomi, 1981). The latter use of self-
reports is common when cost is a critical concern or when the client is not
part of an “observable social system” (Haynes, 1978).
Like other assessment devices, self-report measures can be used to generate
Assessment Strategies 133
investigation. However, even if the inventory does not meet this criterion,
it may be suitable as an aid to selecting subjects, target behaviors, or
treatments (e.g., Hawkins, 1979).
2. Does the questionnaire provide the required degree of specific information
regarding the target behavior? Many traditional self-report techniques
were based on trait assumptions of temporal, situational, and behavioral
(item) homogeneity or consistency that have proven to be incorrect (e.g.,
Mischel, 1968). Although the increased response and situational specificity
of behavioral self-report measures improve their correspondence with
objective measures (e.g., Lick, Sushinsky, & Malow, 1977), the term
behavior in an instrument’s title does not guarantee the requisite degree of
specificity.
3. Is the inventory sensitive enough to detect changes in performance as a
result of treatment? Although most questionnaires evaluated for sensitiv
ity have passed this validity hurdle, not all have done so successfully (e.g.,
Wolfe & Fodor, 1977).
4. Does the questionnaire guard against the biases common to the self-report
genre? Self-report measures are susceptible to a variety of test-related and
subject-related distortions. As regards test-related biases, the wording of
items may be so ambiguous that idiosyncratic interpretations by respon
dents are common (e.g., Cronbach, 1970). Furthermore, items may re
quest information that is beyond subjects’ discrimination, storage, or
recall capabilities, or they may be arranged so as to effect scores (response
bias). Scores may also be effected by clients’ attempts at impression
management. Clients may, for example, endorse socially valued responses
(social desirability), agree with strongly worded alternatives (acquies
cence), endorse responses that they expect to be positively regarded by the
investigator (demand effects), or engage in outright faking or lying. Biases
due to impression management are particularly troublesome in the assess
ment of subjective experiences, as independent verification of the accuracy
of responding may be difficult or impossible. Unfortunately, few question
naires include scales designed to detect biased responding or guard against
its occurrence (Evans, 1983).
5. Finally, does the inventory meet expected reliability and validity require
ments and possess appropriate norms for the population of interest in the
present investigation? Self-report questionnaires may be adequate for one
group, but not for another, so an instrument’s technical information must
be examined with care.
Self -monitoring, the second popular type of self-report among behavioral
clinicians, is similar to direct observation, but with one major exception: The
client is the observer. Data from self-monitoring have been used for target
behavior and treatment selection, as well as for treatment evaluation. How
Assessment Strategies 135
ever, in the latter case, objective assessments typically play a more important
role, except when the target is itself a subjective response.
Self-monitoring has proven particularly useful for assessing rare and sensi
tive behaviors and responses that are only accessible to the client such as pain
due to migraine headaches (Feuerstein & Adams, 1977) and obsessive rumina
tions (Emmelkamp & Kwee, 1977). Other responses assessed via self-moni
toring include appetitive urges, hallucinations, hurt and depressed feelings,
sexual behaviors, and waking time (for insomniacs). An array of behaviors
more susceptible to direct observations also has been monitored by the client,
including weight gain or loss, caloric intake, nail biting, exercise, academic
behaviors, alcohol consumption, and whining. Haynes (1978), Haynes and
Wilson (1979), Nay (1979), and Nelson (1977) surveyed applications of target
behaviors and recording procedures used in self-monitoring.
Self-monitoring procedures share a number of method-related problems.
Foremost among these is reactivity (Haynes & Wilson, 1979; Nelson, 1977).
Reactivity effects vary as a function of the social desirability of the behavior
recorded, with the frequency of positively valued responses likely to increase
and negatively valued acts likely to decrease during the course of self-
monitoring. The obtrusiveness, the timing, and the frequency of self-moni
toring also may influence the level of subject reactivity. Indeed, because of
these reactive effects, self-monitoring has been included in a number of
treatment packages as an intervention technique (e.g., Nay, 1979).
A second, and perhaps more serious, problem is the variable accuracy of
self-monitoring (e.g., Haynes & Wilson, 1979; Nelson, 1977). Inaccurate self-
monitoring can be improved by many of the same stratagems used to improve
the accuracy of direct observation: arrange recording procedures that are
convenient, habitual, and generally nonaversive; provide prior training in
self-monitoring; and encourage and dispense contingencies for accuracy. Self-
monitoring accuracy also can be enhanced by means of various social-
influence procedures such as a public commitment to self-monitor (P. H.
Bomstein, Hamilton, Carmody, Rychtarik, & Veraldi, 1977). Despite the fact
that accuracy can be increased through use of these manipulations, there are
numerous factors adversely affecting the validity of self-monitoring; hence
this approach should be used with caution when it is the only method
available for monitoring the progress or outcome of treatment (Haynes,
1978).
Psychophysiological measures
Psychophysiological measures involve the surface recording of physiologi
cal events, most of which are controlled by the autonomic nervous system
(Haynes, 1978). The assessment of psychophysiological responses has become
increasingly important to behavioral clinicians as a result of the (perhaps
136 Single-case Experimental Designs
usually skin conductance or its reciprocal, skin resistance. EDRs have been
viewed as a measure of activation or autonomic arousal; thus, they often are
used to monitor changes in response to fear stimuli as a result of behavioral
interventions (e.g., Barlow, Leitenberg, Agras, & Wincze, 1969). However,
the use of ectodermal responding as a measure of arousal also must be done
cautiously, as scores vary depending on the EDR response component
measured (conductance, fluctuations, latency, and wave form), the time
sampling parameters utilized, and the specific measurement site and proce
dures used (e.g., Edelberg, 1972; Venables & Christie, 1973).
Sophisticated uses of physiological measures have been made primarily by
laboratory investigators rather than practicing clinicians, due to the expense
of the equipment, the inconvenience associated with its use, and the need for
extensive knowledge of physiology and electronics (Nietzel & Bernstein,
1981).9 Equipment for measuring psychophysiological responses includes (1)
a sensing device, such as electrodes or some form of transducer for detecting
relevant input, (2) a central processor that may include amplifiers for
strengthening the incoming signal and filters for removing “noise;” and (3) an
output for displaying the electronic signals, such as a pen-tracing or a
digitized printout. Because malfunctioning of these components may result in
missing data (a particularly serious problem in individual subject investiga
tions), special precautions should be followed in conducting physiological
assessments. For example, laboratory assistants should be thoroughly famil
iar with the equipment, including its maintenance and calibration, and would
be well advised to practice with nonclinical subjects before actually moni
toring physiological responding during experimental interventions (Hersen &
Barlow, 1976).
In conducting any physiological measurement, investigators should be
aware of the range of variables that may invalidate their records (e.g., Haynes
& Wilson, 1979; Ray & Raczynski, 1981). Aspects of the physical environ
ment, including temperature, lighting, humidity, ambient noise, and un
shielded electrical sources, may affect the client’s or subject’s responding.
Control of these variables is necessary, and subjects should be habituated or
adapted to the laboratory setting before recording occurs. Similarly, record
ing techniques, such as the preparation of the recording site, nature of the
conductive medium, and type, location, and attachment of electrodes or
transducers also can affect the resulting physiological record. Investigators
should consult standard references in this area (e.g., Greenfield & Sternbach,
1972; Stern, Ray, & Davis, 1980; Venables & Martin, 1967) in order to avoid
problems due to unstandardized recording procedures. Procedural variables
also can interact with measurement procedures to determine the nature of
clients’ responses. Thus aspects of the procedure such as the presence and
characteristics of the examiner should be held constant throughout an investi
gation.
Not surprisingly, the characteristics of the response assessed will determine
138 Single-case Experimental Designs
the nature of the resulting record. For example, some responses display
substantial habituation or adaptation effects; that is, the same stimulus
evokes lowered levels of responding following repeated stimulation, both
within and across sessions (cf. Barlow, Leitenburg, & Agras, 1969; Montague
& Coles, 1966). Responsivity to stimulation also will vary inversely with the
prestimulus level of that response. According to this “law of initial values,” a
change in heart rate from 120 to 125 is different from, and probably greater
than, a change from 70 to 75. Thus some form of data transformation may
be necessary to equate response changes at various ranges of the response
dimension (e.g., Ray & Raczynski, 1981). Individuals also may show response
specificity\ or a particular pattern of responding across related stimuli (e.g.,
Lacey, 1959). Because individuals vary in the response system that is most
reactive, investigators should assess their clients’ reactivity before selecting a
measure that will be sensitive to the changes resulting from treatment. Some
physiological systems also may be responsive to circadian rhythms, and to
diurnal as well as layer cyclic effects (Haynes & Wilson, 1979); again,
familarity with standard technique references is critical to the judicious
selection of measurement procedures.
NOTES
1. The by-products, or traces (e.g., Webb, Campbell, Schwartz, Sechrest, & Grove,
1981), of behaviors such as pounds gained and cigarettes smoked also are consid
ered grist for the assessment mill.
2. The inconsistency in target behavior selection is due in part to variations in
individual assessors* notions of what is socially important (Baer et al., 1968), their
personal values regarding the relative desirability of alternative behaviors, their
conceptions o f deviancy, and their familiarity with the immediate and long-term
consequences of various forms of problem behavior. The operation o f these factors
can be seen in the recent controversies centering on modifying feminine sex-role
behaviors among boys and annoying, but only mildly disruptive, classroom behav
iors (e.g., Winett & Winkler, 1972; Winkler, 1977).
3. Not infrequently, additional behaviors will be monitored during one or more o f the
aforementioned phases. For example, measurements may be regularly or periodi
cally obtained on the independent, or treatment, variable to ensure that it is
manipulated in the intended manner. L. Peterson, Homer, and Wonderlich (1982)
argued that the infrequent use o f independent variable checks seriously threatens
the reliability and validity of applied behavior studies. Along with J. M. Johnston
and Pennypacker (1980), they suggested a variety o f methods of assessing the
integrity o f independent variable manipulations. Similar recommendations are
given in related treatment literatures (e.g., Hartmann, Roper, & Gelfand, 1977;
Paul & Lentz, 1977).
At other times the investigator may choose to measure environmental events
such as the opportunities to perform the target response (Hawkins, 1982). For
example, when the target is “instruction following,” assessing the client’s perfor
mance may require measurement o f the occurrence of each instruction or request.
Assessment Strategies 139
7. Self-report measures have proliferated at such a rapid rate that at least one well-
known behavioral assessor suggested that journal editors limit these devices by not
considering for publication those studies employing new instruments that are not
demonstrably superior to existing ones (see comments by blue-ribbon panelists in
Hartmann, 1983).
5.1. INTRODUCTION
niques (cf. Ashem, 1963; Barlow, 1980; Barlow et al., 1983; Lazarus, 1963;
Ullmann & Krasner, 1965; Wolpe, 1958, 1976).
Although there can be no doubt that the case history method yields
interesting (albeit uncontrolled) data, that it is a rich source for clinical
speculation, and that ingenious technical developments derive from its appli
cation, the multitude of uncontrolled factors present in each study do not
permit sound cause-and-effect conclusions. Even when the case study method
is applied at its best (e.g., Lazarus, 1973), the absence of experimental control
and the lack of precise measures for target behaviors under evaluation remain
mitigating factors. Of course, proponents of the case study method (e.g.,
Lazarus & Davison, 1971) are well aware of its inherent limitations as an
evaluative tool, but they show how it can be used to advantage to generate
hypotheses that later may be subjected to more rigorous experimental
scrutiny. Among their advantages, the case study method can be used to (1)
foster clinical innovation, (2) cast doubt on theoretic assumptions, (3) permit
study of rare phenomena (e.g., Gilles de la Tourette’s Syndrome), (4) develop
new technical skills, (5) buttress theoretical views, (6) result in refinement of
techniques, and (7) provide clinical data to be used as a departure point for
subsequent controlled investigations.
With respect to the last point, Lazarus and Davison (1971) referred to the
use of “objectified single case studies.” Included are the A-B-A experimental
designs that allow for an analysis of the controlling effects of variables, thus
permitting scientifically valid conclusions. However, in the more typical case
study approach, a subjective description of treatment interventions and re
sulting behavioral changes is made by the therapist. Most frequently, several
techniques are administered simultaneously, precluding an analysis of the
relative merits of each procedure. Moreover, evidence for improvement is
usually based on the therapist’s “global” clinical impressions. Not only is
there the strong possibility of bias in these evaluations, but controls for the
treatment’s placebo value are unavailable. Finally, the effects of time (ma-
turational factors) are confounded with application of the treatment(s), and
the specific contribution of each of the factors is obviously not distinguished.
More recently, Kazdin (1981) has pointed out how “ . . . the scientific yield
from case reports might be improved in clinical practice where methodologi
cal alternatives are unavailable” (p. 183). In ascending order of rigor, three
types are described: (1) cases with preassessment and postassessment, (2)
cases with repeated assessment and marked changes, and (3) multiple cases
with continuous assessment and stability information (e.g., no change in a
patient’s condition over extended periods of time despite prior therapeutic
efforts). However, notwithstanding improvements inherent in the aforemen
tioned case approaches, threats to internal validity are still present to one
degree or another.
A very modest improvement over the uncontrolled case study method
142 Single-case Experimental Designs
elsewhere (Browning & Stover, 1971) has been labeled the “B Design” In this
“design,” baseline measurement is omitted, but the investigator monitors one
of a number of target measures throughout the course of treatment. One
might also categorize this procedure as the simplest of the time series analyses
(see G. V. Glass, Willson, & Gottman, 1973). Although this strategy ob
viously yields a more objective appraisal of the patient’s progress, the con
founds that typify the case study method apply equally here. In that sense the
B Design is essentially an uncontrolled case study with objective measures
taken repeatedly. This, of course, is the same as Kazdin’s (1981) description of
cases with repeated assessment and marked changes.
The weakness in this design is that the data in the experimental condition is
compared with a forecast from the prior baseline data. The accuracy of an
assessment of the role of the experimental procedure in producing the change
rests upon the accuracy of that forecast. A strong statement of causality there
fore requires that the forecast be supported. This support is accomplished by
elaborating the A-B design, (p. 5)
0 A Y S W E E K S
Day 13. Renewed improvement was then noted between Days 15-18, and
treatment was continued through Day 24. Thus the B phase was twice as long
as baseline, but it was extended for very obvious clinical considerations.
The 12-week follow-up period reveals a zero level of gagging, with the
exception of Week 9, when three gagging episodes were recorded. Follow-up
data were corroborated by the patient’s wife, thus precluding the possibility
that treatment only affected the patient’s verbal report rather than diminution
of actual symptomatology.
Although treatment appeared to be the effective ingredient of change in
this study, particularly in light of the longevity of the patient’s disorder, it is
conceivable that some unidentified variable coincided with the application of
reinforcement procedures and actually accounted for observed changes.
However, the A-B design does not permit a definitive answer to this question.
It might also be noted that the specific use of this design (baseline, treatment,
and follow-up) could readily have been carried out in an outpatient facility
(clinic or private-practice setting) with a minimum of difficulty and with no
deleterious effects to the patient.
Lawson (1983) also used an A-B design with a single target behavior
(alcohol consumption) and obtained a follow-up assessment. His case in
volved a divorced 35-year-old male with a history of problem drinking
beginning at age 16. He periodically would experience blackouts as a function
of his drinking. But despite the chronicity of his problem, with the exception
of a few AA meetings, the subject had not obtained any form of treatment
for his alcoholism. Baseline data (based on the subject’s self-report) indicated
that he consumed an average of 65 drinks per week (see Figure 5-2). This was
confirmed by his girlfriend.
Treatment (B phase) began in the third week, and, on the basis of the
behavioral analyses performed, three goals were identified: (1) to decrease
alcohol consumption, (2) to improve social relationships, and (3) to diminish
frequency of anxiety and depression episodes. Thus the comprehensive
therapy program involved goal setting with regard to number of drinks
consumed, rate-reduction strategies, stimulus-control strategies, development
of new social relationships and recreational activities, assertion training, and
self-management of depression.
Examination of data in Figure 5-2 indicates that there were substantial
improvements in rate of drinking during the course of therapy (to about 10
drinks per week) that appeared to be maintained at the 3-month follow-up
(also confirmed by the girlfriend). Indeed, an informal communication re
ceived by the therapist 1Vi years subsequent to treatment further confirmed
that the subject still was drinking in a socially acceptable manner.
Treatment did appear to be responsible for change in Lawson’s (1983)
alcoholic, particularly given the 19-year history of excessive drinking. This,
then, from a design standpoint, fits in nicly with Kazdin’s notion of repeated
146 Single-case Experimental Designs
F IG U R E 5-2. W eekly se lf-m o n ito red a lc o h o l co n su m p tio n during b aselin e, treatm en t, and at 3-
m o n th fo llo w -u p . (F igure 6 -1 , p . 165, from : L a w so n , D . M . A lc o h o lism . In M . H ersen (E d .).
(19 8 3 ). O u tp a tien t b eh a v io r therapy: A clinical g u id e. N ew York: G rune & Stratton . C opyright
1983 b y M . H ersen . R ep ro d u ced b y p erm issio n .)
reptitiously on the average of one per hour between the hours of 8:00 A.M.
and 10:00 P.M. during non-work-related activities.
The results of this study appear in Figure 5-3. Inspection of these data
indicates that number of points earned in baseline increased slightly but then
stabilized. Baseline ratings of depression show stability, with evidence of
greater daytime activity. Beck scores ranged from 19-28. Institution of token
economy on Day 5 resulted in a marked linear increase in points earned, a
substantial increase in day and evening behavioral ratings of depression, and
a linear descrease in self-reported Beck Inventory scores.
Thus it appears that token economy effected improvement in this patient’s
depression as based on both objective and subjective indexes. However, as
was previously pointed out, this design does not permit a direct analysis of
the controlling effects of the therapeutic variable introduced (token
economy), as does our example of an A-B-A design seen in Figure 5-7
(Hersen, Eisler, Alford, & Agras, 1973). Nonetheless, the use of an A-B
design in this case proved to be useful for two reasons. First, from a clinical
standpoint, it was possible to obtain some objective estimate of the treat
ment’s success during the patient’s abbreviated hospital stay. Second, the
results of this study prompted the further investigation of the effects of token
economic procedures in three additional reactively depressed subjects (Her
sen, Eisler, Alford, & Agras, 1973). In that investigation more sophisticated
experimental strategies confirmed the controlling effects of token economy in
neurotic depression.
•-•8AM— 4PM
COM M ENDATORY SC EN ES R E FU S A L SC EN ES
portraying commendatory scenes and the right half refusal scenes. In general,
improvements during training suggest that the treatment was effective for
both categories (commendatory and refusal) and that there was transfer of
gains from trained to generalization scenes. Moreover, gains appeared to
remain in follow-up, with the exception of smiles (commendatory). However,
a closer examination does reveal a number of problems with these data. First,
for the commendatory scenes there are only one- or two-point baselines.
Therefore, complete establishment of baseline trends was not possible. Also,
for two of the behaviors (smiles, appropriate verbal content), improvements
in training similarly appear to be the continuation of baseline trends. Second,
this also seemed to be the case with regard to refusal scenes for the following
components: eye contact, extraneous movements, appropriate verbal con
tent, and overall social skill. Thus, although the subject was obviously
clinically improved, these data do not clearly reflect experimental confirma
tion of such improvement, given the limited confidence one can ever have
with the A-B strategy.
F IG U R E 5-5. M ean pen ile circu m feren ce ch a n g e to a u d io ta p es and slides during baseline, covert
sensitiza tio n , and fo llo w -u p . (Figure 1, p. 83, from : H arbert, T. L ., B arlow , D . H ., H ersen, M .,
& A u stin , J. B. (19 7 4 ). M easurem ent and m o dification o f in cestu ou s behavior: A case study,
Psychological Reports, 34 , 7 9 -8 6 . C op yrigh t 1974 by P sy ch o logical R ep orts. R eproduced by
p erm issio n .)
152 Single-case Experimental Designs
F IG U R E 5-6. C ard sort sco res o n p ro b e days during b a selin e, covert sen sitization , and fo llo w
u p . (F igure 2, p. 84, from : H arb ert, T. L ., B arlow , D . H ., H ersen , M ., & A u stin , J. B. (1974).
M easu rem en t and m o d ifica tio n o f in cestu o u s b ehavior: A case study. Psychological Reports, 34,
7 9 -8 6 . C o p y rig h t 1974 by P sy ch o lo g ica l R ep o rts. R eproduced by p erm issio n .)
Basic A-B-A Withdrawal Designs 153
However, despite this limitation, the A-B-A design is a useful research tool
when time factors (e.g., premature discharge of a patient) or clinical aspects
of a case (e.g., necessity of changing the level of medication in addition to
reintroducing a treatment variable after the second A phase) interfere with
the correct application of the more comprehensive A-B-A-B strategy.
A second problem with the A-B-A strategy concerns the issues of multiple-
treatment interference, particularly sequential confounding (Bandura, 1969;
Cook & Campbell, 1979). The problem of sequential confounding in an A-B-
A design and its variants also somewhat limits generalization to the clinic. As
Bandura (1969) and Kazdin (1973b) have noted, the effectiveness of a thera
peutic variable in the final phase of an A-B-A design can only be interpreted
in the context of the previous phases. Change occurring in this last phase may
not be comparable to changes that would have occurred if the treatment had
been introduced initially. For instance, in an A-B-BC-B design, when A is
baseline and B and C are two therapeutic variables, the effects of the BC
phase may be more or less powerful than if they had been introduced initially.
This point has been demonstrated in studies by O’Leary and his associates
(O’Leary & Becker, 1967; O’Leary, Becker, Evans, & Saudargas, 1969), who
noted that the simultaneous introduction of two variables produced greater
change than the sequential introduction of the same two variables.
154 S in g le o se Experimental Designs
2
m
>
Z
09
m
X
>
<
5
70
>
70
>
■H
5
o
t/>
F IG U R E 5-7. N u m b er o f p o in ts earned and m ean beh avioral ratings for Su b ject 1. (Figure 1, p.
394, from : H ersen , M ., Eisler, R . M ., A lfo r d , G . S ., & A g ra s, W. S. (1973). E ffe c ts o f token
e c o n o m y o n n eu ro tic d ep ression : A n experim ental a n alysis, Behavior Therapy, 4, 3 9 2 -3 9 7 .
C op yrigh t 1973 by A sso c ia tio n for the A d v a n cem en t o f B ehavior Therapy. R ep rod u ced by
p erm issio n .)
However, as the slope of the curve was not extensive, and in light of the
primary focus on behavioral ratings (depression), we proceeded with our
change in conditions on Day 5. Had there been unlimited time, baseline
conditions would have been maintained until number of points earned daily
stabilized to a greater extent.
We might note parenthetically at this point that all of the ideal conditions
(procedural rules) outlined in our discussion in chapter 3 are rarely approxi
mated when conducting single-case experimental research. Our experience
shows that procedural variations from the ideal are required, as data simply
do not conform to theoretical expectation. Moreover, experimental finesse is
sometimes sacrificed at the expense of time and clinical considerations.
Continued examination of Figure 5-7 indicates that instigation of token
economic procedures on Day 5 resulted in a marked linear increase in both
points earned and behavioral ratings. The abrupt change in slope of the
curves, particularly in points earned, strongly suggests the influence of the
token economy variable, despite the slightly upward trend initially seen in
baseline. Removal of token economy on Day 9 led to an initially large drop in
156 Single-case Experimental Designs
F IG U R E 5-8. P ercen ta g e o f a tten d in g b eh a v io r in su ccessive tim e sam p les during the individual
co n d itio n in g p ro g ra m . (F igure 2 , p. 2 4 7 , from : W alker, H . M ., & Buckley, N . K. (1968). T h e use
o f p ositiv e rein fo rcem en t in co n d itio n in g atten d in g behavior. Journal o f Applied Behavior
Analysis, 1, 2 4 5 -2 5 0 . C o p y rig h t 1968 by S o ciety fo r the E xperim ental A n alysis o f B ehavior, Inc.
R ep rod u ced by p erm issio n .)
of the curve in extinction (A) and the relatively equal lengths of the B and A
phases further dispel doubts that the reader might have as to the confound of
time.
Secondly, with respect to the decreasing-increasing baseline obtained in the
first A phase, although it might be preferable to extend measurement until
full stability is achieved (see section 3.3, chapter 3), the range of variability is
very constricted here, thus delimiting the importance of the trends.
COM? I MO IN f CO Mf I MO IMT
âTTINTIOM, aâtllIM I j ATTINTION,
F IG U R E 5-10. E x p erim en t 1: F req u en cy o f c o n fed era te in itia tion s o f p lay organ izers, shares,
and a ssists an d su b je c t’s p o sitiv e resp o n ses to these ap p ro a ch b eh aviors. (Figure 1, p. 335, from :
H en d r ic k so n , J. M ., S train, P. S ., TVemblay, A ., & S h o re, R . E . (1982). Interactions o f beha-
Behavior Modifica
v iorally h a n d ic a p p ed children: F u n ctio n a l e ffe c ts o f peer so cia l in itiation s.
tion, 6, 3 2 3 -3 5 3 . C o p y rig h t 1982 by S age P u b lica tio n s. R ep rod u ced by p erm ission .)
derate neither initiated any of the three targeted behaviors nor responded to
any initiations of the three withdrawn children. However, during the first
intervention phase, when the confederate was prompted, instructed, and
reinforced for playing, there was a marked increase in the three categories of
behavior. This was noted both in terms of initiations and responses. When
intervention was removed in the second baseline, frequency of such initiating
and responding returned to the original baseline level. Finally, in the second
intervention phase, high levels of initiating and responding were easily rein
stated. Throughout this study, mean interobserver agreement for behaviors
targeted was 89% for all subjects.
With respect to design considerations, we have here a very clear demonstra
tion of the efficacy of the intervention on two occasions. As was the case in
our prior example (R. V. Hall et al., 1971) baselines (especially the second)
were shorter than treatment phases. However, in light of the zero level of
Basic A-B-A Withdrawal Designs 161
One initial advantage is that such assessment would permit the possibility o f
determining response generalization. If certain response frequencies are in-
Basic A-B-A Withdrawal Designs 163
This study . . . points out the desirability of measuring several child behaviors,
although a modification procedure might focus on only one. In this way the
preschool teacher can assess the efficacy of her program based upon changes in
other behaviors as well as the behavior of immediate concern, (p. 77)
Thus changes from one phase to the next are accomplished with the experi
menter’s full knowledge of prior results. Moreover, specific techniques are
then applied with the expectation that they will be efficacious. Although these
factors are of benefit to the experimental clinician, they present certain
difficulties from a purely experimental standpoint. Indeed, critics of tKè
single-case approach have concerned themselves with the possibilities of bias
in evaluation and in actual application and withdrawal of specified tech
niques. One method of preventing such “bias” is to determine lengths oft
baseline and experimental phases on an a priori basis, while keeping the’
experimenter uninformed as to trends in the data during their collection. A
problem with this approach, however, is that decisions regarding choice of
baselines and those concerned with appropriate timing of institution and
removal of therapeutic variables are left to change.
The above-discussed strategy was carried out in an A-B-A-B design in
which target measures were rated from video tape recordings for all phases on
a postexperimental basis. Hersen, Miller, and Eisler (1973) examined the
effects of varying conversational topics (nonalcohol and alcohol-related) on
duration of looking and duration of speech in four chronic alcoholics and
their wives in ad libitum interactions videotaped in a television studio. Fol
lowing 3 minutes of “warm-up” interaction, each couple was instructed to
converse for 6 minutes (A phase) about any subject unrelated to the hus
band’s drinking problem. Instructions were repeated at 2-minute intervals
over a two-way intercom from an adjoining room to ensure maintenance of
the topic of conversation. In the next 6 minutes (B phase) the couple was
instructed to converse only about the husband’s drinking problem (instruc
tions were repeated at 2-minute intervals). The last 12 minutes of interaction
consisted of identical replications of the A and B phases.
Mean data for the four couples are presented in Figure 5-13. Speech
duration data show no trends across experimental phases for either husbands
or wives. Similarly, duration of looking for husbands across phases does not
vary greatly. However, duration of looking for wives was significantly greater
during alcohol- than nonalcohol-related segments of interaction. In the first
nonalcohol phase, looking duration ranged from 26 to 43 seconds, with an
upward trend in evidence. In the first alcohol phase (B), duration of looking
ranged from 57 to 70 seconds, with a continuation of the upward linear trend.
Réintroduction of the nonalcohol phase (A) resulted in a decrease of looking
(38 to 45 seconds). In the final alcohol segment (B), looking once again
increased, ranging from 62 to 70 seconds.
An analysis of these data does not allow for conclusions with respect to the
initial A and B phases inasmuch as the upward trend in A continued into B.
However, the decreasing trend in the second A phase succeeded by the
increasing trend in the second B phase suggests that topic of conversation had
a controlling influence on the wives’ rates of looking. We might note here that
166 Single-case Experimental Designs
K C IN' OR CI MI NT
„ DAYS
B A 8
F IG U R E 5 -1 4 . T otal n u m b er o f h ou rs o f o n -w a rd p erfo rm a n ce by a grou p o f 44 p atien ts, E xp .
I ll (F igure 4 , p . 3 7 3 , redraw n from : A y llo n , T., & A zrin , N . H . (1965). T h e m easurem ent an d
rein fo rcem en t, o f b eh a v io r o f p sy ch o tics. Journal o f the Experimental Analysis o f Behavior, 8,
3 5 7 -3 8 3 . C o p y rig h t 1965 b y S o ciety for th e E xp erim en tal A n a ly sis o f B ehavior, Inc. R ep rod u ced
by p e rm issio n .)
n >-
©
<
/><
1 3 5 7 9 11 1315 1 3 5 7 9 11 1315 1 3 5 7 9 11 13 15
TIME <3 MINUTE BLOCKS) TIME (3 MINUTE BLOCKS) T IM E (3 MINUTE BLOCKS)
The A-B-C-B design, a variant of the A-B-A-B design, has been )ised to
evaluate-the-effects^ of-ieinforcement procedures. Whereas in the~A-B-A3T
strategy, baseline and treatment (e.g., contingent reinforcement) are alter
nated in sequence, in the A-B-C-B strategy only the first two phases of
experimentation consist of baseline and contingent reinforcement. In the
third phase (C), instead of returning to baseline observation, reinforcement is
administered in proportions equal to the preceding B phase but on a totally
noncontingent basis. This phase controls for the added attention (“attention-
placebo”) that a subject receives for being in a treatment condition and is
analogous to the A, phase (placebo) used in drug evaluations (see chapter 6).
In the final phase, contingent reinforcement procedures are reinstated. Thus
the last three phases of study are identical to those used by Ayllon and Azrin
(1965) in the example described in section 5.5 (however, there the study is
labeled B-A-B).
In the A-B-C-B design the A and C phases are not comparable, inasmuch
as experimental procedures differ. Therefore, the main experimental analysis
is derived from the B-C-B portion of study. However, baseline observations
are of some value, as the effects of B over A are suggested (here we have the
limitations of the A-B analysis). We will illustrate the use of the A-B-C-B
design with one example concerned with the control of drinking in a chronic
alcoholic.
PROBE DAYS
F IG U R E 5-1 6 . B iw eek ly b lo o d -a lc o h o l co n cen tra tio n s for each p h ase. (Figure 1, p. 262, from :
M iller, P. M ., H ersen , M ., Eisler, R . M ., & W atts, J. G . (1 9 7 4). C on tin gen t rein forcem en t o f
Behaviour Research and
low ered b lo o d /a lc o h o l lev els in an o u tp a tien t ch ro n ic a lc o h o lic .
Therapy,; 12, 2 6 1 -2 6 3 . C o p y rig h t 1974 b y P erg a m o n . R ep rod u ced by p erm ission .)
TRAINEE ENGAGEMENT
Study doys
F IG U R E 5-17. P ercen ta g e o f trainees en gaged during the activity hour for 19 d ays and fo llo w -u p
d ays. (F igu re 1, p. 2 36 from : P o rterfield , J ., B lu n d en , R ., & B lew itt, E. (1980). Im proving
en v iro n m en ts for p r o fo u n d ly h an d ica p p ed adults: U sin g p rom pts and social atten tion to m ain
tain high gro u p e n g a g em en t. Behavior Modification, 4 , 2 2 5 -2 4 1 . C op yrigh t 1980 by Sage
P u b lic a tio n s. R ep ro d u ced by p erm issio n .)
Basic A-B-A Withdrawal Designs 173
A-B-A-C-A-C'-A design
Wincze et al. (1972) conducted a series of 10 experimental single-case
designs in which the effects of feedback and token reinforcement were
examined on the verbal behavior of delusional psychiatric patients. In one of
these studies an A-B-A-C-A-C'-A design was used, with B and C representing
feedback and token reinforcement phases, respectively. During all phases of
study, a delusional patient was questioned daily (15 questions selected ran
domly from a pool of 105) by his therapist to elicit delusional material. Per
centage of responses containing delusional verbalizations was recorded. In
addition, percentage of delusional talk on the ward (token economy unit) was
monitored by nursing staff on a randomly distributed basis 20 times per day.
During baseline (A), the patient received “free” tokens as no contingencies
were placed with respect to delusional verbalizations. During feedback (B),
the patient continued to receive tokens noncontingently, but corrective state
ments in response to delusional verbalizations were offered by the therapist in
individual sessions. The third phase (A) consisted of a return to baseline
procedures. In Phase 4 (C) a stringent token economy system embracing all
aspects of the patient’s ward life was instituted. Tokens could be earned by the
patient for “talking correctly” (nondelusionally) both in individual sessions
and on the ward. Tokens were exchangeable for meals, luxuries, and privi
leges. Phase 5 (A) once again involved a return to baseline. In the sixth phase
(C ') token bonuses were awarded on a predetermined percentage basis for
talking correctly (e.g., speaking delusionally less than 10% of the time during
designated periods). This condition was incorporated to counteract the ten
dency of the patient to earn tokens merely for increasing frequency of
nondelusional talk while still maintaining a high frequency of delusional
verbalizations. In the last phase of experimentation (A), baseline conditions
were reinstated for the fourth time.
Results of this experimental analysis for one subject appear in Figure 6-2.
Percentage of delusional talk in individual sessions and on the ward did not
differ substantially during the first three sessions, thus suggesting the ineffec
tiveness of the feedback variable. Institution of token economy in Phase 4,
178 Single-case Experimental Designs
1 2 3 4 5 6 7
F IG U R E 6 -2 . P ercen ta g e o f d elu sio n a l talk o f S u bject 4 during therapist session s and o n w ard for
each exp erim en ta l day. (F igure 4 , p . 2 5 6 , from : W in cze, J. P., L eitenberg, H ., & A gras, W. S .
[1972]. T h e e ffe c ts o f to k e n rein fo rcem en t and feed b ack o n the d elu sion al verbal beh avior o f
ch ron ic p ara n o id sch iz o p h re n ics. Journal o f Applied Behavior Analysis, 5 , 2 4 7 -2 6 2 . C opyright
1972 b y S o c ie ty fo r th e E xp erim en tal A n a ly sis o f B ehavior, Inc. R ep rod u ced by p erm ission .)
Our example from the third category of extensions of the A-B-A design is
drawn from the child classroom literature. Hopkins et al. (1971) systemati
cally assessed the effects of access to a playroom on the rate and quality of
writing in rural elementary schoolchildren. Target measures selected for study
were most relevant in that these children came from homes where learning
was not a high priority (parents were migrant or seasonal farm workers).
Throughout all phases of study, first- and second-grade students were given
daily standard written assignments during class periods (class periods were 50
minutes long during the first four phases).
In baseline (A), after each child had completed the assignment, handed it
to the teacher, and waited for it to be scored, he or she was expected to return
to his or her seat and remain there quietly until all others in class had turned
180 Single-case Experimental Designs
in their papers. In the next phase (B) each child was permitted access to an
adjoining playroom, containing attractive toys, after his or her paper was
scored. The child was allowed to remain there until the 50-minute period was
terminated, unless he or she became too noisy; then he or she was required to
return to his or her seat. The next two phases (A and B) were identical to the
first two. In the last three phases each child was permitted access to the
playroom after his or her paper had been scored, but the length of class
periods was gradually decreased (45, 40, 35 minutes). A procedural exception
to the aforementioned was made in the last phase on Days 47-54 inasmuch as
the teacher noted that a concomitant of increased speed was decreased quality
(number of errors) in writing. Therefore, during the last 8 days a quality
criterion was imposed before the child gained access to the playroom. In some
cases the child was required to recopy a portion of writing.
Data for first-grade children are plotted in Figure 6-3. Examination of the
bottom half of the figure shows that access to the playroom (50-minute
period) increased the rate of letter writing over baseline levels. This was
confirmed on two occasions in the A-B-A-B portion of study. When total time
of classroom periods systematically decreased, a corresponding increase in
rate of writing resulted. However, data for the last three phases are correla
tive, as an experimental analysis was not performed. For example, a sequen
tial comparison of 50-, 45- and 50-minute periods was not made. Therefore,
the controlling effects of time differences were not fully documented.
Examination of the top part of the graph shows considerable fluctuation
with respect to mean number of errors per letter. However, this did not appear
to represent a systematic increase when class periods were shortened. To the
contrary, there was a general decrease in error rate from the first to the last
phase of study. Nonetheless, the effects of practice cannot be discounted
when total length of the investigation is considered.
DAYS
F IG U R E 6-3. T h e m ean num ber o f letters printed per m inu te by first-grade children are sh ow n o n
the low er co o rd in a tes, and the m ean p ro p o rtio n o f letters scored as errors are o n the upper
co o rd in a tes. E ach data p oin t represents the m ean averaged over all children for that day. T h e
horizon ta l d ash ed lines are th e m ean s o f th e daily m eans averaged over all days w ithin the
exp erim en tal c o n d itio n s n o ted b y the legends at the to p o f the figure. (Figure 1, p. 81, from :
H o p k in s, B . L ., S ch u tte, R . C ., & G a rto n , K. L. [1971]. T h e e ffe cts o f access to a p layroom on
the rate and q u ality o f printing and w riting o f first- and secon d-grad e students. Journal o f
Applied Behavior Analysis, 4 , 7 7 -8 7 . C op yrigh t 1971 by S o ciety for the E xperim ental A n alysis o f
B ehavior, Inc. R ep rod u ced by p erm issio n .)
was timed. In the second phase (B) a DRO procedure was implemented. This
consisted of giving Bob small portions of cookies or bits of peanut butter
contingent on no rumination. In the B phase reinforcement was provided if
no rumination occurred for 15 seconds or more (IRT> 15"). In the next phase
SCED—G
182 Single-case Experimental Designs
Successive m eals
There is no doubt that this approach can be of value in the study of both the
major forms of psychopathology and those of more exotic origin (Hersen &
Breuning, in press). The single-case experimental strategy is especially well
suited to the latter, as control group analysis in the rarer disorders is obviously
not feasible.
Specific issues
It should be pointed out that all procedural issues discussed in chapter 3
pertain equally to drug evaluation. In addition, there are a number of
considerations specific to this area of research: (1) nomenclature, (2) car
ryover effects, and (3) single- and double-blind assessments.
With respect to nomenclature, A is designated as the baseline phase, A, as
the placebo phase, B as the phase evaluating the first active drug, and C as the
phase evaluating the second active drug. The A, phase is an intermediary
phase between A (baseline) and B (active drug condition) in this schema. This
phase controls for the subject’s expectancy of improvement associated with
mere ingestion of the drug rather than for its contributing pharmacological
effects.
Some of the above-mentioned considerations have already been examined
in section 3.4 of chapter 3 in relation to changing one variable at a time across
experimental phases. With regard to this one-variable rule, it becomes ap
parent, then, that A-B, A-B-A, B-A-B, and A-B-A-B designs in drug research
184 Single-case Experimental Designs
NO. D E S IG N TYPE B L IN D P O S S IB L E
example, K. V. Davis et al. (1969) used the following sequence of drug and
no-drug conditions in studying rate of stereotypic and nonstereotypic behav
ior in severe retardates: (1) methylphenidate, (2) thioridazine, (3) placebo,
and (4) no drug. Despite the fact that thioridazine significantly (at the
statistical level) decreased the rate of stereotypic responses, failure to reintro
duce the drug in a final phase weakens the conclusions to some extent from an
experimental analysis standpoint.
A careful survey of the experimental analysis of behavior literature reveals
relatively little discussion with regard to procedural and design issues in the
assessment of drugs. Therefore, in light of the unique problems faced by the
drug researcher and in consideration of the relative newness of this area, we
will outline the basic quasi-experimental and experimental analysis design
strategies for evaluating singular application of drugs. Specific advantages
and disadvantages of each design option will be considered. Where possible,
we will illustrate with actual examples selected from the research literature.
However, to date, most of these strategies have not yet been implemented.
A number of possible single-case strategies suitable for drug evaluation are
presented in Table 6-1. The first three strategies fall into the A-B category and
are really quasi-experimental designs, in that the controlling effects of the
treatment variable (placebo or active drug) cannot be determined. Indeed, it
was noted in section 5.2 of chapter 5 that changes observed in B might
possibly result from the action of a correlated but uncontrolled variable (e.g.,
time, maturational changes, expectancy of improvement). These quasi-ex
perimental designs can best be applied in settings (e.g., consulting room
practice) where limited time and facilities preclude more formal experimenta
tion. In the first design the effects of placebo over baseline conditions are
suggested; in the second the effects of active drug over baseline conditions are
suggested; in the third the effects of an active drug over placebo are sug
gested.
Examination of Strategies 4-6 indicates that they are basically A-B-A
designs in which the controlling effects of the treatment variable can be
ascertained. In Design 4 the controlling effects of a placebo manipulation
over no treatment can be assessed experimentally. This design has great
potential in the study of disorders such as conversion reactions and histrionic
personalities, where attentional factors are presumed to play a major role.
Also, the use of this type of design in evaluating the therapeutic contribution
of placebos in a variety of psychosomatic disorders could be of considerable
importance to clinicians. In Design 5, the controlling effects of an active drug
are determined over baseline conditions. However, as previously noted, two
variables are being manipulated here at one time across phases. Design 6
corrects for this deficiency, as the active drug condition (B) is preceded and
followed by placebo (A,) conditions. In this design the one-variable rule
across phases is carefully observed.
Extensions of the A-B-A Design 187
placebo f l u p h e n a z in e placebo
SESSIONS
No
CO
0>
E
1 3 5 7 9 11 13 15 17 19 21 23
Hospital Days
F IG U R E 6 -6 . B eh a v io r o f an a d o lesc en t as in d icated b y to k e n s earn ed or fined in response to
ch lo rp ro m a zin e, w h ich w as a d d ed to to k e n econ om y. (Figure 15-3, p. 556, from : A gras, W. S.
[1976]. B eh a v io r m o d ifica tio n in th e general h o sp ita l p sychiatric u n it. In H . L eitenberg [E d .],
Handbook o f behavior modification. E n g le w o o d C liffs , N J: P ren tice-H all. C opyright 1976 by H .
L eiten b erg. R ep ro d u ced by p erm issio n .)
0- nunti
:rvwi i
.v /V
i i i » i i i i i i i t i >i i i i >
^i
1l 1II11i
1 3 5 7 9 11 O 13 17 19 21 23 23 27 29 31 33 33 37 39 41 43 43 47 49 31 33 33 37 39 61 63 63 67 69 71 73 I- 2- > 4-
DAYS W HO
In that series (Leitenberg et al., 1968) the first subject was a severe knife
phobic. The target behavior selected for study was the amount of time (in
seconds) that the patient was able to remain in the presence of the phobic
object. The design can be represented as B-BC-B-A-B-BC-B, where B repre
sents feedback, C represents praise, and A is baseline. Each session consisted
of 10 trials. Feedback consisted of informing the patient after each trial as to
the amount of time spent looking at the knife. Praise consisted of verbal
reinforcement whenever the patient exceeded a progressively increasing time
criterion. The results of the study are reproduced in Figure 6-9. During
feedback, a marked upward linear trend in time spent looking at the knife
was noted. The addition of praise did not appear to add to the therapeutic
effect. Similarly, the removal of praise in the next phase did not subtract from
the progress. At this point, it appeared that feedback was responsible for the
therapeutic gains. Withdrawal and reinstatement of feedback in the next two
F IG U R E 6-10. (Figure 1, from : L eiten b erg, H . [1973]. Interaction d esigns. P aper read at the
A m erican P sy c h o lo g ica l A s so c ia tio n , M o n trea l, A u g u st. R ep rod u ced by p erm issio n .)
exceeded a certain criterion, the patient could leave her room, watch televi
sion, play table games with the nurses, and so on. Feedback consisted of
providing precise information on weight, caloric intake, and number of
mouthfuls eaten. Specifically, the patient plotted on a graph the information
that was provided by hospital staff.
In one experiment the effect of reinforcement was examined against a
background of feedback. The design can be represented as B-BC-BCl-BC,
where B is feèdback, C is reinforcement, and C ‘ is noncontingent reinforce
ment. During the first feedback phase (labeled baseline on the graph), slight
gains in caloric intake and weight were noted (see Figure 6-11). When
reinforcement was added to feedback, caloric intake and weight increased
sharply. Noncontingent reinforcement produced a drop in caloric intake and
a slowing of weight gain, while réintroduction of reinforcement once again
produced sharp gains in both measures. These data contain hints of an
interaction, in that caloric intake and weight rose slightly during the first
feedback phase, a finding that replicated two earlier experiments. The addi
tion of reinforcement, however, produced increases over and above those for
feedback alone. The drop and subsequent rise of caloric intake and rate of
weight gain during the next two phases demonstrated that reinforcement is a
controlling variable when combined with feedback.
These data only hint at the role of feedback in this study, in that some
improvement occurred during the initial phase when feedback alone was in
effect. Similarly, we cannot know from this experiment the independent
effects of reinforcement because this aspect was not analyzed separately. To
accomplish this, two experiments were conducted where feedback was intro
duced against a background of reinforcement. Only one experiment will be
presented, although both sets of data are very similar. The design can be
represented as A-B-BC-B-BC, where A is baseline, B is reinforcement, and C
is feedback (see Figure 6-12). It should be noted that the patient continued to
be presented with 6,000 calories throughout the experiment, a point to which
we will return later. During baseline, in which no reinforcement or feedback
was present, caloric intake actually declined. The introduction of reinforce-
ment did not result in any increases; in fact, a slight decline continued.
Adding feedback to reinforcement, however, produced increases in weight
and caloric intake. Withdrawal of feedback stopped this increase, which
began once again when feedback was reintroduced in the last phase.
With this experiment (and its replications) it becomes possible to draw
conclusions about the nature of what is in this case a complex interaction.
When both variables were presented alone, as in the initial phases in the
respective experiments, reinforcement produced no increases, but feedback
produced some increase. When presented in combination, reinforcement
added to the feedback effect and, against a background of feedback, became
the controlling variable, in that caloric intake decreased when contingent
reinforcement was removed. Feedback, however, also exerted a controlling
effect when it was removed and reintroduced against a background of rein
forcement. Thus, it seems that feedback can maximize the effectiveness of
reinforcement to the point where it is a controlling variable. Feedback alone,
however, is capable of producing therapeutic results, which is not the case
with reinforcement. Feedback, thus, is the more important of the two vari
ables, although both contribute to treatment outcome.
It was noted earlier that the contribution of a third variable—size of
meals—was also examined within the context of this interaction. In keeping
with the guidelines of analyzing each variable separately and in combination
with other variables, phases were examined when the large amount of 6,000
calories was presented without the presence of either feedback or reinforce
ment. The baseline phase of Figure 6-12 represents one such instance. In this
phase caloric intake declined steadily. Examination of other baseline phases in
the replications of this experiment revealed similar results. To complete the
interaction analysis size of meal was varied against a background of both
feedback and reinforcement. The design can be represented as ABC-ABC1-
ABC, where A is feedback, B is reinforcement, C is 6,000 calories per day,
and C ‘ is 3,000 calories per day.
Under this condition, size of meal did have an effect, in that more was
eaten when 6,000 calories were served than when 3,000 calories were pre
sented (see Figure 6-13). In terms of treatment, however, even large meals
were incapable of producing weight gain in those phases where it was the only
therapeutic variable. Thus this variable is not as strong as feedback. The
authors concluded this series by summarizing the effects of the three variables
alone and in combination across five patients:
Thus large meals and reinforcement were combined in four experimental phases
and weight was lost in each phase. On the other hand, large meals and feedback
were combined in eight phases and weight was gained in all but one. Finally, all
three variables (large meals, feedback, and reinforcement) were combined in 12
phases and weight was gained in each phase. These findings suggest that informa-
Extensions of the A-B-A Design 201
this point in time, in contrast with the experiments described above. One
example is the evaluation of cognitive strategies (M. E. Bernard et al., 1983)
and the other is concerned with the possible combined effects of drugs and
behavior therapy (Rapport, Sonis, Fialkov, Matson, & Kazdin, 1983). M. E.
Bernard et al. (1983) evaluated the effects of rational-emotive therapy (RET)
and self-instructional training (SIT) in an A-B-A-B-BC-B-BC-A design with
follow-up. The subject was a 17-year old, overweight female who suffered
from trichotillomania (i.e., chronic hair pulling), especially while studying at
home. Throughout the study the subject self-monitored time studying and
number of hairs pulled out (deposited in an envelope). The dependent vari
able was the ratio of hairs pulled out per minute of study time.
In baseline (A) the subject simply self-monitored. During the B phase, RET
was instituted, followed by a return to baseline (A) and réintroduction of
RET (B). In the next phase, (BC), SIT, consisting of problem-solving dia
logues, was added to RET. Then, SIT was removed (B) and subsequently
reintroduced (BC). In the last phase (A) all treatment was removed, and then
follow-up was conducted.
Results of this study appear in Figure 6-14. The first four phases comprise
an A-B-A-B analysis and do appear to confirm the controlling effects of RET
in reducing hair pulling. However, at this point the subject, albeit improved,
still was engaging in the behavior a significant proportion of the time.
F IG U R E 6 -1 4 . T h e n u m b er o f hairs pulled o u t per m inu te o f stu d y tim e over b aselin e treatm ent
and fo llo w -u p p h a ses. M issin g d a ta (*) reflect tim es w h en the subject did n o t study. (Figure 1, p.
277, from : B ernard, M . E ., K ratoch w ill, T. R ., & K eefauver, L. W. [1983]. T h e e ffe c ts o f rational-
Cognitive Therapy and
e m o tiv e therap y a n d self-in stru ctio n a l training o n ch ron ic hair pu llin g.
Research, 7, 273-280. C opyright 1983 P lenu m P ublishing C orporation. R eproduced by perm ission.)
Extensions of the A-B-A Design 203
CAABAMAZEPtNC ♦
DAYS
when sodium valproate was withdrawn (BC). However, when the patient was
totally withdrawn in Phase 3 (B), aggression rose to a mean of 10 a day.
Institution of DRO in Phase 4 (BD) led to a dramatic decrease (0), rose to 4-8
when DRO was withdrawn (B) on days 63 and 64, and gradually decreased to
zero again when DRO was reintroduced (BD) on days 65-91.
Although there was only a 2-day withdrawal of DRO procedures, this is
truly justified given the aggressive nature of the behavior being observed.
Indeed, it is quite clear that although the drug, carbamazepine had a minor
role in controlling aggression, the addition of DRO was the major controlling
force. Moreover, effectiveness of DRO allowed the subject to be discharged
to her family, with DRO procedures subsequently implemented at school in
order to ensure generalization of treatment gains.
Once again, replication on additional subjects and a subsequent reordering
of the experimental strategy so that DRO was analyzed separately and then
combined with the drug would be necessary for a more complete study of
interactions. Finally, the nature of this experimental strategy deserves some
comment, particularly when compared to other strategies attempting to
answer the same questions. First, in any experiment there are more things
interacting with treatment outcome than the two or more treatments or
variables under question. Foremost among these are client variables. This, of
Extensions of the A-B-A Design 205
course, is the reason for direct replication (see chapter 10). If the experimental
operations are replicated (in this example the interaction), despite the dif
ferent experiences clients bring with them to the experiment, then one has
increasing confidence in the generality of the interactional finding across
subjects.
Second, as pointed out in chapter 5 and discussed more fully in chapter 8,
the latter phases of these experiments are subject to multiple-treatment inter
ference. In other words, the effect of a treatment or interaction in the latter
phases may depend to some extent on experience in the earlier phases. But if
the interaction effect is consistent across subjects, both early and late in the
experiment, and across different “orders” of introduction of the interaction,
as in the first two examples described in this section (Agras et al., 1974;
Leitenberg et al., 1968), then one has greatly increased confidence in both the
fact and the generality of the effect. As with A-B-A withdrawal designs,
however, the most easily generalizable data from the experiment to applied
situations are the early phases before multiple treatments build up. This is
because the early phase most closely resembles the applied situation, where
the treatment would also be introduced and continued without a prior back
ground of several treatments.
The other popular method of studying interactions is the between-group
factorial design*. In this case, of course, one group would receive both
Treatments A and B, while two other groups would receive just A or just B.
(If the factorial were complete, another group would receive no treatment.)
Here treatments are not delivered sequentially, but the more usual problems
of intersubject variability, inflexibility in altering the design, infrequent mea
surement, determination of results by statistical inference, and difficulties
generalizing to the individual obtain, as discussed in chapter 2. Each approach
to studying interactions obviously has its advantages and disadvantages.
The changing-criterion design, despite the fact that it has not to date
enjoyed widespread application, is a very useful strategy for assessing the
shaping of programs to accelerate or decelerate behaviors (e.g., increase
interactions in chronic schizophrenics; decrease motor behavior in overactive
children). As a specific design strategy, it incorporates A-B design features on
a repeated basis. After initial baseline measurement, treatment is carried out
until a preset criterion is met, and stability at that level is achieved. Then, a
more stringent criterion is set, with treatment applied until this new level is
met. If baseline is A and the first criterion is B, when the new criterion is set
the former B serves as the new baseline (A1) with B1 as the second criterion.
206 Single-case Experimental Designs
This continues in graduated fashion until the final target (or criterion) is
achieved at a stable level. As noted by Hartmann and Hall (1976), “Thus,
each phase of the design provides a baseline for the following phase. When
the rate of the target behavior changes with each stepwise change in the
criterion, therapeutic change is replicated and experimental control is demon
strated” (p. 527).
This design, by its very nature, presupposes “ . . . a close correspondence
between the criterion and behavior over the course of the intervention phase”
(Kazdin, 1982b, p. 160). When such close correspondence fails to materialize,
with stability not apparent in each successive phase, unambiguous interpreta
tions of the data are not possible. One solution, of course, is to partially
withdraw treatment by returning to a lower criterion, followed by a return to
the more stringent one (as in a B-A-B withdrawal design). This adds experi
mental confidence to the treatment by clearly documenting its controlling
effects. Or, on a more extended basis, one can reverse the procedure and
experimentally demonstrate successive increases in a targeted behavior fol
lowing initial demonstration of successive decreases. This is referred to as bi-
directionality. Finally, Kazdin (1982b) pointed out that some experimenters
have dealt with the problem of excessive variability by showing that the mean
performance over adjacent subphases reflects the stepwise progression.
None of the aforementioned solutions to variability in the subphases is
ideal. Indeed, it behooves researchers using this design to demonstrate close
correspondence between the changing criterion and actually observed behav
ior. Undoubtedly, as this design is employed more frequently, more elegant
solutions to this problem will be found.
Hartmann and Hall (1976) presented an excellent illustration of the chang
ing-criterion design in which a smoking-deceleration program was evaluated.
Baseline level of smoking is depicted in panel A of Figure 6-16. In the next
phase (B treatment), the criterion rate was set at 95% of the baseline rate (i.e.,
46 cigarettes a day). An increasing response cost of $1 was established for
smoking an additional cigarette (i.e., Number 47) and $2 for Number 48, and
on and on. An escalating bonus of $0.10 a cigarette was established if the
subject smoked less than the criterion number set. Subsequently, in phases
C-G, the criterion for each succeeding phase was established at 94% of the
previous one.
Careful examination of Figure 6-16 clearly indicates the success of treat
ment in reducing cigarette smoking by 2% or more from each preceding
phase. Further, from the experimental analysis perspective, there were six
replications of the contingencies applied. In each instance, experimental
control was documented, with the treatment phase serving as baseline with
respect to the decreasing criterion for the next phase, and so on.
Related to the changing criterion design is a strategy that Hayes (1981) has
referred to as the periodic-treatments design. This design, at our writing, has
been used most infrequently and really only has a quasi-experimental basis.
Extensions of the A-B-A Design 207
PANEL A 0 F G
F IG U R E 6 -1 6 . D a ta fro m a sm o k in g -red u ctio n program used to illustrate the stepw ise criterion
ch an ge d esig n . T h e so lid h o rizo n ta l lines in d icate the criterion for each treatm ent p h ase. (Figure
2, p. 529, from : H a rtm a n n , D . P., & H a ll, R . V. [1976]. T h e ch an gin g criterion d esign . Journal o f
Applied Behavior Analysis, 9 , 5 2 7 -5 3 2 . C op yrigh t 1976 b y S o c . for the E xperim ental A n a ly sis o f
Behavior. R ep ro d u ced b y p erm issio n .)
These data do not show what about the treatment produced the change (any
more than an A-B-A design would). It may be therapist concern or the fact that
the client attended a session of any kind. These possibilities would then need to
be eliminated. For example, one could manipulate both the periodicity and
nature of treatment. If the periodicity of behavior change was shown only when
a particular type of treatment was in place, this would provide evidence for a
more specific effect, (p. 203)
208 Single-case Experimental Designs
7.1. INTRODUCTION
The rationale for the multiple baseline design first appeared in the applied
behavioral literature in 1968 (Baer et al.), although a within-subject multiple
baseline strategy had been used previously by Marks and Gelder (1967) in
their assessment of electrical aversion therapy for a sexual deviate. Baer et al.
(1968) point out that:
If general effects on multiple behaviors were observed after treatment had been
applied to only one, there would be no way to clearly interpret the results. Such
results may reflect a specific therapeutic effect and subsequent response general
ization, or they may simply reflect non-specific therapeutic effects having little to
do with the specific treatment procedure under investigation, (p. 95)
While changes in target behaviors are the raison d 'e tre for undertaking treatment
or training programs, concomitant changes may take place as well. If so, they
should be assessed. It is one thing to assess and evaluate changes in a target
Multiple Baseline Designs 213
STU D Y D E S IG N SU BJE C TS
STUDY D E S IG N S U B JE C T S
with a peer, he cried or reported the incident to his teacher. Three target
behaviors were selected for modification as a result of role-played perfor
mance in baseline: ratio of eye contact to speech duration, number of words,
and number of requests. In addition, independent evaluations of overall
assertiveness, based on role-played performance, were obtained. As can be
seen in Figure 7-1, baseline responding for targeted behaviors was low and
stable. Following baseline evaluation, Tom received 3 weeks of social skills
training consisting of three 15-30 minute sessions per week. These were
applied sequentially and cumulatively over the 3-week period. Throughout
training, six role-played scenes were used to evaluate the effects of treatment.
In addition, three scenes (on which the subject received no training) were used
to assess generalization from trained to untrained scenes.
The results for training scenes appear in Figure 7-1. Examination of the
graph indicates that institution of social skills training for ratio of eye contact
to speech duration resulted in marked changes in that behavior, but rates for
number of words and number of requests remained constant. When social
skills training was applied to number of words itself, the rate for number of
requests remained the same. Finally, when social skills training was directly
applied to number of requests, marked changes were noted. Thus it is clear
that social skills training was effective in increasing the rate of the three target
behaviors, but only when treatment was applied directly to each. Indepen
dence of the three behaviors and absence of generalization effects from one
behavior to the next facilitate interpretation of these data. On the other hand,
had nontreated behaviors covaried following application of social skills train
ing, unequivocal conclusions as to the controlling effects of the training could
not have been reached without resorting to Kazdin and KopePs (1975) solu
tion to withdraw and reinstate the treatment.
The reader should also note in Figure 7-1 that, despite the fact that overall
assertiveness was not treated directly, independent ratings evinced gradual
improvement over the 3-week period, with treatment gains for all behaviors
maintained in follow-up.
Examination of data for the untreated generalization scenes indicates that
similar results were obtained, confirming that transfer of training occurred
from treated to untreated items. Indeed, the patterns of data for Figures 7-1
and 7-2 are remarkably alike.
Liberman and Smith (1972) also used a multiple baseline design across
behaviors in studying the effects of systematic desensitization in a 28-year-
old, multiphobic female who was attending a day treatment center. Four
specific phobias were identified (being alone, menstruation, chewing hard
foods, dental work), and baseline assessment of the patient’s self-report of
each was taken for 4 weeks. Subsequently, in vivo and standard systematic
desensitization (consisting of relaxation training and hierarchical presentation
of items in imagination) were administered in sequence to the four areas of
Multiple Baseline Designs 217
TRAINING SCENES
F IG U R E 7-1. P ro b e sessio n s during b a selin e, social skills treatm ent, and fo llo w -u p for training
scenes for T om . A m ultip le b a selin e an alysis o f ratio o f eye con tact w h ile sp eak in g to speech
d u ration , n um ber o f w o rd s, num ber o f requests, and overall assertiven ess. (Figure 3, p. 190,
from : B o m ste in , M . R ., B ella ck , A . S ., H ersen , M . [1977]. S o cial-sk ills training for un assertive
children: A m u ltip le-b a selin e an alysis. Journal o f Applied Behavior Analysis, 10, 183 -1 9 5 .
C op yrigh t 1977 by S o ciety fo r E xperim ental A n a ly sis o f Behavior. R eproduced by p erm issio n .)
GEJfRAUZJOTON SCENES
F IG U R E 7 -2 . P ro b e se ssio n s during b a selin e, so cia l skills treatm ent, and fo llo w -u p for general
iza tio n scen es fo r T om . A m u ltip le b aselin e an alysis o f ratio o f eye con tact w h ile speaking to
sp eech d u ra tio n , n u m b er o f w o rd s, n um ber o f requests and overall assertiven ess. (Figure 4 , p.
191, from : B o r n ste in , M . R ., Bel lack , A . S ., & H ersen , M . [1977]. S ocial-sk ills training for
u n assertiv e ch ildren: A m u ltip le-b a selin e a n a ly sis. Journal o f Applied Behavior Analysis, 10,
1 8 3 -1 9 5 . C o p y rig h t 1977 b y S o ciety fo r the E xperim ental A n a lysis o f Behavior. R ep rod u ced by
p e rm issio n .)
BASELINE DESENSITIZATION
12- B ein g Alone
8-
4
o
CL
<D
(/)
Weeks
F IG U R E 7-3. M u ltip le b a selin e ev a lu a tio n o f d esen sitiza tio n in a sin gle case w ith four p h ob ias.
(F igure 1, p . 6 0 0 , fro m : L ib erm a n , R . P., & S m ith , V. [1972]. A m ultip le b aselin e stu d y o f
system a tic d ese n sitiz a tio n in a p a tient w ith m ultip le p h o b ia s. Behavior Therapy, 3 , 5 9 7 -6 0 3 .
C op yrig h t 1972 b y A sso c ia tio n for the A d v a n cem en t o f B ehavior Therapy. R ep rod u ced b y
p erm issio n .)
ment was maintained throughout all phases of study, the possibility that
expectancy of improvement and actual treatment effects were confounded
cannot be discounted, especially in light of the primary reliance on self-report
data. However, casually conducted behavioral observations corroborate self-
report data.
Despite the above-mentioned limitations, Liberman and Smith’s (1972)
investigation is of interest from a number of standpoints. First, as most
multiple baseline studies emanate from the operant framework, this study
lends credence to the notion that nonoperant procedures (e.g., systematic
desensitization) can be assessed in this paradigm. Second, as the particular
dependent measure (ratings of subjective fear on the Target Complaint Scale)
is based on the patient’s self-report, it would appear that this type of single
case research might easily be carried out in inpatient facilities and even in
220 Single-case Experimental Designs
consulting room practice (see chapter 3, section 3.2). Finally, the treatment
was fully implemented by a mental health paraprofessional who had only one
year’s training in psychiatry.
In our next example of a multiple baseline design across behaviors, a
psychological measure (erectile strength as assessed with a penile gauge) was
used to determine efficacy of covert sensitization in the treatment of a 21-
year-old married male, admitted for inpatient treatment of exhibitionism and
obscene phone calling (Alford, Webster, & Sanders, 1980). History of exhibi
tionism began at age 16, and obscene phone calling had taken place over the
previous year. During baseline assessment:
Audiotapes of both deviant and nondeviant sexual scenes were used to elicit
arousal during physiological monitoring sessions. Deviant stimulus material
included three tapes depicting various obscene phone calls . . . and three tapes of
exhibitionism. . . . T w o nondeviant tapes . . . that depicted normal heterosexual
behavior were also used. . . . They consisted of verbal descriptions designed to
closely parallel the patient’s own sexual behavior and fantasy, (p. 17)
These included one taped description of intercourse with his wife and another
with different sexual partners.
Covert sensitization sessions were conducted twice daily in the hospital at
various locations. This treatment consisted of imaginally pairing the deviant
sexual approach (i.e., obscene phone calls, exhibitionism) with aversive stim
uli such as suffocation, nausea, and arrest. Each session involved 20 pairings
of the deviant scenarios with aversive imagery. Following baseline assess
ment, covert sensitization was first applied to obscene phone calling and then
to exhibitionism. In addition to therapist-conducted treatment sessions, the
patient was instructed to use covert imagery on his own initiative whenever he
experienced deviant sexual urges.
Data for this multiple baseline analysis are presented in Figure 7-4. During
baseline evaluation, penile tumescence in response to tapes of obscene phone
calling and exhibitionism was quite high. Similarly, tumescence was above
75% in response to nondeviant tapes of sexual activity with females other
than his wife, but only slightly higher than 25% in response to lovemaking
with his wife.
Institution of covert sensitization for obscene phone calling resulted in
marked diminution in penile responsivity to taped descriptions of that behav
ior, eventually resulting in only a negligible response. However, such treat
ment also appeared to affect changes in penile response to one of the
exhibitionism tapes (Ex. 1), even though that behavior had not yet been
specifically targeted. (We have here an instance where the baselines are not
independent from one another.) However, when treatment subsequently was
directed to exhibitionism itself, there was marked diminution in penile re-
Multiple Baseline Designs 221
sponse to tapes Ex. 2 and Ex. 3 in addition to continued decreases to tape Ex.
1. During the course of treatment, penile responsivity to nondeviant hetero
sexual interactions remained high, increasing considerably with respect to
lovemaking with the wife.
The reader might note that “the patient was preloaded with 36 oz of beer 90
to 60 minutes prior to Assessments 10 and 11” (Alford et al., 1980, p. 19).
This was carried out inasmuch as he had claimed that alcohol had disinhibited
deviant sexuality. However, experimental data did not seem to confirm this.
One, 2-, and 10-month follow-up assessments indicated that all gains were
maintained, with the exception of decreased penile responsivity to taped
descriptions of intercourse with the wife. In addition, 10-month collateral
information from the patient’s wife, parents, and attorney, as well as police,
court, and telephone company records revealed no incidents of sexual de
viance.
Our illustration reveals a clinically successful intervention evaluated
SCED—H*
222 Single-case Experimental Designs
F IG U R E 7-5. C on cu rren t g ro u p rates o f S tea lin g , F ingers, U ten sils, and P iggin g b eh aviors, and
the sum o f S te a lin g , F ingers, and P ig g in g (Total D isgu stin g B ehaviors) through the b aselin e and
exp erim en tal p h a ses o f the study. (Figure 1, p. 80, from : B a rto n , E . S ., G u ess, D ., G arcia, E ., &
Baer, D . M . [1970]. Im p rovem en t o f retardates* m ealtim e beh a v iors by tim e-ou t procedures using
m ultip le b a selin e tech n iq u es. Journal o f Applied Behavior Analysis, 3, 7 7 -8 4 . C op yrigh t 1970 by
S ociety for E x p erim en ta l A n a ly sis o f B ehavior, Inc. R ep rod u ced b y p erm ission .)
rate for that behavior. Finally, application of time-out for pigging proved
successful in reducing its rate.
Independence of the target behaviors was observed, with the exception of
messy utensils, which increased in rate when the time-out contingency was
applied to fingers. Although group data for the 16 subjects were presented, it
224 Single-case Experimental Designs
would have been desirable if the authors had presented data for individual
subjects. Unfortunately, the time-sampling procedure used by Barton et al.
(1970) precluded obtaining such information. However, this factor should not
overshadow the clinical and social significance of this study, in that (1)
mealtime behaviors improved significantly; (2) a result of improved mealtime
behaviors was a concomitant improvement in staff morale, facilitating more
favorable interactions with the subjects; and (3) staff in other cottages were
sufficiently impressed with the results of this study to begin to implement
similar mealtime programs for their own retarded residents.
A more recent example of a multiple baseline design across behaviors
(carried out in group format) was presented by Bates (1980). This study is of
particular interest inasmuch as he contrasted the effects of interpersonal skills
training (i.e., social skills training) for an experimental group with a control
condition that received no treatment. Subjects were moderately and mildly
retarded adults (8 in the treatment group, 8 in the control group). Since
treatment was carried out sequentially and cumulatively across four behav
iors (introductions and small talk, asking for help, differing with others,
handling criticism) following initial assessment, a multiple baseline analysis
was possible in addition to a controlled group evaluation.
A 16-item role-play test was the dependent measure, with subjects receiving
interpersonal skills training for eight of these scenarios. The remaining eight,
for which subjects received no training, served as a measure of transfer of
training. (But this was only accomplished on a pre-post basis.) Skills training
was conducted thrice weekly and consisted of modeling, behavior rehearsal,
coaching, feedback, incentives, and homework assignments. After each set of
three training sessions an assessment was performed.
Results of this analysis appear in Figure 7-6. As the reader will note,
improvements in each of the four targeted behaviors occurred in time-lagged
fashion only when treatment was specifically applied to each. Thus there was
no evidence of correlated baselines. Data indicate that interpersonal skills
training was effective in bringing about behavioral change. Further, results of
the group comparison indicated that there were statistically significant dif
ferences in favor of the experimental condition.
Although these data are impressive, we would like to identify a few
problems. First, baseline assessment for introductions and small talk should
have been extended to three points, despite the apparent stability. Second, a
three-point assessment in the treatment phase for handling criticism is war
ranted considering that there is the beginning of a downward trend in the
data. If this trend were to continue, unequivocal statements about the treat
ment’s controlling effects over that behavior could not be made. Third,
presentation of data for individual subjects in a table would have been useful
from the single-subject perspective.
This can be a very useful design, but in co-opting behavior analytic
Multiple Baseline Designs 225
INTRODUCTIONS
AND SMALL TALK GRO U P INSTRUCTION (B)
TEST
(See also section 10.2 for a discussion of issues arising from this strategy
relevant to replication.)
Although the multiple baseline design is frequently used in clinical research
when withdrawal of treatment is considered to be detrimental to the patient,
on occasion withdrawal procedures have been instituted following the se
quential administration of treatment to target behaviors, particularly when
reinforcement techniques are being evaluated (e.g., Russo & Koegel, 1977). If
treatment is reintroduced after a withdrawal, a powerful demonstration of its
controlling effects can be documented. This type of multiple baseline strategy
was used by Russo and Koegel (1977) in their evaluation of behavioral
techniques to integrate an autistic child into a normal public school class
room. The subject was a 5-year-old girl who previously had been diagnosed as
autistic. She evinced limited verbal behavior, failed to respond to the initia
tives of others, and, when she did verbalize, her comments reflected pronoun
Multiple Baseline Designs 227
F IG U R E 7 -7 . S o cia l behavior, se lf-stim u la tio n , and verbal resp onse to com m a n d in the n orm al
kindergarten c la ssro o m durin g b a selin e, treatm en t b y th e therapist, and treatm ent b y the trained
kindergarten teacher. A ll three b eh a v io rs w ere m easured sim ultaneously. (Figure 1, p. 585, from :
R u sso , D . C ., & K o eg el, R . L . [1977]. A m eth o d fo r in tegratin g an autistic child in to a norm al
p u b lic sc h o o l cla ssr o o m . Journal o f Applied Behavior Analysis, 10, 5 7 9 -5 9 0 . C op yrigh t 1977 by
S o ciety fo r E x p erim en ta l A n a ly sis o f B ehavior. R ep rod u ced b y p erm issio n .)
and appropriate responses, and they were systematically removed for each
occurrence of self-stimulatory behavior. At the end of each training session
the child had the opportunity to trade remaining tokens for a menu of backup
reinforcers. Three pretraining sessions were carried out to establish the rein
forcing value of tokens.
Initial treatment by the therapist for social behaviors resulted in a marked
increase in responsivity for that 3-week period. There were no substantial
changes in self-stimulatory behavior. However, there was some concurrent
increase in rate of appropriate responses, which then decreased somewhat. In
Weeks 7-9 the reinforcement contingency for social behaviors was with
drawn, resulting in a marked decrease. However, when reinstated in Weeks
10-15, there once again was a substantial improvement in social responding,
thus confirming the controlling effects of reinforcement in A-B-A-B fashion.
Concurrent with retreatment of social behavior in Weeks 10-15 was applica
tion of the contingency for self-stimulation. This led to marked diminution in
such behaviors, with no concurrent changes in the third baseline (appropriate
responses). In Weeks 13-16, when treatment was directed specifically to
appropriate responses, a marked improvement was observed.
In Weeks 14 and 15 the therapist began training the teacher to apply
treatment. From Week 16 through Week 25 the teacher carried out treatment
under the supervision of the initial therapist. Over the course of this time
period the reinforcement schedule was gradually thinned. Data for Weeks
16-25 indicate that initial improvement was either maintained or enhanced.
In summary, this study illustrates the use of the multiple baseline design
across behaviors in a single subject, demonstrating general independence of
target behaviors. Sequential application of a reinforcement contingency to
individual behaviors showed the controlling effects of the contingency. Addi
tional experimental manipulations (withdrawal and réintroduction of the
contingency) for the first baseline (social behavior) further confirmed the
controlling effects of the treatment. Finally, data indicate that treatment
procedures were effectively taught to the teacher, who was able to maintain
the child’s improved performance in the last phase of the study.
In our final example of a multiple baseline design across behaviors, the
effects of booster treatment subsequent to deterioration during follow-up
(after initial success of social skills training) and documented (Van Hasselt,
Hersen, Kazdin, Simon, & Mastantuono, 1983). The subject was a blind
female child attending a special school for the blind. Baseline assessment of
social skills through role playing revealed deficiencies in posture and gaze, a
hostile tone of voice, inability to make requests for new behavior, and a
general lack of social skills (see Figure 7-8).
The sequential and cumulative application of social skills training resulted
in marked improvements in role-played performance, thus documenting the
controlling effects of the treatment. However, data for the 4-week posttreat-
Multiple Baseline Designs 229
TR A IN IN G SC EN ES
Follow-up
F IG U R E 7-8. P ro b e se ssio n s during b aselin e, social skills treatm ent, fo llo w -u p , and b ooster
a ssessm en ts for training scen es for S I . A m ultiple baseline an alysis o f postu re, gaze, h ostile to n e ,
requests fo r new behavior, and overall so cia l skill. (Figure 1, p. 201, from : Van H asselt, V. B .,
H ersen , M ., K azdin, A . E ., S im o n , J ., & M a sta n tu o n o , A . K. [1983]. Social skills training for
blin d a d o lesc en ts. Journal o f Visual Impairment and Blindness, 75, 199 -2 0 3 . C opyright 1983.
R ep rod u ced by p e rm issio n .)
230 Single-case Experimental Designs
ment follow-up revealed a decrement for gaze and requests for new behavior.
Examination of Figure 7-8 shows that retreatment in booster sessions for
those behaviors resulted in a renewed improvement, extending through the 8-
and 10-week follow-up assessments. Thus our multiple baseline analysis
permitted a clear assessment of which behaviors were maintained after treat
ment in addition to those requiring booster treatment.
4 CAY B L O C K S
F IG U R E 7 -1 0 . R esu lts o f the m ultip le b aselin e a n alysis w ith su b seq u en t repeated reversals o f the
influence o f a resp o n se-d ela y requirem ent o f the correct resp ond in g o f autistic children. (Figure 1,
p. 2 3 5 , from : D yer, K ., C h ristian , W. P., & L u ce, S. C . [1982J. T h e role o f resp onse delay in
im provin g the d iscrim in a tio n p erfo rm a n ce o f au tistic children. Journal o f Applied Behavior
Analysis, 15, 2 3 1 -2 4 0 . C o p y rig h t 1982 by S o ciety for E xperim ental A n alysis o f Behavior.
R ep rod u ced b y p erm issio n .)
tution overcorrection when the pants were found to be wet at home. (No
treatment was administered at school as this served as a measure of general
ization.) Restitutional overcorrection “ . . . required the child to (a) obtain a
towel, (b) clean up all traces of the accident, (c) go to the bedroom and put on
clean pants, and (d) dispose of the wet pants in the diaper pail” (Barmann et
al., 1981, p. 341). This was followed by 10 repetitions of positive practice
overcorrection in which the child practiced the correct sequence of toileting
behavior.
Results of this multiple baseline analysis clearly documented the control
ling effects of the treatment, but only when it was directly applied to each
child. Indeed, treatment reduced enuretic accidents to near zero levels for
each subject and was maintained in a lengthy follow-up evaluation period.
Moreover, the effects of treatment generalized from the home to the school
setting.
As in the multiple baseline across behaviors, baseline and treatment phases
for each subject in this study can be conceptualized as separate A-B designs,
with the length of baselines increased for each succeeding subject used in the
multiple baseline analysis. The controlling effects of the contingency are
inferred from the rate changes in the treated subject, while rates remain
unchanged in untreated subjects. When rate changes are sequentially ob
served in at least 3 subjects, but only after the treatment variable has been
directly applied to each, the experimenter gains confidence in the efficacy of
the procedure (i.e., overcorrection). Thus we have a direct replication of the
basic A-B design in 3 matched subjects exposed to the same environment
under “time-lagged” contingency conditions.
Dyer, Christian, and Luce (1982) used an interesting variation of a multiple
baseline strategy across subjects in their assessment of response delay to
improve the discrimination performance of three autistic children (two 13-
year-old girls and one 14-year-old boy). Discrimination tasks for the three
children were as follows: Child 1—pointing to a male or female figure; Child
2—describing function of two objects (e.g., a towel and a fork); Child 3—
discriminating between right and left. Responses to these tasks were obtained
during no-delay and delay conditions, with all experimental sessions con
ducted in each child’s classroom. Treatment (delay) was introduced, with
drawn, and reintroduced, following an initial no-delay condition for each
child. This, of course, was conducted sequentially under time-lagged condi
tions for the three children. Delay consisted of having one child withhold his
or her response for 3 to 5 seconds.
Inspection of Figure 7-10 shows that improved performance only occurred
when the contingency (i.e., delay) was directly applied to each child, thus
documenting the controlling effects of treatment. Data clearly indicate that
the three baselines were independent of one another. Moreover, additional
confirmation of the controlling effects of delay were noted when introduction
Multiple Baseline Designs 233
S E S S IO N S
The present follow-up study has several implications for future research. First,
conclusions about the effectiveness of particular procedures need to be tempered
unless accompanied by evidence showing maintenance of behavior. The implica
tion of many demonstrations is that an important applied problem has been
solved by application of behavioral (or other) procedures. However, durability of
behavior change is not an ancillary measure of treatment effects, (p. 721)
Our illustration shows how the multiple baseline strategy allows for (1) an
initial demonstration of the controlling effects of a treatment, (2) an assess
ment at follow-up, (3) a second demonstration of the controlling effects of
the treatment, and (4) a second follow-up assessment showing differential
responding among subjects.
A three-group application of the multiple baseline strategy across subjects
(groups of children with insulin dependent diabetes) was provided by Epstein
et al. (1981). The effects of a behavioral treatment program to increase the
percentage of negative urine tests were examined in 19 families of such
diabetic children. Treatment was directed to decrease intake of simple sugars
and saturated fats, decrease stress, increase exercise, and adjust insulin
intake. Parents were taught to use praise and token economic techniques to
reinforce improvements in the child’s self-regulating behavior. When treat
ment began, 10 of the children (ages 8 to 12) were self-administering their
insulin; the remaining 9 were receiving shots from their parents.
Multiple Baseline Designs 235
% NEGATIVE
URINES
F IG U R E 7-1 2 . P ercen ta g e o f 0% urine co n cen tra tio n tests w eek ly for children in each grou p . T h e
m ean an d standard error o f the m ean fo r all the ob serv a tio n s in each phase by grou p are
represented b y a so lid and d o tted lin e, respectively. (Figure 1, p. 371, from : E p stein , L . H ., B eck,
S ., F ig u er o a , J ., F ark as, G ., K azd in , A . E ., D a n em a n , D ., & Becker, D . [1981]. T h e e ffe cts o f
targeting im p r o v em en ts in urine g lu co se o n m eta b o lic co n tro l in children w ith insulin dep en d en t
d iab etes. Journal o f Applied Behavior Analysis, 14, 3 6 5 -3 7 5 . C opyright 1981 by S ociety for
E xp erim en tal A n a ly sis o f Behavior. R ep rod u ced b y p erm issio n .)
236 Single-case Experimental Designs
40 F e H k K k /k iie it« f e l l # . ip
V .
D e p t .l
0
f
JO-
\ ! ' i•
* ^
*
D ept 2
0-
\W Y^ v
0 D ep t 4
«0
0. D ept S
yl-V'VH
» • P t.l J ^
. ‘ ! v .
tt 20 jo Vo »0 * “ **• *
SIS SIO« 5 .3 !
F IG U R E 7 -1 3 . F requency o f h azards across dep artm en t as a fu n ction o f the in trodu ction o f the
“ feed b a ck p a ck a g e.” D a ta fo r d ays fo llo w in g unp lan n ed sa fety m eetings are indicated by an op en
circle. A t p o in t “ a ” there w as a ch a n g e in supervisors. (Figure 1, p. 293, from : S u lzer-A zaroff,
B ., & d eS a n ta m a ria , M . C . [1980J. Industrial sa fety hazard reduction through p erform an ce
feed b ack . Journal o f Applied Behavior Analysis, 13, 2 8 7 -2 9 5 . C opyright 1980 by S ociety for
E xp erim en tal A n a ly sis o f Behavior. R eproduced by p erm issio n.)
238 Single-case Experimental Designs
S a if’ Self-Monitoring
♦
Boselme Monitoring Saif-Overcorrection Follow-up
Days Months
F IG U R E 7 -1 4 . E ffe c ts o f se lf-m o n ito rin g and self-ad m in istered overcorrection in the sc h o o l and
h om e: D a v id . (F igu re 1, p . 81, from : O llen d ic k , T. H . [1981]. S elf-m o n ito rin g an d se lf-ad m in is
tered ov erco rrectio n : T h e m o d ifica tio n o f n ervou s tics in children. Behavior Modification, 5,
7 5 -8 4 . C o p y rig h t 1981 by S a g e P u b lica tio n s. R ep rod u ced by p erm ission .)
Multiple Baseline Designs 239
that was carried out for only four 30-minute sessions per day.
Results of this single-case analysis appear in Figure 7-15. Data clearly
indicate the controlling effects of the treatment, both in terms of its initial
application on a time-lagged basis (baselines were independent) and when it
was removed and reintroduced simultaneously in all four settings. Rate of
hyperventilation episodes increased dramatically when the punishment con
tingency was removed in the second baseline and decreased to near zero levels
F IG U R E 7-1 5 . N u m b er o f h y p erv en tila tio n resp onses per m inu te and c o n d itio n m eans across
exp erim en ta l p h ases and settin g s. (F igure 1, p. 565, from : S in gh, N . N ., D a w so n , J. H ., &
G regory, P. R . [1980]. S u p p ressio n o f ch ro n ic h yp erven tilation using resp onse-con tin gen t dra
m atic a m m o n ia . Behavior Therapy; 11, 5 6 1 -5 6 6 . C o p y rig h t 1980 b y A sso c ia tio n for A d v a n ce
m ent o f B eh a v io r Therapy. R ep ro d u ced by p erm issio n .)
Multiple Baseline Designs 241
populations 49,978 and 65,910). During baseline, the mean number of home
burglaries committed per day was computed for each area (Xs = 2.83 and
2.25).
After 17 days of baseline in Area 1 of standard police patrolling, an
Multiple Baseline Designs 243
DAYS
In this . . . design, the researcher initially determines the length of each of several
baseline designs (e.g., 5, 10, 15 days). When a given subject becomes available
(e.g., a client referred who has the target behavior of interest, and is amenable to
the use of a specific treatment of interest), s(he) is randomly assigned to one of
the pre-determined baseline lengths. Baseline observations are then carried out;
and assuming the responding has reached acceptable stability criteria, treatment
is implemented at the pre-determined point in time. Observations are continued
through the treatment phase, as in a simple A-B design. Subjects who fail to
display stable responding would be dropped from the formal investigation;
however, their eventual reaction to treatment might serve as useful replication
data.
B a s e lin e T re a tm e n t
Multiple-probe technique
To this point in our descriptions of multiple baseline strategies, baseline
measurement has been continuous for all designs, including the nonconcur
rent multiple baseline design. However, as noted by Horner and Baer (1978),
there are situations in which repeated measurements will result in reactivity
(i.e., a change simply as a result of repetition of the assessment). When
treatment is subsequently introduced under these circumstances, changes may
not be detected or may be masked, due to the inflated or deflated baseline as a
function of reactivity. In addition, there are some instances when continuous
measurement is not feasible and when (on the basis of prior experimentation)
an “a priori assumption of stability can be made” (Homer & Baer, 1978,
p. 193). This being the case, instead of having 6, 9, and 12 assessments in
three successive baselines, these can be more interspersed, resulting in two,
three, and four measurement points. An example of this approach is pre
sented in Figure 7-19. Probes (hypothetical) in our example are represented
by closed triangles, whereas actual reported data appear as open circles.
In commenting on this graph, Horner and Baer (1978) argued that:
SCED—I
246 Single-case Experimental Designs
The multiple-probe technique, with probes every five days, would have provided
one, two, three, and five probe sessions to establish baselines across the four
subjects. The multiple-probe technique probably could have provided a stable
baseline with five or fewer probe sessions for the subject who had 15 days of
continuous baseline in the original study. The use of the multiple-probe proce
dure might have precluded the increase in irrelevant and competing behaviors by
this subject because such behavior began to increase after the tenth baseline
session, (p. 195)
It should be noted that, over the years, a variety of researchers have applied
this variant of baseline assessment in the multiple baseline design (Baer &
Guess, 1971; Schumaker & Sherman, 1970; Striefel, Bryan, & Aikins, 1974;
Striefel & Wetherby, 1973). In each of these studies the design used was the
multiple baseline design across behaviors. But, as in Figure 7-19, it could be
across subjects, and it certainly might also be across settings.
If reactivity is the primary reason for using this variant, the probe tech
nique should be continued when treatment is instituted. However, if feasibil
ity is questionable in baseline or if an a priori assumption of baseline stability
can be made, more frequent measurements during treatment may be desir
able.
Kazdin (1982b) recommended use of the probe technique for assessment of
behaviors that were not targeted for treatment (i.e., evaluation of generaliza
tion or transfer of treatment effects, say, in the naturalistic environment). Use
of probes here is particularly valuable if reactivity is to be avoided. This was
specifically carried out in a multiple baseline design across behaviors evaluat
ing generalization effects of social skill training in three chronic schizo
phrenics (Bellack, Hersen, & Turner, 1976). In each case, baseline assessment
involved evaluation of verbal and nonverbal behaviors from video taped role-
play scenarios requiring assertive responding. One set of eight scenarios
(Training Scenes) was repeatedly used for assessment during baseline, treat
ment, and follow-up phases. This also served as the training vehicle (see left
side of Figure 7-20). A second set of eight scenarios (Generalization Scenes)
also was repeatedly used for assessment during baseline, treatment, and
follow-up phases, but the patient did not receive training here (see right side
of Figure 7-20). However, since the patient was repeatedly exposed to Gener
alization Scenes, reactivity was considered a good possibility. Therefore, a
third set of eight scenarios (Novel Scenes) was used for an additional general
ization assessment during baseline, treatment, and follow-up phases on a
probe basis (see open circles on the right side of Figure 7-20).
Examination of Figure 7-20 confirms the controlling effects of treatment
on individual behaviors in Training Scenes, with the exception of “ratio of
words spoken to speech duration.” Data also confirm transfer of training
from Training to Generalization Scenes, but again with the exception of
248 Single-case Experimental Designs
T R A IN IN G SCEN ES G E N E R A L IZ A T IO N SCEN ES
l»t".F 4b k.««o 4 '*H**-U»
5 100 . i; m u
6 ; so
w 5 AO
_•
: i
- | %
*. •
ZS
*
: ; ' • • —
■• %0 I«
/ ■ * ........................
. »
y •: o A * ; \
- •••• m
i i i i i i 11 1i i i i i i i i i i i i i i 1 1 1 1 1 1 i.i 1 1 1 1 1 1 1 1 1 1 i.i i i
*♦*
■
•
- J
. • **
L..............i .
• • :» * -
•
oo >
- • A
• 4 V♦
1 11 1 1 1i 11 11 1 11 iiii i i i
: V - i ..................................~
iii i Ji i i
*
• 'm c
« 9 -! •
- ^^ ^ ^ ^^ i \ Jo
: \ .................................
l l l l Al l l l l l l l l
* ** • , , *
4« * * * -*
- • O •• • f!f • •
i Ai l
111 _ ’_
_1111_
_111_
_11111 1 1ll.i i i i!i_
_i_i_
__
• ♦ • * •
: - m• • « •
: *
?S : i
S3 . 1 1-1-tJ l— A t 1 i l l 1 1 1 1 1 1 1 1 1 * * 1 1 l l 1 1 1
1 3 J 1 9 II 13 15 17 19 2-4-10 I 3 5 7 9 11 13 15 17 19 2-4-K )
Prob# $«M»oni W ill Prob« S o m o r i W li
F IG U R E 7-2 0 . P r o b e se ssio n s durin g b a selin e, treatm en t, and fo llo w -u p for Subject 3. (Figure 3,
p. 396, fro m : B e lla ck , A . S ., H ersen , M ., & T iirner, S. M . [1976]. G en eralization e ffe c ts o f social
skills train in g in ch ro n ic schizoph renics: A n ex p erim en tal an alysis. Behaviour Research and
Therapy; 14, 3 9 1 -3 9 8 . C o p y rig h t 1976 by P erg a m o n . R ep rod u ced by p erm ission .)
“ratio of words spoken to speech duration.” Probe data (open circles) suggest
that there was further evidence of transfer of training to the Novel Scenes,
with the exception of “ratio of words spoken to speech duration.” Finally, for
the three sets of scenes, data indicate that gradual improvements in overall
assertiveness were noted throughout treatment, which appeared to be main
tained in follow-up.
As we have seen, the probe technique can be most useful in a number of
instances. However, as in the case of the nonconcurrent multiple baseline
design, it should not be employed as a substitute for continuous measurement
when that is feasible. That is, data accrued from use of probe measures are
suggestive rather than confirmatory of the controlling effects of a given
treatment.
Multiple Baseline Designs 249
With the exception of the multiple baseline across subjects, the multiple
baseline strategies are generally unsuitable for the evaluation of pharmacolo
gical agents on behavior. For example, it will be recalled that, in the multiple
baseline design across behaviors, the same treatment is applied to indepen
dent behaviors within the same individual under time-lagged conditions.
Clearly, in the case of drug evaluations this is an impossibility, as no drug is so
specific in its action that it can be expected to effect changes in this manner.
However,* it would be possible to apply different drugs under time-lagged
conditions to separate behaviors following baseline placebo administrations
for each. But this kind of design would involve a radical departure from the
basic assumptions underlying the multiple baseline strategy across behaviors
and would only permit very tentative conclusions based on separate A,-B
designs for each targeted behavior. In addition, the possible interactive effects
of drugs might obfuscate specific results. Indeed, the interaction design (see
chapter 6) is better suited for evaluation of combined effects of therapeutic
strategies.
Similarly, the use of the multiple baseline across different settings in drug
evaluations would prove difficult unless the particular drug being applied
worked immediately, had extremely short-term effects, and could be rapidly
eliminated from body tissues. However, as most drugs used in controlling
behavior disorders do not meet these three requirements, this kind of design
strategy is not useful in drug research.
Of the three types of multiple baseline strategies currently in use, the
multiple baseline across subjects is most readily adaptable to drug evalua
tions. The application of the multiple baseline design across subjects in drug
evaluations could be most useful when withdrawal procedures (return to
A ,—baseline placebo) are unwarranted for either ethical or clinical consider
ations. Using this type of strategy across matched subjects, baseline adminis
tration of a placebo (A,) could be followed by the sequential administration
(under time-lagged conditions) of an active drug (B). Thus a series of A,-B
(quasi-experimental) designs would result, with inferences made in accord
ance with changes observed when the B (drug) condition was applied. Al
though an approximation of a double-blind procedure is feasible (observer
and patient blind to conditions in force), it is more likely that single-blind
(patient only) conditions would prevail.
Many other design options are possible in the application of the multiple
baseline design across subjects when evaluating pharmacological effects. For
example, V. J. Davis, Poling, Wysocki, and Breuning (1981) looked at the
effects of decreasing phenytoin drug dosage on the workshop performance of
three mentally retarded individuals. Thus one can use the multiple baseline
250 Single-case Experimental Designs
8.1. INTRODUCTION
252
Alternating Treatments Design 253
A B B A B A A B
1 2 3 4 5 6 7 8
WEEKS
Terminology
While this basic research strategy has been used for years within a number
of experimental contexts, a confusing array of terminology has delayed a
widespread understanding of the basic logic of this design. In the first edition
of this book, we termed this strategy a multiple schedule design. Others have
termed the same design a multi-element baseline design (Sidman, 1960;
Ulman & Sulzer-Azaroff, 1973, 1975), a randomization design (Edgington,
1967), and a simultaneous treatment design (Kazdin & Hartmann, 1978;
McCullough, Cornell, McDaniel, & Meuller, 1974). These terms were origina
ted for somewhat different reasons, reflecting the multiple historical origins
Alternating Treatments Design 255
Multiple-treatment interference
Multiple-treatment interference (Barlow & Hayes, 1979; Campbell & Stan
ley, 1963) raises the issue: Will the results of Treatment B, in an ATD where it
is alternated with Treatment A, be the same as when Treatment B is the only
treatment used? In other words, is Treatment A somehow interfering with
Treatment B, so that we are not getting a true picture of the effects of
treatment? This notion enjoys much common sense, because at first glance
*Kazdin (1982b) has used the term multiple-treatment designs very accurately, in our
view, to subsume both alternating and simultaneous treatment designs. However,
since simultaneous treatment designs are so rare and would seem to have such little
applicability in applied research, this book will concentrate on the description and
illustration of alternating treatment designs.
Alternating Treatments Design 257
there are few strictly “applied” situations where treatments are ever alter
nated. Thus it is not immediately apparent to practitioners how these results
could generalize to their own situations.
On closer analysis, however, we will suggest that this is a relatively small
problem, and in some cases not a problem at all, for applied researchers
(although it is a major issue in basic research). Also, there are steps applied
researchers can take to minimize multiple-treatment interference. After a
discussion of the nature of multiple-treatment interference, the remainder of
this section will describe procedures for minimizing it.
In a sense, all applied research is fraught with potential multiple-treatment
interference. Unlike with the splendid isolation of the experimental animal
laboratories where rats are returned to their cages for 23 hours to await the
next session, the children and adults who are the subjects of applied research
experience a variety of events before and between treatment sessions. A
college student on the way to an experiment may have just failed an examina
tion. A subject in a fear-reduction experiment may have been mugged on the
way to the session. Another experimental patient may have lost a family
member in recent weeks or just had sexual intercourse before the session. It is
possible that these subjects respond differently to the treatment than
otherwise would have been the case, and it is these historical factors that
account for some of the enormous intersubject variability in between-group
designs comparing two treatments. ATDs, on the other hand, control for this
kind of confounding experience perfectly by “dividing the subject in two”
and administering two or more treatments (to the same subjects) within the
same time period. Thus, if a family member died during the previous week,
that experience would presumably affect each rapidly alternated treatment
equally. But the one remaining concern is the possibility that one experimen
tal treatment is interfering with the other within the experiment itself. Essen
tially, there are three related concerns: sequential confounding, carryover
effects, and alternation effects (Barlow & Hayes, 1979; Ulman & Sulzer-
Azaroff, 1975).
We earlier discussed sequential confounding as referring to the fact that
Treatment B might be different if it always followed Treatment A. Another
name for sequential confounding is order effects. That is, much of the benefit
of Treatment B might be due simply to the order in which it is administered
vis-à-vis other treatments. Sequential confounding with A-B-A withdrawal
designs has been discussed in section 5.3. The solution, of course, is to
arrange for a random (or semirandom) sequencing of treatments. One can
view this random order of sequencing treatments in a typical ATD in the
hypothetical data presented in Figure 8-1. Such counterbalancing also allows
for statistical analyses of ATDs for those who so desire (see chapter 9).
Carryover effects, on the other hand, refer to the influence of one treat
ment on an adjacent treatment, irrespective of overall sequencing. Terms such
258 Single-case Experimental Designs
The experimental design and the results are represented in Figure 8-2,
where the average responses of the five subjects are presented. (Individual
data were also presented, but this figure will suffice for purposes of illustra
tion.) Thus this experiment really consisted of four separate ATDs after the
baseline condition, in which token reinforcement was alternated with either
baseline or response costs. Each of these ATDs was repeated twice. The
elegance of this design for examining multiple-treatment interference is found
in the fact that one can examine the effects of token reinforcement when
alternated with either another treatment or baseline. If multiple-treatment
interference is evident when token reinforcement is alternated with the other
treatment, response cost, then the effects of token reinforcement should be
different during that part of the experiment from when token reinforcement
is alternated with baseline.
First, it is important to note here that both token reinforcement and
response costs produced strong and comparable effects in increasing on-task
behavior, and that token reinforcement was clearly effective when compared
to baseline. The investigators decided, however, that token reinforcement was
the preferable treatment because they noticed that more disruptive behavior
occurred during the response-cost procedure than during the token reinforce
ment procedure. Thus token procedures were continued during both sessions
in the last phase.
The investigators reported three different sets of findings from their ex
amination of potential multiple-treatment interference. First, no evidence was
B L BL T k n /B l Tkn/RC T k n /B l T k n /R C T k n /T k n
SESSIO N S
BL Of Tofctn
BL or R tt p o n u Cost
found that the overall level of on-task behavior was different when it was
alternated with either baseline or response cost. This, of course, is an ex
tremely important finding, particularly in terms of estimating what the effects
of token reinforcement in this context would be when applied in isolation;
that is, without the potentially interfering effects of another treatment. In
other words, the investigator or clinician can feel somewhat safe in determin
ing that the effects of token reinforcement, when alternated with response
costs, are about what they would be if response cost were not present. Of
course, this still is not a “pure” test because it is possible that alternating
token reinforcement with baseline in an ATD yields a somewhat different
effect from token reinforcement administered in isolation. Strict adherence to
Sidman’s method of independent verification would be necessary to estimate
if any carryover effects were present when a treatment was alternated with a
baseline condition.
Nevertheless, the investigators do point out that on-task behavior was
more variable during token reinforcement when alternated with response cost
than when alternated with baseline. Visual inspection of the data indicates
that this was particularly true in 3 out of 5 subjects. While this finding in no
way effects the interpretation of the results, it is an interesting observation in
itself that could be followed up in a number of ways. It is possible, for
example, that “disruptiveness” noted during response cost temporarily car
ried over into the next token phase, thereby causing some of the variability. A
greater spacing of sessions and subsequent sharpening of stimulus control
might have decreased this variability.
Also, the investigators observed a sequence effect, in that token reinforce
ment was more effective when applied in the morning session than in the
afternoon session. Once again, this demonstrates the importance of counter
balancing. Finally, the investigators observed another possible example of
multiple-treatment interference not directly connected with the comparison
of the two treatments. In the first phase, where token reinforcement and
baseline were alternated, on-task behavior averaged 14 percent during the
baseline condition. In the second phase, where this same alternation oc
curred, however, on-task behavior averaged approximately 30 percent during
the baseline session. Inspection of individual data revealed that this trend
occurred in four out of five children. This may represent a positive carryover
or a generalization of treatment effects to the baseline condition; thus, the
first phase probably presents a truer picture of baseline responding. Studies of
this type will be very critical in the future in mapping out the exact nature of
multiple-treatment interference and improving our ability to draw causal
inferences from ATDs.
The study of carryover effects, or treatment interactions, when they occur,
can be interesting in its own right (Barlow & Hayes, 1979; Sidman, 1960). For
example, it is possible that carryover effects might increase the efficacy of
Alternating Treatments Design 263
Table 8-1
TREATM ENT
T IM E
AM A T -l B T-2 A T-2 B T -l
PM B T-2 A T -l B T -l A T-2
ATDs have been used in at least two ways: to compare the effect of
treatment and no treatment (baseline) and to compare two distinct treat
ments. Some examples of ATDs with specification of the experimental com
parison are presented in Table 8-2.
b . S h o c k as p u n ish m en t for
incorrect resp onses
M ann & Baer (1971) 4 n orm al 4-year-olds L an gu age skills a . A r ticu la tio n training
b . N o training
D o k e & R isley (1972) 14 norm al children G rou p p articipation a . S ch ed u led activities
b . O p tio n a l activities
Joh n so n & L o b itz (1974) 12 fa m ilies C h ild ren ’s disruptive beh avior a. In struction to p arents to m ake
their ch ild lo o k “ b a d ”
b . In struction to parents to m ake
their ch ild lo o k “ g o o d ”
U lm an & S u lzer-A za ro ff (1975) 6 retarded ad u lts A ca d em ic beh avior a. G rou p rein forcem en t
con tin g e n c ie s
b . Ind ivid u al rein forcem en t
con tin g e n c ie s
Bittle & H a k e (1977) 8-year-old au tistic b o y S elf-stim u latory b eh avior T reatm ent p roced u res ap p lied in 4
d ifferen t settin gs
K azdin & G eesey (1977) 2 m entally retarded b o y s aged D isrup tive an d in atten tive a . E arnin g to k e n s for o n e se lf
7 and 9 b eh avior b . E a r n in g to k e n s fo r the entire
class
R ojah n , M u lick , M cC oy, & S chroeder 2 b lin d, p ro fo u n d ly retarded S elf-in ju riou s b eh avior a . A d a p tiv e clo th in g
(1978) m en b . A d a p tiv e c lo th in g an d tim e-ou t
T A B L E 8 -2 . E x a m p le s o f A lte r n a tin g T re a tm en t D e s ig n s (Continued)
W einrott, G arrett, & T odd (1978) 6 b o y s in kindergarten through S ocial aggression a. O bserver present
3rd grade b. O bserver ab sen t
E . B. Fisher (1979) 13 ch ro n ic psychiatric patients T oothbrushing a. R ew ard w ith 5 tok e n s
b. R ew ard w ith 1 tok en
c. N o to k e n rew ard
G . M artin, P a lo tta -C o r n ick , J o h n sto n e & 16 retarded clien ts in W ork p erform an ce a. M u ltip le co m p o n e n t strategy to
C elso -G o y o s (1980) in stitu tion alized sheltered in crease w ork p rod u ction
w o rk sho p b . “ N o r m a l” procedure
N e e f, Iw ata & P a g e (1980) 3 m en tally retarded students Spelling a cq u isition and a. H igh -d en sity rein forcem en t
retention b . Interspersal training
O llen d ick, M a tso n , E lsv eld t-D a w so n , & E x p . 1: 2 em o tio n a lly Increase sp ellin g ach ievem en t E xp . 1:
S hapiro (1980) d isturbed, hosp italized a . P o sitiv e practice overcorrection
children aged 8 and 10 p lu s p ositiv e rein forcem en t
E x p . 2: 2 em o tio n a lly b . P o sitiv e practice a lo n e
disturbed, hosp italized c. N o -rem ed ia tio n con trol
267
V an H o u to n , N a u , M ack en zie-K eatin g, E x p . 1: 2 elem en tary sc h o o l D isrup tive beh avior Exp. 1
S a m e o to , & C o la v ecch ia (1982) b o y s ag ed 9 an d 12 a . Verbal reprim ands w ith eye
E x p . 2: 2 elem en tary sc h o o l c o n ta ct an d grasp
b o y s aged 9 b . Verbal reprim ands w ith o u t eye
c o n ta ct an d grasp
E x p . 2:
a . R ep rim an ds delivered from 1 m
aw a y
b . R ep rim an ds delivered from 7 m
aw a y
H u rlb u t, Iw ata & G reen (1982) 3 severely h an d icapp ed , L an gu age a cq u isition a . B liss sy m b o l system
n o n v o ca l ad o lescen ts b . Icon ic picture system
L ast, B arlow , & O ’Brien (1983) 32-year-old m arried fem ale G eneralized A n x iety D isorder a. C o p in g se lf-statem en ts
b . P arad o x ic a l in tention
S in gh, W in to n , & D a w so n (1982) 2 -year-old d ev elom en tally Scream in g beh avior a . 1-m in u te facial screening
268
N O T E : In so m e cases these d esign s w ere m islab eled in th e original article. In other cases the d ata w ere m isan alyzed .
Alternating Treatments Design 269
before beginning the experiment, the investigators ruled out the use of an A-
B-A withdrawal design because even temporary increases in stereotypic be
havior during withdrawal phases were unacceptable in this setting.
Furthermore, previous experience of these investigators suggested that there
was a chance the two treatments might be equally effective. Thus a no-
treatment condition might be necessary to determine if these treatments were
effective at all. Of course, this problem also arises in between-group research
because, if two treatments were equally effective (on the average) in two
groups, a control group would be necessary to determine if any clinical
effects occurred over and above no treatment.
In this procedure, three 15-minute sessions were administered by the same
experimenter each day. Individual sessions were separated by at least one
hour. Following baseline conditions for all three time periods, the two treat
ments and the no-treatment conditions were administered in a counterbal
anced order across sessions. When one of the treatments produced a zero or
near-zero rate of stereotypic behavior, that treatment was then selected and
implemented across all three time periods during the remainder of the study.
During sessions, each child was escorted to a small table in a classroom and
instructed to work on one of several visual motor tasks. One treatment was
physical restraint, consisting of a verbal warning and manual restraint of the
child’s hand on the tabletop for 30 seconds contingent on each occurrence of
stereotypic behavior. The second treatment, positive-practice overcorrection,
involved the same verbal warning but was followed by manual guidance in
appropriate manipulation of the task materials for 30 seconds. Measures
taken included number of stereotypic behaviors during each session and
performance on the task.
The results for two of the three subjects are presented in Figures 8-3 and 8-
4. In Figure 8-3 it is apparent during the ATD phase of this experiment that
physical retraint was the superior treatment for John. Therefore, this treat
ment was chosen for the remainder of the experiment. Task performance
increased rather steadily throughout the experiment, but was greatest during
physical restraint. On the other hand, Figure 8-4 shows that positive practice
intervention was the superior treatment for Tim.
Several features of this noteworthy experiment are worth mentioning.
First, the ATD part of this experiment was concluded in 3 or 4 days (three
sessions per day), and proper determinations of the effective treatment in
each case were made. This is a relatively brief amount of time for an
experiment in applied research, and yet it is typical of ATDs, particularly in
this context (e.g., McCullough et al., 1974). Second, the addition of a
baseline phase prior to introduction of the ATD allowed further identification
of the naturally occurring frequencies of the target problem and the absolute
amount of reduction in the target problem when treatments were instigated.
Of course, this is not necessary in order to determine which of three condi-
270 Single-case Experimental Designs
cr c/)
5 uj
c r CC
1 “
UJ
o
li
GC</>
oLL </)
UJ
GC </>
K GC
* o!
<
F IG U R E 8-3. S te reo ty p ic hair tw irling a n d accurate task p erform ance for Joh n across experi
m en tal c o n d itio n s . T h e d ata are p lo tted acro ss the three altern atin g tim e periods accord in g to the
sc h e d u le that the treatm en ts w ere in e ffe c t. T h e three treatm ents w ere presented o n ly during the
altern atin g -trea tm en ts p h a se. D u rin g the last p h a se, physical restraint w as u sed during all three
tim e p erio d s. (F igure 1, p. 5 7 3 , from O llen d ick , T. H ., S h ap iro, E . S ., & B arrett, R . P. (1981).
R ed u cin g stereo ty p ic b eh aviors: A n an alysis o f treatm ent p rocedures utilizing an altern atin g
treatm en ts d esig n . Behavior Therapy, 72, 5 7 0 -5 7 7 . C op yrigh t 1981 by A sso c ia tio n for A d v a n ce
m en t o f B eh a v io r T herapy. R ep ro d u ced by p erm issio n .)
SESSIONS
F IG U R E 8-4 S tereo ty p ic hand p o stu rin g and accurate task p erform an ce for T im across experi
m ental co n d itio n s. T h e d ata are p lo tted across the three altern atin g tim e periods accord in g to the
schedu le that the treatm en ts w ere in e ffe c t. T h e three treatm ents w ere presented on ly during the
altern atin g-treatm en ts p h a se. D u rin g the last p h ase, p o sitiv e practice overcorrection w as used
during all three tim e p erio d s. (Figure 2, p. 574, from O llen d ick , T. H ., S h ap iro, E . S ., & Barrett,
R . P. (1981). R ed u cin g stereotyp ic behaviors: A n analysis o f treatm ent procedures utilizing an
alternating treatm en ts d esig n . Behavior Therapy, 72, 5 7 0 -5 7 7 . C opyright 1981 by A sso c ia tio n for
A d van cem en t o f B ehavior therapy. R ep rod u ced by p erm issio n.)
differences, which in fact they did. Because of this, they were in a position to
examine more carefully client-treatment interactions that would predict
which treatment would be successful in an individual case. Once again,
highlighting intersubject variability in this way can only increase the precision
with which one can generalize the effects of these specific treatments to other
individual clients (see chapter 2).
Finally, the discerning reader will notice that posturing during the no
treatment condition of the ATD is somewhat higher with John and Tim than
during baseline, where the same condition was in effect across all three time
periods (but this increased response during no treatment was not true for the
third subject). It is possible that this is an example of negative carryover
effects, because responding during no treatment was worse when it was
alternated with treatment than it was alone; that is, in baseline. In this
experiment the authors purposefully blurred the discriminability of the three
conditions as part of their experimental strategy, which may account, in part,
for the carryover effects. This finding, once again, occurred in baseline and
did not affect the ability of the investigators to determine the most effective
treatment and then to apply it successfully during the last phase.
Of course, determination of the effectiveness of a single treatment com
pared to no treatment can also be examined via the most common A-B-A-B
withdrawal design (see chapter 6, section 6-3). In this particular experiment,
however, the authors were interested in comparing the effects of two treat
ments with each other as well as the effects of each compared to no treat
ment, and thus the ATD was the only choice. Furthermore, they had
determined clinically that it was not possible to allow an increase in stereotyp
ic responding in the absence of treatment, a condition that would obtain
during the withdrawal phase of any A-B-A design. Nevertheless, when one
wishes to compare treatment with no treatment, one has a choice between a
more standard withdrawal design and an ATD. The advantages of the ATD
have already been mentioned. In addition to not requiring a withdrawal of
treatment for a period of time, the comparison within the ATD can usually be
made more quickly, and it can proceed without a formal baseline if this is
necessary. On the other hand, there is no single phase in the ATD where
treatment is applied in isolation as it would be in a clinical situation. There
fore, estimating the generalizability of any given treatment is less certain if
one has any reason to worry about multiple-treatment interference effects.
Investigators will have to weigh these advantages and disadvantages in choos
ing a particular design to compare treatment and no treatment.
Ollendick and his colleagues have also produced two other excellent exam
ples of ATDs comparing three conditions. In each case two treatments were
compared to no treatment (Barrett, Matson, Shapiro, & Ollendick, 1981;
Ollendick, Matson, Esveldt-Dawson, & Shapiro, 1980). In the Barrett et al.
study, punishment and DRO procedures were compared to no treatment in
Alternating Treatments Design 273
F IG U R E 8-6. T h e e ffe c ts o f each treatm ent (C O G = co g n itive treatm ent; SS = social skill
treatm en t) in a m u ltip le b a selin e design acro ss th e 3 su bjects experiencing difficu lties in social
skills on the w eek ly d ep en d en t m easu res ad m in istered . (Total score on the L ubin D ep ression
A d jectiv e C h eck list; A verage score o n th e P erso n a l B eliefs Inventory; M ean cross-p rod u ct score
on the In terp erson al E v en ts S ch ed u le.) (Figure 2 from : M cN ig h t, D . L ., N e lso n , R. O ., H a y es, S.
C ., & Jarrett, R. B . (in press). Im p o rta n ce o f treating individually assessed resp onse classes in the
am elio ra tio n o f d ep ressio n . Behavior Therapy. C op yrigh t 1984 by A sso cia tio n for A d van cem en t
o f B eha v io ra l T herapy. R ep rod u ced by p erm issio n .)
Alternating Treatments Design 277
F IG U R E 8-7. T h e e ffe c ts o f each treatm ent (C O G = co g n itive treatm ent; SS = social skill
treatm ent) in a m ultip le baseline design across the 3 subjects experiencing difficulties in irrational
c o g n itio n s o n the w eek ly d ep en d en t m easures adm inistered. (Total score on the L ubin D ep ression
A d jectiv e C hecklist; A verage score o n the P erson al B eliefs Inventory; M ean cross-p rod u ct score
o n the Interpersonal E vents S ch ed u le.) (Figure 4, from : M cK night, D . L ., N elso n , R. O ., H ayes,
S. C ., & Jarrett, R . B. (in press). Im p ortan ce o f treating in d ividually assessed response classes in
the a m elio ra tio n o f d ep ressio n . Behavior Therapy.
SCED—J
278 Single-case Experimental Designs
TOKEN R F T TOKEN R F T 2
OC
o
5
X
UJ
CD
UJ
>
UJ
H
<
UJ
o
oc
UJ
0.
DAYS
The various strengths and weaknesses of the ATD have been reviewed
before (Barlow & Hayes, 1979; Barlow et al., 1983; Ulman & Sulzer-Azaroff,
1975) and mentioned throughout this chapter. The major advantages and
disadvantages will be listed briefly once again. First, the ATD does not require
withdrawal of treatment. If two or more therapies are being compared,
questions on relative effectiveness can be answered without a withdrawal
phase at all. If one is comparing treatment with no treatment, then one still
would not require a lengthy phase where no treatment was administered.
Rather, no-treatment sessions are alternated with treatment sessions, usually
within a relatively brief period of time.
Second, an ATD will produce usful data more quickly than a withdrawal
design, all things being equal. This is because the relatively lengthy baseline,
treatment, and withdrawal phases necessary to establish trends in A-B-A
withdrawal designs are not important in an ATD design. The examples
provided in this chapter illustrate this point. In fact, the relative rapidity of an
ATD will often make it more suitable in situations where measures can be
taken only infrequently. For example, if it is only practical to take measures
Alternating Treatments Design 281
If enough data points have been collected for each treatment, and if one is
so inclined, a variety of statistical procedures are appropriate for analyzing
alternating treatment designs (see chapter 9). However, visual analysis should
suffice for most ATDs. Throughout this book, the visual analysis of single
282 Single-case Experimental Designs
WEEKS
F IG U R E 8-9. T otal m ean freq u en cy o f gra n d io se bragging responses through ou t study and for
each rein fo rcem en t c o n tin g e n c y during experim ental p erio d . (Figure 3, p. 241, from : B row nin g,
R . M . (1 9 6 7 ). A sa m e-su b ject design for sim u lta n eo u s co m p arison o f three reinforcem ent
con tin g e n c ie s. Behaviour Research and Therapy; 5 , 2 3 7 -2 4 3 . C op yrigh t 1967 by P ergam on P ress.
R ep rod u ced by p erm issio n .)
by Alan E. Kazdin**
9.1. INTRODUCTION
Data evaluation consists of methods that are used to draw conclusions about
behavior change. In applied research where single-case designs are used,
experimental and therapeutic criteria are invoked to evaluate data (Risley,
1970). The experimental criterion refers to the way in which data are evaluated
to determine if an intervention has had a reliable or veridical effect on behav
ior. The experimental criterion is based on a comparison of behavior under
different conditions, usually during intervention and nonintervention (base
line) phases. To the extent that performance reliably varies under these separate
conditions, the experimental criterion has been met.
The therapeutic criterion refers to whether the effects of the intervention are
important. This criterion entails a comparison between behavior change that
has been accomplished and the level of change required for the client’s ade
quate functioning in society. Even if behavior change is reliable and clearly
related to the experimental intervention, the change may not be of clinical or
applied significance. To achieve the therapeutic criterion, the intervention
needs to make an important change in the client’s everyday functioning.
Completion of this chapter was facilitated by a Research Scientist Development
Award (MH00353) from the National Institute of Mental Health.
*Please address all correspondence to: Alan E. Kazdin, Department o f Psychiatry,
University of Pittsburgh School of Medicine, Western Psychiatric Institute and Clinic,
3811 O’Hara Street, Pittsburgh, PA 15213.
SCED—J*
285
286 Single-case Experimental Designs
Serial dependency
In applications of analyses of variance in group research, researchers are
familiar with the fact that the tests are “robust” and can handle the violation of
various assumptions (e.g., Atiqullah, 1967; G. V. Glass, Peckham, & Sanders,
1972; Scheffe, 1959). There is one assumption which, if violated, seriously
affects analysis of variance and makes t or F tests inappropriate. The assump
tion is the independence-of-error components. The assumption refers to the
correlation between the error (e) components of pairs of observations (within
and across conditions) for / and j subjects. The expected value of the correla
tion for pairs of observations is assumed to be zero (i.e., rejej. = 0). Typically, in
between-group designs, independence-of-error components are assured by
randomly assigning subjects to conditions. In the case of continuous or re
peated measures over time, the assumption of independence-of-observations
often is not met. Successive observations in a time series tend to be correlated,
in which case the data are said to be serially dependent. The correlation among
successive data points means that knowing the level of performance of a
subject at a given time allows one to predict subsequent points in the series.
The extent to which there is dependency among successive observations can
288 Single-case Experimental Designs
1.0
.8
.6
.4
r .2
0
-.2
-.4
-.6 LAG
-.8
- 1.0
F IG U R E 9-1. C orrelogram s for data w ith (upper p o rtio n )
and w ith o u t serial d ep en d en cy (low er p ortion ).
General comments
Serial dependency is not a necessary characteristic of single-case data or
observations over time. However, significant autocorrelation is a likely charac
teristic of continuous data and is a central consideration in deciding if particu
lar statistical tests should be applied to single-case data. Several statistical tests
for single-case data, including variations of t and F> are presented below. The
tests vary as to whether they acknowledge, take into account, or are influenced
by serial dependency in the data.
Sources of controversy
The use of statistical analyses has been a major source of controversy
because the approach embraced by such analyses appears to conflict with the
purposes of single-case research and the criteria for identifying effective inter
ventions. To begin with, identifying reliable intervention effects does not
necessarily require statistical evaluation, as implicitly assumed in between-
group research. In single-case research, demonstration of a reliable effect (i.e.,
meeting the experimental criterion) is determined by replication of intervention
and baseline levels of performance over the course of an experiment, as is
commonly illustrated in A-B-A-B designs. Other single-case experimental de
signs replicate intervention effects in different ways and permit comparisons to
be made between what performance would be with and without treatment. In
practice, whether the results clearly meet the experimental criterion depends
upon the pattern of the data in light of the requirements of the specific design.
Several characteristics such as changes in means or slope across phases, abrupt
shifts or repeated changes in performance as an intervention is presented and
withdrawn, and similar characteristics can be used to evaluate intervention
effects without inferential statistics (Kazdin, 1982b).
Statistical criteria are objected to in part because of the goal of applied
single-case research. The goal is to identify and evaluate potent interventions
(Baer, 1977a; Michael, 1974). Visual inspection, the method commonly used to
Statistical Analyses for Single-case Experimental Designs 291
Potential contributions
Statistical analyses in single-case research may provide a valuable supple
ment rather than an alternative to visual inspection. In many applications,
inferences about the effects of the intervention can be readily drawn through
visual inspection. Statistical analyses in such situations may not add an incre
ment of useful information unless a specific question arises about a particular
facet of the data at a given point in time. In many situations, the pattern of data
required for visual inspection may not be met, and statistical tests may provide
important advantages.
Evaluation of intervention effects can be difficult when performance during
baseline is systematically improving. An intervention may still be required to
accelerate the rate of change. For example, self-destructive behavior of an
292 Single-case Experimental Designs
General comments
The controversy over statistical analyses is not whether all data in single-case
research should be evaluated statistically. Single-case research designs, the
tradition from which they derive, and the dual concerns in applied work for
experimental and therapeutic criteria for evaluating change all place limits on
the role of statistical analysis. Within the approach of single-case research, the
question is whether statistical tests can be of use in situations where visual
inspection might be difficult to apply. There are different reasons for posing an
affirmative answer. Although visual inspection can be readily applied to many
investigations, the method has its own weaknesses. In a variety of circum
stances, researchers often have difficulty in judging (via visual inspection)
whether reliable effects have been produced and disagree in their interpreta
tions of the data (DeProspero & Cohen, 1979; Gottman & Glass, 1978; R. R.
Jones et al., 1978). Also, systematic biases may operate when invoking visual
inspection criteria, such as ignoring the impact of autocorrelation and being
influenced by the metric by which data are graphed (R. R. Jones et al., 1978;
Knapp, 1983; Wampold & Furlong, 1981a). An attractive feature of statistical
analyses is that once the statistic is decided, the results are (or should be)
consistent among different investigators. Judgment plays less of a role in
applying a statistical analysis to the data. Thus statistical analyses can be a
useful tool in cases where the idealized data patterns required for visual
inspection are not obtained.
There are a large number of statistical tests that can be applied to data
obtained from a single subject over time. The range of available tests has not
been conveniently codified or illustrated. Indeed, the task is rather large
because a given test might be applied in a variety of different ways depending
294 Single-case Experimental Designs
on the specific variant of single-subject designs and the statement the investiga
tor wishes to make about the intervention. Several tests discussed below
illustrate major variants currently available but do not exhaust the range of
appropriate tests.
B A S E L IN E (A ) IN T E R V E N T IO N (B)
1 12 13 88
2 10 14 28
3 12 15 40
4 22 16 63
5 19 17 86
6 10 18 90
7 14 19 82
8 29 20 95
9 26 21 39
10 5 22 51
11 11 23 56
12 34 24 86
25 31
26 77
27 76
tions is the analysis proposed by Gentile, Roden, and Klein (1972). When
autocorrelation exists, these investigators suggested that nonadjacent phases
that employed the same treatment can be combined and will reduce the effect
of serial dependency. For example, in an A-B-A-B design, the two A phases are
not adjacent and could be combined and compared with the two B phases. The
rationale for combining phases is based on the fact that autocorrelations tend
to decrease as the lag between observations increases. Assuming serial depen
dency in the data, Observation 1 in phase A, would be more highly correlated
with Observation 1 in Phase B! (i.e., the immediately adjacent phase) than with
Observation 1 in phase A2 (i.e., a nonadjacent phase). Since the error compo
nents of all observations in A, are more like the components for the observa
tions in B, than in A2, it is assumed that combining treatments separated in time
will reduce the dependency. Combining phases that are not adjacent should
make A and B treatments more dissimilar, due to dependency in the data. The
resulting t (or F) should be reduced because the dependency of adjacent
observations will minimize treatment differences. Additional variations of t
andFhave been proposed, some of which attempt to address the issue of serial
dependency by developing special error terms to make statistical comparisons
of treatment effects (see Gentile et al., 1972; Shine & Bower, 1971).
296 Single-case Experimental Designs
Data analysis
The actual analysis itself cannot be outlined in a fashion that permits simple
computation. Time series analysis depends upon more than entering raw data
into a single formula. Several models of time series analysis exist that make
different assumptions about the data and require different equations to
298 Single-case Experimental Designs
RATE OF BEHAVIOR
A B
RATE OF BEHAVIOR
d. No change in level;
slope change in slope
A B A B
RATE OF BEHAVIOR
achieve the final statistics. The analysis begins by evaluating serial dependency
in the data. Different patterns of dependency may emerge that depend upon
the pattern of autocorrelations, which are computed with different lags or
intervals, as noted earlier. Once the pattern of serial dependency is identified, a
model is applied to the data. The analysis consists of several steps, including
adoption of a model that best fits the data, evaluation of the model, estimation
of parameters for the statistic, and generation of t for level and slope changes
(G. V. Glass et al., 1974; Gorsuch, 1983; Gottman, 1981; Horne, Yang, &
Ware, 1982; Stoline, Huitema, & Mitchell, 1980). Computer programs are
available to handle these steps (see Gottman, 1981; Hartmann et al., 1980).
It is useful to examine the results of a time series analysis for illustrative
purposes and to evaluate the results in light of the characteristics of the data
that might be inferred from visual inspection. As an illustration, one program
focused on the frequency of inappropriate talking in a second-grade classroom
(C. Hall et al., 1971, Exp. 6). Although there were many children in class, the
class as a whole was treated as a single subject. The intervention consisted of
praise and other reinforcers provided to children for their appropriate class
room behavior. The effects of the intervention, evaluated in an A-B-A-B
design, are plotted in Figure 9-3. The results suggest that inappropriate talking
out was generally high during the two different baseline phases and was much
lower during the different reinforcement phases (praise, tokens plus a sur
prise). The first two phases (AB) have been analyzed using time series analysis
(R. R. Jones, Vaught, & Reid, 1975). Through a computer program, the
analyses revealed that the data were serially dependent, that is, the adjacent
points were significantly correlated. Indeed, autocorrelation for lag 1 was .96
(p<.01). Thus conventional t and F test analyses would be inappropriate.
Time series analyses revealed a significant change in level across the first two
phases (AB) (/(39) = 3.90, p < .01) but no significant change in slope. A change
in level with no change in slope suggests also a change in mean performance,
obvious from visual inspection of the graphical display of the data. The data
analysis only addresses the changes in the first two phases of the design. In
principle, comparisons could be made across the other phases as well, although
restrictions on the number of data points in this particular study present a
limiting condition, discussed later.
The analysis is not restricted to variations of an A-B-A-B design. In any
design where there is a change across phases, time series analysis provides a
potentially useful tool. For example, in multiple baseline designs, time series
analysis can evaluate change from baseline to intervention phases for each of
the responses, persons, or situations, depending upon the precise design.
Straws
plus
(G ra d e i) Baseline, Praise plus a favorite activity surprise B2 Praise
Jenkins, 1970). The nature of the underlying data is revealed through autocor
relations of different lags. In conventional analyses, large sample sizes are
important to achieve statistical power. In time series analysis, the large sample
(of data points) is necessary to identify the processes within the series itself and
to select a model that fits the data.
Precisely what constitutes a large or sufficient number of observations
depends on several factors such as the nature of the data, the types of changes
across phases, variability within a phase, and other parameters that character
ize a given series. However, the number of data points usually advocated is
much greater than the number typically available in applied or clinical investi
gations. For example, various authors have suggested that at least 50 (G. V.
Glass et al., 1974), and preferably 100 (Box & Jenkins, 1970), observations are
required for estimating autocorrelations. Fewer observations have been used
(e.g., data with 10 to 20 observations) in applied research and have detected
statistically significant changes (R. R. Jones et al., 1977). Yet applied investiga
tions often employ relatively short phases lasting only a few days to demon
strate intervention effects. In such cases, time series analyses will not be
applicable.
itself requires multiple data points to detect a statistically significant effect, and
a small number of data points may not permit precise evaluation of the
processes involved in the data.
General Comments. Time series analysis has been used increasingly within the
last several years. The increased availability of publications on the topic (e.g.,
Gottman, 1981; McCleary & Hay, 1980) and several computer programs
(Hartmann et al., 1980; Horne et al., 1982) may be fostering increased use of
time series analyses. Nevertheless, use of the analysis has been relatively limited
for several reasons. The tests are complex and involve multiple steps that are
not easily described in terms familiar to most researchers. For example, serial
dependency and autocorrelation, two of the less esoteric notions underlying
time series analysis, are not part of the usual training of researchers who
conduct group studies in the social sciences. More in-depth examination of
time series analysis and its underlying rationale introduces many concepts that
depart from conventional statistical techniques and training (see Gottman,
1981). In addition, requirements for conducting time series analysis may not
foster widespread adoption within applied behavioral research. The relatively
brief phases typically used in single-case experimental designs make the test
difficult to apply and perhaps, simply, inappropriate. Recent controversy over
whether single-case data as a rule are serially dependent raises questions for
some about the need for time series analysis. Nevertheless, time series analyses
have been appropriately applied in several demonstrations and provide a
valuable addition to statistical analyses of single-case data.
Several different tests useful for single-case experiments are based on the
notion of assigning treatments randomly to different occasions (e.g., days or
sessions) (Edgington, 1980b, 1984; Levin, Marascuilo, & Hubert, 1978; Wam-
pold & Furlong, 1981b). At least two treatments, or conditions, are required;
one of which may be baseline (A) and the other an intervention (B), and
therefore these tests are useful for evaluating ATDs (see chapter 8). Prior to the
experiment, the total number of occasions that the treatments will be imple
mented must be specified, along with the number of occasions on which each
specific condition will be applied. Once these decisions are made, A and B (or
A, B, C . . . ri) conditions are assigned randomly to each session or day of the
experiment, with the restriction that the number of occasions for each meets
the prespecified totals. Each day, one of the conditions is administered accord
ing to the randomized schedule planned in advance.
The null hypothesis of the randomization test is that the client’s response on
the dependent measure(s) is not influenced by the condition in effect on that
occasion (e.g., baseline or intervention). If the condition makes no difference,
Statistical Analyses for Single-case Experimental Designs 303
Data analysis
Consider as an illustration an investigation designed to evaluate the effect of
teacher praise on the attentive behavior of a disruptive student. To use the
randomization test, the investigator must decide in advance the number of days
of the study and the number of days that each of two (or more) conditions will
be administered. Assume for present purposes that the investigator wishes to
compare the effects of ordinary classroom practices (baseline or A Condition)
with a reinforcement program based on praise (intervention or B Condition).
To facilitate computations, suppose that the duration of the study is decided in
advance to be 8 days and that each condition will be in effect for 4 days. (The
statistical test does not require an equal number of days for each condition.)
On each of the 8 days, either condition A or condition B is in effect, until each is
administered for 4 different days. Each day, observations of teacher and child
performance are made, and they provide the data to evaluate the effects of the
different conditions.
The prediction is that praise (Condition B) will lead to higher levels of
attentive behavior than ordinary classroom practices (Condition A). Stated as
a one-tailed (directional) hypothesis, Condition B is expected to lead to higher
scores than Condition A. Under the null hypothesis, any difference between
means for the two conditions is due solely to chance differences in performance
on the occasions to which A and B conditions were randomly assigned. To
determine whether the differences are sufficient to reject the null hypothesis,
the mean level of performance is computed separately for each condition, and
the difference between these means is derived.
Hypothetical data for the example appear in Table 9-2 (upper portion). The
mean difference between A and B Conditions is 43.75, also shown in the table
(lower portion). Whether this difference is statistically significant is determined
by estimating the probability of obtaining scores this discrepant in the pre
dicted direction when conditions have been assigned randomly to occasions.
304 Single-case Experimental Designs
DAYS
A B A A B A B B
20 50 15 10 60 25 65 70
C O M P A R IN G T R E A T M E N T M E A N S
A B
20 50
15 60
10 65
25 70
EA = 70 EB = 245
xA= 17.50 x B= 61.25
X B > X A = 4 3 .7 5
T A B L E 9-3. C ritical R egion for the O btain ed D a ta from the H yp oth etical E xam ple
Note. A ll other c o m b in a tio n s o f the o b ta in ed data (allocated to A and B treatm ents) are n ot in the
critical region usin g .05 as a level o f significance fo r a on e-ta iled test.
to conditions that yielded the greatest difference between A and B, then the
combination of data points that could show the next greatest difference, and so
on. A total of four combinations was selected because this is the number of
combinations that reflects the critical region for the .05 level of confidence.
Thus the critical region consists of the n set of data combinations in the
predicted direction that are the least likely to have occurred by chance (where n
= the number of combinations that constitutes the critical region). The
question for the randomization test is whether the difference between means
obtained in the original data is equal to or greater than one of the mean
differences included in the critical region. The obtained mean difference
(43.75) equals the most extreme value in the critical region and hence is a
statistically significant effect. The actual probability of the difference being
this large, given random assignment of conditions to occasions, is 1/70 or p =
.014. When the data represent the least probable combination of data (given a
one-tailed null hypothesis), the probability equals 1 divided by the total num
ber of possible data combinations.
In the above example, a one-tailed test was performed. For a two-tailed test,
the critical region is at both ends (tails) of the distribution. The number of data
combinations that constitute the critical region is unchanged for a given level of
confidence. However, the number of combinations is divided among the two
tails. Because of the division of the critical region into two tails, the probability
level of an obtained mean difference is doubled. Thus, if the above example
utilized a two-tailed test, the probability level of the obtained difference would
be 2/70 or p = .028.
dency may exist in the data. Yet the test is based on the null hypothesis that
there would be identical responses across occasions if the conditions were
presented in a different order. Every order of presenting treatments should lead
to an identical pattern of data (assuming the null hypothesis). Serial depen
dency does not affect the estimation of the sampling distribution of the statistic
from which the inference of significance is drawn.
A test of ranks, referred to as R^, has been proposed for evaluating data
obtained in multiple baseline designs (Revusky, 1976; Wolery & Billingsley,
1982). The test requires that data be collected across several different base
lines (e.g., different individuals, behaviors, or situations). Whether the inter
vention produces a statistically reliable effect is determined by evaluating the
performance of each of the baselines at the point when the intervention is
introduced. For example, in a multiple baseline design across individuals, the
statistical comparison is completed by ranking scores of each subject at the
point when the intervention is introduced for any one of the subjects. Each
individual is considered a subexperiment. When Condition B is introduced
for a subject, the performance of all subjects (including those for whom
treatment is withheld) is ranked. The sum of the ranks across all subexperi
ments each time the treatment is introduced constitutes the statistic Rn.
An essential feature of the test is that the intervention is applied to different
baselines in a random order. Thus the rationale underlying Rwfollows that of
randomization tests as outlined earlier. Because the baseline (e.g., person or
behavior) that receives the intervention is determined randomly, the combina
tion of ranks at the point of intervention for all subjects will be randomly
distributed if the intervention has no effect. On the other hand, if the
behavior of the client who receives the intervention changes at the point of
intervention, compared with persons who have yet to receive the intervention,
Statistical Analyses for Single-case Experimental Designs 309
this should be reflected in the ranks. If each subject in turn shows a change
when the intervention is introduced, this would be reflected in the sum of the
ranks (or R„) across all subjects, and it suggests that the ranks are not the
likely result of random factors. Rn requires several different baselines or
subexperiments to evaluate whether change at the point of treatment is
reliable. At the .05 level of confidence the minimum requirement for detecting
a statistically significant effect is four baselines (i.e., persons, behaviors, or
situations).
Data analysis
Application of the R„ can be illustrated in a hypothetical example in which
an intervention is applied to increase the amount of time that five aggressive
children engage in appropriate and cooperative play during recess at school.
To fulfill the requirements of the multiple baseline design, data are gathered
for the target behaviors. For present purposes, assume that the data consist of
the percentage of intervals (e.g., 30 sec) observed during recess in which the
child engages in appropriate play. Treatment is introduced to different
children at different points in time. The child who receives treatment first,
second, and so on is always determined randomly.
Table 9-4 provides hypothetical data on the percentage of intervals of
appropriate play across 10 days. As is evident in the table, baseline is in effect
for everyone for 5 days. On the sixth day, one child is randomly selected to
receive the intervention (B), whereas all other children continue under base
line (A) conditions. On successive days, a different child is exposed to the
intervention. The ranking procedure is applied to each subexperiment at the
point when the intervention is introduced. On each occasion that the interven
tion is introduced (which includes Days 6-10 in the example), the children are
ranked. The lowest rank is given to the child who has the highest score (if a
high score is in the desired direction).6 In the example, on Days 6-10, the
child with the highest amount of appropriate play at each point of interven
tion receives the rank of 1, the next highest the rank of 2, and so on. When
the intervention is introduced to the first child, all children are ranked. When
the intervention is introduced on subsequent occasions, all children except
those who previously received the intervention are ranked. Even though all
subjects are ranked when the intervention is introduced, not all ranks are
used. Rn consists of the sum of the ranks for those subjects who receive the
intervention at the point that the intervention is introduced. If treatment is
ineffective, the ranks of these persons should be randomly distributed, i.e.,
include numbers ranging from 1 to the n number of baselines. If treatment is
effective, the point of intervention should result in low ranks for each subject
at that point (if low numbers are assigned to the most extreme score in the
predicted direction of change).
SCLD— K
310 Single-case Experimental Designs
DAYS
1 2 3 4 5 6 7 8 9 10
1 45 30 35 50 40 30a 70b
G
4>2 60 75 80 60 50 70a 50a 65a 80b
2 3 20 20 25 10 30 80b
2 4 55 60 40 45 50 40 a 75a 90b
(j
5 30 25 20 30 20 30a 30a 40a 35a
R an k s = 1 2 1 1 1 ER = 6
Note. D a y s 1 thro ug h 5 served as b a selin e (a) d ays for all su b jects and are un m ark ed ,
a = con tro l or b a selin e, b = experim ental or in terv en tio n p o in t for a ch ild .
As is evident in Table 9-4, hypothetical data show that the child who
receives the intervention at a given point in time, with the exception of
Subject 1, receives the lowest rank (i.e., 1 or 1st place) for performance on
that occasion. Summing the ranks for all children exposed to the intervention
yields Rn = 6. The significance of the ranks for designs employing different
numbers of subjects (or baselines) can be determined by examining Table 9-5.
The table provides a one-tailed test for R„. (A two-tailed test, of course, can
be computed by doubling the probability level for the tabled columns.) To
return to the above example, R„ = 6 for 5 subjects (one-tailed test) is equal to
the tabled value required for the .05 level (see arrow). Thus the data in the
hypothetical example permit rejection of the null hypothesis of no treatment
effect.
T A B L E 9 -5 . M a x im u m values o f R„ significant
at the indicated o n e-ta iled prob ab ility levels w h en the
experim ental scores tend to b e sm aller than the con trol scores.
NO. OF S IG N IF IC A N C E L E V E L
SU B JE C T S 0 .0 5 0 .0 2 5 0 .0 2 0.01 0 .0 0 5
4 4
5 6 5 5 5
6 8 7 7 7 6
7 11 10 10 9 8
8 14 13 13 12 11
9 18 17 16 15 14
10 22 21 20 19 18
11 27 25 24 23 22
12 32 30 29 27 26
rankings could be made on the basis of the mean performance across the
entire week while the intervention was in effect. Mean performance of the
target child would be compared with the mean of the other persons, and
ranks would be assigned on the basis of each person’s mean for that time
period. Using means across days is likely to provide a more stable estimate of
actual performance, to allow the intervention to operate on behavior, and
consequently to reflect intervention effects more readily than evaluation
based on the first day that the intervention is applied. Also, by using averages,
the statistic takes into account the usual manner in which multiple baseline
designs are conducted where the intervention is continued for several days for
one person (baseline) before being introduced to the next person.7
If ranks are to be based on several days rather than a single day, additional
considerations become important. First, the duration employed to evaluate
treatment changes within subjects should be specified in advance. If interven
tion effects are expected to take a certain period of time, the precise number
of days (or a conservative estimate) should be specified. The mean for that
period is then used when the ranks are assigned. Second, the duration for
introducing the treatment and for computing mean performance should be
constant across all subjects. These two features ensure that randomness will
not be influenced by post hoc treatment of the data and capitalization on
chance fluctuations in performance.
312 Single-case Experimental Designs
Data description
The split-middle technique involves multiple steps. The technique begins
with graphically plotting the data. From the data within a given phase, a
trend, or celeration line, is constructed to characterize the rate of perfor
mance over time. (The term celeration derives from the notions of accelera
tion and deceleration if the trend is ascending or descending, respectively.)
The celeration line predicts the direction and the rate of change.
To illustrate computation of the celeration line, consider hypothetical data
plotted in Figure 9-4. (The example will utilize rate of performance and
semilog units to illustrate recommended use of the method.) The data in the
upper panel are from one phase of an A-B-A-B (or other) design plotted on a
semilog chart. The manner in which the celeration line is computed will be
conveyed with data from only one phase, although in practice celeration lines
would be computed and plotted separately for each phase.
The first step for computing a celeration line in a phase is to divide the
phase in half by drawing a vertical line at the median number o f sessions (or
days). The second step is to divide each of these halves in half again. (When
there is an uneven number of days, the vertical line is drawn through the data
point that is the median day rather than between two data points.) The
dividing lines should always result in an equal number of points on each side
314 Single-case Experimental Designs
RATE OF BEHAVIOR
slope=l.65
level =39
F I G U R E 9 -4 . H y p o t h e t ic a l d a ta d u r in g o n e p h a s e o f a n A -B -A -B d e s ig n (top p an el — a )y w ith
s te p s t o d e te r m in e th e m e d ia n d a ta p o in ts in e a c h h a lf o f th e p h a s e (m iddle p an el — b )y a n d w ith
th e o r ig in a l d a t a (d a s h e d ) a n d a d j u s t e d (s o lid ) c e le r a tio n lin e (bottom p a n el — c ).
Statistical Analyses for Single-case Experimental Designs 315
of the division. The next step is to determine the median rate o f performance
for the first and second halves of the phase. This median refers to the data
points that form the dependent measure rather than to the number of
sessions.
T\vo potentially confusing points should be resolved. First, although the
sessions are divided into quarters, only the first division (halves) is employed
at this stage. Second, the median data value within each half of the sessions is
selected. These medians are based on the ordinate (dependent variable values)
rather than the abscissa (number of days). To obtain the data point that is the
median within each half, one merely counts from the bottom (ordinate) up
toward the top data point for each half. The data point that constitutes the
median value within each half is selected. A horizontal line is drawn through
the median at each half of the phase until the line intersects the vertical line
dividing each half.
Figure 9-4b shows the above three steps, namely, a division of the data into
quarters and the selection of median values within each half. Within each half
of the data, a vertical and horizontal line intersect. The next step is finding the
slope, which entails drawing a line connecting the points of intersection
between the two halves.
The final step is to determine whether the line that results “splits” all of the
data, in other words, is the split-middle line or slope. The split-middle slope is
that line that is situated so that 50% of the data fall on or above the line and
50% fall on or below the line. The line is adjusted to divide the data in this
fashion. In practice the line is moved up or down to the point at which all of
the data are divided. The adjusted line remains parallel to the original line.
Figure 9-4c shows the original line (dotted) and the line (solid) after it has
been adjusted to achieve the split-middle slope. Note that the original line did
not divide the data so that an equal number of points fell above and below the
line. The adjustment achieves this “middle” slope by altering the level of the
line (and not the slope). (In some cases, the original line may not have to be
adjusted.)
The celeration line reflects the rate of behavior change, which can also be
expressed numerically. White (1974) has used the weekly rate of change as the
basis of calculating rate, although any time period that might be more
meaningful for a given situation can be employed. To calculate the rate of
change, a point of the celeration line (Day*) that passes through a given value
on the ordinate is determined. The data value on the ordinate for the
celeration line 7 days later (i.e., Day*+ 7) is obtained. To compute the rate of
change, the numerically larger value (either Day* or Day*+ 7) is divided by the
smaller value.
The procedure can be applied to the data in Figure 9-4c. At Day 1, the
celeration line is at 20. Seven days later, the line is at approximately 33.
Applying the above computations, the ratio for the rate of change is 1.65.
316 Single-case Experimental Designs
Because the celeration line is accelerating, this indicates that the average rate
of responding for a given week is 1.65 times greater than it was for the prior
week. The ratio merely expresses the slope of the line.
The level of the slope can be expressed by noting the level of the celeration
line on the last day of the phase. In the above example, the level is approxi
mately 39. When separate phases are evaluated (e.g., baseline and interven
tion), the levels of the celeration lines refer to the last day of the first phase
and the first day of the second phase, as will be discussed below.
For each phase in the experimental design, separate celeration lines are
drawn. The slope of each line is expressed numerically. The change across
phases is evaluated by comparing the levels and slopes. Consider hypothetical
data for A and B phases, each with its separate celeration line, in Figure 9-5.
To estimate the change in level, a comparison is made between the last data
point in baseline (approximately 22) and the first data point during the
intervention (approximately 28). The larger value is divided by the smaller
value, yielding a ratio of 1.27. The ratio merely expresses how much higher
(or lower) the intersection of the different celeration lines is. Similarly, for a
change in slope, the larger slope is divided by the smaller slope, yielding a
value in the example of 1.52. The change in level and slope summarizes the
differences in performance across phases.
Statistical analysis
It should be reiterated that the split-middle procedure has been advocated
as a technique to describe the process of change in an individual’s behavior
rather than as a tool to assess statistical significance. However, statistical
significance of change across phases can be evaluated once the celeration lines
have been calculated.
To determine whether there is a statistically significant change in behavior
across phases, a simple statistical test has been proposed (White, 1972).
Again, consider change across A and B phases in an A-B-A-B design. The null
hypothesis upon which the test is based is that there is no change in perfor
mance across A and B phases. If this hypothesis is true, then the celeration
line of the baseline phase should be a valid estimate of the celeration line of
the intervention phase. Assuming the intervention had no effect, the split-
middle slope of baseline should be the split-middle slope of the intervention
phase, as well. Thus 50% of the data in the intervention or B phase should
fall on or above and 50% of the data should fall on or below the slope of
baseline when that slope is projected into the intervention phase.
To complete the statistical test, the slope of the baseline phase is extended
or projected through the intervention phase. Consider the example of hy
pothetical data in Figure 9-5, which shows the celeration line computed and
Statistical Analyses for Single-case Experimental Designs 317
B A S E L IN E IN T E R V E N T IO N
C h a n g e in level = x 1.27
C hange in slope=x 1.52
extended from baseline into the intervention phase. For purposes of the
statistical test, it is assumed that the probability of a data point during the
intervention phase falling above the projected celeration line of baseline is
50% (i.e., p = .5), given the null hypothesis of no change across phases. A
binomial test can be used to determine if the number of data points that are
above the projected slope in the intervention phase is of a sufficiently low
probability to reject the null hypothesis.9
Using this procedure for the data in Figure 9-5, 10 of 10 data points during
the intervention phase fall above the projected slope of baseline. Applying the
binomial test to determine the probability of obtaining all 10 data points
above the slope, p = (Jo)1/*10 yields a p < .001. Thus the null hypothesis can
be rejected; the data in the intervention phase are significantly different from
the data of the baseline phase. The results do not convey whether the level
and/or slope account for the differences but only that the data overall depart
from one phase to another.
SCED—K*
318 Single-case Experimental Designs
Single-case designs provide a wide array of options for the applied re
searcher. Statistical techniques available for such designs are numerous.
Selected tests were reviewed to convey the breadth of options available.
Additional variations of these analyses, as well as different tests, have also
been described (e.g., Edgington, 1982; Tryon, 1982).
Some of the analyses discussed have wider applicability than others. Single
case designs generally involve a comparison of two or more phases. This one
characteristic raises the possibility of time series, split-middle, randomiza
tion, and t tests. The options were illustrated and discussed in the context of
A-B-A-B and multiple baseline designs, but they can also be applied to other
designs such as the changing-criterion designs, and alternating or simulta
neous treatment designs.10 Despite the flexibility of various tests, several
considerations and sources of caution warrant mention.
First, statistical evaluation of single-case (or any other) data only addresses
the issue of whether the change is statistically significant over the course of
separate conditions. When statistical significance is obtained, this does not of
course provide any necessary clues about the basis for a change in behavior.
Conclusions about the basis for the change derive from the experimental
design rather than from the mere demonstration of statistical significance.
Thus statistical evaluation of an A-B design does not elevate the sophistica
tion of the comparison. Drawing conclusions between the effect of an inter
320 Single-case Experimental Designs
9.10. CONCLUSIONS
The present chapter has discussed specific statistical tests for single-case
experimental designs and considerations dictated by their use. The availability
of multiple statistics provides the investigator with diverse options for the
single-case. A few salient considerations underlying all of the tests warrant
reiteration. To begin with, the appropriateness of utilizing statistical criteria
for the evaluation of applied behavioral interventions remains a major source
of controversy. Statistical analysis is seen by many proponents of single-case
research as a violation of the rationale for conducting research with the
individual subject. Thus whether statistical tests should be used to draw
inferences from single-case research remains an issue.
On this issue, it is important to distinguish experimental designs (e.g.,
single-case and between-group designs), methods of data evaluation (e.g.,
visual inspection and statistical analyses), and types of research (e.g., basic or
applied). There are no necessary connections between particular types of
research, designs, and analyses. Thus use of statistical analyses does not
necessarily conflict with single-case designs or their purposes. When research
attempts to develop a technology of behavior change and to achieve clinically
important effects, statistical analyses will definitely be of limited value. Small
effects that pass beyond a threshold of traditional levels of confidence may
not address the priorities of applied research. Yet there are several uses of
statistics, detailed earlier, that may contribute to the goals of applied re
search.
Another issue important to mention is that the use of statistical tests may
322 Single-case Experimental Designs
NOTES
1. As the lag increases, the correlation becomes somewhat less stable, in part,
because o f the decrease in the number of pairs of observations upon which the
coefficient can be based (Holtzman, 1963).
3. Baer (1977a) has articulately stated the similarities and differences in the ra
tionales underlying statistical analysis and visual inspection. Both methods of data
evaluation attempt to avoid Type I and Type II error. Type I error refers to
concluding that the intervention produced a veridical effect when in fact the
results are attributed to chance. Type II errors refers to concluding that the
intervention did not produce a veridical effect when in fact it did. Typically,
researchers give a higher priority to avoiding a Type I error. In statistical analyses,
the probability o f committing a Type I error is specified (by the level o f confidence
o f the statistical test or a). With visual inspection, the probability o f a Type I error
is not known. Hence, to avoid chance effects, the investigator searches for highly
consistent effects that can be readily seen. By minimizing the probability o f a Type
I error, researchers increase the probability of making a Type II error. Investiga
tors who rely on visual inspection are more likely to commit Type II errors than
investigators who rely on statistical analyses. Thus reliance on visual inspection
Statistical Analyses for Single-case Experimental Designs 323
will tend to overlook and discount many reliable but weak effects. From the
standpoint of developing an effective applied technology of behavior change,
Baer (1977a) has argued persuasively that minimizing Type I errors leads to
identification o f a few variables whose effects are consistent and potent across a
wide range o f conditions. Thus visual inspection may be suited for the special
goals o f applied research. For other research purposes (e.g., testing of alternative
theories), weak but reliable effects may be important to detect, and the priorities
o f erring in one direction rather than another might change.
4. The randomization test discussed and illustrated here is one o f many available
tests (see Edgington, 1969, 1984). The specific one selected, which compares
means from different conditions, is likely to be of special interest in single-case
experiments where performance is compared across phases.
6. As a general guideline, ranks are assigned so that the lowest number is given to the
baseline that shows the highest level of performance in the desired direction. An
easy rule o f thumb is to assign “first place” (a rank of 1) to the highest or lowest
score that represents the “best” performance in terms o f the dependent measure.
Thus 1 might be assigned to the highest performance o f social skills or the lowest
performance o f self-abusive behavior. Second, third, and subsequent ranks are
assigned accordingly for lower scores in the therapeutic direction.
8. The semilog units refer to the fact that the scale on the ordinate is logarithmic but
the scale on the abscissa is not. The effect o f this arrangement is to ensure that
there is no zero origin on the graph and that low and high rates o f performance
can be readily represented. The chart can be used for behaviors with extremely
high or low rates. Rates of behavior can vary from .0006944 per minute (i.e., one
every 24 hours) to 1000 per minute. (The semilog chart paper has been developed
by Behavior Research Company, Kansas City, KS.) Adoption o f the charting
procedure has not been widespread in applied research. Hence it is useful to note
that the split-middle technique can be used with ordinary graph paper.
9. The binomial applied to the split-middle slope test would be the probability of
attaining x data points above the projected slope:
f(x) = x P xQn ~ x (or simply ” p"),
Where n = the number of total data points in Phase B
324 Single-case Experimental Designs
x the number of data points above (or below) the projected slope
P q = .5 by definition o f the split-middle slope
p and q the probability of data points appearing above or below the slope given
the null hypothesis
10. Other design options may raise special issues for statistical tests. For example, in a
changing criterion design, the intervention may be introduced in such a way that
only gradual and small changes in behavior are sought. Obviously, one might not
wish to test for changes in level in such instances, because abrupt changes at the
point of introducing the intervention might not be expected. In an alternating- or
simultaneous-treatment design of special interest, it is not the change from one
phase to another but rather whether separate interventions implemented in the
same phase differ significantly. Analyses discussed previously can be adopted to
these circumstances (e.g., see Edgington, 1982; Kratochwill & Levin, 1980).
CHAPTER 10
10.1 INTRODUCTION
325
326 S in g le o se Experimental Designs
in applied research. Although each patient was severely agoraphobic, all had
numerous associated fears and obsessions. The extent and severity of
agoraphobic fears differed. One subject was a 36-year-old male with a 15-year
agoraphobic history. He was incapacitated to the extent that he could manage
a 5-minute drive to work in a rural area only with great difficulty. A second
subject was a 23-year-old female with only a one-year agoraphobic history.
This patient, however, could not leave her home unaccompanied. The third
subject, a 36-year-old female, also could not leave her home unaccompanied,
but had a 16-year agoraphobic history. In fact, this patient had to be sedated
and brought to the hospital in an ambulance. In addition, these 3 patients
presented different background variables such as personality characteristics
and cultural variations (one patient was European).
The results from one of the cases (the male) are presented in Figure 10-1.
Reinforcement produced a marked increase in distance walked, and with
drawal of reinforcement resulted in a deterioration in performance. Réintro
duction of reinforcement in the final phase produced a further increase in
distance walked. These results were replicated on the remaining 2 patients.
At least three conclusions can be drawn from these data. The first conclu
sion is that the treatment was effective in modifying agoraphobic behavior.
The second conclusion is that within the limits of these data, the results are
reliable and not due to idiosyncracies present in the first experiment, since two
replications of the first experiment were successful. The third conclusion,
however, is of most interest here. The procedure was clearly effective with 3
patients of different ages, sex, duration of agoraphobic behavior, and cultural
backgrounds. For purposes of generality of findings, this series of experi
ments would be strengthened by a third replication (a total of 4 subjects). But
the consistency of the results across 3 quite different patients enables one to
draw initially favorable conclusions on the general effectiveness of this proce
dure across the population of agoraphobic clients through the process of
logical generalization (Edgington, 1967).
On the other hand, if one client had failed to improve or improved only
slightly such that the result was clinically unimportant, an immediate search
would have had to be made for procedural or other variables responsible for
the lack of generality across clients. Given the flexibility of this experimental
design, alterations in procedure (e.g., adding additional reinforcers, changing
the criterion for reinforcement) could be made in an attempt to achieve
clinically important results. If mixed results such as these were observed,
further replication would be necessary to determine which procedures were
most efficacious for given clients (see section 2.2, chapter 2).
In this series, however, these steps were not necessary due to the uniformly
successful outcomes, and some preliminary statements about client generality
were made. The next step in this series, then, would be an attempt to replicate
the results systematically, that is, across different situations and therapists. It
Beyond the Individual: Replication Procedures 329
BLOCKS OF 5 TRIALS
F IG U R E 10-1. T h e e ffe c ts o f rein forcem en t and n on rein fo rcem en t u p o n the p erform an ce o f an
agora p h o b ic p atient (Su b ject 2). (F igure 2, p. 4 25, from : A g ra s, W. S ., L eitenberg, H ., and
B arlow , D . H . [1968]. S o cia l rein fo rcem en t in the m o d ifica tio n o f ago ra p h o b ia . Archives o f
General Psychiatry, 19, 4 2 3 -4 2 7 . C op yrigh t 1968 b y A m erica n M ed ical A sso c ia tio n . R ep rod u ced
by p erm issio n .)
is evident that the preliminary series, which was carried out in Burlington,
Vermont, does not address questions on effectiveness of techniques in dif
ferent settings or with different therapists. It is entirely possible that charac
teristics of the therapist or the particular structure of the course that the
agoraphobic walked facilitated the favorable results. Thus these variables
must be systematically varied to determine generality of findings across all
important clinical domains. In fact, this step was taken many times. Using
procedures that were operationally quite similar to those described above, but
carrying different labels, Marks (1972) successfully treated a variety of severe
agoraphobics in an urban European setting (London) using, of course,
different therapists, and Emmelkamp (1974, 1982) treated a long series of
Dutch agoraphobics.
330 Single-case Experimental Designs
the treatment, if successful, must remain the same, and the comparison is
between treatment and no treatment or treatment and placebo control.
The first 4 subjects in this experiment were severe compulsive hand
washers. The fifth subject presented with a different ritual. All patients were
hospitalized on a research unit. All hand washers encountered articles or
situations throughout the experiment that produced hand washing. Response
prevention consisted of removing the handles from the wash basin wherein all
hand washing occurred. The placebo phase consisted of saline injections and
oral placebo medication with instructions suggesting improvement in the
rituals, but no response prevention. Once again, the design was either A-B-A,
with A representing baseline and B representing response prevention, or A-B-
BC-B-A, where A was baseline, B was placebo, and C was response preven
tion. Both self-report measures (number of urges to wash hands) and an
objective measure (occasions when the patient approached the sink, recorded
by a washing pen—see chapter 4) were administered.
As in the previous series, the patients were relatively heterogeneous. The
first subject was a 31-year-old woman with a 2-year history of compulsive
hand washing. Previous to the experiment, she had received over one year of
both inpatient and outpatient treatment including chemotherapy, individual
psychotherapy, and desensitization. She performed her ritual 10 to 20 times a
day, each ritual consisting of eight individual washings and rinsings with
alternating hot and cold water. The associated fear was contamination of
herself and others through contact with chemicals and dirt. These rituals
prevented her from carrying out simple household duties or caring for her
child.
The second subject was a 32-year-old woman with a 5-year history of hand
washing. Frequency of hand washing ranged from 30 to 60 times per day,
with an average of 39 during baseline. Unlike with the previous subject, these
rituals had strong religious overtones concerning salvation, although fear of
contamination from dirt was also present. Prior treatments included two
series of electric shock treatment, which proved ineffective.
A third subject was a 25-year-old woman who had a 3-year history of the
hand-washing compulsion. Situations that produced the hand washing in this
case were associated with illness and death. If an ambulance passed near her
home, she engaged in cleansing rituals. Hand washings averaged 30 per day,
and the subject was essentially isolated in her home before treatment.
The fourth subject was a 20-year-old male with a history of hand washing
for 1Vi years. He had been hospitalized for the previous year and was hand
washing at the rate of 20 to 30 times per day. The fifth subject, whose rituals
differed considerably from the first 4 subjects, will be described below.
Representative results from one case are presented below. Hand washing
remained high during baseline and placebo phases and dropped markedly
after response prevention. Subjective reports of urges to wash declined
332 Single-case Experimental Designs
slightly during response prevention and continued into follow-up. This de
cline continued beyond the data presented in Figure 10-2 until urges were
minimal. These results were essentially replicated in the remaining three hand
washers.
Before discussion of issues relative to replication, experimental design
considerations in this series deserve comment. The dramatic success of re
sponse prevention in this series is obvious, but the continued reduction of
hand washing after response prevention was removed presents some prob
lems in interpretation. Since hand washing did not recover, it is difficult to
attribute its reduction to response prevention using the basic A-B-A with-
F IG U R E 10-2. In the upper h a lf o f the g rap h , the frequency o f hand w ashin g across treatm ent
p h ases is represen ted . E ach p o in t represents the average o f 2 d ays. In the low er p o rtio n o f the
grap h , to ta l urges rep orted b y the patient are represented. (Figure 3, p. 527, from : M ills, H . L .,
A g ra s, W. S ., B a r lo w , D . H ., a n d M ills, J. R . [1973]. C o m p u lsive rituals treated by response
p reven tion : A n exp erim en tal a n a ly sis. Archives o f General Psychiatry, 28, 5 2 4 -5 2 9 . C opyright
1973 b y A m erica n M ed ical A s so c ia tio n . R ep rod u ced by p erm ission .)
Beyond the Individual: Replication Procedures 333
drawal design. From the perspective of this design, it is possible that some
correlated event occurred concurrent with response prevention that was ac
tually responsible for the gains. Fortunately, the aforementioned flexibility in
adding new control phases to replication experiments afforded an experimen
tal analysis from a different perspective. In all patients, hand washing was
reasonably stable by history and through both baseline and placebo phases.
Hand washing showed a marked reduction only when response prevention
was introduced. In these cases, baseline and placebo phases were adminis
tered for differing amounts of time. In fact, then, this becomes a multiple
baseline across subjects (see chapter 7), allowing isolation of response preven
tion as the active treatment.
Again, this series demonstrates that response prevention works, and repli
cations ensure that this finding is reliable. In addition, the clinical significance
of the result is easily observable by inspection, since rituals were entirely
elminated in all 4 patients. More importantly, however, the fact that this
clinical result was consistently present across 4 patients lends considerable
confidence to the notion that this procedure would be effective with other
patients, again through the process of logical generalization. It is common
sense that confidence in generality of findings across clients increases with
each replication, but it is our rule of thumb that a point of diminishing
returns is reached after one successful experiment and three successful repli
cations for a total of 4 subjects. At this point, it seems efficient to publish the
results so that systematic replication may begin in other settings.
An alternative strategy would be to administer the procedure in the same
setting to clients with behavior disorders demonstrating marked differences
from those of the first series. Some behavior disorders such as simple phobias
lend themselves to this method of replication since a given treatment (e.g., in
vitro exposure) should theoretically work on many different varieties of
simple phobia. Within a disorder such as compulsive rituals, this is also
feasible because several different types of rituals are encountered in the clinic
(Mavissakalian & Barlow, 1981a; Rachman & Hodgson, 1980). The question
that can be answered in the original setting then is: Will the procedure
work on other behavior disorders that are topographically different but
presumably maintained by similar psychological processes? In other words,
would rituals quite different from hand washing respond to the same proce
dure? The fifth case in this series was the beginning of a replication along
these lines.
The fifth subject was a 15-year-old boy who performed a complex set of
rituals when retiring at night and another set of rituals when arising in the
morning. The night rituals included checking and rechecking the pillow
placement and folding and refolding pajamas. The morning rituals were
concerned mostly with dressing. This type of ritual has come to be known as
checking as opposed to previous washing rituals. The rituals were extremely
334 Single-case Experimental Designs
time consuming and disruptive to the family’s routine. After a baseline phase
in which rituals remained relatively stable, the night rituals were prevented,
but the morning rituals were allowed to continue. Here again, response
prevention dramatically eliminated nighttime rituals. Morning rituals gradu
ally decreased to zero during prevention of night rituals.
The experiment further suggests that response prevention can be effective
in the treatment of ritualistic behavior. The implications of this replication,
however, are somewhat different from the previous three replications, where
the behavior in question was topographically similar. Although the treatment
was administered by the same therapists in the same setting, this case does not
represent a direct replication because the behavior was topographically dif
ferent. To consider this case as part of a direct replication series, one would
have to accept, on an a priori basis, the theoretical notion that all compulsive
rituals are maintained by similar psychological processes and therefore will
respond to the same treatment. Although classification of these under one
name (compulsive rituals) implies this, in fact there is some evidence that
these rituals are somewhat different and may react differently to response
prevention treatments (Rachman & Hodgson, 1980). As such, it was probably
inappropriate to include the fifth case in the present series because the clear
implication is that response prevention is applicable to all rituals, but only
one case was presented where rituals differed.
From the perspective of sound replication procedures, the proper tactic
would be to include this case in a second series containing different rituals.
This second series would then be the first step in a systematic replication
series, in that generality of findings across different behaviors would be
established in addition to generality of findings across clients. In fact, re
sponse prevention and exposure, combined occasionally with medication, has
become the treatment of choice for obsessive-compulsive disorders, based on
an extended systematic and clinical replication series that began in the early
1970s (Rachman & Hodgson, 1980; Steketee & Foa, in press; Steketee, Foa, &
Grayson, 1982). This series, relying on individual experimental analyses and
close examination of individual data from group studies, has also begun to
identify patient characteristics that predict failure (e.g., Foa, 1979; Foa et al.,
1983), a critical function of any replication series (see section 10.4).
MALE
LU
o
<
ozc.
o
z
LU
Od
oZD
Od
O
Q.
Z
<
when the heterosexual film was reinstated. This last increase, however, does
not become clear until the last point in the phase, which represents only one
session. Subsequently, the patient was unable to continue treatment due to
prior commitments precluding an extension of this phase, which would have
confirmed (or discontinued) the increase represented by that one point.
Reports of sexual fantasies and behavior were consistent with the modest
increases in heterosexual arousal. While some increase in heterosexual fanta
sies was noted, the patient continued to employ homosexual fantasies occa-
Beyond the Individual: Replication Procedures 337
sionally during sexual intercourse with his wife and was still unable to
ejaculate.
Again, conclusions in three general areas can be drawn from these data.
First, exposure to explicit heterosexual films can be an effective variable for
increasing heterosexual arousal, as demonstrated by the experimental analysis
of the first patient. Second, to the extent that the results were replicated
directly on three patients, the data are reliable and are not due to idiosyncra-
cies in the first case. It does not follow, however, that generality of findings
across patients* has been firmly established. Although the results were clear
and clinically significant for the first 3 patients, results from the fourth patient
338 Single-case Experimental Designs
1 2 3 4 5
100 BMdlM FMdback Tok«n: BamHim Token: Word
90
80
70 9 ?\
60 \ 9 .Jlb
d
6 \» i!
50
V,
40 <5
30 A .
20
i7
10
V
1 7 8 14 15
i>
25 26 32 33 43
DAYS
F IG U R E 10-5. P ercen ta g e d elu sio n a l talk o f Su b ject 1 during therapist se ssion s and o n w ard for
each exp erim en ta l day. (F igu re 1, p . 2 5 4 , from : W in cze, J. P , L eitenberg, H ., and A gras, W. S.
[1972]. T h e e ffe c ts o f to k e n rein fo rcem en t a n d feed b ack o n the d elu sion al verbal beh avior o f
ch ron ic p a ra n o id sch iz o p h re n ics. Journal o f Applied Behavior Analysis, 5 , 2 4 7 -2 6 2 . C opyright
1972 b y S o ciety fo r E x p erim en ta l A n a ly sis o f B ehavior. R eproduced by p erm ission .)
tal analysis to determine which variables were responsible for the improve
ment. The lack of replication, however, suggests that this would not be a
fruitful line of inquiry.
The results from token reinforcement were quite different. This procedure
was administered to 9 patients. Six (Subjects 1, 2, 4, 5, 8 and 9) improved—
an improvement that was confirmed by a return of delusional speech when
token reinforcement was removed. Subject 7 also improved, but delusional
speech did not reappear when token reinforcement was removed. In all of
these patients, the decrease was substantial both in percentage of delusional
speech and in trends across the token phase.
Several conclusions can be drawn from these data. In terms of reduction of
delusional speech within sessions, the experimental analysis demonstrated
that token reinforcement was effective, and replication indicated that the
finding had some reliability. Generality of findings across clients, however, is
limited. Two patients did not improve during administration of token rein
forcement. As Sidman (1960) noted, the failure to replicate on all subjects
does not detract from the successes in the remaining subjects. Token rein
forcement is clearly responsible for improvement in those subjects to the
SCfcD— L
T A B L E 1 0-1. M e a n P e r c e n ta g e D e lu s io n a l T alk o f E a c h S B a se d o n L a s t T \v o D a t a P o in t s o f E a c h P h a s e in T h e r a p is t S e s s io n s a n d o n th e W ard
S U B JE C T S P H A SE SEQ U E N C E S
TOKEN:
TOKEN: W ARD A N D
B A S E L IN E FEEDBACK B A S E L IN E S E S S IO N S B A S E L IN E S E S S IO N S BONUS B A S E L IN E
TOKEN:
TOKEN: W ARD A N D
B A S E L IN E S E S S IO N S B A S E L IN E FEEDBACK B A S E L IN E S E S S IO N S BONUS B A S E L IN E
Note. Table 2 , p. 2 5 8 , from : W in cze, J. P., L eitenberg, H ., and A g ra s, W. S . (1972). T h e effe c ts o f to k e n rein forcem en t an d feed b ack o n the d elu sion al
verbal behavior o f ch ro n ic p a ran oid schizoph renics. Journal o f Applied Behavior Analysis, 5 , 2 4 7 -2 6 2 . C op yrigh t 1972 b y S o ciety for E xperim ental
A n alysis o f Behavior. R ep rod u ced by p erm issio n.
Beyond the Individual: Replication Procedures 343
extent that the experimental design was sound (internally valid). However,
applied researchers cannot stop here, satisfied that the procedure seems to
work well enough on most cases, since the practicing clinician would be at a
loss to predict which cases would improve with this procedure. In fact,
because the authors (Wincze et al., 1972) noted that these two cases actually
deteriorated on the ward during this treatment, the search for accurate
predictions of success becomes all the more important to the clinician. Thus a
careful search for differences that might be important in these cases should
ensue, leading to a more intensive functional investigation and experimental
manipulation of those factors that contribute to success or failure.
In view of the additional fact that all subjects in this series demonstrated
little generalization of improvement from session to ward behavior, analysis
of this treatment is in a very preliminary state and, as Wincze et al. (1972)
pointed out, “ . . . much work needs to be done in order to predict when a
given type of behavioral intervention is likely to succeed in a given case”
(p. 262).
Finally, it seems important to make a methodological point on the size of
this series. While the nine replications in this series yielded a wealth of data, a
more efficient approach might have been to stop after four or five replications
and conduct a functional analysis of failures encountered. In the unlikely
event that failures did not occur in the initial replication series, the results
would be strong enough to generate systematic replication in other research
settings, where failures would almost certainly appear, leading to a search for
critical differences at this point. If failures did appear in this shorter series,
the investigators could immediately begin to determine factors responsible for
variant data rather than continue direct replications that would only have a
decreasing yield of information as subjects accumulated. Perhaps for this
reason, one encounters few direct replication series with an N of seven or
more. One notable exception is a multiple-baseline-across-subjects experi
ment on seven anorexics, where, unfortunately for both experimental and
clinical reasons, all patients improved substantially (Pertschuk, Edwards, &
Pomerleau, 1978).
cal strategy, the experimental design was a multiple baseline across behaviors
for six subjects. Three different aspects of social skills were repeatedly
assessed by role playing. Intervention then proceeded for all six subjects on
the first social skill, followed by the second social skill, and so on. In this
hypothetical example, of course, all subjects did very well, with particular
aspects of social skills improving only when treated. Naturally, this strategy
need not be limited to a multiple-baseline-across-behaviors design. Almost
any single-subject design, such as an alternating treatments design or a
standard withdrawal design, could be simultaneously replicated.
From the point of view of replication, this is a very economical and
conservative way to proceed. It is economical because it is less time consum
ing to treat six clients in a group than it is to treat six clients individually. But
one still has the advantage of observing individual data repeatedly measured
from six different subjects. Naturally, this is only possible where opportuni
ties for group therapy exist. Furthermore, the procedure is conservative
because fewer variables are different from client to client. The gamble taken
by the investigator in a replication series with increasing heterogeneity or
diversity of subjects or settings was mentioned above. To repeat, if a replica
tion fails, the more differences there are in subjects, settings, timing of the
intervention, and so forth, the harder it is to track down the cause of the
failure for replication during subsequent experimentation. If all subjects are
treated simultaneously in the same group, at the same time, then one can be
relatively sure that the intervention procedures, as well as setting and tem
poral factors, are identical. If there is a failure to replicate, then the investiga
tor should look elsewhere for possible causes, most likely in background
variables or personality differences in the subjects themselves.
Of course, treating clients in group therapy has its own special kind of
setting. If one were interested in the generality of these findings to individual
treatment settings, the first step in a systematic replication series would be to
test the procedure in subjects treated individually. Also, when groups of
individuals are treated simultaneously, one cannot stop the series at just any
time to begin examining for causes of failures if they occur. However, this is
not really a problem as long as the groups remain reasonably small (e.g.,
three to six), such that the investigator would be unlikely to accumulate a
large number of failures before having an opportunity to begin the search for
causes. Other examples of simultaneous replication can be found in an
experiment by E. B. Fisher (1979) mentioned in chapter 8.
FREQUENCY OF
FIRST
COMPONENT SKILL
IN ROLE PLAY
A , - J - ■ 1 ... - I . » . L. â ,1
i
FREQUENCY OF
SECOND
COMPONENT SKILL
IN ROLE PLAY
i 1 i i -J
FREQUENCY OF
THIRD
COMPONENT SKILL
IN ROLE PLAY
i . . i _____ i
0
IO H 12
DAYS
F IG U R E 10-6. G raphed h y p o th etica l d ata o f sim u lta n eo u s replications d esign . (Figure 2, p. 306
from : Kelly, J. A ., L au gh lin , C ., C la ib o rn e, M ., & P a tterso n, J. [1979]. A group p rocedure for
teach in g jo b interview ing skills to fo rm erly h osp italized psychiatric patients. Behavior Therapy;
10, 2 9 9 -3 1 0 . C op yrigh t 1979 by A sso c ia tio n for A d v a n cem en t o f B ehavior Therapy. R eproduced
by p erm issio n .)
346 Single-case Experimental Designs
these populations (e.g., Paul & Lentz, 1977; Wallace et aL, in press). In
retrospect, however, there are many methodological faults with this series,
leading to large gaps in our knowledge, which could have been avoided had
replication been more systematic.
While differential attention was successfully administered on psychiatric
wards in several different parts of the country across the range of therapists
or ward personnel typically employed in these settings and across a variety of
psychotic behaviors, from motor behavior through inappropriate speech,
only a few studies contained experimental analyses. On the other hand, many
of the reports would come under the category of case studies (A-B designs
with measurement). Certainly, this preliminary series on institutionalized
patients would be much improved had each class of behavior (e.g., verbal
behavior, withdrawn behavior, inappropriate behavior, aggressive or other
motor behaviors) been subjected to a direct replication series with three or
four patients and then systematically replicated in other settings with other
therapists.
This procedure most likely would have produced some failures. Reasons
for these failures could then have been explored, providing considerably more
information to clinicians and ward personnel on the limitations of differential
attention. As it stands, Ayllon and Michael (1959) reported a failure but did
not describe the patient in any detail or the circumstances surrounding the
failure. This type of reporting leads to undue confidence in a procedure
among naive clinicians; when failures do occur, disappointment is followed
by a tendency to eliminate the procedure entirely from therapeutic programs.
In this specific case, however, what has happened is that differential attention
has been incorporated into more comprehensive programs without adequate
analysis of its contribution. With some cases or in some settings it may be
either important or superfluous. In other cases it may even be detrimental (see
Herbert et al., 1973).
This early series also illustrated a second use of the single-case study (A-B).
In chapter 1 we noted that case studies can suggest initially that a new
technique is clinically effective, which can lead to more rigorous experimental
demonstration and direct replication. In a systematic replication series the
single-case study makes another appearance. Many reports are published that
include only one case, but replicate an earlier direct replication series in either
an experimental or an A-B form. Usually the reports are from different
settings and contain a slight twist, such as a new form of the behavior
disorder or a slight modification of the procedure. While these reports are less
desirable from the larger viewpoint of a systematic replication series, the fact
is that they are published. When a sufficient number accumulate, these
reports can provide considerable information on generality of findings. We
will return to this point later.
352 Single-case Experimental Designs
the most part not followed the type of detailed technique-building approach
described in chapter 2 that would ensure that treatment programs, such as
marital therapy, be as powerful as they might be.
(e.g., Davison, 1965) also suggested that this procedure was applicable to a
wide variety of behavior problems in children while at the same time provid
ing additional information on generality of findings across therapists and
settings.
Although studies of successful application of differential attention to a
single-case demonstrated that this procedure is applicable in a wide range of
situations, a more important development in the series was the appearance of
direct replication efforts containing three or more cases within the systematic
replication series. Although reports of single-cases are uniformly successful,
or they would not have been published, exceptions to these reports of success
can and do appear in series of cases, and these exceptions or failures begin to
define the limits of the applicability of differential attention.
For this reason, it is particularly impressive that many series of three or
more cases reported consistent success across many different clients, with
such behavior disorders as inappropriate social behavior in disturbed hospi
talized children (e.g., Laws, Brown, Epstein, & Hocking, 1971), disruptive
behavior in the elementary classroom (e.g., Cormier, 1969; R. V. Hall et al.,
1971; R. V. Hall, Lund, & Jackson, 1968) or high school classroom (e.g.,
Schutte & Hopkins, 1970), chronic thumb-sucking (Skiba, Pettigrew, &
Alden), disruptive behavior in the home (Veenstra, 1971; Wahler, Winkel,
Peterson, & Morrison, 1965), and disruptive behavior in brain-injured
children (R. V. Hall & Broden, 1967). These improvements occurred in many
different settings such as elementary and high school classrooms, hospitals,
homes, kindergartens, and various preschools. Therapists included profes
sionals, teachers, aides, parents, and nurses (see Table 10-2).
The consistency of their success was impressive, but as these series of cases
accumulated, the inevitable but extremely valuable reports of failures began
to appear. Almost from the beginning, investigators noted that differential
attention was not effective with self-injurious behavior in children. For
instance, Tate and Baroff (1966) noted that in the length of time necessary for
differential attention to work, severe injury would result. In place of differen
tial attention, a strong aversive stimulus—electric shock—proved effective in
suppressing this behavior. Later, Corte, Wolf, and Locke (1971) found that
differential attention was totally ineffective on mild self-injurious behavior in
retarded children but, again, electric shock proved effective. Because there
are no reports of success in the literature using differential attention for self-
injurious behavior, it is unlikely that these cases would have been published at
all if differential attention had not proven effective on other behavior disor
ders. Thus this is an example of a systematic replication series setting the
stage for reports of limitations of a procedure.
More subtle limitations of the procedure are reported in series of cases
wherein the technique worked in some cases, but not in others. In an early
series, Wahler et al. (1965) trained mothers of young, oppositional children in
T A B L E 1 0 -2 . S u m m a r y o f S t u d ie s o n D if f e r e n t ia l A t t e n t io n w ith C h ild r e n
EXPERIMENTAL
AUTHORS C L IE N T (s) N B E H A V IO R S E T T IN G T H E R A P IS T A N A L Y S IS
E X PE R IM E N T A L
AUTHORS CLIENT(s) N BEHAVIOR SETTING THERAPIST ANALYSIS
S loan e, J o h n sto n , & B ijou 4-yr.-old m ale 1 E xtrem e aggression , R em edial nursery Teachers Yes
(1967) tem per tantrum s, sc h o o l
and excessive
fantasy play
B uell, Stod d ard , H arris, & 3-yr.-old fem ale 1 L ack o f coop erative P resch o o l Teacher Yes
Baer (1968) play and participa program
tion in p resch ool
program
C arlson, A rn o ld , Becker, & 8-yr.-old fem ale 1 T antrum s C lassroom Teacher No
M adsen (1968)
Ellis (1968) 4- and 5-yr.-old m ales 5 A ggressive behavior L ab. sc h o o l Teacher and Yes
helper
357
B. V. H a ll, L und, & Jack son E lem entary sc h o o l 6 D isruptive and P overty area Teachers Yes
(1968) pupils daw d ling study classroom
behavior
R. V. H a ll, P a n y a n , R a b o n , 3 cla ssro o m s (1st, 24 S tu d y b eh avior C lassroom T eachers Yes
& Broden (1968) 6 th , 7th grades)
H art, R ey n o ld s, Baer, 5-yr.-old fem ale 1 U n co o p era tiv e play P resch o o l Teacher Yes
Brawley, & H arris (1968)
M adsen, Becker, & T h o m a s E lem entary sc h o o l 3 C lassroom C lassroom Teachers Yes
(1968) pupils disruption
N . J. R eyn old s & R isley 4-yr.-old fem ale 1 L ow frequency o f P resch o o l Teacher Yes
(1968) talking
D . R. T h o m a s, Becker, & 6- to ll-y r .-o ld m ales 10 D isruptive b ehavior C lassroom Teacher Yes
A rm strong (1968) and fem ales
D . R. T h o m a s, N ielso n , 6-yr.-old m ale 1 D isruptive beh avior C lassroom Teacher Yes
Kuypers, & B ecker (1968)
Wähler and P o llio (1968) 8-yr.-old m ale 1 E xcessive d ep en U n iversity clin ic P aren ts and Yes
dency and lack o f therapist
aggressive b ehavior
Ward & Baker (1968) lst-g ra d e children 4 D isruptive b ehavior C lassroom Teacher Yes
T A B L E 1 0 -2 . S u m m a r y o f S tu d ie s o n D if f e r e n t ia l A t t e n t io n w ith C h ild r e n (Continued)
EXPERIMENTAL
AUTHORS C L IE N T (s) N B E H A V IO R S E T T IN G T H E R A P IS T A N A L Y S IS
E X PE R IM E N T A L
AUTHORS CLlENT(s) N BEHAVIOR SETTING THERAPIST ANALYSIS
J. W right, C la y to n , & E dger Severely retarded 15 N egative b eh aviors S tate residential W ard tech n icians No
(1970) children in stitu tion
Buys (1971) 9 problem and 9 18 D evian t classroom C lassroom Teacher Yes
co n tro l elem entary beh avior
sc h o o l pupils
C orte, W olf, & L o ck e (1971) P r o fo u n d ly retarded 4 S elf-in ju riou s be- H o sp ita l training P r o fe ssio n a l3 No
ad o lescen ts h avior lab.
R. V H all et al. (1971) Individual pupils and E x.# D isruptive and W h ite, m idd le- Teacher Yes
cla ssro o m g rou p s &N talkin g-ou t class and black
from lst-g r a d e — 1. 1 behavior p overty
ju n io r high sc h o o l 2. 1 classroom
3. 1
4. 1
5 .3 0
6 .27
L aw s, B row n, E pstein, & Severely disturbed 8- 3 B ehavior that inter- S tate h osp ital S p eech therapist Yes
H ock in g (1971) and 9-yr.-old m ales fers w ith speech
and language
N ordquist (1971) SV i-yr.-old m ale 1 E nuresis and o p p o H om e P aren ts Yes
sition al b ehavior
Skiba, P ettigrew , & A ld en 8-yr.-old fem ales 3 T h u m b su ck in g C lassroom Teacher Yes
(1971)
J. D . T h o m a s & A d a m s W ell-behaved and 16 T ask-related b eh av C lassroom Teacher Yes
(1971) rem edial prim ary ior and low ering
sc h o o l pu p ils sou n d levels
Veenstra (1971) 5- to 14-yr.-old 4 D isrup tive b ehavior H om e M oth er Yes
siblings
Vukelich & H ak e (1971) 18-yr.-old severely 1 C h ok in g and State h osp ital W ard sta ff Yes
retarded fem ale grabbing
Yawkey (1971) 7-yr.-old fem ale 2 P o o r attending C lassroom Teacher Yes
7-yr.-old m ale behavior
T A B L E 1 0 -2 . S y m m a r y o f S tu d ie s o n D iff e r e n tia l A t t e n t io n w ith C h ild r e n (Continued)
E X P E R IM E N T A L
AUTHORS C L IE N T (s) N B E H A V IO R S E T T IN G T H E R A P IS T A N A L Y S IS
B arnes, W o o to n , & W ood 3- and 4-yr.-old m ales 24 Im m ature play M ental health P u b lic health Yes
(1972) and fem ales center nurse
R. V. H all et al. (1972) 4- and 8-yr.-old m ales 4 W h in ing and failure H om e P aren ts Yes
and 5- and lO-yr.-old to w ear orth o d o n tic
fem ales d evice
H asazi & H asazi (1972) 8-yr.-old m ale 1 D igit reversal C lassroom Teacher Yes
H erbert & Baer (1972) 5-yr.-old m ale and 2 Inappropriate b e H om e M other Yes
fem ale h avior in h o m e
Kirby & Shields (1972) 13-yr.-old m ale 1 N o n atten d in g and C lassroom Teacher Yes
p o o r arithm etic
Sajw aj, Tw ardosz, & Burke 7-yr.-old retarded 1 E xcessive con versa R em edial Teacher Yes
(1972) m ale tio n w ith teacher p resch ool
T\vardosz & Sajw aj (1972) 4-yr.-old h yperactive 1 Sitting R em edial Teacher Yes
retarded m ale presch ool
360
C ossairt, H a ll, & H o p k in s 3rd- and 4th-grade 12 L ow atten d in g and E lem en tary Teachers Yes
(1973) m ales and fem ales in stru ctio n -fo llo w sc h o o ls
ing behavior
H erbert et al. (1973) 5- and 6-yr.-old 6 D eviant P resch o o l class M oth ers Yes
fem a les, 5-, 7- and room and obser
8-yr.-old m ales vation lab.
P in k sto n , R eese, L eB la n c, & 3 ,/ 2-yr.-old m ale 1 A ggressive b eh av P resch o o l Teacher Yes
B a e r (1973) iors w ith peers and classroom
low peer interaction
B udd, G reen, & Baer (1976) 3-yr.-old fem ale 1 N o n co m p lia n ce U n iversity lab. M oth er Yes
w ith instru ctions ro o m
and con sid erable
d em and s for
atten tion
M u n ford & L iberm an (1978) 13-yr.-old m ale 1 O perant co u gh in g 1. H osp ita l 1. H o sp ita l s ta ff Yes
2. H o m e 2. P aren ts
Varni, R u sso , & C a ta ldo ll-y r .-o ld m ale 1 D elu sion al speech P sychiatric G rad u ate stu d en t Yes
(1978) hosp ital
Comment on replication
In our view, data on failures are a sign of the maturity of a systematic
replication series. Only when a procedure is proven successful through many
replications, do negative results assume this importance. But these failures do
not detract from the successful replications. The effectiveness of differential
attention has been established repeatedly. These data do, however, indicate
that there are conditions that even today are not fully understood that limit
generality of effectiveness and that practitioners must proceed with caution
(Wahler et al., 1979).
In conclusion, this advanced systematic replication series on differential
attention has generated a great deal of confidence among practitioners. The
evidence indicates that it can be effective with adults and children with a
variety of behavioral problems in most any setting. The clinically oriented
books and monographs widely advocating its use, most often in combination
with other procedures as part of a treatment package (Forehand & McMa
hon, 1981; Jacobson & Margolin, 1979; Patterson, 1982; Paul & Lentz,
1977), have made this procedure available to numerous professionals con
cerned with behavior change, as well as to the consuming public. In fact,
most editors of appropriate journals probably would not consider accepting
another article on differential attention unless it illustrated a clear exception
to the effectiveness of this procedure, as did the Herbert et al. (1973) report.
However, the process of establishing generality of findings across all rele
vant domains is a slow one indeed, and it will probably be years before we
know all we should about this treatment or other treatments currently under
going systematic replication. As we pointed out in the context of adult
psychotic behavior, investigators probably proceeded too quickly to incor
porating differential attention into various package treatments without fully
understanding the limits of its effects. Even with the very informative and
complete systematic replication series on childhood problems, we do not yet
know what predicts failure from differential attention. In fact, there are
many promising hypotheses to account for these failures (Paris & Cairns,
1972; Sajwaj & Dillon, 1977; Wahler, 1969a; Warren & Cairns, 1972). But
these have not yet been explored in the applied setting. Until the time that the
process of systematic replication reveals the precise limitations of a proce
dure, clinicians and other behavior change agents should proceed with cau
tion, but also with hope and confidence that this powerful process will
ultimately establish the conditions under which a given treatment is effective
or ineffective.
Guidelines for systematic replication
The formulation of guidelines for conducting systematic replication is
more difficult than for direct replication due to the variety of experimental
364 Single-case Experimental Designs
typically found in a systematic replication series (e.g., see Table 10-2), two
fall into this category: the experimental analysis containing only one case
and the group study.
As noted above, the report of a single-case, particularly when accompa
nied by an experimental analysis, can be a valuable addition to a series in
that it describes another setting, behavior disorder, or other item where the
procedure was successful. Reports of single-cases also may lead to direct
and systematic replication, as in the differential attention series. Unfortu
nately, however, failures in a single-case are seldom published in journals.
Among the numerous successful reports of single-case studies contained in
the differential attention series, very few reported a failure, although it is
our guess that differential attention has failed on many occasions, and
these failures simply have not been reported.
The group study suffers from the same limitation because failures are
lost in the group average. Again, group studies can play an important role
in systematic replication in that demonstration that a technique is success
ful with a given group, as opposed to individuals in the group, may serve
an important function (see section 2.9). In the differential attention series,
several investigators thought it important to demonstrate that the proce
dure could be effective in a classroom as a whole (e.g., Ward & Baker,
1968). These data contributed to generality of findings across several
domains. The fact remains, however, that failures will not be detected
(unless the whole experiment fails, in which case it would not be
published), thus leading us no closer to the goal of defining the conditions
in which a successful technique fails. In clinical replication, or field testing,
described below, one has more flexibility in examining results from large
groups of treated clients as long as it is possible to pinpoint individuals
who succeed or fail.
4. Finally, the question arises: When is a systematic replication series over?
For direct replication series, it was possible to make some tentative recom
mendations on a number of subjects, given experimental findings. With
systematic replication, no such recommendations are possible. In applied
research, we would have to agree with Sidman’s (1960) conclusion con
cerning basic research that a series is never over, because scientists will
always attempt to find exceptions to a given principle, as well they should.
It may be safe to say that a series is over when no exception to a proven
therapeutic principle can be found, but, as Sidman pointed out, this is
entirely dependent on the complexity of the problem and the inductive
reasoning of clinical researchers who will have to judge in the light of new
and emerging knowledge which conditions could provide exceptions to old
principles. Of course, series will eventually begin to “fade away,” as with
the differential attention series, when wide generality of applicability has
been established.
366 Single-case Experimental Designs
been developed for all coexisting problems, the next step would be to estab
lish generality of findings by replicating this treatment package on additional
patients who present a similar combination of problems. This would be
clinical replication (e.g., Wallace, 1982). The insertion of differential atten
tion, time-out, and other well-tested procedures into a “parenting” package is
a good example of technique building resulting in a treatment ready for
clinical replication.
Another name for clinical replication, then, could be field testing, because
this is where clinicians and practitioners take newly developed treatments or
newly modified treatments and apply them to the common, everyday prob
lems encountered in their practice. While this process can be carried out by
either full-time clinical investigators or scientist-practitioners (Barlow et al.,
1983), establishing the widest possible client and setting generality would
require substantial participation by full-time practitioners. The job of these
practitioners, then, would be to apply these treatments to large numbers of
their clients while observing and recording successes and failures and analyz
ing through experimental strategies, where possible, the reasons for this
individual variation. But even if practitioners are not inclined to analyze
causes for failures in the application of a particular treatment package, full
descriptions of these failures will be extremely important for those investiga
tors who are in a position to carry on this search (Barlow et al., 1983).
Thus, while all facets of single-case experimental research are much closer
to the procedures in clinical or applied practice than to other types of research
methodology (see below), clinical replication in its most elementary form
becomes almost identical with the activities of practitioners.
clear and clinically significant for several children, but the results were also
weak and clinically unimportant for several children. Thus the package has
only limited generality across clients, and the task remains to pinpoint dif
ferences between children who improved and those who did not improve.
From these differences, possible causes for limitations on client generality
should emerge.
In fact, children in this series were quite heterogeneous. In many respects,
this was due to an inherent difficulty in clinical replication—the vagueness
and unreliability of many diagnostic categories. As Lovaas et al. (1973)
pointed out, “ . . . the delineation of ‘autism* is one area that will demand
considerably more work. It has not been a particularly useful diagnosis. Few
people agree on when to apply it” (p. 156). It follows that heterogeneity of
clients will most likely be greater than in a direct replication series, where the
target behavior is well defined and clients can be matched more closely.
Thus the causes of failure in a series with mixed results are more difficult to
ascertain, due to the greater number of differences among individuals. Never
theless, it is necessary to pinpoint these differences and begin the search for
intersubject variability. As Lovaas et al (1973) concluded:
Finally a major focus of future research should attempt more functional descrip
tions of autistic children. As we have shown, the children responded in vastly
different ways to the treatment we gave them. We paid scant attention to
individual differences when we treated the first twenty children. In the future, we
will assess such individual differences, (p. 163)
schedule arrangement for a large group study, where years may pass before
publishable data are available.
Third, the experimental analysis of the single-case is close to the clinic. As
noted in chapter 1, this approach tends to merge the role of scientist and
practitioner. Many an important series has started only after the clinician
confronted an interesting case. Subsequently, measures were developed, and
an experimental analysis of the treatment was performed (Mills et al., 1973).
As a result, the data increase one’s understanding of the problem, but the
client also receives and benefits from treatment. If one plans to treat the
patient, it is an easy enough matter to develop measures and perform the
necesssary experimental analyses. The recent book mentioned above (Barlow
et al., 1983) was designed to explore this potential in our full-time practi
tioners by demonstrating how they can incorporate these principles into their
practices and thereby participate in the research process. This ability to work
with ease within the clinical setting, more than any other fact, may ensure the
future of meaningful replication efforts.
Finally; as noted above, the results of the series are cumulative, and each
new replicative effort has some immediate payoff for the practicing clinician.
As this is the ultimate goal of the applied researcher, it is far more satisfactory
than participating in a multiyear collaborative study where knowledge or
benefit to the clinician is a distant goal.
Nevertheless, the advancement of a systematic replication series is a long
and arduous road full of pitfalls and dead ends. In the face of the immediate
demands on clinicians and behavior change agents to provide services to
society, it is tempting to “grab the glimmer of hope” provided by treatments
that prove successful in preliminary reports or case studies. That these hopes
have been repeatedly dashed as therapeutic techniques and schools of therapy
have come and gone supplies the most convincing evidence that the slow but
inexorable process of the scientific method is the only way to meaningful
advancement in our knowledge. Although we are a long way from the
sophistication of the physical sciences, the single case experimental design
with adequate replication may provide us with the methodology necessary to
overcome the complex problems of human behavior disorders.
Hiawatha Designs an Experiment
Maurice G . Kendall
(Originally published in The American Statistician, Dec. 1959, Vol. 13,
No. 5. Reprinted by Permission).
Thus it happened in the contest All the same, his fellow tribesmen
That their scores were most Ignorant, benighted heathens,
impressive Took away his bow and arrows,
With one notable exception Said that though my Hiawatha
This (I hate to have to say it) Was a brilliant statistician
Was the score of Hiawatha, He was useless as a bowman.
Who, as usual, shot his arrows As for variance components,
Shot them with great strength and Several of the more outspoken
swiftness Made primeval observations
Managing to be unbiased Hurtful to the finer feelings
Not, however, with his salvo Even of a statistician.
Managing to hit the target.
There, they said to Hiawatha In a corner of the forest
That is what we all expected. Dwells alone my Hiawatha
Permanently cogitating
Hiawatha, nothing daunted, On the normal law of error,
Called for pen and called for paper Wondering in idle moments
Did analyses of variance Whether an increased precision
Finally produced the figures Might perhaps be rather better,
Showing, beyond peradventure, Even at the risk of bias,
Everybody else was biased If thereby one, now and then,
And the variance components could
Did not differ from each other Register upon the target.
SCfcD—M
References
& F la n a g a n , B. (1975, D ecem b er). A controlled
A b el, G . G ., B lan ch ard , E . B ., B a rlo w , D . H .,
behavioral treatment o f a sadistic rapist. P aper p resented at the m eetin g o f the A sso c ia tio n for
A d va n cem en t o f B eh a v io r Therapy, San F ran cisco.
A g r a s, W. S . (1 9 7 5 ). B eh a v io r m o d ifica tio n in the general h osp ital p sychiatric u nit. In H .
L eiten b erg (E d .), Handbook o f behavior modification (pp . 5 4 7 -5 6 5 ). E n g lew o o d C liffs , N J:
P r en tice-H a ll.
A gras, W. S ., B a rlo w , D . H ., C h a p in , H . N ., A b e l, G . G ., & L eitenberg, H . (1974). B ehavior
m od ifica tio n o f an o rex ia n erv o sa . Archives o f General Psychiatry; 30 , 2 7 9 -2 8 6 .
A gras, W. S ., K azdin, A . E ., & W ilso n , G . T. (1979). Behavior Thearpy: Toward an applied
clinical science. San F rancisco: W. H . F reem an.
A gras, W. S ., L eiten b erg, H ., & B a r lo w , D . W. (1 9 6 8 ). S o cia l rein forcem en t in the m od ification
o f ago ra p h o b ia . Archives o f General Psychiatry; 19, 4 2 3 -4 2 7 .
A gras, W. S ., L eiten b erg, H ., B a rlo w , D . H ., C u rtis, N . A ., E d w ard s, J. A ., & W right, D . E .
(1971). R elaxation in sy stem a tic d ese n sitiz a tio n . Archives o f General Psychiatry, 25, 5 1 1 -5 1 4 .
A gras, W. S ., L eitenberg, H ., B a rlo w , D . H ., & T h o m so n , L . E . (1969). Instructions and
rein forcem en t in th e m o d ifica tio n o f neurotic behavior. American Journal o f Psychiatry, 125,
1435-1 4 3 9 .
A lfo r d , G . S ., B lan ch ard , E . B ., & B uckley, M . (1972). Treatm ent o f hysterical v om itin g by
m odifica tio n o f social co n tin g en cies: A case study. Journal o f Behavior Therapy and Experi
mental Psychiatry, 3, 2 0 9 -2 1 2 .
A lfo r d , G . S ., W ebster, J. S ., & S a n d ers, S. H . (19 8 0 ). C overt aversion o f tw o interrelated d eviant
sexual p ractices: O b scen e p h o n e ca llin g and ex h ib itio n ism . A sin gle case an alysis. Behavior
Therapy; 11, 1 5 -2 5 .
A llen , K. E ., & H arris, F. R . (1966). E lim in a tio n o f a ch ild ’s excessive scratching b y training the
m other in rein forcem en t p roced u res. Behaviour Research and Therapy, 4, 7 9 -8 4 .
A lle n , K. E ., H a rt, B . M.„ B u ell, J. S ., H arris, F. R ., & W olf, M . M . (1964). E ffects o f social
reinforcem ent o n iso la te b eh a v io r o f a nursery sc h o o l child. Child Development, 35, 5 1 1 -5 1 8 .
A llen , K. E ., H en k e, L . B ., H arris, F. R ., Baer, D . M ., & R ey n old s, N . J. (1967). C on trol o f
h yperactivity b y so c ia l rein fo rcem en t o f atten d in g behavior. Journal o f Educational Psychol
ogy, 58, 2 3 1 -2 3 7 .
A lliso n , M . G ., & A y llo n , T. (1 9 8 0 ). B ehavioral c o a ch in g in the d evelop m en t o f skills in fo o tb a ll,
gym n a stics, a n d ten nis. Journal o f Applied Behavior Analysis, 13, 2 9 7 -3 1 4 .
A llp o r t, G . D . (1 9 6 1 ). Pattern and growth in personality. N e w York: H o lt, R inehart and
W in sto n .
A llp o r t, G . D . (1 9 6 2 ). T h e general a n d the u n iqu e in p sy ch o lo g ical scien ce. Journal o f Personal
ity, 30, 4 0 5 -4 2 2 .
Behaviour, 49, 2 2 7 -2 6 7 .
A ltm a n , J. (1 9 7 4 ). O b serv a tio n a l stu d y o f b ehavior: S a m p lin g m eth od s.
A m erican P sy ch o lo g ica l A sso c ia tio n . (1973). Ethical principles in the conduct o f research with
human participants. W ash in gton , D C : A uthor.
A n d erso n , R . L. (19 4 2 ). D istrib u tio n o f the serial co rrelation co efficien t. Annals o f Mathematical
Statistics, 13, 1 -1 3 .
374
References 375
C u vo, A . J ., & R iva, M . T. (1980). G en eralization an d transfer betw een com p reh en sion and
production: A co m p a riso n o f retarded and nonretarded p ersons. Journal o f Applied Behavior
Analysis, 13, 2 1 5 -2 3 1 .
D alton , K. (1959). M en stru ation and acu te p sychiatric illness. British Medical Journal, 1,
148-1 4 9 .
D alton , K. (1960a). M en stru ation and a ccid en ts. British Medical Journal, 2, 1425-1 4 2 6 .
D alton , K. (1960b). S ch o o l g irls’ beh avior and m en stru ation . British Medical Journal, 2,
1647-1 6 4 9 .
D alton, K. (1961). M en stru ation and crim e. British Medical Journal, 2, 1752-1753.
D avid so n , P. O ., & C o ste llo , C . G . (1969). N = 1: Experimental studies o f single cases. N ew York:
Van N o stra n d R ein h o ld .
D avis, K. V., S prague, R . L ., & Werry, J. S. (1969). Stereotyped behavior and activity level in
severe retardates: T h e e ffe ct o f drugs. American Journal o f Mental Deficiency, 73, 7 2 1 -7 2 7 .
D avis, V. J ., P o lin g , A . D ., W y so ck i, T., & B reuning, S . E . (1981). E ffects o f P h e n ytoin
w ithdraw al o n m atch in g to sam p le and w o rk sh o p p erfo rm a n ce o f m entally retarded persons.
Journal o f Nervous and Mental Disease, 169, 7 1 8 -7 2 5 .
D av iso n , G . C . (1 9 6 5 ). T h e training o f u n dergraduates as social reinforcers for au tistic children.
In L. P. U llm a n n & L . Krasner (E d s.), Case studies in behavior modification (pp. 146-148).
N ew York: H o lt, R inehart and W in sto n .
D eP rosp ero , A ., & C o h en , S. (1979). In con sisten t visual analysis o f intrasubject d ata. Journal o f
Applied Behavior Analysis, 12, 5 7 3 -5 7 9 .
D o k e, L . A . (1976). A ssessm en t o f ch ild ren ’s behavioral d eficits. In M . H ersen & A . S. Bellack
(E d s.), Behavioral assessment (pp. 4 9 3 -5 3 6 ). E lm sfo rd , N ew York: P ergam on P ress.
D o k e, L . A ., & Risley, T. R . (1972). T h e org a n iza tio n o f d ay-care en viron m en ts: R equired vs
op tio n a l activities. Journal o f Applied Behavior Analysis, 5, 4 0 5 -4 2 0 .
D ollard , J ., D o o b , L. W ., M iller, N . E ., M owrer, O . H ., & Sears, R . R. (1939). Frustration and
aggression. N ew H aven: Yale U niversity P ress.
D o m ash , M . A ., S ch n elle, J. E , S to m a tt, E . L ., Carr, A . F., L arson, L. D ., Kirchner, R . E ., &
Risley, T. R. (1980). P o lic e and pro secu tio n system s: A n ev a lu ation o f a p olice crim inal case
preparation p rogram . Journal o f Applied Behavior Analysis, 13, 3 9 7 -4 0 6 .
D rabm an, R. S ., H am m er, D ., & R o sen b a u m , M . S. (1979). A ssessin g gen eralization in behavior
m odification w ith children: T h e g en eralization m a p. Behavioral Assessment, 1, 2 0 3 -2 1 9 .
D u k es, W. F. (1965). N = 1. Psychological Bulletin, 64, 7 4 -7 9 .
d u M as, F. M . (19 5 5 ). S cien ce and the single case. Psychological Reports, 1, 6 5 -7 5 .
D u n lap , G ., & K oegel, R . L . (1980). M o tiv a tin g autistic children through stim ulus variation.
Journal o f Applied Behavior Analysis, 13, 6 1 9 -6 2 7 .
Habits: Their making and unmaking. N ew York: Liverright.
D u n lap , K. (1932).
Dyer, K ., C hristian, W. P., & L u ce, S. C . (1982). T h e role o f resp onse delay in im provin g the
d iscrim ination p erfo rm a n ce o f autistic children. Journal o f Applied Behavior Analysis, 15,
2 3 1 -2 4 0 .
E delberg, R . (1972). E lectrical a ctivity o f the sk in . In N . S. G reenfield & R. A . Sternbach (E d s.),
H a n d b o o k o f p sy c h o p h y sio lo g y (pp. 3 6 7 -4 1 8 ). N ew York: H o lt, R inehart and W in ston .
E d gin gto n, E. S. (1966). Statistical inference and n on ra n d o m sam p les. Psychological Bulletin,
66, 4 8 5 -4 8 7 .
E d gin gto n, E . S. (1967). S tatistical inference from N = 1 experim ents. Journal o f Psychology, 65,
195-1 9 9 .
E d gin gton, E . S . (1969). Statistical inference: The distribution-free approach. N ew York: M c
G raw -H ill.
E d gin gton, E . S. (1972). N = 1 experim ents: H y p o th esis testing. Canadian Psychologist, 13,
121-1 3 5 .
E d gin gton, E. S. (1980a). Randomization tests. N ew York: M arcel Dekker.
E d gin gton, E . S. (1980b). Validity o f r a n d o m iza tio n tests for on e-su b ject exp erim en ts. Journal o f
382 Single-case Experimental Designs
Educational Statistics, 5, 2 3 5 -2 5 1 .
E d gin gto n , E . S. (19 8 2 ). N on p a ra m etric tests for single-subject m ultip le schedu le experim ents.
Behavioral Assessment, 4, 8 3 -9 1 .
E d gin gto n , E . S. (1983). R esp o n se-g u id ed ex p erim en ta tio n . Contemporary Psychology, 28,
6 4 -6 5 .
E d g in g to n , E . S . (1984). S tatistics and sin gle ca se a n alysis. In M . H ersen , R . M . Eisler, & P. M .
M on ti (E d s.). Progress in Behavior Modification (Vol. 16). N ew York: A ca d em ic P ress.
E dw ards, A . L . (1968). Experimental design in psychological research (3rd ed .). N ew York: H o lt,
R inehart and W in sto n .
E gel, A . L ., R ich m a n , G . S ., & K o eg el, R . L . (1 9 8 1 ). N o rm a l peer m o d e ls and autistic ch ild ren ’s
learning.Journal o f Applied Behavior Analysis, 14, 3 -1 2 .
The A-B design: Effects o f token economy on
Eisler, R. M ., & H ersen , M . (A u g u st, 1973).
behavioral and subjective measures in neurotic depression. P aper presented at the m eetin g o f
the A m erica n P sy ch o lo g ica l A s so c ia tio n , M on treal.
Eisler, R . M ., H ersen , M ., & A g ra s, W. S . (1 9 7 3 ). E ffe c ts o f vid eo ta p e and instructional feed b ack
o n non verb al m arital interaction: A n a n a lo g study. Behavior Therapy, 4, 5 5 1 -5 5 8 .
Eisler, R . M ., M iller, P. M ., & H ersen , M . (1973). C o m p o n e n ts o f assertive behavior. Journal o f
Clinical Psychology; 29, 2 9 5 -2 9 9 .
E lkin, T. E ., H ersen , M ., Eisler, R . M ., & W illiam s, J. G . (1973). M od ification o f caloric intake
in an orexia n ervosa: A n experim ental a n alysis. Psychological Reports, 32, 7 5 -7 8 .
E llis, D . P. (1 9 6 8 ). T h e design o f a social structure to co n tro l aggression .Dissertation Abstracts,
29, 6 7 2 A .
E m m elk a m p , P. M . G . (1 9 7 4 ). S elf-o b se rv a tio n versus flo o d in g in the treatm ent o f a g orap h ob ia.
Behaviour Research and Therapy, 12, 2 2 9 -2 3 7 .
Phobic and obsessive-compulsive disorders: Theory; research and
E m m elk a m p , P. M . G . (19 8 2 ).
practice. N ew York: P le n u m .
E m m elk a m p , P. M . G ., & K w ee, K. G . (1977). O b sessio n a l rum inations: A co m p arison betw een
th ou g h t sto p p in g and p ro lo n g ed exp o su re in im a g in a tio n . Behaviour Research and Therapy,
15, 4 4 1 -4 4 4 .
E p stein , L . H ., B eck , S. J ., F ig u ero a , J ., F arkas, G ., K azdin, A . E ., D a n em an , D ., & Becker, D .
(1981). T h e e ffe c ts o f targeting im pro v em en ts in urine g lu co se o n m etab olic con trol in children
w ith insulin d ep en d en t d ia b etes. Journal o f Applied Behavior Analysis, 14, 3 6 5 -3 7 5 .
E p stein , L . H ., & H ersen , M . (1974). B ehavioral co n tro l o f hysterical gaggin g. Journal o f
Clinical Psychology, 30, 1 0 2 -1 0 4 .
E p stein , L . H ., H ersen , M ., & H em p h ill, D . P. (1974). M u sic feed b ack in the treatm ent o f ten sion
headache: A n experim ental ca se study. Journal o f Behavior Therapy and Experimental Psy
chiatry, 5, 5 9 -6 3 .
E tzel, B. C ., & G erw itz, J. L . (1967). E xperim ental m o d ification s o f caretaker-m aintained
highrate op eran t cryin g in a 6- and 2 0 -w eek -o ld in fa n t {Infans tyrannotearus): E xtin ction o f
crying w ith rein forcem en t o f eye co n ta ct and sm ilin g. Journal o f Experimental Child Psychol
ogy, 5, 3 0 3 -3 1 7 .
E vans, I. M . (1 9 8 3 ). B ehavioral a ssessm en t. In C . E . W allace (E d .), Handbook o f clinical
psychology: Vol. 1. Theory, research, and practice (pp . 3 9 1 -4 1 9 ). H o m e w o o d , IL: D o w Jon es-
Irw in.
E van s, I. M ., & W ilso n , F. E . (1983). B ehavioral assessm en t o n d ecision m aking: A theoretical
Perspectives on behavior therapy
an alysis. In M . R o sen b a u m , C . M . F ranks, & Y. J a ffe (E d s.),
in the eighties (Vol. 9 , (pp . 3 5 -5 3 ). N ew York: Springer P u b lishin g.
E yberg, S . M ., & J o h n so n , S. M . (1974). M u ltip le a ssessm en t o f beh avior m od ification w ith
fam ilies: E ffe c ts o f co n tin g e n c y con tractin g and order o f treated p rob lem s. Journal o f Con
sulting and Clinical Psychology, 42, 5 9 4 -6 0 6 .
E ysen ck , H . J. (1 9 5 2 ). T h e effe c ts o f psych oth erap y: A n ev a lu ation . Journal o f Consulting
Psychology, 16, 3 1 9 -3 2 4 .
References 383
Analysis, / , 3 1 5 -3 2 2 .
H allah an, D . P., L lo y d , J. W., Kneedler, R. D ., & M arshall, K. J. (1982). A com p arison o f the
effe cts o f self- versus teacher-assessm ent o f on -task behavior. Behavior Therapy, 13, 7 1 5 -7 2 3 .
H alle, J. W ., Baer, D . M ., & Sp rad lin, J. E . (1981). Teachers’ generalized use o f delay as a
stim ulus con trol procedure to increase lan gu age use in h andicapped ch ildren. Journal o f
Applied Behavior Analysis, 14, 3 8 9 -4 0 9 .
H arbert, T. L ., B arlow , D . H ., H ersen , M ., & A u stin , J. B. (1974). M easurem ent and m od ifica
tion o f in cestu o u s behavior: A ca se study. Psychological Reports, 34, 7 9 -8 6 .
H arris, F. R ., J o h n sto n , M . K ., Kelley, C . S ., & W olf, M . M . (1964). E ffects o f p ositive social
reinforcem ent o n regressed craw ling o f a nursery sc h o o l child. Journal o f Educational Psychol
ogy, 55, 3 5 -4 1 .
H art, B. M ., A lle n , K. E ., B uell, J. S ., H arris, F. R ., & W olf, M . M . (1964). E ffects o f social
reinforcem ent o n operant cryin g. Journal o f Experimental Child Psychology, 1, 145-153.
H art, B. M ., R ey n o ld s, N . J ., Baer, D . M ., Brawley, E . R ., & H arris, F. R. (1968). E ffect o f
con tingent social reinforcem ent on the co o p era tiv e play o f a p resch ool ch ild . Journal o f
Applied Behavior Analysis, 1, 7 3 -7 6 .
H artm an n , D . R (1974). F orcing square pegs in to round holes: S o m e com m e n ts on “ A n analysis-
of-varia n ce m odel for the intrasubject replication d esig n .” Journal o f Applied Behavior
Analysis, 7, 6 3 5 -6 3 8 .
H artm an n , D . P. (1976). S o m e restrictions in the a p p lica tio n o f the Spearm an-B row n prop h ecy
form u la to ob serv a tio n a l d ata. Educational and Psychological Measurement, 36, 8 4 3 -8 4 5 .
H artm an n , D . P. (1977). C o n sid era tio n in the c h o ice o f interobserver reliability estim ates.
Journal o f Applied Behavior Analysis, 10, 1 0 3 -1 1 6 .
H artm a n n , D . P. (1982). A ssessin g the d ep en d ab ility o f ob servation al data. In D . P. H artm an n n
(E d .), Using observers to study behavior: New directions fo r methodology o f social and
behavioral science (pp . 5 1 -6 5 ). San F rancisco: J o ssey -B a ss.
H artm an n , D . P. (1983). E d itorial. Behavioral Assessment, 5, 1-3.
H artm an n , D . P., & Gardner, W. (1979). O n the not so recent in ven tion o f interobserver reliability
statitics: A co m m en ta ry o n tw o articles by Birkim er and B row n. Journal o f Applied Behavior
Analysis, 12, 5 5 9 -5 6 0 .
H artm an n , D . P., & Gardner, W. (1981). C on sid era tio n s in assessing the reliability o f o b serva
tion s. In E . E . Filsinger & R. A . L ew is (E d s.), Assessing marriage (pp. 1 8 4-196). Beverly H ills:
Sage.
H artm an n , D . P., G o ttm a n , J. M ., J o n es, R . R ., Gardner, W ., K azdin, A . E ., & V aught, R . S.
(1980). Interrupted tim e-series analysis and its app lication to behavioral data. Journal o f
Applied Behavior Analysis, 13, 5 4 3 -5 5 9 .
H artm an n , D . P., & H a ll, R. V. (1976). T h e ch an gin g criterion d esign . Journal o f Applied
Behavior Analysis, 9, 5 2 7 -5 3 2 .
H artm an n , D . P., Roper, B. L ., & B radford , D . C . (1979). S o m e relationships betw een behavioral
and traditional assessm en t. Journal o f Behavioral Assessment, 1, 3 -2 1 .
H artm an n , D . P., Roper, B. L ., & G elfa n d , D . M . (1977). E v alu ation o f altern ative m odes o f
child psychotherapy. In B. L ahen & A . K azdin (E d s.), Advances in child clinical psychology
(Vol 1, pp. 1 -4 6 ). N ew York: P len u m .
H artm an n , D . P., & W ood , D . D . (1982). O b servation m eth o d s. In A . S . B ellack, M . H ersen , &
A . E . K azdin (E d s.), International handbook o f behavior modification and therapy (pp.
109-1 3 8 ). N ew York: P len u m .
H asazi, J. E ., & H a sa zi, S. E . (1972). E ffects o f teacher a tten tio n on digit-reversal behavior in an
elem entary sc h o o l ch ild . Journal o f Applied Behavior Analysis, 5, 157-162.
H a w k in s, R . P. (1 9 7 5 ). W h o decid ed that w as the p roblem ? Tw o stages o f responsibility for
app lied b eh a v io r an a ly sis. In W. S . W ood (E d .), Issues in evaluating behavior modification (pp.
9 5 -2 1 4 ). C h a m p a ig n , IL: R esearch P ress.
H aw k in s, R. P. (1979). T h e fu n ctio n s o f assessm ent: Im p lication s for selection and d evelop m en t
386 Single-case Experimental Designs
o f devices for a ssessin g repertoires in clin ical, ed u ca tio n a l, and other settin gs. Journal o f
Applied Behavior Analysis, 12, 5 0 1 -5 1 6 .
Using observers to
H aw k in s, R. P. (1982). D ev elo p in g a beh avior c o d e . In D . P. H a rtm an n (E d .),
study behavior: New directions fo r methodology o f social and behavioral science (pp . 2 1 -3 5 ).
San Francisco: J o ssey -B a ss.
H aw k in s, R. P., A x elro d , S ., & H a ll, R . V. (1976). Teachers as beh avior analysts: P recisely
m onito rin g stu d en t p erfo rm a n ce. In J. A . B righam , R. P. H aw k in s, J. S co tt, & J. F.
M cL aughlin (E d s.), Behavior analysis in education: Self-control and reading (pp . 2 7 4 -2 % ).
D u b u q u e, IA : K e n d a ll/H u n t.
H aw k in s, R . P., & D o b e s, R. W. (1977). B ehavioral definition s in ap p lied b eh avior analysis:
New directions in
E xplicit or im plicit. In B . C . E tzel, J. M . L eB lan c, & D . M . Baer (E d s.),
behavioral research: Theory, methods, and applications. In honor o f Sidney W. Bijou (pp.
1 6 7-1 8 8 ). H illsd a le, N J: E rlb au m .
H aw k in s, R . P., & D o tso n , V. A . (1975). R eliability scores that delude: A n A lice in W onderland
trip through the m islead in g characteristics o f interobserver agreem ent scores in interval record
ing. In E. R am p & G . Sem b (E d s.), Behavior analysis: Areas o f research and application (p p .
3 5 9 -3 7 6 ). E n g lew o o d C liffs , N J: P ren tice-H a ll.
H aw k in s, R . P., & Fabry, B. D . (1979). A p p lied beh a v io r analysis and interobserver reliability: A
com m enta ry o n tw o articles by B irkim er and B row n. Journal o f Applied Behavior Analysis,
12, 5 4 5 -5 5 2 .
H aw k in s, R. P., P eter so n , R . F., S ch w eid , E ., & B ijo u , S. W. (1966). B ehavior therapy in the
hom e: A m elio ra tio n o f p rob lem paren t-ch ild relations w ith the parent in a therapeutic role.
Journal o f Experimental Child Psychology, 4, 9 9 -1 0 7 .
Hay, L. R ., N e lso n , R . O ., & H ay, W. M . (1 9 8 0 ). M eth o d o lo g ic a l p rob lem s in the use o f
participation observers. Journal o f Applied Behavior Analysis, 13, 5 0 1 -5 0 4 .
Journal o f
H ayes, S. C . (1981). S in gle ca se exp erim en tal design and em pirical clinical practice.
Consulting and Clinical Psychology, 49, 1 9 3 -2 1 1 .
H ayn es, S. N . (1978). Principles o f behavioral assessment. N ew York: G ardner P ress.
H ayn es, S. N ., & W ilso n , C . C . (1979). Behavioral assessment. San F rancisco: Jossey-B ass.
H en d rick so n , J. M ., Strain, P. S ., TVemblay, A ., & S h o res, R. E . (1982). Interactions o f
behaviorally h a n d ica p p ed children: F u n ctio n a l effe c ts o f peer social in teraction s. Behavior
Modification, 6, 3 2 3 -3 5 3 .
H erbert, E . W ., & Baer, D . M . (1972). TVaining parents as b eh avior m odifiers: S elf-record in g o f
con tin g en t a tten tio n . Journal o f Applied Behavior Analysis, 5, 1 3 9-149.
&
H erb ert, E . W ., P in k sto n , E . M ., H a y d en , M . L ., S a jw a j, T. E ., P in k sto n , S ., C ord u a, G .,
J a c k so n , C . (1973). A d v erse e ffe c ts o f d ifferen tia l parental atten tion . Journal o f Applied
Behavior Analysis, 6, 1 5 -3 0 .
H erm an, S. H ., B arlow , D . H ., & A g ra s, W. S. (1974a). A n experim ental an alysis o f classical
co n d itio n in g as a m eth o d o f in creasing h eterosexual a rou sal in h o m o sex u a ls. Behavior
Therapy, 5, 3 3 -4 7 .
H erm an, S. H ., B arlow , D . H ., & A g ra s, W. S . (1 974b). A n exp erim ental an alysis o f exp osu re to
“exp licit” heterosexual stim uli as an e ffe ctiv e variable in ch an gin g arousal patterns o f h o m o
sexuals. Behaviour Research and Therapy, 12, 3 3 5 -3 4 5 .
Journal o f the Experimental Analysis o f Behavior,
H errnstein, R . J. (1970). O n the law o f e ffe c t.
13, 2 4 3 -2 6 6 .
H ersen , M . (1973). S elf-a ssessm en t o f fear. Behavior Therapy, 4, 2 4 1 -2 5 7 .
H ersen , M . (1 9 7 8 ). D o b eh a v io r therapists u se self-rep o rt as m ajor criteria? Behavioral Analysis
and Modification, 2, 3 2 8 -3 3 4 .
H ersen, M . (19 8 1 ). C o m p le x p rob lem s require co m p lex so lu tio n s. Behavior Therapy, 12, 15-29.
H ersen, M . (1982). S in gle-case experim ental d esigns. In A . S. B ellack , M . H ersen , & A . E.
K azdin (E d s.), International handbook o f behavior modification and therapy (pp . 1 6 7-201).
N ew York: P len u m .
References 387
H ersen, M ., & B ellack, A . S. (1976). A m ultip le-b aselin e analysis o f social-skills training in
ch ron ic schizoph renics. Journal o f Applied Behavior Analysis, 9, 2 3 9 -2 4 5 .
H ersen, M ., & B ellack, A . S. (E d s.), (1981). Behavioral assessment: A practical handbook (2nd
e d .). E lm sfo rd , N ew York: P erg a m o n P ress.
H ersen, M ., & B reuning, S. E . (E d s.), (in press). Pharmacological and behavioral treatment: An
integrated approach. N ew York: W iley.
H ersen, M ., Eisler, R. M ., A lfo r d , G . S ., & A g ra s, W. S. (1973). E ffects o f token eco n o m y on
n eurotic depression: A n experim ental an alysis. Behavior Therapy; 4 , 3 9 2 -3 9 7 .
H ersen, M ., Eisler, R. M ., & M iller, P. M . (1973). D ev elo p m en t o f assertive responses: C lin ical,
m easu rem en t, and research co n sid era tio n s. Behaviour Research and Therapy, 11, 5 0 5 -5 2 2 .
H ersen, M ., G u llick, E . L ., M ath ern e, P. M ., & H arbert, T. L . (1972). Instructions and
reinforcem ent in the m o d ifica tio n o f a con v ersio n reaction . Psychological Reports, 5 7 ,
7 1 9 -7 2 2 .
H ersen, M ., M iller, P. M ., & Eisler, R . M . (1973). Interactions b etw een alco h o lics and their w ives:
A d escriptive an alysis o f verbal and non -verbal behavior. Quarterly Journal o f Studies on
Alcohol, 5 4 , 5 1 6 -5 2 0 .
H ilgard, J. R . (1933). T h e e ffe c t o f early and delayed practice on m em ory and m otor perfor
m ances stu d ies by the m eth o d o f co -tw in co n tro l. Genetic Psychology Monographs, 74,
4 9 3 -5 6 7 .
H in son , J. M ., & M a lo n e, J. C ., Jr. (1980). L ocal contrast and m aintained gen eralization .
Journal o f the Experimental Analysis o f Behavior, 34, 2 6 3 -2 7 2 .
The evaluation o f psychiatric treatment. N ew York:
H o ch , P. H ., & Z u b in , J. (E d s.). (1964).
G rune & S tratton .
H olla n d sw o rth , J. G ., G la zeski, R . C ., & D ressei, M . E . (1978). U se o f social skills training in the
treatm ent o f extrem e an xiety and deficit verbal skills in the jo b interview settin g. Journal o f
Applied Behavior Analysis, II, 2 5 9 -2 6 9 .
H o lle n b e c k , A . R . (19 7 8 ). P ro b le m s o f reliability in o b serv a tion al research. In G . P. Sackett
(E d .), Observing behavior: Vol. 1. Data collection and analysis methods (pp. 7 9 -9 8 ). B alti
m ore: U n iv ersity P ark P ress.
H o llo n , S. D ., & B em is, K. M . (1981). S elf-rep ort and the assessm en t o f cogn itive fu n citon s. In
M . H ersen & A . S. B ellack (E d s.), Behavioral assessment: A practical handbook (2nd ed .) (pp .
12 5 -1 7 4 ). E lm sfo rd , N ew York: P erg a m o n P ress.
H o lm , R . A . (1978). T echniques o f recording ob serv a tio n a l data. In G . P. Sackett (E d .),
Observing behavior: Vol. 2. Data collection and analysis methods (pp . 9 9 -1 0 8 ). B altim ore:
U niversity Park Press.
H o lm e s, D . S. (1 9 6 6 ). T h e ap p lica tio n o f learning th e o ry to the treatm ent o f a sc h o o l behavior
problem : A case study. Psychology in the School, 3, 3 5 5 -3 5 9 .
H o ltz m a n , W. H . (1963). Statistical m o d e ls fo r the study o f ch an ge in the single case. In C . W.
H arris (E d .), Problems in measuring change (pp. 1 9 9 -2 1 1 ). M ad ison , W I: U niversity o f
W isconsin Press.
H o n ig , W. K. (E d .), (1966). Operant behavior: Areas o f research and application. N ew York:
A p p leto n -C en tu ry -C r o fts.
H o p k in s, B. L ., S ch u tte, R. C ., & G a rto n , K. L . (1971). T h e e ffe c ts o f access to a p layroom on
the rate and q u ality o f printing and w riting o f first- and secon d-grad e students. Journal o f
Applied Behavior Analysis, 4, 7 7 -8 7 .
H orn e, G . P , Yang, M . C . K ., & W are, W. B. (19 8 2 ). T im e series an alysis for single-subject
designs. Psychological Bulletin, 91, 1 7 8 -1 8 9 .
H orner, R . D ., & Baer, D . M . (1978). M u ltip le-p rob e technique: A variation o f the m ultiple
b aselin e. Journal o f Applied Behavior Analysis, 11, 1 8 9 -1 9 6 .
H o u se, A . E ., H o u se , B. J ., & C a m p bell, M . B . (1981). M easures o f interobserver agreem ent:
C alcu lation fo rm u la s and distribution effe c ts. Journal o f Behavioral Assessment, 3, 3 7 -5 7 .
H ubert, L . J. (1977). K appa revisited. Psychological Bulletin, 84, 2 8 9 -2 9 7 .
388 Single-case Experimental Designs
K azdin, A . E . (1973a). T h e e ffe ct o f resp onse co st and aversive stim ulation in suppressing
punished and n on -p u n ish ed speech d ysflu en cies. Behavior Therapy, 4, 7 3 -8 2 .
K azdin, A . E . (1 973b). M eth o d o lo g ic a l and assessm ent co n sid eration s in evalu atin g rein force
m ent program s in applied settin gs. Journal o f Applied Behavior Analysis, 6, 5 1 7 -5 3 1 .
K azdin, A . E . (1 9 7 7 ). A ssessin g the clinical or applied significance o f b eh avior ch an ge through
social v a lid a tio n . Behavior Modification, 1, 4 2 7 -4 5 3 .
K azdin, A . E . (1978). History o f behavior modification: Experimental foundations o f contem
porary research. B altim ore: U n iversity P ark P ress.
K azdin, A . E . (1979). U n o b tr u siv e m easures in behavioral a ssessm en t. Journal o f Applied
Behavior Analysis, 12, 7 1 3 -7 2 4 .
K azdin, A . E . (1 9 8 0 a ). O b stacles in using ra n d o m iza tio n tests in sin gle-case exp erim en tation .
Journal o f Educational Statistics, 5, 2 5 3 -2 6 0 .
K azdin, A . E . (1980b).Research design in clinical psychology. N ew York: H arper & R ow .
K azdin, A . E . (1981). D raw ing valid in feren ces fro m case stu d ies. Journal o f Consulting and
Clinical Psychology, 49, 1 8 3 -1 9 2 .
K azdin, A . E . (1982a). O bserver effe cts: R eactivity o f direct o b ser vation . In D . P. H artm an n
(E d .), Using observers to study behavior: New directions fo r methodology o f social and
behavioral science (pp . 5 -1 9 ). San F rancisco: Jo ssey -B a ss.
K azdin, A . E . (1 982b). Single-case research designs: Methods fo r clinical and applied settings.
N ew York: O x fo rd U n iversity P ress.
K azdin, A . E . (1982c). S y m p to n su b stitu tio n , gen era liza tio n , and response covariation : Im p lica
Psychological Bulletin, 91, 3 4 9 -3 6 5 .
tion s fo r psy ch o th era p y o u tc o m e .
K azdin, A . E . (in press). Behavior modification in applied settings, (3rd e d .). H o m e w o o d , 1L:
D orsey P ress.
K azdin, A . E ., & B o o tz in , R . R . (1972). T h e to k en eco n o m y : A n evalu ative review .Journal o f
Appleid Behavior Analysis, 5, 3 4 3 -3 7 2 .
K azdin, A . E ., & G eesey, S . (1977). S im u ltan eou s-treatm en t design com p arison s o f the effe c ts o f
earning reinforcers for o n e ’s peers versus fo r o n e se lf. Behavior Therapy, 8 , 6 8 2 -6 9 3 .
K azdin, A . E ., & H a rtm a n n , D . P. (1978). T h e sim u ltan eou s-treatm ent d esign . Behavior Therapy,
5, 9 1 2 -9 2 3 .
K azdin, A . E ., & K op el, S. A . (1975). O n resolving am bigu ities o f the m ultip le-b aselin e design:
Behavior Therapy, 6, 6 0 1 -6 0 8 .
P rob lem s and reco m m en d a tio n s.
Kelly, D . (1980). Anxiety and emotions: Physiologial basis and treatment. Springfield, 1L:
C harles C T h o m a s.
Kelly, J. A . (1980). T h e sim u lta n eo u s replication design: T h e u se o f a m ultip le b aselin e to
establish exp erim ental co n tro l in sin gle grou p social skills treatm ent stu d ies. Journal o f
Behavior Therapy and Expermental Psychiatry, 11, 2 0 3 -2 0 7 .
Kelly, J. A ., L au gh lin , C ., C la ib o rn e, M ., & P a tterso n , J. T. (1979). A grou p p rocedure for
teach in g jo b in terview ing skills to form erly h osp italized psychiatric patients. Behavior Thearpy,
10, 2 9 9 -3 1 0 .
Kelly, J. A ., Urey, J. R ., & P a tterso n , J. T. (1 9 8 0 ). Im proving heterosocial con versation al skills o f
m ale psychiatric patients through a sm all g ro u p training procedure. Behavior Therapy; 11,
179-188.
Kelly, M . B . (1977). A review o f ob serv a tio n a l d a ta -co llectio n and reliability procedures reported
in the Journal o f Applied Behavior Analysis. Journal o f Applied Behavior Analysis, 10,
9 7 -1 0 1 .
K endall, P C ., & Butcher, J. N . (1982). Handbook o f research methods in clinical psychology.
N ew York: W iley.
Kennedy, R . E . (1976). The feasibility o f time-series analysis o f single-case experiments. U n
published m anuscript.
K ent, R . N ., & Foster, S. L . (1977). D irect o b servation al procedures: M eth od o lo g ic a l issues in
naturalistic settin gs. In A . R. C im inero , K. S. C a lh o u n , & H . E . A d a m s (E d s.), Handbook o f
SCED—N
390 Single-case Experimental Designs
Lacey, J. I. (1959). P sy ch o p h y sio lo g ica l ap p ro a ch es to the evalu ation o f p sych oth erap eu tic
process and o u tc o m e . In E . A . R ubinstein & M . B . P a r lo ff (E d s.), Research in psychotherapy
(pp . 1 6 0 -2 0 8 ). W ash in gton , D C : N a tio n a l P u b lishin g C o .
L ang, P. J. (1968). Fear reduction an d fear behavior: P ro b lem s in treating a con stru ct. In J. M .
Shlien (E d .), Research in psychotherapy (V ol. 3, p p. 9 0 -1 0 2 ). W ashington, D C : A m erican
P sych o lo g ica l A sso c ia tio n .
L ast, C . G ., B arlow , D . H ., & O ’Brien, G . T. (1983). C o m p a rison o f tw o cogn itiv e strategies in
treatm ent o f a patient w ith generalized anxiety disorder. Psychological Reports, 5 3 , 19-2 6 .
L aw s, D . R ., B ro w n , R . A ., E p stein , J ., & H o ck in g , N . (1971). R ed u ction o f inappropriate social
behavior in disturbed children by an u ntrained p arap rofessional therapist. Behavior Therapy;
2 , 5 1 9 -5 3 3 .
L aw son , D . M . (1983). A lc o h o lism . In M . H ersen (E d .), Outpatient behavior therapy: A clinical
guide (pp . 1 4 3 -1 7 2 ). N ew York: G rune & Stratton .
L azarus, A . A . (1963). T he results o f beh avior therapy in 126 cases o f severe n eu rosis. Behaviour
Research and Therapy; 1, 6 9 -8 0 .
L azarus, A . A . (1 9 7 3 ). M u lti-m o d a l b eh avior therapy: Treating th e B A S IC ID . Journal o f
Nervous and Mental Disease, 756, 4 0 4 -4 1 1 .
L azarus, A . A ., & D a v iso n , G . C . (1971). C linical in n o v a tio n in research and p ractice. In A . E .
Bergin & S. L . G arfield (E d s.), Handbook o f psychotherapy and behavior change: An
empirical analysis (pp. 1 9 6 -2 1 3 ). N ew York: W iley.
Leitenberg, H . (A u g u st, 1973). In teraction design s. P aper read at A m erican P sy ch o lo g ica l
A sso c ia tio n , M on treal.
L eitenberg, H . (1973). T h e u se o f sin gle-case m e th o d o lo g y in p sych oth erap y research. Journal o f
Abnormal Psychology, 82, 8 7 -1 0 1 .
Leitenberg, H . (1976). B ehavioral ap p roach es to treatm ent o f n eu roses. In H . L eitenberg (E d .),
Handbook o f behavior modification and behavior therapy (p p ., 12 4 -1 6 7 ). E n g lew o o d C liffs,
N J: P ren tice-H all.
L eitenberg, H ., A g ra s, W. S ., E d w ard s, J. A ., T h o m so n , L . E ., & W in cze, J. P. (1970). P ractice
as a p sych oth erap eu tic variable: A n experim ental an alysis w ithin single cases. Journal o f
Psychiatric Research, 7, 2 1 5 -2 2 5 .
L eitenberg, H ., A g ra s, W. S ., T h o m so n , L. E ., & W right, D . E . (1968). F eed b ack in b eh avior
m odification: A n experim ental an alysis o f tw o p h ob ic cases. Journal o f Applied Behavior
Analysis, 1, 1 3 1 -1 3 7 .
L eonard, S . R ., & H a y es, S. C . (1983). Sexual fantasy a ltern a tio n . Journal o f Behavior Therapy
and Experimental Psychiatry, 14, 2 4 1 -2 4 9 .
L evin, J. R ., M a ra scu ilo , L. A ., & H u b ert, L . J. (1978). N = N on p aram etric ran d om ization
tests. In T. R . K ratochw ill (E d .), Single-subject research: Strategies fo r evaluating change (pp .
1 6 7 -1 9 7 ). N ew York: A ca d em ic P ress.
Levy, R . L ., & O lso n , D . G . (1 9 7 9 ). T h e sin gle-su b ject m eth o d o lo g y in clinical practice: A n
overview . Journal o f Social Service Research, 3, 2 5 -4 9 .
L ew in, K. (1933). Vectors, co g n itiv e p rocesses and Mr. T o lm a n ’s criticism . Journal o f General
Psychology, 8, 3 1 8 -3 4 5 .
L ew inso h n , P. M ., & L ib et, J. (1972). P leasu rab le even ts, activity schedu les, and dep ression .
Journal o f Abnormal Psychology, 79, 2 9 1 -2 9 5 .
L ew insoh n , P. M ., M isch el, W , C h a p lin, W , & B a rto n , R . (1980). S ocial com p ete n c e and
depression: T h e roles o f illu sory self-p ercep tio n s. Journal o f Abnormal Psychology, 89,
2 0 3 -2 1 2 .
L iberm an, R . P., D a v is, J ., M o o n , W , & M o o re, J. (1973). R esearch design for analyzing drug-
en viron m en t-b eh avior in teraction s. Journal o f Nervous and Mental Disease, 156, 4 3 2 -4 3 9 .
L iberm an, R . R , N eu ch terlein , K. H ., & W allace, C . J. (1982). S ocial skills training in the nature
o f schizoph renia. In C urran, J. P. & M o n ti, P. M . (E d s.), Social skills training (pp . 1 -5 6 ). N ew
York: G u ilfo rd Press.
392 Single-case Experimental Designs
L iberm an, R . P., & S m ith , V. (19 7 2 ). A m ultip le b aselin e stu d y o f system atic d esensitization in a
patient w ith m ultip le p h o b ia s. Behavior Therapy,; 3 , 5 9 7 -6 0 3 .
L iberm an, R . P., W heeler, E . G ., D eV isser, L . A ., K u eh n el, J ., & K uehnel, T. (1980). Handbook
o f marital therapy. N ew York: P le n u m .
L ick, J. R ., Sushinsky, L. W ., & M a lo w , R . (1977). S p ecificity o f F ear Survey S ch ed u le item s and
the prediction o f a v o id a n ce behavior. Behavior Modification, 7, 195 -2 0 4 .
L ight, F. J. (1971). M easu res o f resp onse a greem en t fo r q u alitative data: S o m e gen eralization s
and altern atives. Psychological Bulletin, 76, 3 6 5 -3 7 7 .
Lindsley, O . R . (1 9 6 2 ). O perant co n d itio n in g tech n iqu es in the m easu rem en t o f p sych op h a rm a co -
logical resp o nse. In J. H . N o d in e & J. H . M o y er (E d s.), Psychosomatic medicine: The first
Hahnemann symposium on psychosomatic medicine (p p . 3 7 3 -3 8 3 ). P hilad elp h ia: L ea &
Febiger.
L in eh an , M . M . (1 9 8 0 ). C o n ten t validity: Its relevance to behavioral a ssessm en t. Behavioral
Assessment, 2 , 1 4 7 -1 5 9 .
L ovaas, O . I ., B erberich, J. P., P e r lo ff, B . F., & S chaeffer, B . (1966). A cq u isitio n o f im itiative
speech by sch izo p h ren ic ch ild ren . Science, 767, 7 0 5 -7 0 7 .
L ovaas, O . I ., F reitas, L ., N e ls o n , K ., & W h a len , C . (1967). T h e estab lish m en t o f im itation and
its use fo r the d ev elo p m en t o f co m p lex b eh avior in schizoph renic children. Behaviour Research
and Therapy, 5 , 1 7 1 -1 8 1 .
L ovaas, O . I ., K o eg el, R ., S im m o n s, J. Q ., & L o n g , J. D . (1973). S o m e gen eralization and
fo llo w -u p m easures o n au tistic children in b eh a v io r therapy. Journal o f Applied Behavior
Analysis, 5 , 1 3 1 -1 6 6 .
L ovaas, O . L , Sch aeffer, B ., & S im o n s, J. Q . (1965). E xperim ental studies in ch ild h o o d
schizophrenia: B uild in g so cia l b eh aviors u sin g electric sh o ck . Journal o f Experimental Re
search in Personality; 7, 9 9 -1 0 9 .
L o vaas, O . I ., & S im m o n s, J. Q . (1969). M a n ip u la tio n o f self-d estru ction in three retarded
ch ild ren . Journal o f Applied Behavior Analysis, 2 , 1 4 3 -1 5 7 .
L uborsky, L . (1 9 5 9 ). P sych otherap y. In P. R . F a rn sw o rth & Q . M cN em ar (E d .), Annual review o f
psychology (p p . 3 1 7 -3 4 4 ). P a lo A lto , C A : A n n u a l R eview .
L ym an , R . D ., R ich ard , H . C ., & Elder, I. R . (1975). C o n tin g en cy m anagem en t o f self-rep ort
and clean in g behavior. Journal o f Abnormal Child Psychology, 3 , 1 5 5 -1 6 2 .
M ad sen , C . H ., Becker, W. C ., & T h o m a s, D . R . (1968). R u les, praise, and ignoring: E lem en ts o f
elem entary cla ssro o m c o n tro l. Journal o f Applied Behavior Analysis, 7, 139 -1 5 0 .
M alan, D . H . (1 9 7 3 ). T h erapeutic factors in a n alytically orien ted b rief p sychotherapy. In R . H .
Support, innovation and autonomy (pp. 1 8 7 -2 0 5 ). L on d on : T avistock.
G oslin g (E d .),
Journal o f the Experimental
M a lo n e, J. C ., Jr. (1 9 7 6 ). L ocal contrast and P a v lo v ia n in d u c tio n .
Analysis o f Behavior, 2 6 , 4 2 5 -4 4 0 .
M an d ell, R . M ., & M a n d ell, M . P. (1967). S u icid e and the m enstrual cycle. Journal o f the
American Medical Association, 200, 7 9 2 -7 9 3 .
M an n , R . A . (1972). T h e behavior-therapeutic use o f co n tin g en cy con tractin g to con trol an adult
b eh avior problem : W eight c o n tro l. Journal o f Applied Behavior Analysis, 5, 9 9 -1 0 9 .
M ann, R . A ., & Baer, D . M . (19 7 1 ). T h e e ffe cts o f receptive la n gu age training on articu lation .
Journal o f Applied Behavior Analysis, 4, 2 9 1 -2 9 8 .
M an n , R . A ., & M o ss, G . R . (1973). T h e therapeutic use o f a to k en eco n o m y to m anage a you n g
and assau ltive inpatient p o p u la tio n . Journal o f Nervous and Mental Disease, 157, 1 -9 .
M an sell, J. (1982). R ep eated direct replication o f A B design s (L etter to the E ditor). Journal o f
Behaviour Therapy and Experimental Psychiatry, 13, 2 6 1 -2 6 2 .
M arks, I. M . (19 7 2 ). F lo o d in g (im p lo sio n ) and allied treatm ents. In W. S . A gras (E d .), Behavior
modification: Principles and clinical applications (pp. 1 5 1 -2 1 3 ). B oston : L ittle, B row n.
M arks, I. M . (1981). N ew d ev elo p m en ts in p sy ch o lo g ica l treatm ents o f p h o b ia s. In M . R .
M avissak alian & D . H . B arlow (E d s.), Phobia: Psychological and pharmacological treatment
(pp. 1 7 5 -1 9 9 ). N ew York: G u ilfo rd P ress.
References 393
M arks, I. M ., & Gelder, M . G . (1 9 6 7 ). T ransvestism and fetishism : C linical and p sych ological
changes during faradic aversio n . British Journal o f Psychiatry; 113, 7 1 1 -7 2 9 .
M artin, G ., P a llo tta -C o rn ick , A ., J o h n sto n e, G ., & C e lso -G o y o s, A . (1980). A supervisory
strategy to im prove w ork p erfo rm a n ce for low er fu n ctio n in g retarded clients in a sheltered
w o rk sh o p . Journal o f Applied Behavior Analysis, 13, 1 8 5 -1 9 0 .
M artin, P. J ., & Lindsey, C . J. (1976). Irregular discharge as an u n ob tru sive m easure o f . . .
som ething: S o m e a d d itio n a l tho u g h ts. Psychological Reports, 38, 6 2 7 -6 3 0 .
M ash , E . J ., & M a k o h o n iu k , G . (1975). T h e e ffe c ts o f prior in fo r m a tio n and behavioral
predictability o n observer accuracy. Child Development, 46, 5 1 3 -5 1 9 .
M ash, E . J ., & Terdal, L. G . (E d s.). (1 9 8 1 ). Behavioral assessment o f childhood disorders. N ew
York: G u ilfo rd P ress.
M a tso n , J. L . (1981). A ssessm en t a n d treatm ent o f clinical fears in m en tally retarded children.
Journal o f Applied Behavior Analysis, 14, 2 8 7 -2 9 4 .
M atson , J. L . (1982). T h e treatm ent o f b eh avioral ch aracteristics o f d ep ression in the m entally
retarded. Behavior Therapy, 13, 2 0 9 -2 1 8 .
M avissak alian , M . R ., & B arlow , D . H . (1981a). A ssessm en t o f ob sessiv e-co m p u lsiv e disorders.
In D . H . B arlow (E d .), Behavioral assessment o f adult disorders (pp . 2 0 9 -2 3 9 ). N ew York:
G u ilfo rd P ress.
M avissak alian , M . R ., & B arlow , D . H . (1 9 8 1 b ). P h o b ia : A n ov erview . In M . R . M avissak alian
& D . H . B arlow (E d s.), Phobia: Psychological and pharmacological treatment (pp . 1 -3 5 ). N ew
York: G u ilfo rd P ress.
M avissak alian , M . R ., & B arlow , D . H . (E d s.). (19 8 1 c). Phobia: Psychological and pharmacolo
gical treatment. N ew York: G u ilfo rd P ress.
M ax, L . W. (1935). B reaking up a h o m o sex u a l fixation by the c o n d itio n ed reaction techique: A
case study. Psychological Bulletin, 32, 734.
May, P. R. A. (1973). Research in psychotherapy and psychoanalysis. International Journal o f
Psychiatry, 1, 7 8 -8 6 .
M cC allister, L . W ., S ta c h o w ia k , J. G ., Baer, D . M ., & C o n d erm a n , L. (1969). T h e a p p lication o f
operant c o n d itio n in g tech n iq u es in a seco n da ry sc h o o l cla ssro o m . Journal o f Applied Behavior
Analysis, 2, 2 7 7 -2 8 5 .
M cC leary, R ., & H ay, R . A ., Jr. (1 9 8 0 ). Applied time series analysis fo r the social sciences.
Beverly H ills: Sage.
M cC u llo u g h , J. P , C o rn ell, J. E ., M cD a n iel, M . H ., & M euller, R . K. (1974). U tiliz a tio n o f the
sim u ltan eou s treatm ent design to im prove student beh a v io r in a first-grade cla ssro o m . Journal
o f Consulting and Clinical Psychology, 42, 2 8 8 -2 9 2 .
M cF all, R . M . (1970). E ffects o f self-m o n ito rin g o n n orm al sm ok in g behavior. Journal o f
Consulting and Clinical Psychology, 35, 1 3 5 -1 4 2 .
M cF all, R. M . (1977). A n a lo g u e m eth o d s in beh avioral a ssessm en t: Issues an d p rosp ects. In J. D .
C o n e & R . P. H a w k in s (E d s.), Behavioral assessment: New direction in clinical psychology (pp.
152-1 7 7 ). N ew York: B ru n n er/M a z el.
M cF all, R . M ., & L illesa n d, D . B. (1971). B ehavior rehearsal w ith m o d e lin g and c oach in g in
assertion training. Journal o f Abnormal Psychology, 77, 3 1 3 -3 2 3 .
M cF arlain, R . A ., & H ersen , M . (19 7 4 ). C o n tin u o u s m easurem ent o f activity level in psychiatric
patients. Journal o f Clinical Psychology, 30, 3 7 -3 9 .
M cK night, D . L ., N e lso n , R . O ., H a y es, S . C ., & Jarrett, R . B . (1983). Im p ortan ce o f treating
individually a ssessed response classes in the a m elio ra tio n o f dep ression . Behavior Therapy.
M cL au ghlin , T. F., & M alaby, J. (1972). Intrinsic reinforcers in a classroom tok en econ om y.
Journal o f Applied Behavior Analysis, 5, 2 6 3 -2 7 0 .
M cL ean , A . P., & W h ite, K. G . (1981). U n d erm a tch in g and contrast w ithin c o m p o n e n ts o f
m ultiple schedu les. Journal o f the Experimental Analysis o f Behavior, 35, 2 8 3 -2 9 1 .
M cM ah o n , R . J ., & F oreh a n d , R . L. (1983). C o n su m er sa tisfa ction in behavioral treatm ent o f
children: T y p es, issu es, and reco m m en d a tio n s. Behavior Therapy, 14, 2 0 9 -2 2 5 .
394 Single-case Experimental Designs
M cN am a ra , J. R . (1972). T h e use o f self-m o n ito rin g tech n iqu es to treat nailbiting. Behaviour
Research and Therapy, JO, 1 9 3 -1 9 4 .
M cN am a ra , J. R ., & M a c D o n o u g h , T. S. (1972). S o m e m eth o d o lo g ica l con sid eration s in the
design an d im p lem en ta tio n o f b eh avior therapy research. Behavior Therapy, 5 , 3 6 1 -3 7 8 .
M elin , L ., & G o testa m , K. G . (1 9 8 1 ). T h e effe c ts o f rearranging w ard rou tines o f com m u n ication
and eating b eh aviors o f psych ogeriatric patients. Journal o f Applied Behavior Analysis, 14,
4 7 -5 1 .
M etcalfe, M . (1956). D em o n stra tio n o f a p sy ch o so m a tic relation sh ip . British Journal o f Medical
Psychology, 29, 6 3 -6 6 .
M ich ael, J. (1974). Statistical in feren ce for individual o rgan ism research: M ixed blessin g or
curse? Journal o f Applied Behavior Analysis, 7, 6 4 7 -6 5 3 .
M iller, P. M . (19 7 3 ). A n exp erim en tal an alysis o f retention co n trol training in the treatm ent o f
nocturnal enuresis in tw o in stitu tio n a lized a d o lescen ts. Behavior Therapy, 4, 2 8 8 -2 9 4 .
Miller, P. M ., H ersen , M ., Eisler, R . M ., & W atts, J. G . (19 7 4). C on tin gen t rein forcem en t o f
low ered b lo o d /a lc o h o l levels in an o u tp a tien t ch ron ic a lc o h o lic . Behaviour Research and
Therapy, 12, 2 6 1 -2 6 3 .
M ills, H . L ., A g ra s, W. S ., B a rlo w , D . H ., & M ills, J. R . (1973). C om p u lsive rituals treated by
response p revention: A n exp erim en tal a n alysis. Archives o f General Psychiatry, 28, 5 2 4 -5 2 9 .
M in k in , N ., B ra u k m a n n , C . J ., M in k in , B. L ., T im b ers, G . D ., T im b ers, B. J ., F ixsen , D . L .,
P h illip s, E . L ., & W o lf, M . M . (1 9 7 6 ). T h e social va lid a tio n and training o f con versation al
sk ills. Journal o f Applied Behavior Analysis, 9, 1 2 7 -1 3 9 .
M ischel, W. (1968). Personality and assessment. N ew York: W iley.
M itchell, S. K. (1 9 7 9 ). Interobserver a g reem en t, reliability, and gen eralizab ility o f data co llected
in ob serv a tio n a l stu d ies. Psychological Bulletin, 86, 3 7 6 -3 9 0 .
M o n ta g u e , J. D ., & C o le s, E . M . (1966). M ech an ism and m easurem ent o f the galvan ic skin
resp onse. Psychological Bulletin, 65, 2 6 1 -2 7 9 .
M on ti, P. M ., C orriv ea u , E . P., & C urran, J. P. (1982). S ocial skills training for psychiatric
patients: T reatm ent and o u tc o m e . In J. P. C urran & P. M . M on ti (E d s.), Social skills training
(pp. 1 8 5 -2 2 3 ). N e w York: G u ilfo rd P ress.
M o ses, L . E . (1 9 5 2 ). N o n p a ra m etric statistics fo r p sy ch o lo g ica l research. Psychological Bulletin,
49, 1 2 2 -1 4 3 .
M u n fo r d , P. R ., & L ib erm a n , R . P. (1978). D ifferen tia l a tten tion in the treatm en t o f op eran t
co u g h . Journal o f Behavioral Medicine, 1, 2 8 0 -2 8 9 .
N ath an , P. E ., Titler, N . A ., L o w e n stein , L . M ., S o lo m o n , P., & R o ssi, A . M . (1970). B ehavioral
Archives o f General Psychiatry, 22, 4 1 9 -4 3 0 .
analysis o f ch ron ic a lco h o lism .
Behavior therapies in the treatment o f anxiety
N ation a l Institute o f M en tal H ea lth . (1 9 8 0 ).
disorders: Recommendations fo r strategies in treatment assessment research. (Final report o f
NIMH conference HRFP NIMH ER-79-003). U n p u b lish ed m anu scrip t.
Nay, W. R . (19 7 7 ). A n a lo g u e m easures. In A . R . C im in ero , K. S. C a lh o u n , & H . E . A d a m s
(E d s).,Handbook o f behavioral assessment (pp . 2 3 3 -2 7 9 ). N ew York: W iley.
Multimethod clinical assessment. N ew York: G ardner P ress.
Nay, W. R . (1 9 7 9 ).
N ea le, J. M ., & O ltm a n n s, T. (1 9 8 0 ). Schizophrenia. N ew York: W iley.
N e e f, N . A ., Iw ata, B . A ., & P a g e , T. J. (1980). T h e effe c ts o f interspersal training versus high-
density rein forcem en t o n sp ellin g a cq u isitio n and reten tion . Journal o f Applied Behavior
Analysis, 13, 1 5 3 -1 5 8 .
N elso n , R . O . (1 9 7 7 ). M e th o d o lo g ic a l issu es in assessm en t via se lf-m on itorin g. In J. D . C o n e &
R . P. H a w k in s (E d s.), Behavioral assessment: New directions in clinical psychology (pp .
2 1 7 -2 5 4 ). N ew York: B ru n n er/M a z el.
N e lso n , R. O ., & H a y es, S. C . (1979). S o m e current d im en sion s o f behavioral assessm en t.
Behavioral Assessment, 1, 1 -1 6 .
N elso n , R . O ., & H a y es, S. C . (1981). N a tu re o f behavioral assessm en t. In M . H ersen & A . S.
Bellack (E d s.), Behavioral assessment: A practical handbook, (2nd e d .) (pp. 3 -3 7 ). E lm sfo rd ,
References 395
R eyn old s, G . S. (19 6 8 ). A primer o f operant conditioning. G len view , 1L: S cott, F oresm an .
R eyn old s, N . J ., & Risley, T. R . (1968). T h e role o f social and m aterial reinforcers in increasing
talking o f a d isad van taged p resch o o l ch ild . Journal o f Applied Behavior Analysis, 1, 2 5 3 -2 6 2 .
R ickard, H . C ., D ig n a m , P. J ., & H orner, R. F. (1960). Verbal m anip u lation in a p sych oth era
peutic relationship. Journal o f Clinical Psychology; 76, 1 6 4 -1 67.
R ickard, H . C ., & D in o ff, M . (19 6 2 ). A fo llo w -u p n o te on “ Verbal m anip u lation in a p sy
ch otherap eu tic relation sh ip .” Psychologicl Reports, 77, 506.
R ickard, H . C ., & S au n ders, T. R . (1971). C o n tro l o f “clea n -u p ” behavior in a sum m er cam p .
Behavior Therapy, 2 , 3 4 0 -3 4 4 .
Risley, T. R . (1968). T h e e ffe c ts a n d sid e-effe cts o f pu n ish in g the au tistic beh aviors o f a deviant
child. Journal o f Applied Behavior Analysis, 7, 2 1 -3 4 .
Risley, T. R. (1970). B ehavior m odification: A n experim ental-therapeutic endeavor. In L. A .
H am erlyn ck , P. O . D a v id so n , & L. E. A cker (E d s.), Behavior modification and ideal health
services (pp. 1 0 3 -1 2 7 ). C algary, A lb erta , C anada: U niversity o f C algary P ress.
Risley, T. R ., & W olf, M . M . (1972). S trategies for analyzing behavioral ch an ge over tim e. In J.
N esselro a d e & H . R eese (E d s.), Life-span developmental psychology: Methodological issues
(pp. 1 7 5 -1 8 3 ). N ew York: A ca d em ic P ress.
R oberts, M . W ., H atzenbuehler, L. C ., & B ean, A . W. (1981). T h e effe c ts o f d ifferen tial atten tion
Behavior Therapy, 72, 9 3 -9 9 .
and tim eou t o n child n o n co m p lia n ce.
The therapeutic relationship
R ogers, C . R ., G en d lin, E . T., Kiesler, D . J ., & Truax, C . B. (1967).
and its impact: A study o f psychotherapy with schizophrenics. M adison: U n iversity o f W isc o n
sin P ress.
R ogers-W arren, A ., & W arren, S. F. (1977). Ecological perspectives in behavior analysis. B alti
m ore: U n iversity Park P ress.
R ojah n, J ., M u lick , J. A ., M cC oy, D ., & Schroeder, S. R . (1978). Settin g effe c ts, ad ap tive
clo th in g , and the m o d ifica tio n o f h ea d -b an gin g and self-restraint in tw o p ro fo u n d ly retarded
adults. Behavioural Analysis and Modification, 2, 1 85-196.
R o se n , J. C ., & L eiten b erg , H . (1982). B u lim ia N ervosa: Treatm ent w ith exp osu re and response
ev a lu a tio n . Behavior Therapy, 13, 1 1 7 -1 2 4 .
R osen blu m , L . A . (1978). T h e creation o f a behavioral taxonom y. In G . P. Sack ett (E d .),
Observing behavior: Vol. 2. Data collection and analysis methods (pp. 15-2 4 ). B altim ore:
U n iversity Park P ress.
R osen th a l, R . (1976). Experimenter effects in behavioral research (enlarged ed .). N ew York:
Irvington.
R osen zw eig , S . (1 9 5 1 ). Id io d y n a m ics in p ersonality therapy w ith special reference to p rojective
m eth o d s. Psychological Review, 58, 2 1 3 -2 2 3 .
R oss, A . O . (1981). Child behavior therapy: Principles, procedures, and empirical basis. N ew
York: W iley.
R oxb u rgh , P. A . (1970). Treatm ent o f persistent p h en oth iazin e-in du ced orald ysk in esia. British
Journal o f Psychiatry, 116, 2 7 7 -2 8 0 .
R ubenstein, E . A ., & P a r lo ff, M . B . (1959). R esearch prob lem s in psychotherapy. In E . A .
R ubenstein & M . B. P a r lo ff (E d s.), Research in psychotherapy, (Vol. 1) (pp. 2 7 6 -2 9 3 ).
W ashington, D C : A m erican P sy c h o lo g ica l A sso cia tio n .
R ugh, J. E ., & S ch w itzgeb el, R . L. (1977). In strum entation for behavioral assessm en t. In A . R .
C im inero, K. S . C a lh o u n , & H . E . A d a m s (E d s.), Handbook o f behavioral assessment (pp.
7 9 -1 1 3 ). N ew York: W iley.
R usch, F. R ., & K azdin, A . E . (1 9 8 1 ). T ow ard a m e th o d o lo g y o f w ithdraw al d esign s for the
assessm ent o f response m a inten a n ce. Journal o f Applied Behavior Analysis, 14, 131-140.
R usch, F. R ., Walker, H . M ., & G reen w o o d , C . R . (1975). Experim enter calcu lation errors: A
p otential fa cto r a ffec tin g interpretation o f results. Journal o f Applied Behavior Analysis, 8,
460.
R ussell, M . B ., & B ern al, M . E . (1 9 7 7 ). Tem poral and clim atic variables in naturalistic observa-
SCED—N*
398 Single-case Experimental Designs
Observing
S ackett, G . P. (1 9 7 8 ). M easu rem en t in ob serv a tio n a l research. In G . P. Sackett (E d .),
behavior: Vol. 2. Data collection and analysis methods (p p . 2 5 -4 3 ). B altim ore: U n iversity Park
P ress.
St. L aw rence, J. S ., B radlyn, A . S ., & Kelly, J. A . (1983). Interpersonal adju stm en t o f a
h om o sex u a l adult: E n h a n cem en t via so cia l skills training. Behavior Modification, 7, 4 1 -5 5 .
Sajw aj, T. E ., & D illo n , A . (1 9 7 7 ). C o m p lex ities o f an “ elem en tar y” b ehavior m od ification
procedure: D ifferen tia l adult a tten tio n used for ch ild ren ’s behavior disorders. In B . C . E tzel, J.
M . L eB lan c, & D . M . Baer (E d s)., New developments in behavioral research: Theory; methods
and application (pp . 3 0 3 -3 1 5 ). H illsd a le, N J: E rlbaum .
Sajw aj, T. E ., & H ed g es, D . (1 9 7 1 ). F u n ctio n s o f parental a tten tion in an o p p o sitio n a l retarded
boy. In Proceedings o f the 79th Annual Convention o f the American Psychological Association
(pp. 6 9 7 -6 9 8 ). W a sh in gton , D C : A m erica n P sy ch o lo g ica l A sso c ia tio n .
S ajw aj, T. E ., T w ardosz, S ., & B urke, M . (1 9 7 2 ). Side effe c ts o f extin ction procedures in a
rem edial p resch o o l. Journal o f Applied Behavior Analysis, 5 , 163 -1 7 5 .
Sanson-Fisher, R . W ., P o o le , A . D ., S m a ll, G . A ., & F lem in g, I. R . (1979). D ata a cq u isition in
real tim e: A n im proved system for naturalistic o b ser v a tio n s. Behavior Therapy, 10, 5 4 3 -5 5 4 .
S c h effe, H . (19 5 9 ). The analysis o f variance. N ew York: W iley.
S ch in dele, R . (1981). M e th o d o lo g ic a l p rob lem s in rehab ilitation research. International Journal
o f Rehabilitation Research, 4 , 2 3 3 -2 4 8 .
S ch leien , S. J ., W eym an , P., & K iernan, J. (1981). Teaching leisure skills to severely han d icapp ed
adults: A n a ge ap p ro p ria te darts g a m e. Journal o f Applied Behavior Analysis, 14, 5 1 3 -5 1 9 .
S ch reib m a n , L ., K o eg el, R . L ., M ills, D . L ., & B urke, J. C . (in press). Training parent child
in te ra ctio n s. In E . S ch olp er & G . M esib o v (E d s.), Issues in autism: Vol. III. T h e e ffe cts o f
autism o n th e fam ily. N ew York: P le n u m .
Schum aker, J ., & S h erm a n , J. A . (1970). Training generative verb u sage by im itation and
rein forcem en t p roced u re. Journal o f Applied Behavior Analysis, 3, 2 7 3 -2 8 7 .
Sch u tte, R . C ., & H o p k in s, B . L . (1 9 7 0 ). T h e e ffe c ts o f teacher a tten tion fo llo w in g instru ctions in
a kindergarten cla ss. Journal o f Applied Behavior Analysis, 3, 117-122.
Sechrest, L. (E d .). (1979). Unobtrusive measurement today: New directions fo r methodology o f
behavioral science. San F rancisco: J o ssey -B a ss.
S h ap iro, D . A ., & S h a p iro , D . (1983). C o m p a ra tiv e therapy o u tc o m e research: M eth o d o lo g ic a l
Journal o f Consulting and Clinical Psychology, 51, 4 2 -5 3 .
im plication s o f m eta-an alysis.
& O llen d ick , T. H . (1980). A co m p arison o f physical restraint and
S h ap iro, E . S ., B arrett, R . P.,
p ositiv e practice overcorrection in treating stereotyp ic behavior. Behavior Therapy, 11,
2 2 7 -2 3 3 .
S h ap iro, E. S ., K azd in , A . E ., & M cG o n ig le, J. J. (1982). M ultiple-treatm ent interference in the
sim u ltan eou s- or a lternating-treatm ents design . Behavioral Assessment, 4, 1 0 5-115.
S h ap iro, M . B . (1961). T h e single ca se in fun d a m en ta l clincial p sych ological research. British
Journal o f Medical Psychology, 34, 2 5 5 -2 6 3 .
S h ap iro, M . B . (1966). T h e single case in clin ica l-p sy ch o lo g ical research. Journal o f General
Psychology, 74, 3 -2 3 .
S h ap iro , M . B . (1 9 7 0 ). In ten sive assessm en t o f the single case: A n ind uctive-d ed u ctive ap p roach .
In P. M ittler (E d .), Psychological assessment o f mental and physical handicaps. L on d on :
M eth u en .
Sh ap iro, M . B ., & R aven ette, A . T. (1959). A prelim inary exp erim ent o f paran oid d elu sion s.
Journal o f Mental Science, 105, 2 9 5 -3 1 2 .
S h ine, L. C ., & B ow er, S. M . (1971). A o n e-w a y analysis o f variance for single-subject d esign s.
Educational and Psychological Measurement, 31, 1 0 5 -1 1 3 .
References 399
The
W eick, K. E . (1968). S ystem atic o b serv a tio n a l m eth o d s. In G . L indzey & E . A ro n so n (E d s.).,
handbook o f social psychology, (V ol. 2, 2nd e d .). (pp. 3 5 7 -4 5 1 ). M en lo P ark , C A : A d d iso n -
Wesley.
W einrott, M . R ., G arrett, B ., & T odd, N . (1978). T h e influence o f o bserver presence o n classroom
behavior. Behavior Therapy; 9, 9 0 0 -9 1 1 .
W einrott, M . R ., J o n es, R . R ., & Boler, G . R . (1981). C on vergen t and discrim inant valid ity o f
five cla ssro o m o b serv a tio n system s: A se co n d a ry an alysis. Journal o f Educational Psychology;
75, 6 7 1 -6 7 9 .
W ells, K. C ., H ersen, M ., B ellack , A . S ., & H im m elh o ck , J. M ., (1979). Social skills training in
unipolar n o n p sy ch o tic d ep ressio n . American Journal o f Psychiatry, 756, 1331-1 3 3 2 .
Werner, J. S ., M in k in , N ., M in k in , B . L ., F ixsen , D . L ., P h illip s, E . L ., & W olf, M . M . (1975).
“ In tervention p a ck a g e” : A n an alysis to prepare ju v en ile delin q uents for en cou n ters w ith p olice
officers. Criminal Justice and Behavior; 2 , 5 5 -8 3 .
W hang, P. L ., Fletcher, R . K ., & F aw cett, S. B . (1982). Training cou n selin g skills: A n exp erim en
tal an alysis and so cia l v a lid a tio n . Journal o f Applied Behavior Analysis, 75, 3 2 5 -3 3 4 .
W heeler, A . J ., & Sulzer, B . (1970). O perant training and gen eralization o f a verbal response form
Journal o f Applied Behavior Analysis, 5 , 139 -1 4 7 .
in a speech-deficient ch ild .
W hite, O . R . (1971). A glossary o f behavioral terminology. C h am p aign , IL: R esearch P ress.
W hite, O . R . (1972). A manual fo r the calculation and use o f the median slope: A technique o f
progress estimation and prediction in the single case. E u g en e, O R : U n iversity o f O regon ,
R egion al R eso u rce C enter for H a n d ica p p ed C hildren.
W h ite, O . R . (1974). The ,(split middle”: A “quickie” method o f trend estimation. S eattle, WA:
U n iversity o f W ash in gton , E xperim ental E d u ca tio n U n it, C hild D ev elop m en t and M ental
R etardation Center.
W ild m an , B . G ., & E rick so n , M . T. (1977). M eth o d o lo g ic a l p rob lem s in behavioral o b servation .
In J. D . C o n e & R . P. H a w k in s (E d s.), Behavior assessment: New directions in clinical
psychology (pp. 2 5 5 -2 7 3 ). N ew York: B ru n n er/M a z el.
W illiam s, C . D . (1 9 5 9 ). C a se report: T h e elim in a tio n o f tantrum b eh avior by extin ction p roce
dures. Journal o f Abnormal and Social Psychology; 5P, 269.
W illiam s, J. G ., B arlow , D . H ., & A g ra s, W. S . (1972). B ehavioral m easurem ent o f severe
d ep ression . Archives o f General Psychiatry, 2 7 , 3 3 0 -3 3 4 .
W ilson , C . W ., & H o p k in s, B . L . (19 7 3 ). T h e e ffe c ts o f co n ting ent m usic o n the intensity o f n o ise
in ju n io r high h o m e eco n o m ic s classes. Journal o f Applied Behavior Analysis, 6, 2 6 9 -2 7 5 .
W ilson , G . T., & R a ch m a n , S . J. (1983). M eta-analysis and the evalu ation o f p sych oth erap y
o u tc o m e lim itation s and liab ilities. Journal o f Consulting and Clinical Psychology, 57, 5 4 -6 4 .
W in cze, J. P. (1982). A ssessm en t o f sexual disorders. Behavioral Assessment, 4 , 2 5 7 -2 7 1 .
W in cze, J . P., & L an ge, J. D . (1981). A ssessm en t o f sexual behavior. In D . H . B arlow (E d .),
Behavioral assessment o f adult disorders (pp. 3 0 1 -3 2 9 ). N ew York: G u ilford P ress.
W in cze, J. R , L eitenberg, H ., & A g ra s, W. S . (1972). T h e effe c ts o f token rein forcem en t and
feedback o n the d elu sio n a l verbal b eh avior o f chronic p aranoid schizoph renics. Journal o f
Applied Behavior Analysis, 5 , 2 4 7 -2 6 2 .
W inkler, R . C . (1977). W h at types o f sex -ro le b ehavior sh ou ld b eh avior m odifiers p rom ote?
Journal o f Applied Behavior Analysis, 10, 5 4 9 -5 5 2 .
W inett, R . A ., & W inkler, R . C . (1972). C urrent b eh avior m odification in the classroom : Be still,
b e qu iet, be d o cile. Journal o f Applied Behavior Analysis, 5 , 4 9 9 -5 0 4 .
W ittlieb, E ., E ifert, G ., W ilso n , F. E ., & E van s, I. M . (1979). Target beh avior selection in recent
child case reports in b eh avior therapy. Behavior Therapist, 7, 1 5-16.
Wolery, M . & B illingsley, F. F. (1982). T h e ap p lica tio n o f R ev u sk y’s Rn test to slo p e and level
ch an ges. Behavioral Assessment, 4, 9 3 -1 0 3 .
W olf, M . M . (1978). Social validity: T h e ca se for su b jective m easurem ent or h ow applied
beh avior analysis is finding its heart. Journal o f Applied Behavioral Anslysis, 11, 2 0 3 -2 1 5 .
W olf, M . M ., Brinbrauer, J. S ., W illia m s, T., & Lawler, J. (1965). A n o te on apparent extin ction
404 Single-case Experimental Designs
o f the v o m itin g b eh avior o f a retarded ch ild . In L . P. U llm an n & L. Krasner (E d s.), Case
studies in behavior modification (pp. 3 6 4 -3 6 6 ). N ew York: H o lt, R inehart and W in ston .
W olf, M . M ., & Risley, T. R . (1971). R ein forcem en t: A p p lied research. In R . G laser (E d .), The
nature o f reinforcement (pp. 3 1 0 -3 2 5 ). N ew York: A ca d em ic P ress.
W olfe, J. L ., & F odor, I. G . (1977). M o d ify in g assertive b eh avior in w om en : A com p a riso n o f
Behavior Therapy, 8, 5 6 7 -5 7 4 .
three a p p ro a ch es.
W olpe, J. (1958). Psychotherapy by reciprocal inhibition. Stan ford : S tan ford U n iversity P ress.
W olpe, J. (1976). Theme and variations: A behavior therapy casebook. E lm sfo rd , N ew York:
P erga m o n P ress.
W olstein, B . (1954). Transference: Its meaning and function in psychoanalytic therapy. N ew
York: G rune & Stratton .
W ong, S. E ., G a y d o s, G . R ., & F u q u a , R . W. (1982). O perant con trol o f p ed op h ilia. Behavior
Modification, 6 , 7 3 -8 4 .
W ood , D . D ., C a lla h a n , E . J ., A le v iz o s, P. N ., & T eigen, J. R . (1979). Inpatient beh avioral
assessm en t w ith a p ro b lem -o rien ted p sychiatric lo g b o o k . Journal o f Behavior Therapy and
Experimental Psychiatry, 10, 2 2 9 -2 3 5 .
W ood , L . F., & J a c o b so n , N . S. (in press). M arital d isord ers. In D . H . B arlow (E d .), Behavioral
treatment o f adult disorders. N ew York: G u ilfo rd P ress.
W right, H . F. (1 9 6 0 ). O b serv a tio n a l child study. In P. M u ssen (E d .), Handbook o f research
methods in child development (pp. 7 1 -1 3 9 ). N ew York: W iley.
W right, J ., C la y to n , J ., & Edgar, C . L . (1970). B ehavior m o d ification w ith low -level m ental
retardates. Psychological Record, 20, 4 6 5 -4 7 1 .
Yarrow, M . R ., & W axier, C . Z . (1979). D im en sio n s and correlates o f p rosocial b eh avior in y o u n g
children. Child Development, 47, 1 1 8 -1 2 5 .
Yates, A . J. (1 9 7 0 ). Behavior therapy. N ew York: W iley.
Yates, A . J. (1 9 7 5 ). Theory and practice in behavior therapy. N ew York: W iley.
Yawkey, T. D . (1 9 7 1 ). C o n d itio n in g in d ep en dent w ork beh avior in reading w ith seven-year-old
children in a regular early c h ild h o o d c la ssro o m . Child Study Journal, 2, 2 3 -3 4 .
Y elton, A . R ., W ild m a n , B . G ., & E rick so n , M . T. (1977). A p rob ab ility-b ased fo rm u la for
calculating in terobserver agreem en t. Journal o f Applied Behavior Analysis, 10, 127 -1 3 1 .
Zeilberger, J ., S a m p en , S. E ., & S lo a n e, H . N . (1968). M o d ification o f a ch ild ’s problem
behaviors in the h o m e w ith the m oth er as therapist. Journal o f Applied Behavior Analysis, 1,
4 7 -5 3 .
Z ilbergeld, B ., & E v a n s, M . B. (1980). T h e in ad eq uacy o f M asters and J o h n so n . Psychology
Today, 14, 2 8 -4 3 .
Z im m erm an , E . H ., & Z im m erm a n , J. (1962). T h e alteration o f b ehavior in a special classroom
situ atio n . Journal o f the Experimental Analysis o f Behavior, 5, 5 9 -6 0 .
Z im m erm an , J. O verp eck , C ., E isen b erg, H ., & G arlick, B. (1969). O perant co n d itio n in g in a
sheltered w o rk sh o p . Rehabilitation Literature, 30, 3 2 6 -3 3 4 .
Subject Index
Actuarial issues, 62-63 Changing Criterion Design, 175,
Agoraphobic disorder, 55, 59, 326, 205-208, 319
329-330, 366 Classification, 26
Alcoholism, 145, 165, 170-171 Clinical significance, 35, 36, 45, 48,
Alternating Treatments Design, 65, 69, 282, 285, 320, 333, 369
95, 99, 210, 211, 252-283, 302, Component analysis, 110
319, 338, 344 Concurrent Schedule Design. See
Analysis of variance, 7, 56, 59, 60, 193, Simultaneous Treatment Design
287-290, 294 Confound, 19, 20, 142, 253, 256, 275
Anorexia Nervosa, 45-46, 69, 82, Control groups, 14, 56, 59, 60, 61, 143,
197-201, 343 226, 269
Anxiety, 34, 87, 136, 145, 241, 273 Correlation, 6, 17, 19, 28, 38, 45, 127
Assessment, 107-139 Correlogram, 288
direct, 108 Counterbalancing, 259, 260, 262, 263,
See also Repeated measurement 264, 269, 273, 274, 280, 284
Autism, 226-228, 232-233, 292, Criterion Reference Tests, 109, 110
354-355, 362, 366, 368-369 Critical Ratio Test, 6
Autocorrelation, 288, 293, 294, 295,
296, 299, 301, 302 Demand effects, 70, 134, 203
Averaging of results, 14-15, 16, 23, 54, Dependent variables, 10, 12, 17, 33, 35,
55, 60, 61, 66, 226 37, 39, 142, 236, 302
Depression, 15, 34, 35, 36, 54, 57-58,
Baseline, 39-45, 71-79 61, 64, 100, 109, 145, 146, 147,
Behavioral observation measures, 109, 154, 155, 156, 274, 275, 278, 366
110, 131, 146, 182 Deterioration, 16, 17, 36, 37, 44, 55,
behavioral products, 131-132 59, 64, 65, 74, 77, 88, 94, 104, 150,
codes, 125, 126, 130 152, 153, 154, 163, 228, 233, 328,
observers, 113, 115, 117, 118-122, 343
124-129, 130, 132, 282 Diagnostic category, 37
procedures, 113, 115, 116-118, 120, Differential attention, 347-362, 365, 366
129, 130 Direct observation. See Behavioral
settings, 109, 110, 112-114 observation
Between Series Design, 254 Drug evaluation, 28, 87-88, 100, 101,
Bidirectionality, 206 170, 183-192, 209, 249-251, 264
Blocking, 45
Enuresis, 98, 230-232
Carry-over effects, 96, 99-101 Error, 3, 5, 6, 26, 33
Case study, 1, 8-13, 17, 19, 22, 23, Equivalent Time Series Design, 28,
24-25, 56, 140-142, 351 157-166
Celeration line, 313-315, 316, 317 Ethics, 14, 74, 90, 96, 98, 100, 153,
Central tendency, 5 209, 249
405
406 Single-case Experimental Designs
Expectancy effects, 42, 184, 189, 219 classical conditioning, 39, 40, 90
Experimental analysis of behavior, 8, operant conditioning, 8, 30, 99
29-31 Logical generalization, 253, 333, 369
Experimental criterion, 285, 286
Experimental psychology, 1, 2-5, 6, 14, Maintenance, 68, 105-106, 144, 230,
30, 35 236, 239, 248, 250
Matching, 15, 54, 68, 213, 214
Factor analysis, 6 Merit Method, 6
Factorial Design. See Analysis of Mixed Schedule Design, 255
variance Multi-Element Baseline Design, 254,
Field testing, 365, 367 255, 299, 319
Follow-up, 44, 89, 110, 145, 150, 151, Multi-Element Experimental Designs, 30
234, 236, 247, 248 Multiple Baseline Design, 9, 64, 66, 88,
Functional manipulation, 260 95, 101, 102, 106, 164, 209-251,
275, 281, 308, 309, 311, 321, 333
Generality o f findings, 2, 4, 7, 8, 14, across behaviors, 215-230, 247, 344
16, 25, 28, 32, 33, 49-66, 84, 112, across individuals, 244
113, 127, 130, 150, 153, 154, 162, across settings, 238-244, 247, 249
204, 205, 211, 216, 226, 232, 239, across subjects, 230-238, 249, 251,
241, 247, 252, 257, 260, 272, 325, 278, 343
325-371 Multiple Probe Technique, 245-248
Group comparison. S ee Group design Multiple Schedule Design, 254, 255
Group Comparison Design, 1, 2, 3, Multiple treatment interference, 143,
5-8, 11-13, 14, 15, 16, 17, 18, 19, 153, 179, 205, 256-263, 272, 273,
20, 21, 22, 23, 28, 29, 30, 31, 33, 281
35, 36, 51-66, 99, 108, 167, 178,
179, 191, 193, 205, 226, 238, 252, Naturalistic studies, 2, 17, 18-20, 21
259, 286, 287, 291, 320, 321, 365, Nonconcurrent Multiple Baseline
370 Design, 244, 248
Norm Reference Tests. S ee Criterion
Habituation, 138 reference tests
Headache, 135, 136, 161-162 Normal distribution, 3, 5, 305
Homosexuality, 10, 39-42, 70, 86,
103-105, 147, 334-339 Obsessive Compulsive Disorder, 15, 16
Operational definition, 111
Independent variables, 9, 10, 17, 18, 27,
28, 29, 30, 33, 34, 35, 39, 48, 67, Paranoid delusions, 26
154 Patient Uniformity Myth, 16
Independent verification, 259 Percent of success, 12, 17, 19, 56
Individual differences, 5, 6, 7 Period treatment design, 175, 206
Instrumentation, 108 Phase, 26, 67, 72, 93, 95-101, 154, 162,
Intensive Design, 28 165, 280, 286, 292, 295, 299, 301,
Interaction effects, 193-205, 249, 272 302, 316, 319
Intelligence, 5 Phobia, 53, 82, 195-197, 201, 216-219,
tests, 6 273, 284, 333, 343, 346, 347
Intrasubject averaging, 45-48 Physiological measures, 108, 131,
Introspection, 3-4 135-138, 150
Irreversible procedures, 101-105 Physiological psychology, 1, 2-5, 8, 23
Placebo effects, 39, 60-61, 75, 78, 87,
Law o f Initial Values, 138 101, 104, 105, 141, 183, 184, 185,
Learning Theory, 4, 6, 30, 31 186, 187, 188, 189, 190, 191, 192,
Subject Index 407
209, 249, 251, 255, 330, 331, 333, Scientist-Practitioner Split, 21-22
335 Self-report measures, 70, 108, 109 131,
Population, 8, 16, 305 132-135, 136, 150, 218-219, 284
Post Traumatic Stress Disorder, 241 behavioral, 133
Probe measures, 241 questionnaires, 108, 109, 133-134
Process research, 2, 17, 20-21, 23, 25, self-monitoring, 108, 109, 133,
26, 27, 38 134-135, 203, 239
structured interviews, 108, 109
Quasi-Experimental Designs, 27-28, 71, Serial dependency, 287-290, 295, 296,
142, 143, 186, 206, 249 299, 300, 301, 302, 305-306
Questionnaires, 29 Sexual disorders, 86, 194, 220-222, 367
Simultaneous Replication Design, 226,
Random assignment, 15, 18, 19, 287 254
Random sampling, 52-54, 55, 65, 305 Simultaneous Treatment Design, 255,
Randomization Design, 254, 255 282-284, 319
Randomization Tests, 302-308, 319, 320 Social psychology, 30
Reactivity, 118, 120, 130, 135, 143, 245, Social validation procedures
247, 282 social comparison, 109, 110
Regression techniques, 110 subjective evaluation, 109, 110
Reliability, 68, 109, 114, 118, 122, Split middle technique, 312-319, 321
124-129, 134, 158, 239, 286, 290, Spontaneous remission, 12, 19, 42
293, 308, 322, 325, 326, 327, 333, Statistical analysis, 3, 5, 6, 22, 28, 34,
338, 341, 346, 364 36, 126, 128, 129, 255, 257, 281,
Repeated measurement, 3, 4, 20, 21, 26, 282, 321
27, 30, 32, 37-38, 39, 41, 42, 43, descriptive statistics, 3, 6, 22, 319,
44, 48, 64, 65, 67, 68-71, 72, 108, 321
110, 142, 179, 245, 287 inferential statistics, 1, 7-8, 16, 53,
Replication, 5, 11, 25, 26, 33, 51, 60, 65, 252, 318, 319, 321
56-62, 111, 143, 153, 154, 156, single-case, 285-324
162, 165, 179, 193, 196, 200, 204, Statistical significance, 35, 36, 48, 58,
205, 212, 225, 226, 232, 241, 244, 294, 302, 303-304, 308, 309, 313,
253, 260, 264, 286, 325-371 316, 318, 319-320
clinical, 325, 366-369 Structuralism, 4
direct, 50, 58, 61, 325, 326-347, 351,
364, 365
Target behaviors, 107, 108, 109-112,
systematic, 56, 59, 61, 62, 63, 101,
126, 129, 131, 134, 142, 145, 146,
325, 334, 339, 343, 344, 346,
156, 158, 187, 212, 228, 251, 309
347-354, 363-366
Term Series Design, 27
Representative case, 25-26
Therapeutic criterion, 285, 286
Response dimensions, 114-116
Time sampling, 70, 222, 224
Response guided experimentation, 38
Time Series Analysis, 71, 142, 288,
Response specificity, 138
296-302, 308, 319, 321, 353
Reversal design, 9, 30, 67, 88-95, 101,
Trend, 37, 38, 45, 73, 77
209, 210
Trend analysis, 28
Rn Test of Ranks, 308-312, 320
Triple response system, 108, 132
Sample, 8, 15, 16, 107
Sampling theory, 1, 8 Validity, 109, 129-131, 134, 135, 137
Schizophrenic Disorder, 15, 52, 80, 87, construct, 130
91, 167-168, 187, 205, 339-343, content, 130
366 convergent, 111
408 Single-case Experimental Designs
409
410 Single-case Experimental Designs
25, 33, 35, 36, 41, 51, 54, 55, 61, Bruce, C., 358
63, 74, 366, 370 Brunswick, E., 53
Berk, R. A ., 126, 127 Bryan, K. S., 247
Berler, E. S., 214 Bryant, L. E., 214
Bernal, M. E., 112 Bucher, B., 268
Bernard, M. E., 175, 202 Buckley, N. K., 156, 157, 352
Bickman, L., 113 Budd, K. S., 214, 360
Bijou, S. W., 95, 99, 108, 117, 118, Buell, J. S., 89, 90, 354, 356, 357
356, 357 Bugle, C., 106
Billingsley, E E, 308, 318, 323 Burgio, L. D ., 214
Birkimer, J. C., 129 Burke, M., 162, 163, 360, 369
Bimey, R. C., 6 Butcher, J. N ., 19, 31
Bittle, R., 255, 266, 280 Buys, C. J., 359
Blackburn, B. L., 215
Blanchard, E. B., 71, 136, 263, 352 Cairns, R. B., 363
Blewitt, E., 171, 172 Calhoun, K. S., 139
Blough, R M., 258 Callahan, E. J., 102, 104, 115
Blunden, R., 171, 172 Campbell, D. T , 27, 28, 45, 57, 71,
Boer, A. R, 118 111, 121, 126, 132, 138, 140, 142,
Boler, G. R., 131 143, 153, 157, 244, 252, 256
Bolger, H ., 9 Capparell, H. V , 191, 192
Bolstad, O. D ., 120, 121, 125, 129, 131, Carey, R. G ., 268
139 Carkhuff, R. R., 167, 168, 169
Boone, S. E., 52 Carlson, C. S., 357
Bootzin, R. R., 94, 99 Carmody, T. B., 135
Borakove, L. S., 110 Carr, A ., 243
Boring, E. G., 3, 4, 6 Carter, V , 358
Bornstein, M. R., 214, 215, 217, 218, Carver, R. R, 110
347 Cataldo, M. E, 360
Bornstein, R H ., 108, 113, 133, 135 Catania, A. C., 212
Bower, S. M ., 295 Celso-Goyos, A ., 267
Bowdler, C. M ., 25 Chai, H ., 320
Box, G. E. R, 300, 301, 306 Chapin, H. N ., 198, 199, 201, 275
Boyer, E. G., 139 Chapin, J. R, 45, 46, 47
Boykin, R. A ., 139 Christian, W. R, 214, 231, 232
Bradley, L. A ., 136 Christie, M. H ., 137
Bradlyn, A. S., 147, 149 Chassan, J. B., 15, 16, 20, 28, 35, 36,
Brady, J. R, 352 55, 87, 95, 99, 100, 183, 184, 185
Brawley, E. R., 357, 358 Ciminero, A. R., 139
Breuer, J., 9 Clairborne, M ., 343, 345
Breuning, S. E., 214, 249, 250, 251 Clark, R., 358
Bridgwater, C. A ., 108 Clayton, J., 359
Brill, A. A ., 10 Coates, T. J., 136
Brinbauer, J. S., 209, 265, 352, 366 Cohen, D. C., 22, 70
Broden, M., 355, 356, 357, 358 Cohen, J., 127
Brody, G. H ., 287 Cohen, S., 293
Brookshire, R. H ., 352 Colavecchia, B., 268
Brouwer, R., 215 Coleman, R. A ., 175
Brown, J. H ., 129 Coles, E. M ., 138
Brown, R. A ., 355, 359 Conderman, L., 358
Browning, R. M ., 142, 256, 283 Cone, J. D ., 108, 109, 115, 118, 122,
Name Index 411
Herman, S. H., 39, 40, 41, 42, 43, 334, Johnson, S. M., 110, 120, 121, 125,
336, 337, 338 129, 131, 139, 266
Hernstein, R. J., 212 Johnston, J. M., 31, 37, 72, 90, 94, 95,
Hersen, M., 25, 35, 61, 67, 68, 69, 70, 96, 100, 111, 128, 132, 175, 182,
71, 73, 74, 79, 80, 82, 85, 86, 88, 291, 347, 354
94, 95, 96, 102, 105, 133, 137, 139, Johnston, M. K., 356, 357
140, 142, 144, 146, 148, 150, 152, Johnstone, G., 267
153, 154, 155, 156, 158, 161, 164, Jones, R. R., 125, 131, 290, 293, 296,
165, 166, 167, 170, 171, 175, 183, 297, 299, 301
184, 185, 191, 192, 209, 212, 214, Jones, R. T , 214, 233, 234
215, 217, 218, 228,229,247,248,
347, 352, 366 Kanowitz, J., 121
Hickey, J. S., 108 Katz R. C., 214, 230
Hilgard, J. R., 213 Kaufman, K. E, 175
Himmelhock, J. M., 68, 347 Kazdin, A. E., 9, 19, 24, 25, 30, 31, 53,
Hinson, J. M., 258 56, 59, 60, 67, 88, 94, 95, 99, 101,
House, A. E., 126 102, 105, 106, 109, 110, 112, 113,
House, B. J., 126 115, 118, 120, 121, 130, 132, 139,
Horner, R. D., 245, 246, 349 141, 142, 153, 162, 202, 204, 206,
Horne, G. R, 299, 302 209, 211, 212, 214, 215, 216, 223,
Hopkins, B. L., 116, 175, 179, 355, 228, 229, 234, 235, 247, 254, 256,
358, 360 260, 261, 266, 267, 278, 279, 282,
Honing, W. K., 38, 212 286, 290, 291, 292, 307, 318
Homer, A. L., 138 Kane, M., 214, 241, 242
Holz, W., 122 Keefauver, L. W., 175, 202
Holtzman, W. H., 322 Kelley, C. S., 354, 356
Holmes, D. S., 356 Kelly, J. A ., 149, 214, 226, 343, 345
Hollon, S. D., 72 Kelly, M. G., I l l , 115, 117, 126, 147
Holmberg, M., 114 Kendall. R C., 19, 31, 116
Holm, R. A ., 115 Kennedy, R. E., 215, 301
Hollenbeck, A. R., 121, 127 Kent, R. N ., 118, 121
Hollands worth, J. G., 120 Kernberg, O. E, 18
Hoffman, A ., 320 Kessel, L., 10
Hodgson, R. J., 333, 334 Kiernan, J., 110
Hocking, N., 355, 357 Kiesler, D. J., 16, 17, 18, 20, 49, 55, 60
Hoch, R H., 17, 20 Kirby, F. D ., 360
Hubert, L. J., 127, 302 Kircher, A. S., 266
Huitema, B. E., 299 Kirchner, R. E., 215, 243
Hundert, J., 214 Kirk, R. E., 307
Hutt, C., I l l , 112 Kistner, J., 215
Hutt, S. J., I l l , 112 Klein, R. D ., 295
Hyman, R., 10, 17 Knapp, T. J., 293
Kneedler, R. D., 267
Inglis, J., 29 Koegel, R. L., 106, 214, 215, 226, 227,
Iwata, B. A ., 267-268 368, 369
Kopel, S. A ., 209, 211, 212, 216
Jackson, D., 355, 357 Kraemer, H. C., 55, 117
Jacobson, N. S., 353, 363 Krasner, L., 30, 57, 94, 99, 141
Jarrett, R. B., 268, 274, 276, 277 Kratchowill, T. R., 31, 67, 142, 175,
Jayaratne, S., 31 202, 287, 296, 301, 324
Jenkins, G. M., 301 Kulp, S., 117
414 Single-case Experimental Designs
Wincze, J. P., 69, 137, 174, 178 179, Wooton, M., 360
330, 339, 341, 342, 343, 366 Workman, E. A ., 244, 245
Winett, R. A ., 138, 292 Wright, D. E., 80, 81, 195
Winkel, G. H., 355, 356 Wright, H. E., 114, 116
Winkler, R. C., 138 Wright, J., 358
Winton, A. S., 268 Wysocki, T., 249
Wittlieb, E., 109
Wodarski, J. S., 215 Yang, M. C. K., 299
Wolery, M., 308, 318, 323 Yarrow, M. R., 123
Wolf, M. M., 64, 71, 89, 90, 110, 142, Yates, A. J., 29
143, 175, 212, 266, 286, 290, 352, Yawkey, T. D ., 359
354, 355, 356, 359 Yelton, A. R., 129
Wolfe, J. L., 134, 215 Yule, W , 215
Wolstein, B., 37
Wonderlich, S. A ., 138 Zegiob, L. E., 120
Wong, S. E., 215 Zeilberger, J., 99, 358
Wood, D. D ., 122 Zilbergeld, B., 367
Wood, L. E, 108, 115, 116, 117, 123, Zimmerman, E. H., 356
125, 129, 353 Zimmerman, J., 265, 266, 356
Wood, S., 360 Zubin, J., 17, 20
About the Authors
DAVID H. BARLOW received his Ph.D from the University of Vermont in
1969 and has published over 150 articles and chapters and seven books,
mostly in the areas of anxiety disorders, sexual problems, and clinical re
search methodology. He is formerly Professor of Psychiatry at the University
of Mississippi Medical Center and Professor of Psychiatry and Psychology at
Brown University, and founded clinical psychology internships in both set
tings. Currently he is Professor in the Department of Psychology at the State
University of New York at Albany and has been a consultant to the National
Institute of Mental Health and the National Institutes of Health since 1973.
He is Past President of the Association for Advancement of Behavior
Therapy, past Associate Editor of the Journal o f Consulting and Clinical
Psychology,, past Editor of the Journal o f Applied Behavior Analysis, and
currently Editor of Behavior Therapy: At the present he is also Director of the
Phobia and Anxiety Disorders Clinic and the Sexuality Research Program at
SUNY at Albany. He is a Diplomate in Clinical Psychology of the American
Board of Professional Psychology and maintains a private practice.