What Do We Know About The Effectiveness of Software Design Patterns

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO.
5, SEPTEMBER/OCTOBER 2012 1213
What Do We Know about the Effectiveness

of Software Design Patterns?
Cheng Zhang and David Budgen, Member, IEEE Computer Society
Abstract—Context. Although research in software engineering largely seeks to improve the practices and products of software
development, many practices are based upon codification of expert knowledge, often with little or no underpinning from objective
empirical evidence. Software design patterns seek to codify expert knowledge to share experience about successful design structures.
Objectives. To investigate how extensively the use of software design patterns has been subjected to empirical study and what
evidence is available about how and when their use can provide an effective mechanism for knowledge transfer about design. Method.
We conducted a systematic literature review in the form of a mapping study, searching the literature up to the end of 2009 to identify
relevant primary studies about the use of the 23 patterns catalogued in the widely referenced book by the “Gang of Four.” These
studies were then categorized according to the forms of study employed, the patterns that were studied, as well as the context within
which the study took place. Results. Our searches identified 611 candidate papers. Applying our inclusion/exclusion criteria resulted in
a final set of 10 papers that described 11 instances of “formal” experimental studies of object-oriented design patterns. We augmented
our analysis by including seven “experience” reports that described application of patterns using less rigorous observational forms. We
report and review the profiles of the empirical evidence for those patterns for which multiple studies exist. Conclusions. We could not
identify firm support for any of the claims made for patterns in general, although there was some support for the usefulness of patterns
in providing a framework for maintenance, and some qualitative indication that they do not help novices learn about design. For future
studies we recommend that researchers use case studies that focus upon some key patterns, and seek to identify the impact that their
use can have upon maintenance.
Index Terms—Design patterns, systematic literature review, empirical software engineering
1 INTRODUCTION
A LTHOUGH software engineering is intrinsically an

“empirical” subject, this aspect has so far largely
manifested itself through the (relatively informal) reuse of
appears to be largely in the form of “advocacy” or
“experience,” and much of it is also focused upon
identifying and documenting patterns rather than reporting
“experience” and expert opinion rather than through the and analyzing experience with using them. Probably the
use of the outcomes from systematic empirical investiga- most widely known textbook is that by the “Gang of Four”
tions [37]. Various authors have noted this: Back in 1997, [21] (usually abbreviated to GoF, as we will do here)—and
Whitley commented that “a common pattern in software while this employs a template for cataloguing patterns that
engineering research is the development of system-building has been widely adopted, the template offers little that
techniques, such as object-oriented design, which are would encourage the notion of providing evidence about
strongly advocated in the absence of evidence” [61], while, where and when a given pattern might best be used,
stemming from their systematic surveys of the literature, beyond the headings “Applicability” and “Known Uses.”
Glass et al. have made similar observations about the Like most software engineering practices, design pat-
emphasis upon advocacy and the predominance of analysis terns have been distilled from the experiences of software
as the means for evaluation [23], [24]. developers. For patterns to provide an effective vehicle for
The use of software design patterns for designing object- conveying design experience, it is necessary for them to be
oriented (OO) systems would appear to fit into this usable by less experienced designers and also for that use to
category. The large number of papers and books published lead to successful designs. In this paper, we describe a
about patterns supports the view that the concept of the mapping study we have undertaken to determine the scale
design pattern is valued by many experienced developers and extent to which empirical studies have been under-
as a categorization of recurring issues (and solutions), as taken to determine how effective patterns are as a knowl-
well as providing a widely used vocabulary for discussing edge transfer mechanism and the forms of evidence for this.
design. However, the available literature on patterns The evidence-based paradigm is concerned with system-
atically and objectively finding and aggregating all of the
available empirical data on a chosen topic, on the basis that
. The authors are with the Science Laboratories, School of Engineering and
Computing Sciences, Durham University, South Road, Durham DH1 3LE, the outcomes from such a “secondary” study can reduce any
United Kingdom. E-mail: {cheng.zhang2, david.budgen}@durham.ac.uk. bias that might occur in the outcomes from individual
Manuscript received 1 Apr. 2010; revised 3 June 2011; accepted 14 July 2011; “primary” studies. The core tool of the evidence-based
published online 29 July 2011. paradigm is the Systematic Literature Review (SLR), often
Recommended for acceptance by M.-A. Storey. abbreviated to “Systematic Review” [38], [48], which pro-
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-2010-04-0094. vides a framework for systematically searching the literature,
Digital Object Identifier no. 10.1109/TSE.2011.79. extracting the data, and performing the necessary analysis.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 12:38:07 UTC from IEEE Xplore. Restrictions apply.
0098-5589/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society
1214 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 5, SEPTEMBER/OCTOBER 2012
The evidence-based paradigm has become strongly TABLE 1

established in clinical medicine, where the use of secondary Differences between Mapping Studies and SLRs
studies has become well codified and widely used, with
conduct and publication of studies coordinated through the
“not for profit” Cochrane Collaboration.1 Between 2004, when
its adoption for use in software engineering was proposed
by Kitchenham et al. [35], [18], and mid-2008, over 50
relevant published SLRs and mapping studies have been
identified [36], [39].
However, there are some differences between the
domains of clinical medicine and software engineering that
constrain its effectiveness when employed in the latter [11].
In particular, we should note that:
. The interventions used in primary studies in clinical

medicine are performed upon the participants,
whereas for software engineering, they are per-
formed by the participants. Any differences in the
skills of the individual participants and the influence
of the context (the specific task being performed)
will increase the complexity.
. The human body, and the terminology associated
with it, does not change significantly, making it
easier to identify relevant keywords to employ when
searching for primary studies of relevance as well as
to perform longitudinal studies over a period of upon any subsequent maintenance tasks, are avail-
time. In contrast, software engineering practices are able from these empirical studies?
often only imprecisely defined, and may change . What further research, using which forms, might be
with increasing knowledge and experience. Like- needed to address any “gaps” in the available
wise, the terminology also may change or evolve, evidence?
creating problems for the searching phase. Conducting empirical studies in software engineering is
Both of these are significant factors when looking at studies nontrivial, and especially so for studies that address
of the design process. The first creates a particular problem complex cognitive processes such as design. Hence, we
for the experimenter, in that the individual skill levels of did not anticipate finding other than a relatively modest
participants forms a significant confounding factor. How- corpus of primary studies that addressed issues relating to
ever, the second factor is at least in part offset for the case of design patterns. As we explain in Section 3, despite a large
design patterns by the way that the predominance of the number of “hits” when searching, the final set of experi-
book by the GoF has created a relatively stable codification mental studies used for our analysis only contained a small
standard and vocabulary—although, as we observe later, number of papers. To assist with analysis and interpreta-
some of the earlier papers did use different terminology. tion, we therefore widened our interpretation of “empiri-
A mapping study (also termed a scoping review) is cal” and looked to see what supplementary forms of
sometimes used before undertaking a systematic literature evidence might be available. We were then able to identify
review, in order to identify the extent of, and categorize the a small set of “experience” reports that provided reasonably
form of, the literature on a particular topic. Here, we have well-documented observations about specific patterns and
performed a mapping study to help identify those primary could be used to augment our analysis.
studies that evaluate aspects of design patterns in any way In the rest of this paper we first discuss some of the
and hence to determine what forms and issues have been characteristics of software design patterns, and then
studied, as well as by what means. Table 1 highlights some describe how we organized our mapping study, the
key distinctions between a mapping study and a systematic
outcomes from this, and the degree to which it answers
literature review in terms of some of the main elements [40].
the questions. Finally, we discuss the implications that our
(The boundaries are of course not rigid, and in this
findings have for design pattern use and adoption.
mapping study we do undertake a modest amount of
quality evaluation.)
In undertaking this study, we set out to address the 2 KNOWLEDGE TRANSFER THROUGH DESIGN
following research questions: PATTERNS
. Which of the GoF patterns have been evaluated This section examines the issue of knowledge transfer for
empirically? software design.
. What lessons about their use, and any consequences
of this, particularly regarding constraining effects 2.1 Knowledge Schema
Software design is widely recognized as being a “wicked”
1. See www.cochrane.org. or “ill-structured” problem, characterized by ambiguous
ZHANG AND BUDGEN: WHAT DO WE KNOW ABOUT THE EFFECTIVENESS OF SOFTWARE DESIGN PATTERNS? 1215
specifications, no true/false solutions (only ones that are from software design practice. However, there are two
“better” or “worse” from a particular perspective), the lack important assumptions implicit in such a view:
of any “stopping rule” to determine when a solution has
been reached, and no ultimate test of whether a solution 1. That knowledge schema can be codified in such a way
meets the requirements [25], [54]. The processes for seeking that they can be shared between different designers.
solutions to such problems (in our case, software designs) (The observational studies discussed in the previous
section only considered the situation where a
cannot therefore be a procedural or “defined” process, and
designer reused their own previous experiences.)
so must employ more empirical or “opportunistic” strate-
2. That reuse will lead to “better” designs. There is an
gies [28], [62].
obvious question here of what constitutes “better,”
Studies of expert software designers support this view. A
but for the studies that we have identified there
study of experienced and inexperienced designers by
seems to be a general consensus that the key measure
Adelson and Soloway noted a range of techniques being
for this concerns the maintainability of a design.
used, with these being dependent upon both the expertise of
the designer, and also their familiarity with the problem Curiously, this first assumption does not seem to have been
domain [2]. In particular, they observed the use of “labels for explored until quite recently [41]. The authors of this paper
plans” by experts, whereby a designer recognized that for observe that knowledge schema are an internal representa-
tion of knowledge which cannot be directly shared, and that
part of their task they could reuse earlier design experience,
a design pattern is one means of externalizing a schema for
and made a note of this at a relatively abstract level.
transfer to others, and also of coming to a consensus about
In a later study of expert designers by Guindon [25], the
its nature. However, because each individual who learns a
author discusses the observed use of a range of forms of
pattern must gain an understanding of the pattern to create
knowledge schema, from the simple rule to the part solution.
and store their own form of schema within their own
Such a schema incorporates a pattern that specifies both the
knowledge store, multiple examples of that pattern’s
characteristics of the problem and also the form of the
application may well be needed to assist with this
corresponding solution, and hence includes the concept of
process—potentially adding a further confounding factor
labels for plans.
for experimental studies of their use. The second assump-
For object-oriented development, Détienne identifies
tion has been noted by some researchers [8], [51] and
three forms of relevant knowledge schema: application
Bieman particularly observes that: “there is very little
domain schemas, function schemas, and procedure sche-
empirical evidence of the claimed benefits of design
mas. Such a schema is a “knowledge structure” that
patterns and other design practices when applied to real
represents “generic concepts stored in memory,” rather
development projects.”
than just a solution plan [17].
For this study, the original idea was that we would seek
One of the distinctions between expert and novice
studies that investigated the use of design patterns and see
designers is that the former possess a much richer and
how extensively these met the claims for design patterns. In
fuller set of knowledge schemas. Over the years, various practice, it proved difficult to identify a coherent set of claims
techniques such as design methods and design patterns since many are entwined with a definition of what a pattern is
have been employed, both to help less experienced and the comparisons implied do not readily relate to any
designers to develop their own knowledge schemas by specific measures. For example: “design patterns make it
acquiring additional knowledge drawn from the experi- easier to reuse successful designs” is taken from the GoF and
ences of experts (at least partially), and also to augment the is really a definition of the role of a pattern with no clear sense
set of schemas possessed by other experts. We might also of what measures should be related to “easier.” However, one
note that some authors view the second of these as being of the papers we analyzed [52] has identified the following
more practical and appropriate for the case of design advantages as being claimed for design patterns:
patterns [56].
1. Using patterns improves programmer productivity
2.2 Design Patterns and program quality.
The concept of the design pattern was formulated by the 2. Novices can increase their design skills significantly
architect Christopher Alexander, who describes it as follows: by studying and applying patterns.
Each pattern describes a problem which occurs over and over again 3. Patterns encourage best practices, even for experi-
in our environment, and then describes the core of the solution to enced designers.
that problem, in such a way that you can use this solution a 4. Design patterns improve communication, both
million times over, without ever doing it the same way twice. [3] among developers and from developers to
As such, the concept is clearly one that is applicable to the maintainers.
design process in any domain, including software. Such While these still lack baseline values for comparison
behavior also occurs at different levels of abstraction, for purposes, we have adapted these as the relevant issues to
example: Buschmann et al. have discussed the idea of explore in our mapping study, while also recognizing that
“architectural patterns” that describe overall system orga- they should really be separately assessed for each pattern.
nization [13], while the software design patterns identified To address these issues, we have focused our mapping
in the GoF essentially address “part-system” issues. study on identifying which patterns have been investigated
Software design patterns may therefore be regarded as a empirically, the forms this has taken, the extent to which
means of codifying expert knowledge schemas derived different studies agree or disagree, and how far they
address each issue. As observed above, in doing so, we that different spellings (such as “utilize” and “uti-
have added an element of quality assessment to the analysis lise”) were recognized as equivalent by the search
of the papers in the clusters in order to assess how much engines. Our choice of search terms was based upon
weight to ascribe to specific studies. those that inspection of other studies suggested were
commonly used in the research literature, together
with a dictionary check for synonyms.
3 FINDING THE EVIDENCE: THE MAPPING STUDY Following the advice of the guidelines for sys-
A mapping study is: “designed to provide a wide overview tematic reviews and in order to perform an exhaus-
of a research area, to establish if research evidence exists on tive search [7], we selected six search engines/sources
a topic, and provide an indication of the quality of the which are relevant to software engineering: ACM,
evidence” [38]. IEEE Xplore, Google Scholar, CiteSeer, ScienceDirect,
So, while the planning phase is similar to that of a and Web of Science. The search queries were used
systematic review, much of the focus of a mapping study is with all six search engines/sources, restricting their
upon the activities undertaken for the first three stages of scope to title, abstract, and keywords where the
the second phase of a review, namely: interfaces permitted this. Collectively these accessed
the main digital libraries considered to be appropriate
. identification of research evidence (searching), to the study.
. selection of primary studies (inclusion/exclusion), . Round 2 consisted of a manual search through four
. assessment of study quality (bias/validity). journals that were identified as major sources of
Any data extracted about the outcomes from the primary papers (IEEE Transactions on Software Engineering,
studies included are generally less detailed than for a Empirical Software Engineering; Journal of Systems &
systematic review, and is mainly for the purposes of Software, Information & Software Technology) covering
classification and categorization. the same period (1995-end 2009). This round was
For this study, since we also wanted to “profile” the used primarily to act as a check on the reliability of
patterns literature, we conducted the second (inclusion/ the electronic searches, and in particular, upon our
exclusion) stage using the following three substages: choice of search strings. Since the key journals were
easier to identify than conferences, we confined our
. selection of all papers that were about software
search to the journals.
design patterns,
. Round 3 consisted of a “snowball” search, checking
. categorization of these papers in terms of their
the references used in the empirical papers found in
purpose,
the first two rounds, to see if these identified any
. selection of papers that provided actual empirical
further papers.
data about the use of patterns.
For the searching stages of our study we employed the
3.1 Conduct of the Study following inclusion criteria:
For the searching stage, the general scope of the study was
. papers describing software design patterns (not just
identified as being:
empirical papers, as we wanted to be able to
. Population: Published scientific literature reporting categorize the overall patterns literature, although
software design pattern studies. only the empirical papers are actually analyzed in
. Intervention: Studies involving the use of software this paper),
design patterns. . where several papers reported the same study, only
. Outcomes of relevance: Quantity and type of the most comprehensive (usually the most recent)
evidence relating to specific instances of design was included,
patterns. . where several studies were reported in the same
. Experimental design: Any form of empirical study. paper, each relevant study was treated as an
Because the GoF [21] is regarded as the milestone textbook independent primary study;
on design patterns, the start for the search period was and the following exclusion criteria:
chosen as 1995 (publication of GoF).
. literature that was only available in the form of
Searching for relevant papers was organized as a three-
abstracts or Powerpoint presentations,
stage process.
. technical reports and submitted papers.
. Round 1 involved performing an electronically based We should note that no quality assessment was performed
search covering the period from start of 1995 until the at this stage in order to ensure maximum coverage. Our
end of 2009. After some initial prototyping of possible search terms also meant that we found a wide range of
terms, we used the terms “software + pattern” papers other than simply empirical papers since a longer
together with the following: experience, investigation, term concern is to determine how well the available
experiment, study, experimental, empirical, apply, use, empirical studies address the issues identified in concept
implementation, application, investigate, investigating, papers and tutorial papers. In order to perform this
experimentation, utilise, utilisation, employ, practice, analysis, we therefore needed to identify as wide a
survey, work, sketch, analyze, analysis, usage, exercise, spectrum of the literature related to design patterns (both
implement, construct. Where appropriate we checked concept and specific instances) as possible.
TABLE 2 here was to ensure that we did not miss any potentially
Round 1 Results by Search Engine relevant material.
As has been observed in other secondary studies, this
process is invariably confounded by the rather casual use
of technical terms in many papers. Particular examples of
this are:
. Few papers that describe themselves as conducting a

case study are actually treating this as an empirical
method, such as the positivist form documented by
Yin [64] that can be readily adapted for software
3.2 Outcomes engineering use [9], [29]. Many of them are observa-
The three rounds of searching identified a total of 611 papers tional narratives that could perhaps be more
after removing papers that did not refer to design patterns as correctly described as “use cases.”
. The term experiment is used very casually. While in
well as any duplicates (found by more than one search
empirical software engineering this is generally
engine). Table 2 provides the final counts from the initial
taken to refer to a randomized controlled experiment
search (Round 1) for each of the search engines. Google
or a quasi experiment employing human partici-
Scholar was used last, due to its lack of precision, and the
pants, many authors use the term in quite cavalier
count for this is for those papers that were not identified
fashion. This is particularly evident in papers that
through the other search engines. Since we were interested describe tools (such as those used to extract patterns
in knowing the proportion of papers that had an empirical from code) and then describe the application of the
focus, the titles and abstracts were used by one of us (CZ) to tool to a chosen software artifact as being an
classify papers (broadly). After inspection of the papers experiment, despite the lack of randomization or of
found, we adopted a scheme of classifying these by using any form of control group.
the themes of tutorial, support tool, empirical, construction, and The 219 papers in the “empirical” category formed the
index to describe their focus. Our choice of these themes was core set for the formal systematic review process itself.
based upon the categories discussed on the patterns web site Following the SLR guidelines, we performed the formal
(hillside.net), although some of our terms such as “con- inclusion/exclusion phase as follows:
struction” differ from those used there. Table 3 explains how
these classifications were interpreted, together with the . Exclusion on the basis of title.
figures for each of these from the three rounds. We should . Exclusion after reading the abstract.
note that most of the papers from Round 2 were from the . Exclusion after reading the full paper.
early years, reflecting some variation in terminology during Fig. 1 provides a schematic of this process, indicating the
that period. number of papers remaining after each step. For each phase,
In the remainder of this section, we examine the profile we both performed an independent analysis of the
for the empirical papers alone. These were considered to be candidate papers and then produced an agreed list. When
any paper that provided some form of data about patterns excluding on the basis of title and abstract we sought to
and employed one of the major empirical forms most retain any papers that might possibly contain useful
commonly used in software engineering, namely, experi- experience about the use of patterns, and hence took a
ment, case study, survey, or experience report. conservative approach, retaining a paper if there was any
possibility. We also calculated the Kappa score for interrater
3.3 The Empirical Papers agreement for our initial levels of agreement. For exclusion
For our initial filtering of these papers we took a liberal on title this was 0.44 (moderate agreement) and for
interpretation of what was meant by “empirical,” including exclusion on abstracts it was 0.60 (verging closely on good).
papers that ranged from those that were essentially The size of the final set is also generally consistent with
informal observational “experience” papers through to those found in mapping studies performed on other design-
those describing controlled experiments. Our motivation oriented topics such as [19], [27], [44], [53].
TABLE 3
Classification Themes Used and Related Counts of Papers
Fig. 1. Overview of the review process (figures between the boxes are the number of papers remaining at each step).
The initial filter identified nine candidate survey papers, use of the pattern concept, but in another context—namely
but none of them provided suitable evidence about specific ubiquitous computing.)
patterns. However, one survey did assess 10 quality During the more detailed analysis, it also became evident
attributes for all of the GoF patterns using a sample of that FS6 [50] was a preliminary report on the first part of the
20 experienced developers [33]. study later reported in FS8, with the latter including
The 13 “experiment” reports involved a range of rigor additional data from a second group who repeated the
and experimental form (for this reason we will refer to these study in a different venue. To avoid double-counting
as “formal studies (FS),” coded as FS1 to FS13). These were we therefore removed FS6 from the set, leaving 11 papers.
largely published as Conference or Workshop papers, only There was also overlap between FS5 [49] and both FS7 and
three being published in journals. FS8, with the second experiment of FS5 being reported
(more fully) in FS7, and the first in FS8 (but reporting a
different count of participants). So FS5 was also removed
4 THE EVIDENCE IN DETAIL from our analysis, although where relevant we use
In this section, we describe the nature of the evidence qualitative comments reported in FS5. FS8 (and hence FS5)
provided in the papers that described formal studies such also reported upon two instantiations of their experimental
as controlled experiments. To do so, we examine the study, occurring in two well-separated locations, with
research questions used for the studies, the way that they different participant profiles, different programming lan-
were organized, the specific instances of patterns that guages, different working modes—and with some different
were studied, the types of participant and task, and the results. So we have labeled these as FS81 and FS82 and,
extent to which the outcomes of similar or replication where appropriate, have treated them as being separate
studies agreed. One study included (FS1) did not study studies for purpose of analysis. We should also note that two
object-oriented patterns and so is not analyzed in this of the studies (FS9 and FS11) were replications of previous
section (it was originally included because it studied the studies, although with different types of participant.
TABLE 4
Research Questions Addressed
4.1 Research Questions upon a specific pattern and its properties (FS2 and FS12).
We examined the papers to identify the type of research From Table 5 we can therefore conclude that only Composite,
question that they were addressing, and these are listed in Observer, and Visitor have been studied very extensively.
Table 4. A significant characteristic was that all of the
4.3 Participants and Tasks
experiments on object-oriented patterns were concerned
with studying and modifying existing systems. We looked at the type of participant used in each study and
their degree of experience with using patterns. We also
4.2 Which Patterns Have Been Studied? examined the way the studies were organized, the tasks
All of the formal studies of OO patterns used only patterns assigned to participants, and the measures used. Our
from the GoF [21], and in terms of purpose they used a mix of summaries of these are presented in Tables 6 and 7.
the five creational, seven structural, and 11 behavioral Details of the participants were generally reported quite
patterns that are catalogued there, as well as including both fully and while many were (perhaps inevitably) taking
class and object patterns (scope). Table 5 summarizes the advanced undergraduate or graduate courses, there was
frequency with which specific patterns were employed. also quite a large contingent who had extensive program-
While 15 of the GoF patterns were used, many were only ming experience and industry experience. Given the not
used in a single study. The patterns that were not studied
were: Builder, Prototype, Flyweight, Proxy, Interpreter,
TABLE 6
Iterator, Mediator, and Memento. Only two studies focused
Details of the Participants
TABLE 5
Patterns Used in the Different Studies
TABLE 7
Tasks and Measures Used
uncommon view that the use of patterns does require a For each study we indicate how many programs were
degree of maturity with object-oriented design and im- employed and indicate their sizes (when known). Where
plementation [56], this did seem to be recognized in this set more than one version was provided, we quote the size of
of experiments. the version that used patterns.
The way that the experimental aspects of the study were
organized so as to address the research question is 4.4 Degree of Agreement between Studies
summarized in Table 7. Where a study is classified as In order to assess the extent to which the different studies find
being “within-subjects” in form, this was generally of the similar effects we have looked at the following two questions.
form: pretest—training—posttest.
One feature of Table 7 is that most of the OO studies 4.4.1 Replication Studies
(FS3-FS13) involved modification and that most of them also There are two cases of replicated studies. Here our question
involved an element of coding. Given that many of the is: “Do the replication studies demonstrate any degree of
qualities advocated for design patterns are concerned with consistency in their findings?”
the more abstract activities of design, such a strong emphasis
on coding seems surprising—although probably reflecting . FS9 uses the material from FS8 (a study of patterns
something of the difficulty of conducting empirical studies as documentation) although extending it to use
involving design activities. Indeed, the only studies that did Javadoc tags, the use of which actually forms the
not involve an element of coding were FS1, FS12, and FS13. focus of the paper. So, although this is a replication
Finally, Table 8 summarizes the details of the different study, its purpose is not actually the study of
programs used for the purposes of comprehension and/or patterns. Both studies used student participants
modification for those studies that used the GoF patterns. and (ignoring the effects of the changes) FS9 claims
to have produced comparable findings. The only
difference reported in FS9 was:
TABLE 8 Compared to the reference experiment, the main difference is the
Programs Used in the Studies improvement in time obtained with the web-based exercise
compared to the paper based.
which was quantified as “the average reduction in time
is 8 percent.” However, the author of FS9 also
observed that he found the opposite results to those
from FS8 for one of the tasks and the hypothesis
related to time needed to perform the task. The
authors of FS8 had noted that their results for this
task did not support the hypothesis for instantiation
FS81 and offered an explanation of why they
thought this had occurred. This related to the use
of paper-based solutions for this part of the study,
making it difficult for participants to check correct-
ness—whereas for FS82 Unix workstations were
used, allowing checking, (although there were also questions, many of the papers focus upon the tasks
too few participants for analysis). Unfortunately, the involved in the studies, and it is not uncommon for these
author of FS9 does not offer any explanation for the to involve more than one design pattern. Only the studies
difference that he observed. reported in FS5 (for FS8), FS7, and FS11 provided
. FS11 is a replication of FS7 (comparing the main- qualitative comments about the effects of individual
tainability of designs developed with patterns to patterns upon the participant’s activities and solutions,
those developed without them). Both studies used and even for these, only limited information is available.
professional software engineers, but for FS11 these
used a software development environment rather . For Composite, FS5 (for FS81 and FS82 ) considered its
than paper printouts. The conclusions of FS11 use as beneficial when used with Visitor (as noted
emphasized the need to assess patterns on an above, many tasks involved multiple patterns,
individual basis and found differences in the results making it hard to separate their effects, with this
for two patterns (Visitor and Observer). With regard being particularly true for Composite). However, in
to these two patterns the authors of the original FS11, its dependence on knowledge about recursion
study (FS7) observed: created problems, which the authors considered
A negative effect from unnecessary application of the Observer might be because recursion was unfamiliar to the
pattern, particularly for subjects with low pattern knowledge. participants, while FS7 noted no specific effects.
. For Observer, FS11 noted no problems arising from
While for the replication (FS11) the authors noted:
its use (see quote above). The only comments in FS5
Our results differ somewhat from those found by Prechelt et al. related to FS81 and concerned the benefits arising
(2001) especially in the case of Visitor and Observer. While they
from the use of pattern-specific documentation.
found Visitor to be without significant harmful effects, few of our
subjects achieved a good solution with it, even after the course. By However, the replication in FS11 “did not find the
contrast, we observed no significant harm done by using the negative effect that was observed in the original
Observer pattern experiment” in terms of the expectation of easier
implementation when provided with pattern doc-
Neither FS9 nor FS11 can be considered to be close
umentation. FS7 observed that participants needed
replications [42]. As is apt to occur with software
longer to understand the task involving this pattern,
engineering replications, both were differentiated
and that the greater complication introduced by its
replications, as the researchers made some changes
use was only justified where the increased flexibility
to the form of the experiment. (Lindsay and Ehren-
was needed.
berg argue that close replications are needed first in
. For Visitor, FS5 noted that (when used with
order to establish consistency of results, with
Composite) its use resulted in less time being taken
differentiated studies then helping to determine
(FS82 ) or led to fewer errors (FS81 ). In contrast, FS7
how widely these apply [42].) So, the lack of any
noted no specific effects, and that its presence did
close replications does make it more difficult to draw
not increase the time needed to perform the tasks.
any firm conclusions about consistency. However, in
FS11 considered it complicated, leading to longer
both cases the replications used participants with development time and poor correctness, to the
similar backgrounds to those used in the originals extent that participants avoided it. FS12 specifically
(students for FS9 and FS8, software development set out to study this one pattern, using an eye tracker
professionals for FS11 and FS7). A possible explana- to determine how participants moved their attention
tion for the differences in the second pair may have around a class diagram. The authors concluded that
arisen largely from the difference in the extent to it “does not reduce the subjects’ efforts for comprehension
which participants were familiar with patterns— tasks” but that where a UML diagram is provided in
which could have been greater for the professionals the canonical form for this pattern then it “reduces the
than would be likely to occur for students—but developer’s efforts for modification tasks.”
equally it could have arisen from the change in Factors such as task size and background of participants
working mode, paper versus software tool. may well have influenced these differences, and the extent
The variations produced in these two replication studies of these does suggest that care is needed when using the
do suggest that there is a need to perform close replications outcomes from individual studies.
before differentiated ones. In both cases there was sufficient
variation in the experimental conditions to explain the
differences in the results, with no means of determining the 5 TRIANGULATION WITH EXPERIENCE
actual cause. Given the limited set of (relatively rigorous) experimental
studies, as well as the limited and sometimes differing
4.4.2 Patterns Studied Most conclusions from these about individual patterns, we
For this group, we asked the question: “Where the same decided to investigate whether other forms of empirical
pattern is studied in a number of experiments, do these reinforce study might help resolve or explain these differences. To do
one another or show different effects?” this, we reviewed the set of papers that we had noted as
Three patterns have been studied rather more than containing “experience” during inclusion/exclusion. For
others: Composite, Observer, and Visitor, largely in the same this purpose we regarded an experience paper as being one
set of studies. However, in addressing their research that both provides a set of observations that are based upon
TABLE 9
Data Extraction Form Used for Experience Papers
practical experience with using design patterns, and also differences were resolved. The final selection process was
where these observations are summarized as “lessons” that then based upon two main criteria:
have explicitly been derived from the experience. We
examined 22 candidate papers in detail, eventually selecting . That the paper identified specific design patterns
seven of these. (necessary for our original purpose of triangulation
with the results from the experiments).
5.1 Data Extraction . That the “lessons” described in a paper were linked
As these papers are far less homogeneous than those to specific experiences.
describing experiments, in order to extract the data consis- For the second and third group of questions it proved
tently we employed a data extraction form that was adapted difficult to answer every question for all papers due to the
from the one used for the experiments. The form (summar- standards of reporting (we discuss this further in Section 6).
ized in Table 9) consisted of three groups of questions. Where For the second group (study context), this was made more
we provided specific codes for classifying an element, this is problematical both because our chosen classification
indicated; otherwise, entries were free text. schemes proved to be a poor match and also because it
often proved difficult to determine the right category from
1. Q1-4. Citation details. Describing the source and how the available information (where one of us extracted a value
it was found. and the other did not, we noted this as being a disagree-
2. Q5-11. Study context. These questions were used to ment and then resolved it later). Examining the disagree-
identify the characteristics that might influence the ments revealed that these were essentially differences of
way that any outcomes should be interpreted and interpretation. This was far less of a problem for the third
weighted. We were interested in knowing how group (information provided).
independent the authors were (by looking at the
references to see if they were authors of patterns or 5.2 Outcomes
books about patterns), in knowing about the type Our expectation was that we might gain some qualitative
and size of system(s) that provided the source of the assessments about individual patterns from these studies.
experiences, in knowing whether these experiences These could then be combined with the qualitative elements
were first hand or not, in determining the level of that are provided in some of the reports on experiments (the
abstraction at which the experiences were discussed triangulation element). However, a major limitation affect-
(design or coding), and in establishing how these ing almost all of the papers was the lack of any clear link
were related to the software life cycle. between the experiences described in the paper and the
3. Q12-14. Information provided. These addressed the conclusions (or “lessons”) drawn by the authors. By failing
details of the patterns involved, the conclusions to provide the cause-effect audit trail that is needed for any
about them, and how these were derived from the findings to be considered scientifically reliable, the authors
experiences. made it impossible to assess the validity of their conclu-
For each paper this form was completed by both authors, sions. In particular, this does mean the “expert assessment”
working independently. Our conclusions were then put into elements of these reports are implicit rather than explicit. So
a spreadsheet, checked for consistency, and then any the only papers that we retained for further analysis were
those where we judged that there was sufficient information 4. O4. This report describes the experience of applying
provided about the linkage between cause and effect for us design patterns to construct a flexible system [43].
to have confidence in the assessment element. While the lessons are based upon specific patterns,
In the next sections we summarize the set of papers that they are not directly linked to them and we primarily
did provide some element of linkage between experiences retained this paper because of its contribution in
terms of the experiences related to flexibility.
with specific patterns and their conclusion. We then report
5. O5. The basis for this report was the experiences
on how well we were able to extract data from the wider set
from developing a customisable diagram editor [63].
of candidate papers, identifying some of the reasons why The authors discuss one architectural pattern (not
this set was selected. used here) and six design patterns. For these, they
assess their experiences in terms of the GoF claim
5.3 The Included Papers
that the insight from using patterns would “make
To distinguish this group of papers, we will refer to them as your own designs more flexible, modular, reusable,
O1 to O7 in the rest of this paper (using “O” for and understandable,” assessing the experience for
“observational”). The variety they exhibit makes it difficult each pattern in terms of those qualities. Their
to tabulate their characteristics in the same way as for the summary focuses on the different effects noted for
formal studies, so we have provided a brief summary of behavioral and structural patterns.
each one here. 6. O6. This study draws upon the industrial experience
of one of the authors in order to assess three patterns
1. O1. The focus is upon reuse in developing commu- (Singleton, Abstract Factory, and Facade) with the aim
nications software [55]. A relatively early paper, and of assisting readers with recognizing potential pit-
partly proselytizing in nature, it provides a long list falls [16]. It particularly assesses the problems that
of lessons but does link these (directly and can arise in the context of implementing these using
indirectly) to the experiences gained from develop- Java, and its main limitation is that the industrial
ing three systems. The paper concludes that using experience is distilled into abstract examples rather
the Reactor pattern (a variant of Observer) has the than cited directly.
benefit of increasing portability and avoiding the 7. O7. While at different points the authors describe
need for threads when handling events from this paper as an experiment, a quasi experiment, and
multiple devices, offset by the loss of handler a case study, it is largely observational in nature [14].
preemption as well as creating more complex flow It reports on the experiences of 23 postgraduate
of control to complicate debugging. As such it students who were asked to develop two versions of
assesses the outcomes in terms of both benefits and a system (chosen from a range of topics) working
disadvantages. either as individuals or in pairs. The first version
2. O2. Provides a description of experiences gained in was required to be without pattern use, and in the
developing an IDE [65]. Again, it is a relatively early second they were asked to resolve any problems
paper, but it does specifically report on the use of identified in the first by using at least two different
three specific patterns (Composite, State, and Abstract patterns. The study notes that “students had
Factory). However, their assessment that Composite difficulties in demonstrating exactly which problems
eases the use of similar operations with different have been solved by each pattern.” There is little
elements, while State simplifies the design of event detailed analysis, but the authors did identify three
responses, and Abstract Factory makes it easy to issues as “most frequently reported by the students”
extend to new notations is not illustrated by any that were resolved through the use of Bridge, State,
specific examples from the system they developed. and Observer. Unfortunately there is very little in the
The emphasis is also upon benefits, with any way of causal links in the reporting.
description of disadvantages being related to rather We now describe how well the different characteristics were
generic issues rather than the specific patterns. reported in the experience papers we examined. While our
3. O3. This report, by Wendorff, provides a valuable discussion ranges across the full set of papers that were
example of clear reporting of experiences [60]. Its
candidates for inclusion, we will particularly draw exam-
focus is upon the experiences derived from main-
ples from the group above.
tenance rather than development, related to a large
system that had been developed with patterns being 5.4 Describing the Source of Experiences (Q6-Q11)
used for many parts. It links the discussion of four
Q6 (“form of study”) proved difficult to code for this group
particular patterns (Proxy, Observer, Bridge, and
of papers (experience papers are mostly observational, but
Command) to specific experiences with changing
the structure of the system. The paper distinguishes organized in different ways), and most of the issues involved
between situations where the characteristics of were essentially captured by the remaining questions.
specific patterns may be considered undesirable Hence, we did not separately analyze the results for Q6.
(such as the added indirection involved in using For Q7 (“type of system”), many papers did not report
Proxy), as well as where they were used inappropri- this, although it is quite important for the reader who
ately (and why this was so), together with the might want to be reassured that the experience provided is
consequences for the system maintenance. relevant to their needs. Of the papers described in the
TABLE 10
Patterns Studied in Experiments and Experience Papers
previous section, O1, O2, O3, and O5 did report the basic to include the set of papers O1-O7 described in Section 5.3,
purpose of the systems involved (although Wendorff gave in many papers they were either poorly reported or not
only a very vague indication). There may well be good reported at all.
commercial reasons for not giving much detail, but that
should not prevent enough being given for general 5.5 Describing the Experiences (Q12-Q14)
classification purposes. For O7 the range of project topics Few of the papers provided descriptions that were
was listed. explicitly related to specific patterns and derived from
Q8 was intended to extract the size of the system that their experiences regarding these. Even fewer provided
formed the source of these experiences. Only four papers descriptions that linked their experiences to their conclu-
from those considered gave any indication of this, and then sions. The most positive example was the paper by
mostly indirectly. In addition, these four used a range of Wendorff (O3), which did discuss specific patterns and
different measures, two of which were internal measures provided some specific conclusions about these, illustrated
(KLOC, number of classes) while the third was an external by descriptions derived from the experiences. O5 and O6
measure (years of development plus number of developers). also linked experiences to specific patterns, but in each case
Of the set used in this study, O3 provided a single measure of the experience itself was only described in the abstract.
size (1,000 KLOC) and O5 provided two measures: 50 KLOC
and 173 classes. For O7 there was a range of systems 5.6 Triangulation with the Experiments
produced, with average size around 800 LOC. Our original purpose was to identify if any patterns were
Q9 addressed the issue of whether the experience was studied using more than one form, and the extent to which
from the authors’ own development work or that of others these agreed. Table 10 lists the patterns studied in both the
(allowing for the possibility of being both). Apart from O7, formal studies as well as the experience papers.
authors appeared to largely report on their own experiences. Table 10 indicates that the patterns which have the
Q10 sought to distinguish between experience that was widest mix of study types, and hence which offer good
gained from working at a design level of abstraction or from opportunity for triangulation, are Composite, Observer, and
using (code-oriented) realizations of patterns. Because these Visitor. (For Observer, O1 is in brackets as the paper reports
do not have clear boundaries, we sometimes disagreed in upon a variation of Observer (namely Reactor), and so we
our interpretations. cannot be sure that any comments about that will be
Finally, Q11 focused on a rather important distinction, applicable to Observer).
which was whether a paper described experiences gained Overall, there was disappointingly little overlap between
from development of a system or from maintenance the sets of patterns investigated using the two forms. In
activities. While the former was by far the most common particular, the results from experience that were reported
(unlike the experiments), if not always clearly identified, most effectively (by Wendorff) were all for patterns that
studies based on maintenance seem to provide a better were only covered to a very limited degree in
retrospective view of how the use of patterns to structure a the experiments. In the rest of this section, we summarize
system can affect its form and performance. the outcomes from the studies of the three patterns that
Overall, while these characteristics proved to be useful in have been examined most comprehensively: Composite,
terms of providing the context for supporting the decision Observer, and Visitor.
TABLE 11 TABLE 13
Summary of Qualitative Assessment for Composite Summary of Qualitative Assessment for Visitor
5.6.1 Composite
Unfortunately, the available experience data for Composite
lacks any examples that might make it possible to compare
it with the results from the experimental papers (which also
produce conflicting conclusions). Table 11 summarizes the
different conclusions about this pattern.
the qualitative data from the experiments (where available),
In many studies the use of Composite went alongside that
plus that from the three experience papers.
of other patterns, and while these sometimes attracted
From Table 12 we can see some benefits of triangulation,
comments, Composite did not. Overall, we can assume that even if this is limited in scope. Both O3 and O5 provide
the use of Composite, while needing some knowledge of some agreement with the results for Observer from the
recursion, does not tend to create particular problems. different experiments. The observations from O3 show
some agreement with FS7, although providing no specific
5.6.2 Observer explanation of why this overcomplication might occur. O5
As noted earlier, there were somewhat differing conclusions does, however, offer some explanation of the results from
about Observer from the experiments. Table 12 summarizes FS81 and FS11 in noting the likely distinction between the
ease with which the two groups of participants were likely
to be able to understand the code. There is also some small
TABLE 12 reinforcement from one survey [33], where understand-
Summary of Qualitative Assessment for Observer
ability was seen as being decreased by the use of Observer.
5.6.3 Visitor
The results for Visitor are reported in Table 13. These are
complicated by its being coupled with the use of Composite
in many of the experiments that reported on its use (FS7,
FS8, and FS11). (While the use of multiple patterns is
realistic in the sense that pattern instances do interconnect
within systems, from an experimental viewpoint it does
make it harder to disaggregate the contributions of specific
patterns.) FS8 provides little real information (the focus of
the paper was on documentation), and as already observed,
FS7 and FS11 provide conflicting results, with FS11
observing that despite thorough training: “most subjects
either ignored the pattern or were confused by it” and also
noting the poor correctness of solutions. FS12, which
specifically studied this pattern, was ambivalent about its
effect upon modification tasks. The one available observa-
tional study (O5) tends to agree with FS11 and FS12. Again,
the survey of expert developers in [33] suggests a negative
effect for understandability when using Visitor. Taken
together, these results do at least partially indicate that
designs based upon the use of Visitor, while likely to be
harder to understand than nonpattern equivalents, may
possibly be easier to modify
So, while we do have some examples of how observa- necessary to read parts of the paper in order to make a
tional studies may help with the interpretation of the results decision (reinforcing [10, Lesson 10]).
from experiments, we were unable to perform any sig- The external validity of our findings in terms of
nificant degree of triangulation between the different aggregated knowledge about the patterns is influenced by
empirical forms used in studying the other patterns. In the following two factors:
part, this was largely because the poor reporting standards
. No patterns have been studied very extensively;
encountered resulted in our having too few usable data
. The experimental studies of OO patterns mostly
points, but equally because there were simply not enough
involved tasks that required a mix of understanding
studies available. In addition, most of the more formal
and modification, rather than the development of
studies were primarily concerned with examining issues
new systems. So the influence of patterns upon
such as documentation of patterns, rather than the effects of
development has only been studied for a few patterns.
the patterns themselves.
We have drawn upon these experiences to propose a One other aspect that should at least be noted is that most
set of preliminary reporting standards for experience of the experimental papers are from two research groups, in
papers [12]. (Major headings for which we offer specific Karlsruhe (with a preference for within-subject studies) and
ideas are: title & ownership, structured abstract, key- Hong Kong (preferring between-subject studies).
words, introduction, background, lessons, threats to
6.2 Answering the Research Questions
validity, conclusions & further work.) This is in the hope
of encouraging better reporting since such papers clearly The answer to our first question (“which of the GoF
do have the potential to provide an explanation of patterns have been evaluated empirically?”) is essentially
experimental results, as well as to provide some assess- summarized in Table 10. In all, 15 of the 23 patterns have
ment of how well they represent practice. been subjected to some form of formal empirical evaluation,
albeit six of them (Singleton, Adapter, Bridge, Chain of
Responsibility, Command, and Strategy) were only ad-
6 DISCUSSION dressed in one study.
6.1 Threats to Validity There was relatively little explanation in the studies that
For a secondary study such as this, we can identify three we found as to why particular patterns were chosen. This
quite specific threats to validity that need to be considered: may perhaps have been because all of these studies were
conducted by software engineers, with the choice of patterns
. the effectiveness of the search process (internal), used arising from a mix of opportunism (availability of
. the selection and classification processes (internal), suitable material) and their own experiences. Had they been
. the completeness and the extent of the analysis as conducted by cognitive scientists to explore particular
reported in the primary studies (external), design activities, then the choice might have been explained,
and we briefly examine each of these in turn. but might have used patterns of less interest to software
For the search process, experience from conducting other developers. Certainly, for all but one of the papers addres-
studies of this form in software engineering suggests that a sing the use of OO patterns, the choice was determined by
wide range of search engines should be employed since no the set of applications employed to provide tasks for the
search engine will find all of the relevant primary studies
participants—and the selection of these seems to have been
[10, Lesson 8]. Hence, we have used a broad set of search
based on availability and the need for a tractable size so that
engines, backed up by a quite thorough manual search, as
well as snowballing. So we can be reasonably confident that participants could understand and change the code in the
we have identified most of the relevant studies published in available time. In some cases, the choice was, of course,
the computer science and software engineering literature. determined by the decision to perform a replication.
The process of inclusion/exclusion was performed in The only two studies that did choose a pattern for a
two stages. The first, which consisted of excluding papers specific reason were FS2, where the purpose was to study
that were not about software design patterns and then the use of the Factory Method pattern for API design, and
categorizing these, was performed by one person. For the FS12, which studied the Visitor pattern. Both of these were
empirical papers, the subsequent process involved sifting motivated by informal experiences of the researchers and,
first on title, then on abstract, and finally on the complete in the case of FS12, by differences in the outcomes of
paper and was undertaken by both of us, working previous studies.
independently and then resolving any differences. Subse- Formulating an answer to the second question (“what
quent decisions about classification of the empirical papers lessons about their use, and its effectiveness, are available
proved to be rather more difficult, as demonstrated by the
from these empirical studies?”) is constrained by the focus
interrater measures: For some aspects, such as the form of a
upon maintenance activities (understand and modify) in so
study, we achieved good agreement, but more subjective
elements, such as the issue (from our predefined list) many of the studies. While this is very relevant to the
addressed by a paper, required joint discussion and usefulness of patterns (and to their effectiveness), we are
resolution. Again though, since we have erred on the side still left with little knowledge that might form useful
of caution at each step, it seems unlikely that we have lessons about their role in development.
inappropriately excluded any material. We might also This is arguably the consequence of the emphasis upon
observe that for both selection and classification purposes, the use of laboratory experiments, a point that we return to
the abstracts provided were largely inadequate, making it below. Overall, therefore, we can currently draw little in the
TABLE 14
What the Studies Concluded about the Claimed Advantages
way of conclusions about the conditions necessary for a addressed in the conclusions from each of the studies.
given pattern to be successful (or unsuitable). From this we suggest that:
For the three patterns where there are enough studies for
us to seek a more comprehensive answer to this question, . There is reasonably good support for the claim that
Tables 11, 12, and 13 do offer some indication of how we using patterns improves communication between
developers and maintainers, at least when these are
can use observational studies to interpret the results from
appropriately documented.
experiments—and there is also a clear warning about the
. There is no support for the claim that using patterns
risk of overly complicated solutions if a pattern is used
helps novices learn about design.
inappropriately.
. In terms of their effect upon productivity and quality
The third question (“what further research”) poses a
the results are ambivalent—arguably these issues are
dilemma. Would it be better to augment the data available more strongly affected by the nature of individual
for those patterns that have already been studied, or instead patterns than the other two.
to explore those patterns that either have limited coverage or
Overall, we can only conclude that what is clearly
are entirely unassessed? Our recommendation would be for
needed is more studies about patterns.
the former—there is at least a baseline here, and some better
reporting of experience could potentially create a more 6.3 The Excluded Papers
substantial corpus. Also, many of these patterns have been There are two groups of studies that fall into the
chosen on the basis of experience or because their use is quite “empirical” category and that, while not meeting the
widespread. Related to this is the question of the form of criteria for inclusion, should be noted here as providing
study that might be used—the experiences from this study valuable knowledge about patterns and their use.
and from the other studies reported in Table 15, discussed The first group is the surveys. We have already noted
below, suggests that wider use of forms such as case studies something about one of these that examined quality
might usefully be considered by researchers to allow deeper attributes for the GoF patterns [33], [34]. While the sample
exploration, and that close replications are also needed. size used in this was small (20 developers), the paper does
It is also useful to look at the extent to which the formal make a useful contribution in terms of identifying the
studies address the four “claims” from [52] that we noted in aspects of patterns that matter to developers. A rather
Section 2.2. Table 14 summarizes how far these were different form of survey is reported in [26], which involved
TABLE 15 other methods. Table 15 summarizes the main features of

Characteristics of Secondary Studies of Design Practices these three studies.
Looking at Table 15, a significant difference with this
study is evident in the nature of the primary studies
involved. For the three previous studies, many of the
primary studies were case studies (however defined), with
significantly less emphasis upon coding and/or laboratory
experiments. Examining these a little more closely is quite
informative.
The study of adoption of the Rational Unified Process
[27] is a topic where the use of experiments might be
considered less appropriate. Consequently, the authors
designed their study protocol to address qualitative data,
and identified a set of papers that seem to correspond well
with those we have described as experience. The process of
inclusion/exclusion is described very thoroughly, and we
summarize the three key stages below:
. Exclusion on title and abstract. Their key criterion here

was that “the study must present empirical data
beyond anecdotal evidence,” and resulted in redu-
cing their initial set of 100 candidate papers to 36.
. Exclusion on inspection of full text. This involved
making a quality judgment about the following
analyzing design documents from 988 open-source projects. issues and excluding any papers considered to be
The author notes the lack of empirical data to support the “insufficient” with respect to them.
claims for patterns and concentrates upon the GoF patterns
- a well-defined and limited study aim,
alone. The analysis identifies how widely individual
- adequate description of the study method,
patterns were used across projects, but is otherwise focused - sufficient description of study context,
upon developer behavior. As such, it can help future - presentation of study effects,
researchers to identify which patterns should be investi- - a thorough analysis of results,
gated from the perspective of how widely they are - conclusions and answer related to the study
deployed. We might also note that, while it is not a survey aim.
as such, the literature review performed in [59] reports on This had the effect of reducing the 36 candidates to
the set of patterns found in the code that was analyzed. five (5).
This last paper provides a link into the second group of . Exclusion on team review. This again used the six
papers, which are based upon what we have termed “code criteria above, and resulted in the five studies being
analysis.” These papers expand upon the ideas in two reduced to a final two.
studies led by Bieman, using metrics to examine how the Overall, the process is generally similar to that which we
part of a design that is based upon design patterns evolves used, in demanding relative rigor from qualitative studies,
over time [7], [8]. Other papers in this group include [4], [5], and the effect on the final count is not dissimilar.
[6], [22], [30], [47], [59]. While for our purposes these papers The study on Model-Driven Engineering [44] provides
do not provide any pattern-specific data that could be little detail about either the searching process, although this
included in our analysis, they do point the way toward did seem to include snowballing, or the inclusion/exclusion
making wider use of code analysis in studies of patterns. process. Thirty-three papers were found, but eight (8) were
then excluded as lacking any detail about the application
6.4 Comparison with Other Studies of Design concerned, leaving 25, although not all of these appear to
Practices have been used. Their third research question was generally
The secondary studies related to design that have been similar to ours: “What evidence do we have on the impact
conducted so far are rather unevenly distributed across the of MDE on productivity and software quality?” The authors
knowledge headings used by the SWEBOK to categorize make the comment about data extraction that “it was not
software engineering activities and knowledge [1]. For possible to extract information on the size of the projects for
design activities, we can identify three studies that are the majority of papers,” again reflecting our experiences.
clearly relevant. Two of them are primarily concerned with Of their final set of papers, only three provided information
adoption rather than form of design practices: model-driven that could be used to answer the third research question
engineering [44] and the Rational Unified Process [27]. The described above, with these consisting of: a company case
remaining one comprises a study of what is known about study, an EU project with six small case studies and quasi
agile methods [19]. However, this was effectively limited to experiments, and one report of the redevelopment of a
reviewing studies of XP as there were only a few studies of legacy project.
The systematic review of agile methods [19] describes the study of design patterns. Recommendation 3: Observa-
quite rigorous screening procedures and criteria for quality. tional studies need to be reported rigorously and ensure that the
They found more primary studies than expected (based on links between any conclusions and the reported experiences are
the three nonsystematic reviews that preceded it), although explicit. (See [12].)
the focus of the primary studies was almost entirely upon Indeed, noting the potential that a survey offers for a
XP. The case studies and surveys were predominantly more short-term aggregation of experience, we have under-
undertaken in industrial settings. They noted that in many taken a survey into experiences with patterns for software
cases: “methods were not well described; issues of bias, development, using a set of questions that are based upon
validity, and reliability were not always addressed; and the data extraction model that we used for the experience
methods of data collection and analysis were often not papers [66]. Our survey received 206 usable responses and
explained well.” Given the variation in the subject matter, it identified a small number of GoF patterns as being highly
was perhaps not surprising that many of the primary valued (e.g., Observer), rather mixed views about another
group (such as Visitor), and adverse opinions about another
studies were concerned with issues such as adoption. Of the
group (particularly Flyweight).
four that did comparative studies of productivity, three
We would therefore suggest that the “gap” between the
found improvements from adopting XP, although
existing experimental and observational studies might be
the authors add a caveat to the effect that “none of these most effectively spanned by performing a set of case studies
studies had an appropriate recruitment strategy to ensure with experienced practitioners, spanning development to
an unbiased comparison.” maintenance, and focusing upon a small number of the
When comparing our study with these, we can see many more controversial patterns (chiefly Visitor, Singleton, and
parallels: relatively small numbers of primary studies, Facade). Such studies could reveal more insight into the
quality issues with the primary studies, and poor reporting longer term consequences of design decisions regarding the
standards for observational studies. use of patterns and upon their effect during maintenance.
6.5 Implications for Future Studies
One purpose for a mapping study such as this is to identify 7 CONCLUSIONS
where further studies are needed and to help researchers Our mapping study has shown that, despite their promi-
determine how they might be organized. While the set of nence in software engineering, design patterns have not been
empirical studies reviewed in this paper provide a valuable subjected to other than limited empirical evaluation, and that
contribution to our understanding of how and when design much of this has also only been studying patterns indirectly.
patterns might best be deployed, they also exhibit some However, Table 14 indicates that there is some qualitative
significant limitations. support for the value of patterns as an aid to maintenance
First, the emphasis upon the use of experiments clearly (when documented as such), and also that patterns do not
forms a limitation when studying design activities. Their appear to help novices learn about design. Beyond that, the
use imposes a restriction upon both the depth (duration) of quality of the available studies generally proved inadequate
such studies and for within-subject forms in particular, they for us to be able to identify any firm guidelines about when
also limit the experience of participants (who are so often to use (or not to use) particular patterns, and more design-
students). Table 6 emphasizes this quite clearly. Looking at centric evidence is very much needed.
Table 7, we might also question whether measures such as Design patterns can potentially provide the “knowledge
time to complete a task or the number of errors in the code schema” role described by Détienne [17]. However, the
are suitable measures to employ when studying design variety of form and scope that arises means that “blind”
activities? Recommendation 1: The use of experiments, application of patterns with any sense of the potential
particularly using student participants and short-term tasks, limitations is unwise (as is illustrated by the example of the
should be used with care for studies of design. Case studies may be conclusions of FS2 that “the factory pattern erodes the
more appropriate for exploring the complex cognitive issues usability of APIs in which it is used” [20]).
involved [64]. Our study indicates that we are currently far from
A second limitation is the basis for choosing which having the necessary degree of knowledge for making
patterns to study. Only two studies set out to address the evidence-based judgments about when to employ indivi-
question of how a specific pattern was to be employed. dual patterns. Hence, we hope that the recommendations
There are some obvious issues here that it might be will provide a valuable framework for future studies. A key
valuable to investigate, for example, whether (say) beha- role for a mapping study is to provide a baseline against
vioral forms are more easily learned and used by novices which further research can be positioned to best effect. As
than creational or structural ones—and the influence of such, the outcomes from this study provide some important
scope? Recommendation 2: Studies of design patterns should pointers as to where this should be focused and how it
use research questions that are related to specific patterns and might be organized.
their roles. (As in the example of [20].)
Third, and relating especially to the experience papers,
reporting standards are simply too poor for the data to be ACKNOWLEDGMENTS
reliably aggregated with that from more rigorous forms of This work was partly supported by an award from the UK
study. While we have sought to use this experience Engineering & Physical Sciences Research Council (EP/
positively by proposing a set of reporting guidelines [12], E046983/1: Evidence-based Practices Informing Computing
in this case it unfortunately provides little that is useful for (EPIC)). The authors would like to acknowledge the help
received from Professor Barbara Kitchenham and they are [24] R.L. Glass, I. Vessey, and V. Ramesh, “Research in Software
Engineering: An Analysis of the Literature,” Information &
very grateful to the anonymous reviewers for their Software Technology, vol. 44, pp. 491-506, 2002.
observations and suggestions. [25] R. Guindon, “Knowledge Exploited by Experts during Software
System Design,” Int’l J. Man-Machine Studies, vol. 33, pp. 279-304,
1990.
REFERENCES [26] M. Hahsler, “A Quantitative Study of the Adoption of Design
Patterns by Open Source Software Developers,” Free/Open Source
[1] Guide to the Software Engineering Body of Knowledge, A. Abran, J.W. Software Development, S. Koch, ed., Idea Group, Inc., 2004.
Moore, P. Bourque, and R. Dupuis, eds. IEEE Computer Society,
[27] G.K. Hanssen, F.O. Bjørnson, and H. Westerheim, “Tailoring and
2004.
Introduction of the Rational Unified Process,” Proc. 14th European
[2] B. Adelson and E. Soloway, “The Role of Domain Experience in
Conf. European Systems and Software Process Improvement, pp. 7-18,
Software Design,” IEEE Trans. Software Eng., vol. 11, no. 11,
2007.
pp. 1351-1360, Nov. 1985.
[3] C. Alexander, S. Ishikawa, M. Silverstein, M. Jacobson, I. Fiksdahl- [28] B. Hayes-Roth and F. Hayes-Roth, “A Cognitive Model of
King, and S. Angel, A Pattern Language. Oxford Univ. Press, 1977. Planning,” Cognitive Science, vol. 3, no. 4, pp. 275-310, 1979.
[4] L. Aversano, L. Cerulo, and M. Di Penta, “Relationship between [29] M. Höst and P. Runeson, “Checklists for Software Engineering
Design Patterns Defects and Crosscutting Concern Scattering Case Study Research,” Proc. First Int’l Symp. Empirical Software
Degree: An Empirical Study,” IET Software, vol. 3, no. 5, pp. 395- Eng. and Measurement, pp. 479-481, 2007.
409, Oct. 2009. [30] C. Izurieta and J.M. Bieman, “How Software Designs Decay: A
[5] L. Aversano, G. Canfora, and L. Cerulo, “An Empirical Study on Pilot Study of Pattern Evolution,” Proc. First Int’l Symp. Empirical
the Evolution of Design Patterns,” Proc. Sixth Joint Meeting of the Software Eng. and Measurement, pp. 459-461, 2007.
European Software Eng. Conf. and the ACM SIGSOFT Symp. The [31] M. Abdul Jalil and S.A. Mohd Noah, “The Difficulties of Using
Foundations of Software Eng., pp. 385-394, 2007. Design Patterns among Novices: An Exploratory Study,” Proc.
[6] L. Aversano, L. Cerulo, and M. Di Penta, “Relating the Evolution Fifth Int’l Conf. Computational Science & Applications, pp. 97-103,
of Design Patterns and Crosscutting Concerns,” Proc. Seventh IEEE 2007.
Int’l Working Conf. Source Code Analysis and Manipulation, pp. 180- [32] S. Jeanmart, Y.-G. Guéhéneuc, H. Sahraoui, and N. Habra,
192, 2007. “Impact of the Visitor Pattern on Program Comprehension and
[7] J.M. Bieman, G. Straw, H. Wang, P.W. Munger, and R.T. Maintenance,” Proc. Third Int’l Symp. Empirical Software Eng. &
Alexander, “Design Patterns and Change Proneness: An Exam- Measurement, pp. 69-78, 2009.
ination of Five Evolving Systems,” Proc. Ninth Int’l Software [33] F. Khomh and Y.-G. Guéhéneuc, “Perception and Reality: What
Metrics Symp., pp. 40-49, 2003. Are Design Patterns Good For?” Proc. 11th ECOOP Workshop
[8] J.M. Bieman, D. Jain, and H.J. Yang, “OO Design Patterns, Design Quantitative Approaches in Object Oriented Software Eng., p. 7, 2007.
Structure, and Program Changes: An Industrial Case Study,” Proc.
[34] F. Khomh and Y. Guéhéneuc, “Do Design Patterns Impact
IEEE Int’l Conf. Software Maintenance, pp. 580-589, 2001.
Software Quality Positively?” Proc. 12th European Conf. Software
[9] O.P. Brereton, B.A. Kitchenham, D. Budgen, and Z. Li, “Using a
Maintenance and Reeng., pp. 274-278, 2008.
Protocol Template for Case Study Planning,” Proc. 12th Int’l Conf.
Evaluation and Assessment in Software Eng., 2008. [35] B.A. Kitchenham, T. Dybå, and M. Jørgensen, “Evidence-Based
[10] O.P. Brereton, B.A. Kitchenham, D. Budgen, M. Turner, and M.A. Software Engineering,” Proc. 26th Int’l Conf. Software Eng., pp. 273-
Khalil, “Lessons from Applying the Systematic Literature Review 281, 2004.
Process within the Software Engineering Domain,” J. Systems & [36] B. Kitchenham, P. Brereton, D. Budgen, M. Turner, J. Bailey,
Software, vol. 80, no. 4, pp. 571-583, 2007. and S. Linkman, “Systematic Literature Reviews in Software
[11] D. Budgen, J. Bailey, M. Turner, B. Kitchenham, P. Brereton, and S. Engineering—A Systematic Literature Review,” Information &
Charters, “Cross-Domain Investigation of Empirical Practices,” IET Software Technology, vol. 51, no. 1, pp. 7-15, 2009.
Software, EASE special section, vol. 3, no. 5, pp. 410-421, Oct. 2009. [37] B. Kitchenham, D. Budgen, P. Brereton, M. Turner, S. Charters,
[12] D. Budgen and C. Zhang, “Preliminary Reporting Guidelines for and S. Linkman, “Large-Scale Software Engineering Questions-
Experience Papers,” Proc. 13th Int’l Conf. Evaluation and Assessment Expert Opinion or Empirical Evidence?” IET Software, vol. 1, no. 5,
in Software Eng., pp. 1-10, 2009. pp. 161-171, Oct. 2007.
[13] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. [38] B. Kitchenham and S. Charters, “Guidelines for Performing
Stal, Pattern-Oriented Software Architecture. Wiley, 1996. Systematic Literature Review in Software Engineering,” Technical
[14] A. Chatzigeorgiou, N. Tsantalis, and I. Deligiannis, “An Empirical Report EBSE 2007-001, Keele Univ. and Durham Univ. Joint
Study on Students’ Ability to Comprehend Design Patterns,” Report, 2007.
Computers & Education, vol. 51, pp. 1007-1016, 2008. [39] B. Kitchenham, R. Pretorius, D. Budgen, P. Brereton, M. Turner,
[15] E. Chung, J. Hong, M. Prabaker, J. Landay, and A. Liu, M. Niazi, and S. Linkman, “Systematic Literature Reviews in
“Development and Evaluation of Emerging Design Patterns for Software Engineering—A Tertiary Study,” Information & Software
Ubiquitous Computing,” Proc. Conf. Designing Interactive Systems, Technology, vol. 52, pp. 792-805, 2010.
pp. 233-242, 2004. [40] B.A. Kitchenham, D. Budgen, and O. Pearl Brereton, “Using
[16] Cinnéide and P. Fagan, “Design Patterns: The Devils in the
M.O. Mapping Studies as the Basis for Further Research—A Partici-
Detail,” Proc. Conf. Pattern Languages of Programs, 2006. pant-Observer Case Study,” Information & Software Technology,
[17] F. Détienne, Software Design—Cognitive Aspects, practitioner series. special section from EASE, vol. 53, no. 4, pp. 638-651, 2011.
Springer, 2002. [41] C. Kohls and K. Scheiter, “The Relation between Design Patterns
[18] T. Dybå, B.A. Kitchenham, and M. Jørgensen, “Evidence-Based and Schema Theory,” Proc. 15th Conf. Pattern Languages of
Software Engineering for Practitioners,” IEEE Software, vol. 22, Programs, pp. 1-14, 2008.
no. 1, pp. 58-65, Jan./Feb. 2005.
[19] M. Jørgensen, T. Dybå, and T. Dingsøyr, “Empirical Studies of [42] R. Murray Lindsay and A.S.C. Ehrenberg, “The Design of
Agile Software Development: A Systematic Review,” Information Replicated Studies,” The Am. Statistician, vol. 47, no. 3, pp. 217-
& Software Technology, vol. 50, pp. 833-859, 2008. 228, 1993.
[20] B. Ellis, J. Stylos, and B. Myers, “The Factory Pattern in API [43] G. Masuda, N. Sakamoto, and K. Ushijima, “Applying Design
Design: A Usability Evaluation,” Proc. 29th Int’l Conf. Software Patterns to Decision Tree Learning System,” Proc. Sixth ACM
Eng., pp. 302-311, 2007. SIGSOFT Int’l Symp. Foundations of Software Eng., pp. 111-120,
[21] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: 1998.
Elements of Reusable Object-Oriented Software. Addison-Wesley, [44] P. Mohagheghi and V. Dehlen, “Where Is the Proof?—A Review of
1995. Experiences from Applying MDE in Industry,” Proc. Fourth
[22] M. Gattrell, S. Counsell, and T. Hall, “Design Patterns and Change European Conf. Model Driven Architecture: Foundations and Applica-
Proneness: A Replication Using Proprietary Csharp Software,” tions, pp. 432-443, 2008.
Proc. 16th Working Conf. Reverse Eng., pp. 160-164, 2009. [45] T.H. Ng, S.C Cheung, W.K. Chan, and Y.T. Yu, “Toward Effective
[23] R.L. Glass, V. Ramesh, and I. Vessey, “An Analysis of Research in Deployment of Design Patterns for Software Extension: A Case
Computing Disciplines,” Comm. ACM, vol. 47, pp. 89-94, June 2004. Study,” Proc. Int’l Workshop Software Quality, pp. 51-56, 2006.
[46] T.H. Ng, S.C. Cheung, W.K. Chan, and Y.T. Yu, “Do Maintainers Cheng Zhang received the MSc and PhD
Utilize Deployed Design Patterns Effectively?” Proc. 29th Int’l degrees from the University of Durham in 2006
Conf. Software Eng., pp. 168-177, 2007. and 2011, respectively. His research interests
[47] M. Di Penta, L. Cerulo, Y.-G. Guéhéneuc, and G. Antoniol, “An include the study of how software design
Empirical Study of the Relationships between Design Patterns patterns are used, empirical software engineer-
Roles and Class Change Proneness,” Proc. IEEE Int’l Conf. Software ing, and the use of evidence-based software
Maintenance, pp. 217-226, 2008. engineering techniques.
[48] M. Petticrew and H. Roberts, Systematic Review in the Social
Sciences: A Practical Guide. Blackwell Publishing, 2006.
[49] L. Prechelt and B. Unger, “A Series of Controlled Experiments on
Design Patterns: Methodology and Results,” Proc. Softwaretechnik
’98, 1998. David Budgen received the BSc and PhD
[50] L. Prechelt, B. Unger, and M. Philippsen, “Documenting Design degrees in theoretical physics from the Univer-
Patterns in Code Eases Program Maintenance,” Proc. ICSE sity of Durham in 1969 and 1973, respectively.
Workshop Process Modelling and Empirical Studies of Software He is a professor of software engineering in the
Evolution, pp. 72-76, 1997. School of Engineering & Computing Sciences at
[51] L. Prechelt, B. Unger, W.F. Tichy, P. Brossler, and L.G. Votta, “A
the University of Durham, United Kingdom. His
Controlled Experiment in Maintenance Comparing Design Pat-
current research interests include software de-
terns to Simpler Solutions,” IEEE Trans. Software Eng., vol. 27, sign, design of service-based systems, software
no. 12, pp. 1133-1144, Dec. 2001. development environments, and empirical soft-
[52] L. Prechelt, B. Unger-Lamprecht, M. Philippsen, and W.F. Tichy, ware engineering, with a particular emphasis
“Two Controlled Experiments Assessing the Usefulness of Design upon evidence-based software engineering. He is the author of Software
Pattern Documentation in Program Maintenance,” IEEE Trans.
Design (second edition, Pearson Addison Wesley) and of more than
Software Eng., vol. 28, no. 6, pp. 595-606, June 2002.
100 refereed publications on software engineering topics. He is a
[53] R. Pretorius and D. Budgen, “A Mapping Study on Empirical member of the IEEE Computer Society, ACM, and IET.
Evidence Related to the Models and Forms Used in the UML,”
Proc. ACM/IEEE Second Int’l Symp. Empirical Software Eng. and
Measurement, pp. 342-344, 2008.
[54] H.J. Rittel and M.M. Webber, “Planning Problems Are Wicked . For more information on this or any other computing topic,
Problems,” Developments in Design Methodology, N. Cross, ed., please visit our Digital Library at www.computer.org/publications/dlib.
pp. 135-144, Wiley, 1984.
[55] D. Schmidt, “Using Design Patterns to Develop Reusable Object-
Oriented Communication Software,” Comm. ACM, vol. 38, no. 10,
pp. 65-74, 1995.
[56] I. Sommerville, Software Engineering, eighth ed. Addison-Wesley,
2007.
[57] M. Torchiano, “Documenting Pattern Use in Java Programs,” Proc.
Int’l Conf. Software Maintenance, pp. 230-233, 2002.
[58] B. Unger and W. Tichy, “Do Design Patterns Improve Commu-
nication? An Experiment with Pair Design,” Proc. Int’l Workshop
Empirical Studies of Software Maintenance, pp. 1-5, 2000.
[59] M. Vocak, W.F Tichy, D.I.K. Sjøberg, E. Arisolm, and M. Aldrin,
“A Controlled Experiment Comparing the Maintainability of
Programs Designed with and without Design Patterns—A
Replication in a Real Programming Environment,” Empirical
Software Eng., vol. 9, pp. 149-195, 2004.
[60] P. Wendorff, “Assessment of Design Patterns during Software
Reengineering: Lessons Learned from a Large Commercial
Project,” Proc. Fifth European Conf. Software Maintenance and Reeng.,
pp. 77-84, 2001.
[61] K.N. Whitley, “Visual Programming Languages and the Empiri-
cal Evidence For and Against,” J. Visual Languages and Comput-
ing, vol. 8, pp. 109-142, 1997.
[62] L. Williams and A. Cockburn, “Agile Software Development: It’s
about Feedback and Change,” Computer, vol. 36, no. 4, pp. 39-43,
Apr. 2003.
[63] B. Wydaeghe, K. Verschaeve, B. Michiels, B. Van Damme, E.
Archens, and V. Jonckers, “Building an OMT-Editor Using Design
Patterns: An Experience Report,” Proc. Conf. Technology of Object-
Oriented Languages, 1998.
[64] R. Yin, Case Study Research: Design & Methods, third ed. Sage
Books, 2003.
[65] W. Yuanhong, M. Hong, and S. Weizhong, “Experience Report:
Using Design Patterns in the Development of JB System,” Proc.
Technology of Object Oriented Languages and Systems, pp. 159-165,
1997.
[66] C. Zhang and D. Budgen, “A Survey of Experienced User
Perceptions about Design Patterns,” submitted for publication,
2011.

What Do We Know About The Effectiveness of Software Design Patterns

Uploaded by

What Do We Know About The Effectiveness of Software Design Patterns

Uploaded by

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO.

5, SEPTEMBER/OCTOBER 2012 1213

What Do We Know about the Effectiveness

Index Terms—Design patterns, systematic literature review, empirical software engineering

A LTHOUGH software engineering is intrinsically an

The evidence-based paradigm has become strongly TABLE 1

. The interventions used in primary studies in clinical

. Few papers that describe themselves as conducting a

TABLE 15 other methods. Table 15 summarizes the main features of

. Exclusion on title and abstract. Their key criterion here

You might also like