Black and William Assessment
Black and William Assessment
Black and William Assessment
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=pdki.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Phi Delta Kappa International is collaborating with JSTOR to digitize, preserve and extend access to The Phi
Delta Kappan.
http://www.jstor.org
Inside the Black Box
byA.GJ. arcc
Illustration e s O C T O B E R 1998 139
ning and management; and more frequent ing: formative assessment. But we will show mary here, our text will appear strong on
and thorough inspection are all means to that this feature is at the heart of effective assertions and weak on the details of their
ward the same end. But the sum of all teaching. justification. We maintain that these as
these reforms has not added up to an effec sertions are backed by evidence and that
tivepolicy because somethingismissing. this backing is set out in full detail in the
The Argument lengthy review on which this article is
Learning is driven by what teachers and
pupils do in classrooms. Teachers have to We start from the self-evident propo founded.
manage complicated and demanding situ sition that teaching and learning must be We believe that the three sections be
ations,channelingthepersonal,emotion interactive.Teachersneed toknow about low establish a strong case that govern
al, and social pressures of a group of 30 theirpupils'progressanddifficultieswith ments, theiragencies, school authorities,
or more youngsters in order to help them learning so that they can adapt their own and the teachingprofession should study
learnimmediatelyandbecomebetterlearn work tomeet pupils' needs - needs that very carefullywhether theyare seriously
ers in the future. Standards can be raised are often unpredictable and that vary from interested in raising standards in educa
only if teachers can tackle this task more one pupil to another. Teachers can find out tion.However,we also acknowledgewide
effectively. What ismissing from the ef what they need to know in a variety of spreadevidence thatfundamentalchange
forts alluded to above is any direct help ways, includingobservation and discus in education can be achieved only slowly
with this task. This fact was recognized sion in the classroom and the reading of - through programs of professional de
in the TIMSS video study: "A focus on pupils'writtenwork. velopment that build on existing good prac
standards and accountability that ignores We use the general term assessment to tice. Thus we do not conclude that forma
the processes of teaching and learning in refer to all those activitiesundertakenby tive assessment is yet another "magic bul
classrooms will not provide the direction teachers- and by their students in assess let"foreducation.The issues involvedare
that teachers need in their quest to im ing themselves - thatprovideinformation too complex and too closely linked to both
prove."' to be used as feedback tomodify teaching thedifficulties of classroompractice and
In termsof systemsengineering,pres and learningactivities. Such assessment the beliefs that drive public policy. In a fi
ent policies in the U.S. and inmany oth becomes formative assessmentwhen the nal section,we confront this complexity
er countries seem to treat the classroom evidence is actually used to adapt the teach and try to sketch out a strategy for acting
as a black box. Certain inputs from the ing tomeet studentneeds.2 on our evidence.
outside - pupils, teachers, other resour There is nothing new about any of this.
ces,management rules and requirements, All teachers make assessments in every Does Improving Formative
parental anxieties, standards, testswith high class they teach. But there are three im
Assessment Raise Standards?
stakes, and so on - are fed into the box. portant questions about this process that
Some outputs are supposed to follow: pu we seek to answer: A researchreviewpublished in 1986,
pils who aremore knowledgeable and com * Is there evidence that improving for concentratingprimarilyon classroom as
petent, better test results, teachers who are mative assessment raises standards? sessment work for children with mild hand
reasonably satisfied, and so on. But what * Is there evidence that there is room icaps, surveyed a large number of innova
is happening inside the box? How can any for improvement? tions,fromwhich 23were selected.4 Those
one be sure that a particular set of new in * Is there evidence about how to im chosen satisfied the condition that quan
puts will produce better outputs ifwe don't prove formative assessment? titative evidence of learning gains was ob
at least study what happens inside? And In setting out to answer these questions, tained, both for those involved in the in
why is it that most of the reform initia we have conducted an extensive survey of novation and for a similar group not so in
tives mentioned in the first paragraph are the research literature. We have checked volved. Since then,many more papers have
not aimed at giving direct help and support through many books and through the past been publisheddescribing similarlycare
to the work of teachers in classrooms? nine years' worth of issues of more than ful quantitative experiments. Our own re
The answer usually given is that it is 160 journals, and we have studied earlier view has selected at least 20 more studies.
up to teachers: they have to make the in reviewsof research.This process yielded (The number depends on how rigorous a
side work better. This answer is not good about 580 articles or chapters to study.We set of selection criteria are applied.) All
enough, for two reasons. First, it is at least prepared a lengthy review, using materi these studies show that innovations that in
possible that some changes in the inputs al from 250 of these sources, that has been clude strengthening the practice of forma
may be counterproductive andmake ithard published in a special issue of the journal tive assessment produce significant and of
erfor teachers to raise standards. Second, Assessment inEducation, togetherwith ten substantial learning gains. These studies
it seems strange, even unfair, to leave the comments on our work by leading edu range over age groups from 5-year-olds to
most difficult piece of the standards-rais cational experts from Australia, Switzer university undergraduates,across several
ing puzzle entirelyto teachers.If thereare land,Hong Kong, Lesotho, and theU.S.3 school subjects, and over several coun
ways inwhich policymakers and others The conclusionwe have reachedfrom tries.
can give direct help and support to the our researchreview is that the answer to For researchpurposes, learninggains
everydayclassroom taskof achievingbet each of the threequestions above is clear of this type aremeasured by comparing
ter learning,thensurely theseways ought ly yes. In the threemain sections below, theaverageimprovementsin thetestscores
tobe pursuedvigorously. we outline thenatureand forceof theev of pupils involved in an innovationwith
This article is about the inside of the idencethatjustifies thisconclusion.How the rangeof scores thatare found for typ
blackbox.We focuson one aspectof teach ever, because we are presenting a sum ical groupsof pupils on these same tests.
quality in relation to learning. no more than devote a tiny fraction of its ilar to those of the external tests in the
The second issue is negative impact. resources to such work."2Most of the avail UnitedKingdom.Moreover, the tradition
*The giving of marks and the grading able resources and most of the public and al reliance on multiple-choice testing in
function are overemphasized,while the politicalattentionwere focusedon nation theU.S. - not shared in theUnited King
giving of useful advice and the learning al externaltests.While teachers'contribu dom - has exacerbated the negative ef
function are underemphasized. tions to these "summativeassessments" fects of such policies on the quality of class
*Approaches are used inwhich pupils have been given some formal status, hard roomlearning.
are compared with one another, the prime ly any attention has been paid to their con
purpose of which seems to them to be tributionsthroughformativeassessment. How Can We Improve
competitionratherthanpersonal improve Moreover, the problems of the relation
Formative Assessment?
ment; inconsequence,assessmentfeedback shipbetween teachers'formativeand sum
teaches low-achieving pupils that they lack mative roles have receivedno attention. The self-esteem of pupils.A reportof
"ability," causing them to come to believe It is possible that many of the com schools in Switzerland states that "a num
that they are not able to learn. mitments were stated in the belief that for ber of pupils ... are content to 'get by.'. . .
The third issue is the managerial role mative assessmentwas not problematic, Every teacher who wants to practice for
of assessments. that it already happened all the time and mative assessment must reconstruct the
*Teachers' feedback to pupils seems needed no more than formal acknowledg teaching contracts so as to counteract the
to serve social and managerial functions, ment of its existence. However, it is also habits acquired by his pupils.""4
often at the expense of the learning func clear that the political commitment to ex The ultimateuser of assessment infor
tion. ternaltesting inorder topromote compe mation that is elicited in order to improve
*Teachers are often able to predict pu tition had a central priority, while the com learning is the pupil. There are negative
pils' resultson externaltestsbecause their mitment to formativeassessmentwasmar and positive aspects of this fact. The neg
own tests imitate them, but at the same ginal.As researcherstheworld over have ative aspect is illustrated by the preceding
When theclassroomculturefo
time teachers know too little about their found, high-stakes external tests always quotation.
pupils' learningneeds. dominate teaching and assessment. How cuses on rewards, "gold stars," grades, or
*The collection of marks to fill in rec ever, they give teachers poor models for class ranking, then pupils look for ways
ords is given higher priority than the anal formativeassessmentbecauseof theirlim to obtain the best marks rather than to im
ysis of pupils' work to discern learning itedfunctionof providingoverall summa prove theirlearning.One reportedconse
needs; furthermore,some teacherspay ries of achievement rather than helpful di quence is that,when they have any choice,
no attention to the assessment records of agnosis. Given this fact, it is hardly sur pupils avoid difficult tasks. They also spend
theirpupils' previous teachers. prising that numerous research studies of time and energy looking for clues to the
Of course, not all these descriptions the implementation of the education re "right answer." Indeed, many become re
apply to all classrooms. Indeed, there are forms in the United Kingdom have found luctant to ask questions out of a fear of
many schools and classrooms to which thatformativeassessment is "seriouslyin failure.Pupilswho encounterdifficulties
they do not apply at all. Nevertheless, these need of development.""3 With hindsight, are led to believe that they lack ability,
general conclusions have been drawn by re we can see that the failure to perceive the and this belief leads them to attribute their
searchers who have collected evidence - need for substantialsupportfor formative difficulties to a defect in themselves about
throughobservation,interviews,andques assessment and to take responsibility for which they cannot do a great deal. Thus
tionnaires - from schools in several coun developing such support was a serious er they avoid investing effort in learning that
tries,includingtheU.S. ror. can lead only to disappointment, and they
An empty commitment. The devel In theU.S. similar pressures have been try to build up their self-esteem in other
opment of national assessment policy in felt frompoliticalmovements character ways.
England and Wales over the last decade ized by a distrust of teachers and a belief The positive aspect of students'bping
illustrates the obstacles that stand in the that external testing will, on its own, im the primary users of the information gleaned
way of developing policy support for for prove learning.Such fracturedrelation from formative assessments is that nega
mative assessment.The recommendations ships between policy makers and the teach tive outcomes - such as an obsessive fo
of a government ing profession are not inevitable -
task force in 198811 and indeed, cus on competition and the attendant fear
all subsequentstatementsof government many countries with enviable educational of failure on the part of low achievers -
policy have emphasized the importance of achievements seem tomanage well with are not inevitable. What is needed is a cul
formative assessment by teachers. How policies that show greater respect and sup ture of success, backed by a belief that all
ever, the body charged with carrying out port for teachers. While the situation in pupils can achieve. In this regard, forma
government policy on assessment had no the U.S. is far more diverse than that in tive assessment can be a powerful weapon
strategy either to study or to develop the England and Wales, the effects of high if it is communicated in the right way.
formative assessment of teachers and did stakes state-mandated testing are very sim While formative assessment can help all
OCTOBER
1998 143
Tests given in class and tests and other exercises assigned
for homework are also importantmeans of promotingfeedback.
of questions and by accepting answers teacher. Feedback has been shown to im routines, for any such change is uncom
from a few, can keep the lesson going but prove learning when it gives each pupil fortable, and emphasis on the challenge
is actually out of touch with the under specific guidance on strengths and weak to think for yourself (and not just towork
standing of most of the class. The ques nesses, preferably without any overall harder) can be threatening tomany. Pupils
tion/answer dialogue becomes a ritual, marks. Thus the way inwhich test results cannot be expected to believe in the value
one inwhich thoughtful involvement suf are reported to pupils so that they can of changes for their learning before they
fers. identify their own strengths and weak have experienced the benefits of such chang
There are several ways to break this nesses is critical. Pupils must be given the es.Moreover, many of the initiatives that
particular cycle. They involve giving pu means and opportunities towork with ev are needed take more class time, particu
pils time to respond; asking them to dis idence of their difficulties. For formative larly when a central purpose is to change
cuss their thinking in pairs or in small purposes, a test at the end of a unit or teach the outlook on learning and the working
groups, so that a respondent is speaking ing module is pointless; it is too late to methods of pupils. Thus teachers have to
on behalf of others; giving pupils a choice work with the results. We conclude that take risks in the belief that such invest
between different possible answers and thefeedback o01 tests, seatwork, and home ment of time will yield rewards in the fu
asking them to vote on the options; ask work shouild give each pupil guidance on ture,while "delivery" and "coverage" with
ing all of them to write down an answer how to improve, and each pupil must be poor understanding are pointless and can
and then reading out a selected few; and given help and an opportunity towork on even be harmful.
so on. What is essential is that any dia the improvement. Teachers must deal with two basic is
logue should evoke thoughtful reflection All these points make clear that there sues that are the source of many of the
in which all pupils can be encouraged to is no one simple way to improve forma problems associated with changing to a
take part, for only then can the formative tive assessment. What is common to them system of formative assessment. The first
process start to work. In short, the dia is that a teacher's approach should start by is the nature of each teacher's beliefs about
logue between pupils and a teacher should being realistic and confronting the ques learning. If the teacher assumes that knowl
be thoughtful, reflective, focused to evoke tion "Do I really know enough about the edge is to be transmitted and learned, that
and explore understanding, and conduct understanding of my pupils to be able to understanding will develop later, and that
ed so that all pupils have an opportunity help each of them?" clarity of exposition accompanied by re
to think and to express their ideas. Much of the work teachers must do to wards for patient reception are the essen
Tests given in class and tests and oth make good use of formative assessment tials of good teaching, then formative as
er exercises assigned for homework are can give rise to difficulties. Some pupils sessment is hardly necessary. However,
also important means of promoting feed will resist attempts to change accustomed most teachers accept the wealth of evi
back. A good test can be an occasion for
learning. It is better to have frequent short
tests than infrequent long ones. Any new
learning should first be tested within about
a week of a first encounter, but more fre
quent tests are counterproductive. The qual
ity of the test items that is, their rele
vance to themain learning aims and their
clear communication to the pupil - re
quires scrutiny as well. Good questions
are hard to generate, and teachers should
collaborate and draw on outside sources
to collect such questions.
Given questions of good quality, it is
essential to ensure the quality of the feed
back. Research studies have shown that,
if pupils are given only marks or grades,
they do not benefit from the feedback. The
worst scenario is one in which some pu
pils who get low marks this time also got -
v AF7 Al
low marks last time and come to expect
to get low marks next time. This cycle of "It has been said that a fool can ask more questions than a wise man can an
repeated failure becomes part of a shared swer
belief between such students and their
144 PHIDELTAKA1'PAN
dence that this transmission model does tegral part of each pupil's learning work. ised by the research evidence are to be se
not work, even when judged by its own It follows from this view that several cured, each teacher must find his or her
criteria, and so are willing tomake a com changes are needed. First, policy ought to own ways of incorporating the lessons
mitment to teaching through interaction. start with a recognition that the prime lo and ideas set out above into his or her own
Formative assessment is an essential com cus for raising standards is the classroom, patterns of classroom work and into the
ponent of such instruction.We do not mean so that the overarching priority has to be cultural norms and expectations of a par
to imply that individualized, one-on-one the promotion and support of change with ticular school community."7 This process
teaching is the only solution; rather we in the classroom. Attempts to raise stan is a relatively slow one and takes place
mean that what is needed is a classroom dards by reforming the inputs to and meas through sustained programs of profession
culture of questioning and deep thinking, uring the outputs from the black box of al development and support. This fact does
inwhich pupils learn from shared discus the classroom can be helpful, but they are not weaken the message here; indeed, it
sions with teachers and peers. What emerg not adequate on their own. Indeed, their should be seen as a sign of its authentic
es very clearly here is the indivisibility of helpfulness can be judged only in light of ity, for lasting and fundamental improve
instruction and formative assessment prac their effects in classrooms. ments in teaching and learning must take
tices. The evidence we have presented here place in this way. A recent international
The other issue that can create prob establishes that a clearly productive way study of innovation and change in educa
lems for teachers who wish to adopt an to start implementing a classroom-focused tion, encompassing 23 projects in 13 mem
interactive model of teaching and learning policy would be to improve formative as ber countries of the Organisation for Eco
relates to the beliefs teachers hold about sessment. This same evidence also estab nomic Co-operation and Development, has
the potential of all their pupils for learn lishes that in doing so we would not be con arrived at exactly the same conclusion with
ing. To sharpen the contrast by overstat centrating on some minor aspect of the regard to effective policies for change.'8
ing it, there is on the one hand the "fixed business of teaching and learning. Rather, Such arguments lead us to propose a four
I.Q." view - a belief that each pupil has we would be concentrating on several es point scheme for teacher development.
a fixed, inherited intelligence that cannot sential elements: the quality of teacher! 1. Learningfrom development. Teach
be altered much by schooling. On the oth pupil interactions, the stimulus and help ers will not take up ideas that sound at
er hand, there is the "untapped potential" for pupils to take active responsibility for tractive, no matter how extensive the re
view - a belief that starts from the as their own learning, the particular help need search base, if the ideas are presented as
sumption that so-called ability is a com ed tomove pupils out of the trap of "low general principles that leave the task of
plex of skills that can be learned. Here, achievement," and the development of the translating them into everyday practice en
we argue for the underlying belief that all habits necessary for all students to be tirely up to the teachers. Their classroom
pupils can learn more effectively if one come lifelong learners. Improvements in lives are too busy and too fragile for all
can clear away, by sensitive handling, the formative assessment, which are within but an outstanding few to undertake such
obstacles to learning, be they cognitive fail the reach of all teachers, can contribute work. What teachers need is a variety of
ures never diagnosed or damage to person substantially to raising standards in all living examples of implementation, as prac
al confidence or a combination of the two. these ways. ticed by teachers with whom they can iden
Clearly the truth lies between these two Four steps to implementation. If we tify and from whom they can derive the
extremes, but the evidence is that ways of accept the argument outlined above, what confidence that they can do better. They
managing formative assessment that work needs to be done? The proposals outlined need to see examples of what doing bet
with the assumptions of "untapped poten below do not follow directly from our termeans in practice.
tial" do help all pupils to learn and can analysis of assessment research. They are So changing teachers' practice cannot
give particular help to those who have consistent with itsmain findings, but they begin with an extensive program of train
previously struggled. also call on more general sources for guid ing for all; that could be justified only if
ance.16 it could be claimed that we have enough
At one extreme, one might call formore "trainers" who know what to do, which is
Policy and Practice research to find out how best to carry out certainly not the case. The essential first
Changing the policy perspective. The such work; at the other, one might call for step is to set up a small number of local
assumptions that drive national and state an immediate and large-scale program, with groups of schools - some primary, some
policies for assessment have to be called new guidelines that all teachers should put secondary, some inner-city, some from out
into question. The promotion of testing as into practice. Neither of these alternatives er suburbs, some rural- with each school
an important component for establishing is sensible: while the first is unnecessary committed both to a school-based devel
a competitive market in education can be because enough is known from the results opment of formative assessment and to
very harmful. The more recent shifting of of research, the second would be unjusti collaboration with other schools in its lo
emphasistowardsettingtargetsforall,with fied because not enough is known about cal group. In such a process, the teachers
assessmentprovidinga touchstonetohelp classroom practicalities in thecontextof in theirclassroomswill be working out
check pupils' attainments,is amore ma any one country's schools. theanswers tomany of thepracticalques
tureposition.However, we would argue Thus the improvement of formativeas tions thattheevidencepresentedhere can
that there is a need now tomove further, sessmentcannotbe a simplematter.There not answer.They will be reformulating
tofocus on the insideof the "blackbox" isno quick fix thatcan alterexistingprac the issues, perhaps in relation to funda
and so to explore thepotential of assess tice by promising rapid rewards.On the mental insightsandcertainlyin termsthat
ment to raise standardsdirectlyas an in contrary,if the substantialrewardsprom make sense to theirpeers in other class
THEFOURTHINTERNATIONAL|
TEACHINGFOR INTELLIGENCE
CONFERENCE
InApril 1998, theworld's educational leaders on teaching and instructiongathered to discuss student
achievement. More than2,200 educators from across the nation came to New YorkCity to focus on strategies
and practices thatpromote teaching for intelligence.
incompetent in tackling a problem under such an effort, although success would National Curriculum Science Policy for the 5-14
test conditions can look quite different in clearly depend on cooperation among gov Age Range: Findings and Interpretations from aNa
tional Evaluation Study in England," International
the more realistic conditions of an every ernment agencies, academic researchers, Journal vol.
of Science Education, 17, 1995, pp.
day encounter with an equivalent problem. and school-basededucators. 481-92.
Indeed,theconditionsunderwhich formal The main plank of our argument is that 14. Phillipe Perrenoud, "Towards a Pragmatic Ap
tests are taken threaten validity because standards can be raised only by changes proach to Formative Evaluation," in Penelope Wes
they are quite unlike those of everyday per that are put into direct effect by teachers ton, ed., Assessment of Pupils' Achievement: Moti
vation and School Success (Amsterdam: Swets and
formance. An outstanding example here is and pupils in classrooms. There is a body
Zeitlinger, 1991), p. 92.
that collaborative work is very important of firm evidence that formative assess 15. D. Royce Sadler, "Formative Assessment and
in everyday life but is forbidden by current ment is an essential component of class the Design of Instructional Systems," Instructional
norms of formal testing.2'These points open room work and that its development can Science, vol. 18, 1989, pp. 119-44.
up wider arguments about assessment sys raise standards of achievement. We know 16. Paul J. Black and J.Myron Atkin, Changing the
Innovations in Science, Mathematics, and
tems as awhole - arguments that are be of no other way of raising standards for Subject:
Technology Education (London: Routledge for the
yond the scope of this article. which such a strong prima facie case can and De
Organisation for Economic Co-operation
4. Research. It is not difficult to set out be made. Our plea is that national and state velopment, 1996); and Michael G. Fullan, with
a list of questions that would justify fur policy makers will grasp this opportuni Suzanne Stiegelbauer, The New Meaning of Educa
ther research in this area. Although there ty and take the lead in this direction. tional Change (London: Cassell, 1991).
are many and varied reports of successful 17. See Stigler and Hiebert, pp. 19-20.
18. Black and Atkin, op. cit.
innovations, they generally fail to give clear 1. James W. Stigler and James Hiebert, "Understand
Instruc 19. Peter Johnston et al., "Assessment of Teaching
accounts of one or another of the impor ing and Improving Classroom Mathematics
tion: An Overview of the TIMSS Video Study," Phi and Learning in Literature-Based Classrooms," Teach
tant details. For example, they are often vol. 11,1995, p. 359.
Delta Kappan, September 1997, pp. 19-20. ing and Teacher Education,
silent about the actual classroom methods no term here. 20. Dylan Wiliam and Paul Black, "Meanings and
2. There is internationally agreed-upon
used, themotivation andexperienceof the "Classroom evaluation," "classroom assessment," "in Consequences: A Basis for Distinguishing Forma
teachers, the nature of the tests used as ternal assessment," "instructional assessment," and "stu tive and Summative Functions of Assessment," Brit
measures of success, or theoutlooks and dent assessment" have been used by different authors, ish Educational Research Journal, vol. 22, 1996,
expectationsof thepupils involved. and some of these terms have different meanings in pp. 537-48.
different texts. 21. These points are developed in some detail in
However,while there is ample justifi 3. Paul Black and Dylan Wiliam, "Assessment and Sam Wineburg, "T. S. Eliot, Collaboration, and the
cation forproceedingwith carefully for Classroom Learning," Assessment inEducation, March Quandaries of Assessment in a Rapidly Changing
mulated projects,we do not suggest that 1998, pp. 7-74. World," Phi Delta Kappan, September 1997, pp. 59
everyone else shouldwait for theircon 4. Lynn S. Fuchs and Douglas Fuchs, "Effects of 65. IC