Inside the Black Box: Raising Standards through Classroom Assessment

Author(s): Paul Black and Dylan Wiliam

Source: The Phi Delta Kappan, Vol. 80, No. 2 (Oct., 1998), pp. 139-144, 146-148
Published by: Phi Delta Kappa International
Inside the Black Box

Raising Standards Throgh Classroom Assessment



Firm evidence shows that

formative assessment is an
essential component of
classroomwork and that its
development can raise
standards of achievement, Mr.
Black andMr. Wiliam point
out. Indeed, theyknow of no
otherway of raising standards
for which sucha strongprima
facie case can be made.
R_ AISING the standards of learn
ing that are achieved through
schooling is an important nation
al priority. In recent years, gov
ernments throughout the world
have been more and more vigorous inmak
ing changes in pursuit of this aim. Nation
al, state, and district standards; target set
ting; enhanced programs for the external
testing of students' performance; surveys
such as NAEP (National Assessment of
EducationalProgress)andTIMSS (Third
InternationalMathematics and Science
Study); initiativesto improveschoolplan

PAUL BLACK isprofessor emeritus in the

School of Education, King's College, London,
where DYLAN WILIAM is head of school and
professor of educational assessment.

byA.GJ. arcc
ning and management; and more frequent ing: formative assessment. But we will show mary here, our text will appear strong on
and thorough inspection are all means to that this feature is at the heart of effective assertions and weak on the details of their
ward the same end. But the sum of all teaching. justification. We maintain that these as
these reforms has not added up to an effec sertions are backed by evidence and that
tivepolicy because somethingismissing. this backing is set out in full detail in the
The Argument lengthy review on which this article is
Learning is driven by what teachers and
pupils do in classrooms. Teachers have to We start from the self-evident propo founded.
manage complicated and demanding situ sition that teaching and learning must be We believe that the three sections be
ations,channelingthepersonal,emotion interactive.Teachersneed toknow about low establish a strong case that govern
al, and social pressures of a group of 30 theirpupils'progressanddifficultieswith ments, theiragencies, school authorities,
or more youngsters in order to help them learning so that they can adapt their own and the teachingprofession should study
learnimmediatelyandbecomebetterlearn work tomeet pupils' needs - needs that very carefullywhether theyare seriously
ers in the future. Standards can be raised are often unpredictable and that vary from interested in raising standards in educa
only if teachers can tackle this task more one pupil to another. Teachers can find out tion.However,we also acknowledgewide
effectively. What ismissing from the ef what they need to know in a variety of spreadevidence thatfundamentalchange
forts alluded to above is any direct help ways, includingobservation and discus in education can be achieved only slowly
with this task. This fact was recognized sion in the classroom and the reading of - through programs of professional de
in the TIMSS video study: "A focus on pupils'writtenwork. velopment that build on existing good prac
standards and accountability that ignores We use the general term assessment to tice. Thus we do not conclude that forma
the processes of teaching and learning in refer to all those activitiesundertakenby tive assessment is yet another "magic bul
classrooms will not provide the direction teachers- and by their students in assess let"foreducation.The issues involvedare
that teachers need in their quest to im ing themselves - thatprovideinformation too complex and too closely linked to both
prove."' to be used as feedback tomodify teaching thedifficulties of classroompractice and
In termsof systemsengineering,pres and learningactivities. Such assessment the beliefs that drive public policy. In a fi
ent policies in the U.S. and inmany oth becomes formative assessmentwhen the nal section,we confront this complexity
er countries seem to treat the classroom evidence is actually used to adapt the teach and try to sketch out a strategy for acting
as a black box. Certain inputs from the ing tomeet studentneeds.2 on our evidence.
outside - pupils, teachers, other resour There is nothing new about any of this.
ces,management rules and requirements, All teachers make assessments in every Does Improving Formative
parental anxieties, standards, testswith high class they teach. But there are three im
Assessment Raise Standards?
stakes, and so on - are fed into the box. portant questions about this process that
Some outputs are supposed to follow: pu we seek to answer: A researchreviewpublished in 1986,
pils who aremore knowledgeable and com * Is there evidence that improving for concentratingprimarilyon classroom as
petent, better test results, teachers who are mative assessment raises standards? sessment work for children with mild hand
reasonably satisfied, and so on. But what * Is there evidence that there is room icaps, surveyed a large number of innova
is happening inside the box? How can any for improvement? tions,fromwhich 23were selected.4 Those
one be sure that a particular set of new in * Is there evidence about how to im chosen satisfied the condition that quan
puts will produce better outputs ifwe don't prove formative assessment? titative evidence of learning gains was ob
at least study what happens inside? And In setting out to answer these questions, tained, both for those involved in the in
why is it that most of the reform initia we have conducted an extensive survey of novation and for a similar group not so in
tives mentioned in the first paragraph are the research literature. We have checked volved. Since then,many more papers have
not aimed at giving direct help and support through many books and through the past been publisheddescribing similarlycare
to the work of teachers in classrooms? nine years' worth of issues of more than ful quantitative experiments. Our own re
The answer usually given is that it is 160 journals, and we have studied earlier view has selected at least 20 more studies.
up to teachers: they have to make the in reviewsof research.This process yielded (The number depends on how rigorous a
side work better. This answer is not good about 580 articles or chapters to study.We set of selection criteria are applied.) All
enough, for two reasons. First, it is at least prepared a lengthy review, using materi these studies show that innovations that in
possible that some changes in the inputs al from 250 of these sources, that has been clude strengthening the practice of forma
may be counterproductive andmake ithard published in a special issue of the journal tive assessment produce significant and of
erfor teachers to raise standards. Second, Assessment inEducation, togetherwith ten substantial learning gains. These studies
it seems strange, even unfair, to leave the comments on our work by leading edu range over age groups from 5-year-olds to
most difficult piece of the standards-rais cational experts from Australia, Switzer university undergraduates,across several
ing puzzle entirelyto teachers.If thereare land,Hong Kong, Lesotho, and theU.S.3 school subjects, and over several coun
ways inwhich policymakers and others The conclusionwe have reachedfrom tries.
can give direct help and support to the our researchreview is that the answer to For researchpurposes, learninggains
everydayclassroom taskof achievingbet each of the threequestions above is clear of this type aremeasured by comparing
ter learning,thensurely theseways ought ly yes. In the threemain sections below, theaverageimprovementsin thetestscores
tobe pursuedvigorously. we outline thenatureand forceof theev of pupils involved in an innovationwith
This article is about the inside of the idencethatjustifies thisconclusion.How the rangeof scores thatare found for typ
blackbox.We focuson one aspectof teach ever, because we are presenting a sum ical groupsof pupils on these same tests.


The ratio of the former divided by the lat ularly important.
pupilswho ly, the results have to be used to adjust
ter is known as the effect size. Typical ef come to see themselves as unable to learn teaching and learning; thus a significant
fect sizes of the formative assessment ex usually cease to take school seriously. Many aspect of any program will be the ways in
periments were between 0.4 and 0.7. These become disruptive; others resort to tru which teachers make these adjustments.
effect sizes are larger than most of those ancy. Such young people are likely to be *The ways in which assessment can
found for educational interventions. The alienated from society and to become the affect the motivation and self-esteem of
following examples illustrate some prac sources and the victims of serious social pupils and the benefits of engaging pupils
tical consequences of such large gains. problems. in self-assessment deserve careful atten
*An effect size of 0.4 would mean that Thus it seems clear that very significant tion.
the average pupil involved in an innova learning gains lie within our grasp. The
tion would record the same achievement fact that such gains have been achieved by
as a pupil in the top 35% of those not so a variety of methods that have, as a com
Is There Room for Improvement?
involved. mon feature, enhanced formative assess A poverty of practice. There is awealth
*An effect size gain of 0.7 in the re ment suggests that this feature accounts, of research evidence that the everyday
cent international comparative studies in at least in part, for the successes. Howev practice of assessment in classrooms is
mathematics' would have raised the score er, it does not follow that itwould be an beset with problems and shortcomings, as
of a nation in themiddle of the pack of 41 easy matter to achieve such gains on a the following selected quotations indicate.
countries (e.g., the U.S.) to one of the top wide scale in normal classrooms. Many of * "Marking is usually conscientious but
five. the reports we have studied raise a num often fails to offer guidance on how work
Many of these studies arrive at another ber of other issues. can be improved. In a significant minor
important conclusion: that improved for *All such work involves new ways to ity of cases, marking reinforces under
mative assessment helps low achievers more enhance feedback between those taught achievement and underexpectation by be
than other students and so reduces the range and the teacher, ways thatwill require sig ing too generous or unfocused. Informa
of achievement while raising achievement nificant changes in classroom practice. tion about pupil performance received by
overall. A notable recent example is a study *Underlying the various approaches are the teacher is insufficiently used to inform
devoted entirely to low-achieving students assumptions about what makes for effec subsequent work," according to a United
and students with learning disabilities, which tive learning - in particular the assump Kingdom inspection report on secondary
shows that frequent assessment feedback tion that students have to be actively in schools.7
helps both groups enhance their learning.6 volved. * "Why is the extent and nature of for
Any gains for such pupils could be partic *For assessment to function formative mative assessment in science so impover
ished?" asked a research study on second
ary science teachers in the United King
* "Indeed they pay lip service to [for
mative assessment] but consider that its
practice is unrealistic in the present edu
cational context," reported a study of Ca
nadian secondary teachers.9
* "The assessment practices outlined
above are not common, even though these
kinds of approaches are now widely pro
moted in the professional literature," ac
cording to a review of assessment prac
tices in U.S. schools.'0
The most important difficulties with
assessment revolve around three issues.
The first issue is effective learning.
*The tests used by teachers encourage
rote and superficial learning even when
teachers say they want to develop under
standing; many teachers seem unaware of
the inconsistency.
*The questionsandothermethods teach
ersuse arenot sharedwith other teachers
in the same school, and theyare not crit
icallyreviewedin relationtowhat theyac
The ultimate user of assessment information that is
elicited in order to improve learning is thepupil.

quality in relation to learning. no more than devote a tiny fraction of its ilar to those of the external tests in the
The second issue is negative impact. resources to such work."2Most of the avail UnitedKingdom.Moreover, the tradition
*The giving of marks and the grading able resources and most of the public and al reliance on multiple-choice testing in
function are overemphasized,while the politicalattentionwere focusedon nation theU.S. - not shared in theUnited King
giving of useful advice and the learning al externaltests.While teachers'contribu dom - has exacerbated the negative ef
function are underemphasized. tions to these "summativeassessments" fects of such policies on the quality of class
*Approaches are used inwhich pupils have been given some formal status, hard roomlearning.
are compared with one another, the prime ly any attention has been paid to their con
purpose of which seems to them to be tributionsthroughformativeassessment. How Can We Improve
competitionratherthanpersonal improve Moreover, the problems of the relation
Formative Assessment?
ment; inconsequence,assessmentfeedback shipbetween teachers'formativeand sum
teaches low-achieving pupils that they lack mative roles have receivedno attention. The self-esteem of pupils.A reportof
"ability," causing them to come to believe It is possible that many of the com schools in Switzerland states that "a num
that they are not able to learn. mitments were stated in the belief that for ber of pupils ... are content to 'get by.'. . .
The third issue is the managerial role mative assessmentwas not problematic, Every teacher who wants to practice for
of assessments. that it already happened all the time and mative assessment must reconstruct the
*Teachers' feedback to pupils seems needed no more than formal acknowledg teaching contracts so as to counteract the
to serve social and managerial functions, ment of its existence. However, it is also habits acquired by his pupils.""4
often at the expense of the learning func clear that the political commitment to ex The ultimateuser of assessment infor
tion. ternaltesting inorder topromote compe mation that is elicited in order to improve
*Teachers are often able to predict pu tition had a central priority, while the com learning is the pupil. There are negative
pils' resultson externaltestsbecause their mitment to formativeassessmentwasmar and positive aspects of this fact. The neg
own tests imitate them, but at the same ginal.As researcherstheworld over have ative aspect is illustrated by the preceding
When theclassroomculturefo
time teachers know too little about their found, high-stakes external tests always quotation.
pupils' learningneeds. dominate teaching and assessment. How cuses on rewards, "gold stars," grades, or
*The collection of marks to fill in rec ever, they give teachers poor models for class ranking, then pupils look for ways
ords is given higher priority than the anal formativeassessmentbecauseof theirlim to obtain the best marks rather than to im
ysis of pupils' work to discern learning itedfunctionof providingoverall summa prove theirlearning.One reportedconse
needs; furthermore,some teacherspay ries of achievement rather than helpful di quence is that,when they have any choice,
no attention to the assessment records of agnosis. Given this fact, it is hardly sur pupils avoid difficult tasks. They also spend
theirpupils' previous teachers. prising that numerous research studies of time and energy looking for clues to the
Of course, not all these descriptions the implementation of the education re "right answer." Indeed, many become re
apply to all classrooms. Indeed, there are forms in the United Kingdom have found luctant to ask questions out of a fear of
many schools and classrooms to which thatformativeassessment is "seriouslyin failure.Pupilswho encounterdifficulties
they do not apply at all. Nevertheless, these need of development.""3 With hindsight, are led to believe that they lack ability,
general conclusions have been drawn by re we can see that the failure to perceive the and this belief leads them to attribute their
searchers who have collected evidence - need for substantialsupportfor formative difficulties to a defect in themselves about
throughobservation,interviews,andques assessment and to take responsibility for which they cannot do a great deal. Thus
tionnaires - from schools in several coun developing such support was a serious er they avoid investing effort in learning that
tries,includingtheU.S. ror. can lead only to disappointment, and they
An empty commitment. The devel In theU.S. similar pressures have been try to build up their self-esteem in other
opment of national assessment policy in felt frompoliticalmovements character ways.
England and Wales over the last decade ized by a distrust of teachers and a belief The positive aspect of students'bping
illustrates the obstacles that stand in the that external testing will, on its own, im the primary users of the information gleaned
way of developing policy support for for prove learning.Such fracturedrelation from formative assessments is that nega
mative assessment.The recommendations ships between policy makers and the teach tive outcomes - such as an obsessive fo
of a government ing profession are not inevitable -
task force in 198811 and indeed, cus on competition and the attendant fear
all subsequentstatementsof government many countries with enviable educational of failure on the part of low achievers -
policy have emphasized the importance of achievements seem tomanage well with are not inevitable. What is needed is a cul
formative assessment by teachers. How policies that show greater respect and sup ture of success, backed by a belief that all
ever, the body charged with carrying out port for teachers. While the situation in pupils can achieve. In this regard, forma
government policy on assessment had no the U.S. is far more diverse than that in tive assessment can be a powerful weapon
strategy either to study or to develop the England and Wales, the effects of high if it is communicated in the right way.
formative assessment of teachers and did stakes state-mandated testing are very sim While formative assessment can help all


pupils, it yields particularlygood results to improvelearning. display the state of their understanding.
with low achievers by concentrating on Such an argument Thus we maintain thatopportunitiesfor
is consistent with
specificproblemswith theirwork andgiv more generalideasestablishedby research pupils toexpresstheirunderstanding
ing them a clear understanding of what is into theway people learn.New understand be designed into any piece of teaching, for
wrong and how to put it right. Pupils can ings are not simply swallowed and stored thiswill initiate the interactionthrough
accept and work with such messages, pro in isolation; they have to be assimilated which formative assessment aids learn
vided that they are not clouded by over in relation to preexisting ideas. The new ing.
tonesaboutability,competition,and com and the old may be inconsistent or even Discussions inwhich pupils are led to
parison with others. In summary, themes in conflict, and the disparities must be re talk about their understanding in their
sage can be stated as follows: feedback to solved by thoughtful actions on the part of own ways are important aids to increas
any pupil should be about the particular the learner. Realizing that there are new ingknowledgeand improvingunderstand
qualities of his or her work, with advice goals for the learning is an essential part ing.Dialogue with the teacherprovides
on what he or she can do to improve, and of this process of assimilation. Thus we the opportunity for the teacher to respond
shouldavoid comparisonswith otherpu to and reorient a pupil's thinking. How
pils. ever, there are clearly recorded examples
Self-assessmentby pupils.Many suc of such discussions inwhich teachers have,
cessful innovationshave developed self Dialogue with the quite unconsciously, responded inways
and peer-assessment by pupils as ways of thatwould inhibit the future learning of a
enhancingformativeassessment,and such teacher provides pupil. What the examples have in common
work has achieved some success with pu is that the teacher is looking for a particu
pils from age 5 upward. This link of for
the opportunity lar response and lacks the flexibility or the
mative assessment to self-assessment is confidence to deal with the unexpected. So
not an accident; indeed, it is inevitable.
for the teacher to the teacher tries to direct the pupil toward
To explain this last statement, we should respond toand giving the expected answer. Inmanipu
first note that themain problem that those lating the dialogue in this way, the teacher
who aredeveloping self-assessmentsen reorient a pupil's seals off any unusual, often thoughtful but
counter is not a problem of reliability and unorthodox,attemptsby pupils towork
Pupils aregenerallyhon
trustworthiness. thinking. out their own answers. Over time the pu
est and reliable in assessing both them pils get themessage: they are not required
selves and one another; they can even be to think out their own answers. The ob
too hard on themselves. The main prob ject of the exercise is to work out - or
lem is that pupils can assess themselves guess - what answer the teacher expects
only when they have a sufficiently clear conclude: ifformative assessment is to be to see or hear.
picture of the targets that their learning is productive, pupils shouldbe trainedinself A particular feature of the talk between
meant to attain. Surprisingly, and sadly, assessment so that they can understand the teacher and pupils is the asking of ques
many pupils do not have such a picture, main purposes of their learning and there tions by the teacher. This natural and di
and they appear to have become accus by grasp what they need to do to achieve. rect way of checking on learning is often
tomed to receiving classroom teaching as The evolution of effective teaching. unproductive.One common problem is
an arbitrary sequence of exercises with no The research studies referred to above show that, following a question, teachers do not
overarchingrationale.To overcome this very clearly that effective programs of for wait long enough to allow pupils to think
pattern of passive reception requires hard mative assessment involve far more than out their answers. When a teacher answers
and sustained work. When pupils do acquire the addition of a few observations and tests his or her own question after only two or
such an overview, they then become more to an existing program. They require care three seconds and when aminute of silence
committed and more effective as learners. ful scrutiny of all themain components of is not tolerable, there is no possibility that
Moreover, their own assessments become a pupil can think out what to say.
a teaching plan. Indeed, it is clear that in
an object of discussion with their teach struction and formative assessment are in There are then two consequences. One
ers and with one another, and this discus divisible. is that, because the only questions that can
sion further promotes the reflection on one's To begin at the beginning, the choice produce answers in such a short time are
own thinking that is essential to good learn of tasks for classroom work and home questions of fact, these predominate. The
ing. work is important. Tasks have to be justi other is that pupils don't even try to think
Thus self-assessment by pupils, far from fied in terms of the learning aims that theyout a response. Because they know that
being a luxury, is in fact an essential com serve, and they can work well only if op the answer, followed by another question,
ponentofformativeassessment. When any portuntiesforpupils tocommunicatetheir will come along in a few seconds, there
one is tryingto learn,feedbackabout the evolving understandingarebuilt into the is no point in trying. It is also generally
effort has threeelements: recognitionof planning. Di scussion, observationof ac- the case thatonly a few pupils in a class
thedesired goal, evidence aboutpresent tivities,andmarking of writtenwork can answer the teacher'squestions.The rest
position,and someunderstanding of away all be used toprovide thoseopportunities, then leave it to these few, knowing that
All three
toclose thegap betweenthe twO.15 but it is then importantto look at or listen theycannot respondas quickiy andbeing
must be understood to some degree by carefully to the talk, thewriting, and the unwilling to riskmakingmistakes inpub
anyone before he or she can takeaction actions throughwhich pupilsdevelop and lic. So the teacher,by lowering the level

1998 143
Tests given in class and tests and other exercises assigned
for homework are also importantmeans of promotingfeedback.

of questions and by accepting answers teacher. Feedback has been shown to im routines, for any such change is uncom
from a few, can keep the lesson going but prove learning when it gives each pupil fortable, and emphasis on the challenge
is actually out of touch with the under specific guidance on strengths and weak to think for yourself (and not just towork
standing of most of the class. The ques nesses, preferably without any overall harder) can be threatening tomany. Pupils
tion/answer dialogue becomes a ritual, marks. Thus the way inwhich test results cannot be expected to believe in the value
one inwhich thoughtful involvement suf are reported to pupils so that they can of changes for their learning before they
fers. identify their own strengths and weak have experienced the benefits of such chang
There are several ways to break this nesses is critical. Pupils must be given the es.Moreover, many of the initiatives that
particular cycle. They involve giving pu means and opportunities towork with ev are needed take more class time, particu
pils time to respond; asking them to dis idence of their difficulties. For formative larly when a central purpose is to change
cuss their thinking in pairs or in small purposes, a test at the end of a unit or teach the outlook on learning and the working
groups, so that a respondent is speaking ing module is pointless; it is too late to methods of pupils. Thus teachers have to
on behalf of others; giving pupils a choice work with the results. We conclude that take risks in the belief that such invest
between different possible answers and thefeedback o01 tests, seatwork, and home ment of time will yield rewards in the fu
asking them to vote on the options; ask work shouild give each pupil guidance on ture,while "delivery" and "coverage" with
ing all of them to write down an answer how to improve, and each pupil must be poor understanding are pointless and can
and then reading out a selected few; and given help and an opportunity towork on even be harmful.
so on. What is essential is that any dia the improvement. Teachers must deal with two basic is
logue should evoke thoughtful reflection All these points make clear that there sues that are the source of many of the
in which all pupils can be encouraged to is no one simple way to improve forma problems associated with changing to a
take part, for only then can the formative tive assessment. What is common to them system of formative assessment. The first
process start to work. In short, the dia is that a teacher's approach should start by is the nature of each teacher's beliefs about
logue between pupils and a teacher should being realistic and confronting the ques learning. If the teacher assumes that knowl
be thoughtful, reflective, focused to evoke tion "Do I really know enough about the edge is to be transmitted and learned, that
and explore understanding, and conduct understanding of my pupils to be able to understanding will develop later, and that
ed so that all pupils have an opportunity help each of them?" clarity of exposition accompanied by re
to think and to express their ideas. Much of the work teachers must do to wards for patient reception are the essen
Tests given in class and tests and oth make good use of formative assessment tials of good teaching, then formative as
er exercises assigned for homework are can give rise to difficulties. Some pupils sessment is hardly necessary. However,
also important means of promoting feed will resist attempts to change accustomed most teachers accept the wealth of evi
back. A good test can be an occasion for
learning. It is better to have frequent short
tests than infrequent long ones. Any new
learning should first be tested within about
a week of a first encounter, but more fre
quent tests are counterproductive. The qual
ity of the test items that is, their rele
vance to themain learning aims and their
clear communication to the pupil - re
quires scrutiny as well. Good questions
are hard to generate, and teachers should
collaborate and draw on outside sources
to collect such questions.
Given questions of good quality, it is
essential to ensure the quality of the feed
back. Research studies have shown that,
if pupils are given only marks or grades,
they do not benefit from the feedback. The
worst scenario is one in which some pu
pils who get low marks this time also got -
v AF7 Al
low marks last time and come to expect
to get low marks next time. This cycle of "It has been said that a fool can ask more questions than a wise man can an
repeated failure becomes part of a shared swer
belief between such students and their

dence that this transmission model does tegral part of each pupil's learning work. ised by the research evidence are to be se
not work, even when judged by its own It follows from this view that several cured, each teacher must find his or her
criteria, and so are willing tomake a com changes are needed. First, policy ought to own ways of incorporating the lessons
mitment to teaching through interaction. start with a recognition that the prime lo and ideas set out above into his or her own
Formative assessment is an essential com cus for raising standards is the classroom, patterns of classroom work and into the
ponent of such instruction.We do not mean so that the overarching priority has to be cultural norms and expectations of a par
to imply that individualized, one-on-one the promotion and support of change with ticular school community."7 This process
teaching is the only solution; rather we in the classroom. Attempts to raise stan is a relatively slow one and takes place
mean that what is needed is a classroom dards by reforming the inputs to and meas through sustained programs of profession
culture of questioning and deep thinking, uring the outputs from the black box of al development and support. This fact does
inwhich pupils learn from shared discus the classroom can be helpful, but they are not weaken the message here; indeed, it
sions with teachers and peers. What emerg not adequate on their own. Indeed, their should be seen as a sign of its authentic
es very clearly here is the indivisibility of helpfulness can be judged only in light of ity, for lasting and fundamental improve
instruction and formative assessment prac their effects in classrooms. ments in teaching and learning must take
tices. The evidence we have presented here place in this way. A recent international
The other issue that can create prob establishes that a clearly productive way study of innovation and change in educa
lems for teachers who wish to adopt an to start implementing a classroom-focused tion, encompassing 23 projects in 13 mem
interactive model of teaching and learning policy would be to improve formative as ber countries of the Organisation for Eco
relates to the beliefs teachers hold about sessment. This same evidence also estab nomic Co-operation and Development, has
the potential of all their pupils for learn lishes that in doing so we would not be con arrived at exactly the same conclusion with
ing. To sharpen the contrast by overstat centrating on some minor aspect of the regard to effective policies for change.'8
ing it, there is on the one hand the "fixed business of teaching and learning. Rather, Such arguments lead us to propose a four
I.Q." view - a belief that each pupil has we would be concentrating on several es point scheme for teacher development.
a fixed, inherited intelligence that cannot sential elements: the quality of teacher! 1. Learningfrom development. Teach
be altered much by schooling. On the oth pupil interactions, the stimulus and help ers will not take up ideas that sound at
er hand, there is the "untapped potential" for pupils to take active responsibility for tractive, no matter how extensive the re
view - a belief that starts from the as their own learning, the particular help need search base, if the ideas are presented as
sumption that so-called ability is a com ed tomove pupils out of the trap of "low general principles that leave the task of
plex of skills that can be learned. Here, achievement," and the development of the translating them into everyday practice en
we argue for the underlying belief that all habits necessary for all students to be tirely up to the teachers. Their classroom
pupils can learn more effectively if one come lifelong learners. Improvements in lives are too busy and too fragile for all
can clear away, by sensitive handling, the formative assessment, which are within but an outstanding few to undertake such
obstacles to learning, be they cognitive fail the reach of all teachers, can contribute work. What teachers need is a variety of
ures never diagnosed or damage to person substantially to raising standards in all living examples of implementation, as prac
al confidence or a combination of the two. these ways. ticed by teachers with whom they can iden
Clearly the truth lies between these two Four steps to implementation. If we tify and from whom they can derive the
extremes, but the evidence is that ways of accept the argument outlined above, what confidence that they can do better. They
managing formative assessment that work needs to be done? The proposals outlined need to see examples of what doing bet
with the assumptions of "untapped poten below do not follow directly from our termeans in practice.
tial" do help all pupils to learn and can analysis of assessment research. They are So changing teachers' practice cannot
give particular help to those who have consistent with itsmain findings, but they begin with an extensive program of train
previously struggled. also call on more general sources for guid ing for all; that could be justified only if
ance.16 it could be claimed that we have enough
At one extreme, one might call formore "trainers" who know what to do, which is
Policy and Practice research to find out how best to carry out certainly not the case. The essential first
Changing the policy perspective. The such work; at the other, one might call for step is to set up a small number of local
assumptions that drive national and state an immediate and large-scale program, with groups of schools - some primary, some
policies for assessment have to be called new guidelines that all teachers should put secondary, some inner-city, some from out
into question. The promotion of testing as into practice. Neither of these alternatives er suburbs, some rural- with each school
an important component for establishing is sensible: while the first is unnecessary committed both to a school-based devel
a competitive market in education can be because enough is known from the results opment of formative assessment and to
very harmful. The more recent shifting of of research, the second would be unjusti collaboration with other schools in its lo
emphasistowardsettingtargetsforall,with fied because not enough is known about cal group. In such a process, the teachers
assessmentprovidinga touchstonetohelp classroom practicalities in thecontextof in theirclassroomswill be working out
check pupils' attainments,is amore ma any one country's schools. theanswers tomany of thepracticalques
tureposition.However, we would argue Thus the improvement of formativeas tions thattheevidencepresentedhere can
that there is a need now tomove further, sessmentcannotbe a simplematter.There not answer.They will be reformulating
tofocus on the insideof the "blackbox" isno quick fix thatcan alterexistingprac the issues, perhaps in relation to funda
and so to explore thepotential of assess tice by promising rapid rewards.On the mental insightsandcertainlyin termsthat
ment to raise standardsdirectlyas an in contrary,if the substantialrewardsprom make sense to theirpeers in other class


wider dissemination -
rooms. It is also essential to carry out such for example, ear study suggests thatassessment, as itoc
development in a range of subject areas,
marking funds for inservice training pro curs in schools, is far from a merely
for the research inmathematics education grams - would have to be pursued. technical problem. Rather, it is deeply
is significantly different from that in lan We must emphasize that this process social and personal.'9
guage, which is different again from thatwill inevitably be a slow one. To repeat The chief negative influence here is
in the creative arts. what we said above, if the substantial re that of short external tests. Such tests can
The schools involved would need ex wards promised by the evidence are to be dominate teachers' work, and, insofar as
tra support in order to give their teachers
secured, each teacher mustfind his or her they encourage drilling to produce right
time to plan the initiative in light of ex
own ways of incorporating the lessons and answers to short, out-of-context questions,
isting evidence, to reflect on their experi
ideas that are set out above into his or her they can lead teachers to act against their
ence as it develops, and to offer advice own patterns of classroom work. Even with own better judgment about the best ways
about training others in the future. In ad
optimum training and support, such a process to develop the learning of their pupils. This
dition, there would be a need for external
will take time. is not to argue that all such tests are un
evaluators to help the teachers with their 3. Reducing obstacles. All features in helpful. Indeed, they have an important
development work and to collect evidence the education system that actually obstruct role to play in securing public confidence
the development of effective formative as
of its effectiveness. Video studies of class in the accountability of schools. For the
room work would be essential for dissem sessment should be examined to see how immediate future, what is needed in any
inating findings to others. their negative effects can be reduced. Con development program for formative as
2. Dissemination. This dimension of
sider the conclusions from a study of teach sessment is to study the interactions be
the implementation would be in low gear ers of English inU.S. secondary schools. tween these external tests and formative
at the outset - offering schools no more assessments to see how themodels of as
than general encouragement and expla Most of the teachers in this study were
sessment that external tests can provide
caught inconflicts among belief systems
nation of some of the relevant evidence could be made more helpful.
and institutionalstructures,agendas, and
that they might consider in light of their All teachers have to undertake some
values. The point of friction among these
existing practices.Dissemination efforts conflicts was assessment, which was as summative assessment. They must report
would become more active as results and sociatedwith very powerful feelings of to parents and produce end-of-year re
resources became available from the de being overwhelmed, and of insecurity, ports as classes are due tomove on to new
velopment program. Then strategies for guilt, frustration, and anger.... This teachers. However, the task of assessing

pupils summativelyfor externalpurpos clusions. Enough is known to provide a Systematic Formative Evaluation: A Meta-Analy
sis," Exceptional Children, vol. 53, 1986, pp. 199
es is clearly different from the task of as basis for active development work, and
sessing ongoingwork tomonitor and im some of themost important questions can
5. See Albert E. Beaton et al., Mathematics Achieve
proveprogress.Some arguethatthesetwo be answered only through a program of ment in the Middle School Years (Boston: Boston
roles are so different that they should be practical implementation. College, 1996).
kept apart. We do not see how this can be Directions for futureresearchcould in 6. Lynn S. Fuchs et al., "Effects of Task-Focused
done, given that teachers must have some clude a study of the ways inwhich teach Goals on Low-Achieving Students with and With
out Learning Disabilities," American Educational
share of responsibility for the former and ers understand and deal with the relation
Research Journal, vol. 34, 1997, pp. 513-43.
must take the leading responsibility for shipbetween theirformativeand summa 7. OFSTED (Office for Standards in Education),
However, teachersclearly face tive roles or a comparative study of the
the latter.20 Subjects and Standards: Issues for School Devel
difficultproblems inreconcilingtheirfor predictivevalidityof teachers'summative opment Arising from OFSTED Inspection Findings
mative and summativeroles,andconfusion assessments versus external test results. 1994-5: Key Stages 3 and 4 and Post-16 (London:
minds between theserolescan Many more questions could be formulated,
in teachers' Her Majesty's Stationery Office, 1996), p. 40.
8. Nicholas Daws and Birendra Singh, "Formative
impede the improvementof practice. and it is important for future development
to En
Assessment: To What Extent Is Its Potential
The arguments here could be takenmuch that some of these problems be tackled by hance Pupils' Science School Sci
Being Realized?,"
further tomake the case that teachers should basic research. At the same time, experi ence Review, vol. 77, 1996, p. 99.
play a far greater role in contributing to enced researchers would also have a vital 9. Clement Dassa, Jes?s Vazquez-Abad, and Djavid
summativeassessments for accountabili role to play in the evaluation of the devel Ajar, "Formative Assessment in a Classroom Set
to Computer Innovations," Al
ty.One strong reason for giving teachers opmentprogramswe have proposed. ting: From Practice
berta Journal of Educational Research, vol. 39,
a greater role is that they have access to
1993, p. 116.
the performance of their pupils in a vari
Are We Serious 10. D. Monty Neill, "Transforming Student Assess
ety of contexts and over extended periods ment," Phi Delta Kappan, 1997, pp. 35
About Raising Standards? September
of time. 36.
This is an important advantage because The findings summarizedabove and 11. Task Group on Assessment and Testing: A Re
the program we have outlined have im of Education and Sci
sampling pupils' achievement by means port (London: Department
ence and theWelsh Office, 1988).
of short exercises taken under the condi plications for a variety of responsible
12. Richard Daugherty, National Curriculum As
tions of formal testing is fraught with dan agencies. However, it is the responsibili sessment: A Review of Policy, 1987-1994 (London:
gers. It is now clear that performance in ty of governments to take the lead. It Falmer Press, 1995).
any task varies with the context in which would be premature and out of order for 13. Terry A. Russell, Anne Quaker, and Linda
it is presented. Thus some pupils who seem us to try to consider the relative roles in McGuigan, "Reflections on the Implementation of

incompetent in tackling a problem under such an effort, although success would National Curriculum Science Policy for the 5-14

test conditions can look quite different in clearly depend on cooperation among gov Age Range: Findings and Interpretations from aNa
tional Evaluation Study in England," International
the more realistic conditions of an every ernment agencies, academic researchers, Journal vol.
of Science Education, 17, 1995, pp.
day encounter with an equivalent problem. and school-basededucators. 481-92.
Indeed,theconditionsunderwhich formal The main plank of our argument is that 14. Phillipe Perrenoud, "Towards a Pragmatic Ap
tests are taken threaten validity because standards can be raised only by changes proach to Formative Evaluation," in Penelope Wes

they are quite unlike those of everyday per that are put into direct effect by teachers ton, ed., Assessment of Pupils' Achievement: Moti
vation and School Success (Amsterdam: Swets and
formance. An outstanding example here is and pupils in classrooms. There is a body
Zeitlinger, 1991), p. 92.
that collaborative work is very important of firm evidence that formative assess 15. D. Royce Sadler, "Formative Assessment and
in everyday life but is forbidden by current ment is an essential component of class the Design of Instructional Systems," Instructional
norms of formal testing.2'These points open room work and that its development can Science, vol. 18, 1989, pp. 119-44.
up wider arguments about assessment sys raise standards of achievement. We know 16. Paul J. Black and J.Myron Atkin, Changing the
Innovations in Science, Mathematics, and
tems as awhole - arguments that are be of no other way of raising standards for Subject:
Technology Education (London: Routledge for the
yond the scope of this article. which such a strong prima facie case can and De
Organisation for Economic Co-operation
4. Research. It is not difficult to set out be made. Our plea is that national and state velopment, 1996); and Michael G. Fullan, with
a list of questions that would justify fur policy makers will grasp this opportuni Suzanne Stiegelbauer, The New Meaning of Educa
ther research in this area. Although there ty and take the lead in this direction. tional Change (London: Cassell, 1991).

are many and varied reports of successful 17. See Stigler and Hiebert, pp. 19-20.
18. Black and Atkin, op. cit.
innovations, they generally fail to give clear 1. James W. Stigler and James Hiebert, "Understand
Instruc 19. Peter Johnston et al., "Assessment of Teaching
accounts of one or another of the impor ing and Improving Classroom Mathematics
tion: An Overview of the TIMSS Video Study," Phi and Learning in Literature-Based Classrooms," Teach
tant details. For example, they are often vol. 11,1995, p. 359.
Delta Kappan, September 1997, pp. 19-20. ing and Teacher Education,
silent about the actual classroom methods no term here. 20. Dylan Wiliam and Paul Black, "Meanings and
2. There is internationally agreed-upon
used, themotivation andexperienceof the "Classroom evaluation," "classroom assessment," "in Consequences: A Basis for Distinguishing Forma
teachers, the nature of the tests used as ternal assessment," "instructional assessment," and "stu tive and Summative Functions of Assessment," Brit
measures of success, or theoutlooks and dent assessment" have been used by different authors, ish Educational Research Journal, vol. 22, 1996,

expectationsof thepupils involved. and some of these terms have different meanings in pp. 537-48.
different texts. 21. These points are developed in some detail in
However,while there is ample justifi 3. Paul Black and Dylan Wiliam, "Assessment and Sam Wineburg, "T. S. Eliot, Collaboration, and the
cation forproceedingwith carefully for Classroom Learning," Assessment inEducation, March Quandaries of Assessment in a Rapidly Changing
mulated projects,we do not suggest that 1998, pp. 7-74. World," Phi Delta Kappan, September 1997, pp. 59
everyone else shouldwait for theircon 4. Lynn S. Fuchs and Douglas Fuchs, "Effects of 65. IC


