Introduction To Research Methods - Burns, Robert B. (Robert Bounds), 1951 - 2000 - London Thousand Oaks, Calif. - SAGE - 9780761965923 - Anna's Archive
Introduction To Research Methods - Burns, Robert B. (Robert Bounds), 1951 - 2000 - London Thousand Oaks, Calif. - SAGE - 9780761965923 - Anna's Archive
Introduction To Research Methods - Burns, Robert B. (Robert Bounds), 1951 - 2000 - London Thousand Oaks, Calif. - SAGE - 9780761965923 - Anna's Archive
ee
ee
:
LEARNING. 01209 722146
eeevessosesess SEFVICeS
i 0 Ar ZUZU
16 MAY 2017
iH
CORNWALL COLLEGE
Robert B. Burns
S
SAGE Publications
London * Thousand Oaks * New Delhi
© 2000 Pearson Education Australia Pty Limited
First published 1990
Reprinted 1991
Second Edition 1994
Reprinted 1995
Third Edition 1997
Reprinted 1998
Fourth Edition 2000
A catalogue record for this book is available from the British Library
Appendix
STQ answers
Index
Preface to the new expanded edition
Being asked to prepare an international edition is both exciting and a challenge. It is
exciting because it means the continuation of a project that has been successful in its aims
already—to provide a simple yet reasonably thorough basic grounding in the essential
concepts and practical application of quantitative and qualitative research methods
without too much mathematical notation and number manipulation, which many other
texts appear to focus on and which make the subject a big turn off.
It is challenging because the request demands a response of further development.
Therefore, in this edition I have introduced several additional topics. There are new
chapters on regression and meta-analysis, both of which have been asked for by users of
the text. Secondly, I have included a chapter which introduces the reader in simple terms
to the role of effect size and power, both increasingly regarded as important alternatives
to conventional significance testing in evaluating the results of investigations.
The ubiquity of statistical programs for the PC has encouraged me to include, at the
end of each chapter covering statistical tests, a short set of instructions on how to use
SPSS for data analysis for that statistical procedure and how to interpret the output.
The choice of SPSS rather than any other program was based on the fact that it is the
most popular statistical software used in the social sciences.
Little else has altered in the text but I hope the additional material will be to the
liking of those who have used the text as teachers and students over the last decade.
https://archive.org/details/introductiontoreO00Oburn_k7e3
Preface to the first edition
Education is a complex process and we know, even now, only a small part of how it
operates and of the reciprocal interaction between the process and the pupils, teachers,
parents and the many others involved. There are so many things we wish to know and
the only safe way to produce knowledge in which we can put our faith is to conduct
systematic research.
The overriding purpose of this book is to provide a basic understanding of the main
techniques, concepts and paradigms for conducting research in education in both
quantitative and qualitative modes.
Methodology and statistics have been integrated into one text, since neither is much
use without the other. By the end of the book, the student should be able to evaluate the
research of others, define a problem, formulate hypotheses about the problem, design
and carry out a valid and reliable study of the problem, apply the correct statistics, discuss
the results and implications, and write it all up in a sensible and logical manner.
Experience has shown that many students and teachers are reluctant to study this area
as it is seen as mathematical and the milieu of experts. This book attempts to demystify
the role of experts and dispel such fears and negative attitudes by the logical sequencing
of material and by using simple examples and practice exercises employing only basic
arithmetic. It is hoped that even the mathematically inept will enjoy and understand the
material. What is a very complex array of concepts, designs and statistics has been
presented in everyday language, and questions for self-testing current understanding
(STQs) are presented within the text as well as at the ends of most chapters.
The text is addressed to a wide variety of students, but primarily to those in pre-
service and in-service teacher education courses. It would also be suitable for any social
science and para-medical course needing an introductory text in the area. The book can
be used in class or in conditions of minimum support in distance education, as reference
to previous material is deliberately made throughout the text since this distributed
repetition improves retention, understanding and transfer.
The text organisation encourages students to proceed through the book, chapter by
chapter, rather than dipping in here and there; for as students master the material in one
chapter, they are providing a basis for the understanding of future material.
I am grateful to the Literary Executor of the late Sir Ronald A. Fisher, FRS, to Dr
Frank Yates, FRS, and the Longman Group Ltd, London, for permission to reprint
Tables II, [V and VI from their book Statistical Tables for Biological, Agricultural and
Medical Research (6th edition, 1974).
Finally, I wish all those who use the text ‘good luck’ in their studies. Many of your
predecessors have contributed to the viewpoints and thinking that characterise this text.
To them I owe a debt of gratitude. I am particularly indebted to Shelley and Caroline
who produced such a splendid manuscript out of my hieroglyphics.
Finally, to my family, who endured patiently the long periods of selfish devotion I
spent on the production of this text, |am profoundly grateful for their support.
R.B. Burns
Perth, Western Australia
preface
oovtibe wilt ab oF
o> eae len wih mie Sot eo) ;
. uquy ah lew Aeon oi giewiied eae .
,; = eee s ;
on Abe Sy PROT YOR Ti Fie gh a
clogs os wae wi Sling atte a dE tr gel eel aieTar
nocon ait) tO, grndbgtd>) aie sinestige air ruts ma are ally
rT
hed al bultcods gi Ayden a -pitudlinde WA omy
:
; a _ £
ies —
; :
dower ci Ales ite {eet a ore —_ aylt-as ond ved " - —
si cael orehdh, wlblind. rT 1 aud si I5 aw
ss
irgiees .cvabi my Sth ahi eag tw y wnt
Fnet SIGH
nate: isi taipese35) ju emt 7 altviar *) honrte:ys
im sone adin bniahg [lie ghdiereas fiagut Ue Tt aT Se
- 7S ih) hut @ theses td weaves (ey euriue ings
etek we yyitoony | nlaAwe yes4 evade it an
ub Nica
gpstalisuiiese lai oli ext col eaTi (ee Sali
| ela path onlay te ma Py a tts cag od ae , —o-- )
Stet Oeeettaglents leas acier Uipereemn i! at a oy 2 . rn i
Weed al abtieefare Sei viqith iren? ting : "*
= gistinetas tl iT APA one tRe ausaidieande! mt ‘
avnapihe jeu pa pen“ia ng lnve ; _
VM) of Vet Ae aia Wer) tort} a~rhan tiy he j 1
a oie tiie ad skisdali ort demu noni ro waa
why eral 1T eto oi of iaxd @ in son yi eile
sete at fred eacirs neil yi reap gea rant a
pete
6 ted! Pa Bane te: soli Hi icignaniitebem *
“= cot wp tiie
+8 ope ded dqueul len of Bap
sro es Leheniyt ae TUNES: vndlattr ra revel Taig
. lila gue! to aniline ash mg silt 18 w
1) oa ZAO rac Ablcouet it eae wicnalll
ruins at peli oes wor ooking! JT ganic aie
Lory Lewes Meee! vt non y eevee amas
J
nnd 2-2 -
fry rreszes \Y/ inet \
atl,
=
a
CONTRASTING PERSPECTIVES
ETHICS OF RESEARCH
Introduction
Research is a systematic investigation to find answers to a problem. Research in
professional social science areas, like research in other subjects, has generally followed the
traditional objective scientific method. Since the 1960s, however, a strong move towards
a more qualitative, naturalistic and subjective approach has left social science research
divided between two competing methods: the scientific empirical tradition, and the
naturalistic phenomenological mode.
In the scientific method, quantitative research methods are employed in an attempt
to establish general laws or principles. Such a scientific approach is often termed
nomothetic and assumes social reality is objective and external to the individual.
The naturalistic approach to research emphasises the importance of the subjective
experience ofindividuals, with a focus on qualitative analysis. Social reality is regarded
as a creation of individual consciousness, with meaning and the evaluation of events
seen as a personal and subjective construction. Such a focus on the individual case rather
than general law-making is termed an ideographic approach.
Each of these two perspectives on the study of human behaviour has profound
implications for the way in which research is conducted.
Write down in your notebook a couple of sentences explaining why research has tended
to adopt a systematic approach to the study of human behaviour/human issues.
Methods of knowing
There are four general ways of knowing, according to Kerlinger (1986):
] Method of tenacity. Here one holds to the truth because one knows it to be true. The
more frequent the repetition of the ‘truth’, the more the enhancement of the validity
ofit. It is self-evident then, and people will cling to such beliefs. For example, even
in the face of contrary evidence some people believe that all communists are spies.
Method of authority. A thing must be true ifit is in the Bible, or the prime minister
says it, or a teacher said so. The method of authority is not always unsound but we
never know when it is or isn’t. But we lack individual resources to investigate
everything, so the presumed competence of authority offers advantages.
Method of intuition (a priori method). This claims that reason is the criterion of
truth. It ‘stands to reason’ that learning difficult subjects must build moral character.
But whose reason is to carry the judgement, if two eminent persons using rational
processes reach different conclusions’?
Method of science. This method has one characteristic none of the other methods
has—that is, self-correction. The checks verify and control the scientist’s activities
and conclusions. Even ifahypothesis seems to have support, the scientist will also test
alternative hypotheses. Knowledge is attained through a controlled systematic process
because science ultimately appeals to evidence; hypotheses are subjected to test. None
of the other ideas, opinions, theories and methods above provide any procedure for
establishing the superiority of one belief over another. Science is not just a body of
knowledge but a logic of inquiry, for generating, replenishing and correcting
knowledge.
Operational definition
Operational definition means that terms must be defined by the steps or operations
used to measure them. Such a procedure is necessary to eliminate confusion in meaning
and communication. Consider the statement, “Anxiety causes students to score poorly
in tests’. One might ask, “What is meant by anxiety?’. Stating that anxiety refers to being
tense or some other such term only adds to the confusion. However, stating that anxiety
refers to a score over a criterion level on an anxiety scale enables others to realise what
you mean by anxiety. Stating an operational definition forces one to identify the
empirical referents, or terms. In this manner, ambiguity is minimised. Again, introversion
might be defined as a score on a particular personality scale, hunger as so many hours
since last fed, and social class as defined by father’s occupation.
Replication
To be replicable, the data obtained in an experiment must be reliable; that is, the same
result must be found if the study is repeated. That science has such a requirement is quite
obvious, since it is attempting to obtain knowledge about the world. If observations are not
repeatable, our descriptions and explanations are likewise unreliable and therefore useless.
Hypothesis testing
The lay person uses theories and concepts in a loose fashion, often accepting ludicrous
explanations for human behaviour. For instance, “Being ill is a punishment for being
sinful’; “An economic depression may be attributed to Asian immigrants’. On the other
... L reported to him a case which to me did not seem particularly Adlerian, but in which he
found no difficulty analysing the terms of his theory of inferiority feelings, although he had
not even seen the child. Slightly shocked, I asked him how he could be so sure. “Because of
my thousand-fold experience’, he replied; whereupon I could not help saying, ‘And with
this new case, I suppose your experience has become thousand-and-one-fold’. (Popper 1957,
D 5D).
Without rules which could be followed by any trained observer for operationalising
‘inferiority complex’, observations of the concept are not observations in a scientific
sense because there are no checks on what is and what is not an instance of the concept.
Science is a ‘public’ method, open to all who learn its methods, and a similar
observation of the same subject using the same operationalised concept should give the
same result. Where reliance is placed, argues Popper, on the ‘private’ opinion of the
expert (such as Adler) without strict rules of observation, the test of any hypothesis
linking ‘inferiority complex’ to another concept is unsatisfactory.
Summarise the main characteristics of the scientific approach. How does this approach
differ from a common sense approach?
1 Examine some recent newspapers and select two claims that are not accompanied
by supporting evidence. Indicate briefly how support might be obtained.
2 How does the scientific approach differ from the commonsense approach to
problem-solving?
3 Why are only testable ideas of worth in science?
4 Scientific study is empirical and objective. What is meant by this statement?
These problems and issues may well illustrate reasons for the sometimes defensive
reactions of more traditionally-oriented researchers when encountering alternative
conceptions and methodologies.
References
Barton, A. & Lazarsfeld, P. (1969), ‘Some functions of qualitative analysis’, in Jssues in Participant
Observation, eds G. Macall & J. Simmons, Addison-Wesley, Reading.
Cronbach, L.J. (1975), “Beyond the two disciplines of scientific psychology’, American Psychologist,
30, pp. 116-26.
Eisner, E. (1979), ‘Recent developments in educational research affecting art education’, Art Education
By apo =i5.
Kerlinger, F (1986), Foundations ofBehavioral Research, Holt, New York.
Malinowski, B. (1922), Argonauts of the West Pacific, Routledge, London.
Parlett, M. (1975), ‘Evaluating innovations in teaching’, in Curriculum Design, (eds) J. Greenwald &
R. West, Croom Helm, London.
Popper, K. (1957), The Poverty of Historicism, Routledge, London.
Popper, K. (1963), Conjectures and Refutations, Routledge, London.
Rist, R.C. (1975), ‘Ethnographic techniques and the study of an urban school’, Urban Education 10,
pp. 86-108.
Ethical principles, rules and conventions distinguish socially acceptable behaviour from
that which is considered socially unacceptable. However, in social science research a few
workers consider their work beyond scrutiny, presumably guided by a disinterested
virtue which justifies any means to attain hoped for ends. In education, numerous
children have been forced to learn nonsense syllables, native children have been taken
from their natural mothers and brought up as though a member of another culture,
many left-handed children have been forced to write with their right hands. Most
children at some time are unwittingly involved in questionable experiments in teaching
methods or medical procedures, and all of us have at some time felt obliged to complete
some meaningless questionnaire.
Of course, some researchers feel that ethical rules or guidelines that attempt to define
limits may be too rigid, limiting the effectiveness of research and denying research into
aspects of human behaviour where knowledge would be valuable.
Ethical problems can relate to both the subject matter of the research as well as to its
methods and procedures, and can go well beyond courtesy or etiquette regarding
appropriate treatment of persons in a free society. Social scientists have often been
criticised for lack of concern over the welfare of their subjects. The researcher often
misinforms subjects about the nature of the investigation, and/or exposes them to
embarrassing or emotionally painful experiences. Many subjects may feel obliged to
volunteer for a variety of reasons. Professionals also feel troubled by ethical issues. It was
found in a survey by the British Psychological Society that the two major areas of
dilemma for members were confidentiality and research. Issues reported in this latter area
included unethical procedures, informed consent, harm to participants, deception, and
deliberate falsification of results.
Voluntary participation
The problem with volunteers is that they are not likely to be a random sample of the
population. They tend to be better educated, of a higher social class, more intelligent,
more social, less conforming and possess a higher need for approval than non-volunteers.
This means that the external validity (the confidence to generalise to the population) 1s
reduced.
Ethical requirements about volunteering can therefore act in direct opposition to the
methodological requirements of good research. Some volunteers may not be as free to
choose as the researcher may think. Much educational and social science research is
conducted on students at school or university. Most will agree to participate, but often
do so because they may believe that some undesired effect on their marks, report or
references will occur if they do not. They may be free but do not feel so. Many parents
may also be in this same circumstance, agreeing to their child’s or their own participation
for fear of possible consequences.
The following are examples of experiment recruitment practices that may raise ethical
questions:
* subjects who are inmates of prisons who participate in anticipation of more favourable
treatment;
Involuntary participation
In naturalistic covert observation the observed person is usually unaware of their
participation. This is not objectionable when unobtrusive observations are made and
each observation is simply one more in a frequency count, such as children’s playground
behaviour. However, in other observation studies private lives can be invaded, such as
studies on bystander intervention.
Informed consent
This is the most fundamental ethical principle that is involved. Participants must
understand the nature and purpose of the research and must consent to participate
without coercion. Many researchers have their potential participant sign an informed
consent form which describes the purpose of the research, its procedures, risks and
discomforts, its benefits and the right to withdraw. This makes the situation clear and
provides a degree of proof that the person was informed and consented to take part.
Participants who are explicitly or implicitly coerced to get involved, such as prison
inmates for more beneficial treatment or students for money or points towards unit
Deception
The primary justification for deception is that knowledge of the purpose of the investigation
might contaminate results; subjects who are unaware of the real purpose will behave more
naturally. Yet it is basically unethical in human relationships. Moreover, just being in an
experiment even without a specific purpose can alter behaviour, so deception may not work
as expected. Informed consent does lead to non-random samples as it implies voluntary
involvement.
There are some situations where you would not wish to disclose the purpose of the
study, or even that a study is proceeding, e.g. observation ofplay in preschool children,
participant observation in a delinquent gang, etc. Active deception includes
misrepresenting the purpose of the study, use of placebos, false diagnoses, false promises.
Passive deception includes secret recording of behaviour, concealed observation, use of
personality tests where the participant is unaware of the rationale. While deception is a
well used tactic in social psychology and some ethnographic activities, a deceived subject
may feel they have been part of an elaborate hoax, lose self-esteem and develop negative
attitudes to research. Milgram’s 1974 study on obedience is a classic deception study in
which subjects were deceived into thinking that they were administering electric shocks
to another participant.
Milgram’s experiment also produced considerable stress as well as deception. Most
deception is produced in studies on emotion, motivation, social behaviour and
ethnography. There is little deception in memory and intellectual studies, although even
here deception which does not harm the subject can be used, such as telling a subject they
are reading newspaper stories in a study of readability when it is actually examining
memory errors.
Some deception is somewhat innocuous, such as participants being told the baby is
a male and others that it is female and then asked to describe the baby’s personality and
behaviour. The use of a placebo is deceptive but usually not harmful. Some students may
be told their experimental rats are bright.
Other studies are more dangerous in their effect on human behaviour, for instance
Rosenthal and Jacobsen’s (1968) ‘self fulfilling prophecy’ studies, in which teachers were
told of the degree of talent to expect from children in their classes. Subsequent
performance did improve for those who at random were indicated as those who should
‘bloom’. What about those who by chance were allocated to the group not expected to
perform? Were their life chances lowered as a result of expectation fuelled by the teachers
in the experiment? A number of studies have been similar to Milgram’s and involved
Role-playing
Role-playing has been used as a means of avoiding deception. Subjects are fully informed
about the investigation and then asked to act as though they were subject to a particular
treatment condition. It is assumed that they understand they are not part of a real
situation. This relies heavily on the subject’s ability to role-play the required role adroitly.
The Stanford Prison study (Zimbardo & Ruch 1973) showed that subjects role-playing
guards and prisoners could become immersed in a role even when they knew the
experimental nature of the situation. This study had to be stopped after six days of a
planned fourteen days as the student prison guards became brutal and sadistic, while the
student prisoners developed passive dependency. However, it is argued conversely that
subjects who are fully informed of the experiment will produce results different from
those produced by uninformed subjects. Other approaches have been to forewarn
subjects that some of the experiments they might be asked to take part in may involve
deception and only then ask for them to volunteer.
Debriefing
In a debriefing session you inform the subjects about the nature of the study, any
deception and why it was necessary. You must restore the subject’s self-esteem and trust
in the motives of researchers. The debriefing should include the following:
¢ disclosure as to the purpose of the experiment, interviews, questionnaires, etc.;
¢ description of deception and why used;
* an attempt to make the research appear scientifically respectable and important.
* you may wish to allow subjects to view later experimental sessions showing another
subject being deceived so that they fully realise what happened.
Right to discontinue
Ethical research practice respects this right. It is an important safeguard. It is often used
by those completing questionnaires or interviews when they refuse to respond to an
item. It is more difficult in a captive group in an experimental situation; there are subtle
forces at play between a researcher and a subject that make it difficult for the participant
to discontinue.
Experimenter obligations
Researchers make several implicit contracts with their subjects. For example, if the
subjects agree to be present at a specific time and place then the researcher must also. If
the researcher has promised to send a summary of results to the subjects then that must
be done. The researcher must not run overtime as many subjects may have made
arrangements to fit round the time requirement already notified.
Publication of findings
Researchers should be open with their results, allowing disinterested colleagues to vet the
research and its implications, because no one wants newspapers to seize on half-truths,
misinterpreting information and making unnecessary waves, particularly if the issue
affects people’s lives. Nor does anyone want politicians and bureaucrats rushing off to
create new policy before verification and replication among the academic community.
Education and social science research on such matters as racial differences,
immigration policy, selection of specific populations for special programs, sexual
behaviour, etc. will always stir up controversy, so that the most responsible researchers
should announce their findings and implications with great qualification and caution.
Remember it is always difficult to prevent unqualified persons from using research
findings for their own discriminatory and abusive ends.
Stress
Some studies involve providing subjects with unfavourable feedback about their
personalities or abilities in order to assess the effect on their self-concept or self-esteem.
Intervention studies
These studies often involve willing participation, for example, working with parents at
home to improve parental stimulation and to show the effect on child learning and
intellectual performance. Ethical issues arise in selecting one group to be given special
treatment, particularly if itendows the participants with some beneficial ability. Other
interventions that examine the effect of an independent variable on a dependent
behavioural variable which are of questionable ethics include raising the level of
aggression in young children by showing them violent videos.
Conclusion
All in all it looks fairly difficult to conduct much research without running into ethical
arguments. Codes of ethics have been developed by many professions which deal with
human subjects. The most comprehensive and credible code of ethics is that issued by
the American Psychological Association (1992). This has become a major standard and
model for researchers in the social sciences. The British Psychological Society, which is
the other major worldwide association of psychologists, also has a Code of Conduct
(BPS 1993):
The Australian Association for Research in Education has recently published an
annotated bibliography on ethics in educational research which has general application
to research across the behavioural and social sciences. This bibliography can be accessed
at Attp://www.swin.edu.au/aare/welcome.html.
Generally such codes include the following requirements:
that risks to participants are minimised by procedures which do not expose subjects
to risks;
¢ that risks to participants are outweighed by the anticipated benefits of the research;
¢ that the rights and welfare of participants are protected. The research should avoid
unnecessary psychological harm or discomfort to the subjects;
* participation should be voluntary;
¢ the subject has the right to know the nature, purposes and duration of the study, ice.
informed consent. Participants should sign an informed consent form which outlines
the study, who is conducting it, for what purpose, and how it is to be carried out; also
providing assurances of confidentiality and voluntary participation. The participant
should sign, acknowledging that they freely consent to participate. Should the subject
be below the age of consent or incapacitated due to age, illness or disability, a parent,
guardian or responsible agent must sign.
1 If it is inappropriate to explain the reason for the research to the subjects before
data collection, which of the following should the experimenter do?
a ___Inform them anyway since cooperation is vital.
b Not disclose any information.
c Tell the subjects they will be informed at the end of the experiment.
2 Because in the school setting it is essential to have cooperation of parents, teachers,
students and administrators, which of the following must the researcher do?
a Ask the school principal to explain the research to all involved.
b Devise a plan to gain whole school cooperation.
c Work only where immediate cooperation is available.
3 If a student drops out of a research project which of the following should be done?
a__ The student should be required to provide a substitute.
b The student’s personal records should be amended to note the fact.
c¢ Nothing should be done.
4 In meeting with a school principal explain what steps you will tell him or her you are
taking to protect the rights of the teachers and students in an experiment (e.g. give
details of the research, right to privacy, who is involved, right to withdraw, no
penalties for refusal to take part etc.)
5 Suppose you have conducted a study in which students have been given false scores
on a maths test in order to see if this information has any effect on a subsequent
similar test. Describe your procedures at the end of the experiment. (Design a
program to explain why deception is necessary. Meet each student individually and
show test papers with correct scores. Offer to give another similar test so they can
be sure of real performance level etc.)
6 Discuss the contention that the ends never justifies the means where research with
humans is concerned.
Ethical problems are likely to occur in social science research since human subjects are involved.
Researchers must be aware of ethical considerations involved in voluntary and non-voluntary
participation, deception, informed consent, privacy and confidentiality, the right to discontinue,
and obligations of the experimenter.
References
American Psychological Association (1992), “Ethical principles in the conduct of research with human
participants’, American Psychologist, vol. 47.
British Psychological Society (BPS) (1993), Code of Conduct: Ethical Principles and Guidelines,
Leicester, BPS.
Milgram, S. (1974), Obedience to Authority, Harper Row, New York.
Rosenthal, R. & Jacobsen, L. (1968), Pygmalion in the Classroom, Holt, New York.
Zimbardo, P. & Ruch, F. (1973), Psychology and Life, Scott Foresman, New York.
Further reading
Australian Association for Research in Education (1998), Ethics in Educational Research: Annotated
Bibliography, ed. K. Halasa. AARE, Coldstream, Victoria.
Burgess, R. (1989), The Ethics ofEducational Research, Falmer Press, London.
Clark, J. (1995), Ethical and Political Issues in Qualitative Research from a Philosophical Point of
View. Paper presented at the annual meeting of the American Educational Research Association,
San Fransisco.
Doig, S. (1994), The Placement of Teacher Voice in Educational Research, Paper presented at the
AARE conference, Newcastle.
Evans, T. & Jakupec, V. (1996), ‘Research ethics in open and distance education’, Distance Education,
vol. 17, no. 1, pp. 15-22.
Jenkins, D. (1993), “An adversary’s account of SAFARI’s ethics of case study’, in Controversies in
Classroom Research, ed. M. Hammersley, Open University Press, Milton Keynes.
Kimmel, A. (1988), Ethics and Values in Applied Social Research, Sage, Beverly Hills.
Mohr, M.M. (1996), Ethics and Standards for Teacher Research. Paper delivered at American
Educational Research Association conference, New York.
Osbourne, B. (1995), Indigenous Education: Is There a Place for Non-indigenous Researchers? Paper
delivered at AARE conference.
Thompson, A. (1992), “The ethics and politics of evaluation’, /sswes in Educational Research, vol. 2,
Nowe
Wadeley, A. (1991), Ethics in Research and Practice, British Psychological Society, Leicester.
CORNWALL COLLEGE
LEARNING CENTRE
The difficulty is not due to a shortage of researchable problems in education. In fact,
there are so many that researchers usually have trouble choosing among them. The main
difficulty is that a problem must be selected and a question formulated early, when the
beginner’s understanding of how to do research is more limited. In addition,
uncertainties about the nature of research problems, the isolation of a problem, the
criteria for acceptability, and how to solve the problem often seem overwhelming. Even
experienced researchers usually find it necessary to make several attempts before they
arrive at a research problem that meets generally accepted criteria. The first attempt at
formulation may, on closer examination, be found to be unfeasible or not worth doing.
Skill in doing research is to a large extent a matter of making wise choices about what
to investigate.
Experience
A researcher must first of all decide on the general subject of investigation. Such choices
are necessarily very personal but should lead to an area that holds deep interest or about
which there is a real curiosity. Otherwise, the motivation to complete the research may
be difficult to sustain. The researcher’s own knowledge, experience, and circumstances
usually determine these choices.
For example, on a daily basis teachers make decisions about the probable effects of
educational practices on pupil behaviour. For instance, primary teachers may question
the effectiveness of their methods of teaching maths, or any of several other well-known
methods, in order to decide what is the most effective approach to use. Secondary social
studies teachers might wish to find out whether teaching about the problems of Third
World countries changes students’ attitudes to such countries and their inhabitants.
Observations of certain relationships for which adequate explanation does not exist
are another source of problems for investigation. A teacher may notice a decrease in self-
esteem in students at certain times. To investigate this the teacher can formulate various
tentative explanations, then proceed to test them empirically. This investigation may not
only solve the immediate problem but also make some small contribution to an
understanding of how self-esteem is affected by classroom influences.
Similarly, there are decisions to be made about practices that have become routine in
various professional areas—for example, penalties for lateness—which are based mainly
on tradition or authority, with little or no support from scientific research. Why not
evaluate some of these practices? Are there alternatives that would be more effective for
the purpose intended than those now being used?
Thus everyday experiences can yield worthwhile problems for investigation and, in
fact, most of the research ideas developed by beginning researchers tend to come from
their personal experiences. Such studies can often be justified on the basis of their
From your everyday experience in your own professional areas as teacher or student, try
to isolate a problem suitable for investigation.
Theory
There are many theories in behavioural science that are popular theories rather than
scientific ones. These need to be tested by a variety of specific hypotheses to see in what
ways/contexts/conditions they may or may not hold. In this way, research contributes
to theory generation. For example, we are now aware that differences in performance
between boys and girls in specific school subjects such as reading, maths and science—
once believed in folklore theory to be innate sex differences—are in fact a function of
such variables as social expectation, conditioning, self-esteem and individual attribution.
In another example, failure to follow ‘instructions’ written on medical and social benefits
leaflets is not due to unwillingness, but often to the material being written at too high
a level for clients, i.e. a readability problem.
Review of literature
The review ofliterature is normally undertaken in two stages. The first stage involves a
general overview of the relevant area using secondary sources, such as general textbooks
which include relevant topics and literature reviews. Once the problem has been isolated,
a more specific and structured review involving primary sources of salient research can be
undertaken, using, for example, journal articles.
The preliminary review of the literature concentrates on more general texts and on
existing reviews of previous research which summarise the state of knowledge in a
particular area. Secondary sources, such as textbooks and reviews, are useful because
they combine knowledge from many primary sources into a single publication. A good
textbook, for example, combines the work of many other persons and simplifies or
eliminates much of the technical material which is not of interest to the general reader,
thus providing a quick and relatively easy method of obtaining a good overall
understanding of the field.
The review of the literature can help in limiting the individual’s research problem and
in defining it more clearly. Many attempted studies are doomed to failure before the
student starts because the problem has not been limited to an area small enough and
sufficiently specific to work with satisfactorily. It is far better in research to select a
limited problem and treat it well than to attempt the study ofabroad general problem
and do it poorly. Many students also commit themselves to a research problem before
they have thought it out adequately. A fuzzy or poorly defined problem can sometimes
result in the student collecting data and then learning that the data cannot be applied
f] i
(A
A}Pre
PFU
ae i
CRU NTT ulin
aan i wu HAND WAN bn ff
AULAUNG GOVAN
TONGUAEANT COUNT NWCA AUENT, SOT TE TACT
Where do | begin?
to the problem the student wishes to attack. Before starting a review of the literature, the
student should do sufficient background reading from secondary sources to permit a
tentative outline of the research problem. The review of the literature will give the
student the knowledge needed to convert the tentative research problem to a detailed and
concise plan of action.
Reviews of previous research are a fertile source of research problems. Many research
reviews suggest extensions of the research topic and new questions are raised frequently as old
ones are answered. Many existing studies need replicating with different samples, for example,
in cross-cultural modes.
Education index
Psychological Abstracts provide a list of numbers under each key word in the index
volume. These numbers should then be located in the correct volume. Here, a brief
abstract will be provided of each article. This abstract is much more useful than the
brief biographical data found in Education Index as the abstract helps a researcher
to decide whether a journal article is of value to them or not. Any potentially useful
article title should be written on a library index card. The amassing of these cards
will help when the actual articles are being looked for and will provide a basis for
the reference section of the thesis or article under production.
ERIC, an acronym for the Educational Resources Information Centre, was
initiated in 1965 by the US Office of Education to transmit the findings of current
educational researchers to teachers, administrators, researchers, and the public. Two
very useful preliminary sources are published by ERIC. These are Resources in
Education (RIE) and Current Index to Journals in Education (CIE). Although ERIC
abstracts some of the same documents as Education Index and Psychological Abstracts,
it includes many documents not abstracted by these services. For example, RIE
provides abstracts of papers presented at education conferences, progress reports of
ongoing research studies, studies sponsored by federal research programs, and final
reports of projects conducted by local agencies such as school districts, which are not
likely to appear in education journals. Thus, ERIC will be valuable to the student
in providing an overview of the most current research being done in education. In
contrast, many of the studies currently referenced in Education Index and
Psychological Abstracts were completed several years previously because of the time
lag between completion ofthe study, publication in a journal, and abstracting by the
service.
The position of the key fields are shown in the following sample record:
CH AN EJ330143 IR514912
TI VCRs Silently Take over the Classroom
AU Reider, William L.
JN PY TechTrends, V30 n8 p14-18 Nov-Dec 1985
AV Available from: UMI
LA Language: English
DT Document Type: JOURNAL ARTICLE: (080); POSITION
PAPER (120): PROJECT DESCRIPTION (141)
JA Journal Announcement: CIUMAY86
TA Target Audience: Practitioners
AB Discusses the rapid growth of video cassette recorder (VCR) use in
schools; compares ways in which VCRs, audiovisual materials, and micro-
computers are used In classrooms; and suggests reasons for the dramatic
increase in VCR use. The successful implementation of VRC technology in
the Baltimore Country School System (Maryland) is described. (MBR)
DE Descriptors: Adoption (Ideas); * Audiovisual Aids; * Educational Trends;
Financial Support; Futures (of Society); * Micro-computers; Teacher Role;
* Videotape Cassettes; *Videotape Recorders.
ID Identifiers: * Baltimore Country Public Schools MD; Standardisation.
information. The CIJE works in the same way. Should the researcher want a hard copy
of the full document it can be ordered from the source listed in the abstract entry.
If you discover a controversial article or theory and want to find out what later authors
thought of it or what successive researchers have discovered in relation to it you can use
the Social Science Citation Index (SSCI). Later works are located by looking up the key
author and seeing in which later studies the work has been mentioned.
Test sources
In conducting research, a test or measuring device is often required. Buros’s Mental
Measurement Yearbooks are the major reference sources that list and critically review
tests. These books are specifically designed to assist users in education, psychology and
industry to make more intelligent use of standardised tests. Each yearbook is arranged
in the same pattern and is meant to supplement rather than supersede the earlier
volumes. Tests are grouped by subject, and descriptions of each test are followed by
critical reviews and references to studies in which the test has been used. Each volume
has cross-references to reviews, excerpts, and bibliographic references in earlier volumes.
The volumes include aptitude and achievement tests in various subject areas, personality
and vocational tests, and intelligence tests. Complete information is provided for each
test, including cost and ordering instructions. Tests in Print III serves as an index and
supplement to the first eight Mental Measurement Yearbooks. Buros also organises the
material in the Mental Measurement Yearbooks into specialised monographs on tests of
personality, reading, intelligence, vocational and business skills, English, foreign
languages, mathematics, science, and social studies. The Psychological Test Bulletin
published by ACER twice per year includes independent test reviews, research reports,
articles on testing, Australian norms and information on new tests.
. ‘ x Ls a> >
A \ = < .
~ - 2 en LR
\ Than =. = ~ ‘
~ >
>= ~ ~ ~ ~
a a
= a x >.
A > .
c < ve he SO .
a . < =
a OUYTe “SN ~— x ~ *
. “ae ae x . ~
" $ s ~~ . . ~
GS, = * —E = Twa
We are asking the computer for any references that includes any combination of each
pairing, such as 1 or 2 and 3 or 4 and 5 or 6. Or connections increase the number of
references selected while and connections reduce the number selected.
¢ describe the work which has been done, being critical where necessary;
* summarise the main facts and conclusions which emerge, synthesising to produce
main themes, directions, contradictions etc.; and
* point out those areas of the field which are still inadequately covered.
Further reading
Brown, J., Sitts, M. & Yarborough, J. (1975), ERIC: What It Can Do For You. How To Use It.
Syracuse, ERIC Clearinghouse on Information Services.
Buros, O. (ed.) (1978), Mental Measurements Yearbooks, 8th edn, University Nebraska Press, Lincoln.
Cooper H. (1998), Synthesising Research: A Guide for Literature Reviews, Sage, London.
Fink, A. (1998), Conducting Research Literature Reviews, Sage, London.
Hart, C. (1998), Doinga Literature Review, Open University, Milton Keynes.
Mann, L. & Sabatino, D. (1980), Reviews of Special Education, JSE Press, Philadelphia.
Mitchell, J.V. (1983), Tests in Print II: An Index to Test Reviews and the Literature on Specific Tests,
University of Nebraska Press, Lincoln, NE.
Mitzel, H. (ed.) (1982), Encyclopedia of Educational Research, 5th edn, Glencoe, New York.
Thesaurus ofERIC Descriptors (1980), Onyx Press, Phoenix.
i | spe an a eteo
| ann seed 5
a OhFale: v8Y contin bee Apa pagesapt aN Ts wad 8 —
i
oa ni Spar An stil iste gee serra ae med ig oon
seutvong cc i
quiero osotue dite snl On =e lee Guy aed
7 line com LaORME os, caedrin
dl war reeeater,
ed4n Meewerr ete) ie’ “al
4a akan ia Threean Ene at ee et
bidet
A MAS lh wht Ae we — rh
WaT ea in. Ms ee
mii crel xo Teatie i
1 agrees:
ene .2e ig nbc. can -nnneieeme) Vee
baityet calli .¢l anil paloma ,
et dn) 4d| party ele See oi
ay ‘iy 3 ws Se ee ee oe muti ah i
- ee a mm! as
eX ait precy ie Are. Arron, be RE SG stare huge 0 ea,
Libhe
al
* . i ‘ —
mater ks ee .
iacenaaenda i
QUANTITATIVE METHODS
4 DESCRIPTIVE STATISTICS
8 LEVELS OF MEASUREMENT
9 VARIABLES
CHI SQUARE
16 TESTING HYPOTHESES OF RELATIONSHIP II:
CORRELATION
21 META-ANALYSIS
Define a : Select
Formulate Design the samples and
research
hypotheses study instruments
problem
Descriptive statistics
The descriptive aspect of statistics allows researchers to summarise large quantities of
data using measures that are easily understood by an observer. It would always be
possible, of course, simply to present a long list of measurements for each characteristic
observed. In a study of ages at leaving school, for example, we might just present the
reader with a listing of the ages of all students leaving school within the past year in a
particular state, or in a study ofsex offenders, the sex of persons convicted ofchild abuse
in 1998. This kind of detail, however, is not easy to assess—the reader simply gets
bogged down in numbers. Instead of presenting all observations we could use one of
several statistical measures that would summarise, for example, the typical age at leaving
school in the collection of data. This would be much more meaningful to most people
than the complete listing.
Thus descriptive statistics consist of graphical and numerical techniques for
summarising data, i.e. reducing a large mass of data to simpler, more understandable
terms.
Inferential statistics
Other statistical methods, termed inferential, consist of procedures for making
generalisations about characteristics of apopulation based on information obtained from
a sample taken from that population.
Over the last couple of decades, social scientists have increasingly recognised the
power of inferential statistical methods. Hence, a discussion of statistical inference
occupies a large portion of this textbook. The basic inferential procedures ofestimation
and hypothesis testing are explained in chapters 6 and 7.
Overall we can say that statistics consist of aset of methods and rules for organising
and interpreting data. By common usage, statistics means facts and figures. In our
approach it means techniques and procedures for analysing data.
The mean
By far the most common measure of central tendency in educational research is the
arithmetic mean. The mean (M) is simply the sum of all the scores (1X) divided by the
number of scores (N) or
m= 2%
N
The median
The word median means ‘middle item’. Thus, when we have a series of scores which
contain an extreme value or values, it would be sensible to arrange them in rank order
lS 16 17 18
(M = 14)
—2,
—2 +4,
= -2 =| +2 +3 +4
aks Sake
so that the highest value is at the top of the list, and the remaining scores are placed in
descending order of magnitude with the score ofleast value at the bottom of the list. The
median value will be the central value.
For example, if we have a series of nine scores, there will be four scores above the
median and four below. This is illustrated as follows:
16 6 de} 24 17 4 19 Y) 20
Arranged in order of magnitude these scores become:
24 20 19 LZ. a 11 9 6 4
median value
In our example we had a set of odd numbers which made the calculation of the
median easy. Suppose, however, we had been faced with an even set of numbers.
This time there would not be a central value, but a pair of central values. No real
difficulty is presented here, for the median is to be found halfway between these two
values.
Let us put the following numbers in rank order and find the median score:
16 29 20 9 34 10 a, 4 i Ae
In rank order these numbers appear as follows:
34 oS) 23 y) 20 16 15 12 10a.)
median = a = 2
Look at this mean of 63 in relation to the data and comment on what you notice.
Hopefully you realised that the mean was larger than any of the other scores except the
extreme score. This example of having one number much greater in value than the other
numbers presents a real problem, for it renders the mean untypical, unrealistic and
unrepresentative.
The median has the desirable property of being insensitive to extreme scores. In the
distribution ofscores of 66, 70, 72, 76, 80 and 96, the median of the distribution would
remain exactly the same if the lowest score were | rather than 66, or the highest score
were 1223 rather than 96. The mean, on the other hand, would differ widely with these
other scores. Figure 4.2 shows the effect of extreme scores. The mean of 20.3 is hardly
representative. It would not matter in this case if the extreme score were 1000, the
median would remain the same at 11.50.
5 Median
11.50
>
oO
Cc |
aes)
iy
oO
iC
2
Mean
| | |
+ A |
10 11 12 13 14 15 20.30 “ye 100
However, using the median often severely limits any statistical tests that can be used to
analyse the data further, since the median is an element of ordinal or ranked data, whereas
the mean is a major feature of interval data (chapter 8).
Why should the mean be used rather than the median or mode?
Range
One method of considering variability is to calculate the range between the lowest and
the highest scores. This is not a very good method, however, since the range is
considerably influenced by extreme scores.
Variance
The shortcoming of the range as a measure of variability is that it reflects the values of
only two scores in the entire sample. A better measure of variability would incorporate
every score in the distribution rather than just two scores. One might think that the
¥ (score — M)
N
This hypothetical measure is unworkable, however, because some of the scores are
greater than the mean and some are smaller, so that the numerator is a sum of both
positive and negative terms. (In fact, it turns out that the sum of the positive terms
equals the sum of the negative terms, so that the expression shown above always equals
zero.)
The solution to this problem is simply to square all the terms in the numerator, thus
making them all positive. The resulting measure ofvariability is called the variance (V):
Was SO?
N
Variance is the average squared deviation from the mean, that is, the sum of the
squared deviations from the mean divided by the number of scores. The sum of (X — M)?
can also be written as SS which stands for the Sum of Squares or sum of squared
deviations from the mean.
Standard deviation
This is the most important measure of dispersal. It is often symbolised as ‘o’ or ‘SD’.
Some researchers use 6 with population and SD with samples. It reflects the amount of
spread that the scores exhibit around some central tendency measure, usually the mean.
The standard deviation is derived from the variance (V); it is obtained by taking the
square root of the variance.
Thus,
corSD=
VV
(x2) — 2X!
Oo
SS
=.,./— or
eel
,,————or |———————-
N
N N N
In our example in Table 4.1, SD is about 2.1, the square root of the variance which
is 4.4,
Here is another example. Imagine our data is 15, 12, 9, 10 and 14.
Obtain the mean of the values (M = 12).
Calculate the differences of the values from the mean (3; 0; 3; 2; 2).
Obtain the squares of these differences (9; 0; 9; 4; 4).
Find the sum of the squares of the differences (26).
Divide by the number ofitems 26/5 = 5.2. This is the variance.
leObtain
@.6
asia the square root of the variance, which is the standard deviation = 2.3.
The formula for obtaining the standard deviation is:
— |XX -M)? hea 26
oo mr le. in our example = at 2.3
sample variance = en or rT 7
N — 1 will provide a better estimate of the population SD. As you will realise, as
sample N gets larger and the difference between size of sample and size of population
becomes much reduced, the difference between dividing by N or N — 1 becomes
negligible. Purely by chance, a sample may include an extreme case. But the larger the
size of that sample, the less the effect of an extreme case on the average. If a sample of
three persons includes an exceptionally short person, the average height will be unusually
far from the population mean. In a sample of 3000, one exceptionally short person will
not affect the average very markedly. However, as you will see, research generally involves
samples which are considerably smaller than the population from which they are drawn.
Hence the use of N — 1 as denominator is strongly advised.
This formula,
Oc
D(X— M)?
N-1
works well if M is a whole number, but so often it is not and causes unwieldy
computations with deviations that are not whole numbers going to perhaps two decimal
(5X)?
be N
are N-1
where SX? is the sum ofthe squared raw scores and (=X)- is the sum of the raw scores
squared.
This formula is mathematically equivalent to the first formula above. Let us check that
we do get the same answer.
x x2
15 225
12 144
9 81
10 100
14 196
=X = 60 YX? = 746
746 —
\ ten
GC. = SS
_ |746-720 _ |26
rok eae V4
SHS LAAT
This formula too must employ N — | as the denominator when used with samples.
Figure 4.3 shows two difterent standard deviations—one with a clustered appearance,
the other with scores well spread out illustrating clearly the relationship of spread to
standard deviation.
Generally, the larger the 6, the greater the dispersal of scores; the smaller the 6, the
smaller the spread of scores, i.e. increases in proportion to the spread of the scores around
M as the marker point. But measures of central tendency tell us nothing about the
standard deviation and vice versa. Like the mean, the standard deviation should be used
with caution with highly skewed data, since the squaring of an extreme score would
carry a disproportionate weight. It is therefore recommended where M is also
appropriate.
Adding a constant to every score does not change a standard deviation. This is because
each score still remains at the same distance from other scores as it did before, and the
mean simply increases by the constant too. Therefore deviations from the mean remain
the same. However, multiplying each score by a constant causes the standard deviation
to be multiplied by that constant as each distance between scores is also multiplied by
the same constant. For example, given three scores 10, 12 and 14 with a mean of 12 we
> M = 20
(S)
c C= 2
=
oy
O
ii
b)
SAP M = 20
c
12) —
o=6
= io
joy
®
*E fa. a: 4 | ee,
10 15 20 25 30 x
can multiply by 3 and obtain 30, 36 and 42 with a mean of 36. The deviation of each
score from the mean has increased three times and so will the standard deviation.
So in describing an array of data, researchers typically present two descriptive statistics:
the mean and the standard deviation. Although there are other measures of central
tendency and dispersion, these are the most useful for descriptive purposes. Variance is
used extensively, as we shall see, in inferential statistics.
Example procedure
1 Select the column you wish to label with a variable. This is usually the first free
column from the left in the data editor window. To select the column make sure
your cursor or the active cell which is framed in bold is in that column.
2 Click on Data from the menu bar to produce a drop-down menu.
3 On the drop down menu, select Define Variable which opens the Define Variable
dialogue box.
4 Variable Label: Variables can have a label of up to eight characters in length, so
var00001 (the SPSS default label) which is to represent the qualifications of the
teachers will be abbreviated to ‘qualif’ by typing in ‘qualif’ which will delete the
highlighted default name ‘var00001’ in the Variable Name: box.
Next click on the Label button to display the Define Labels box.
In the Variable Label box, type the full variable name ‘qualification’.
In the Value space type the figure ‘1’, and in the Value label space type ‘Certificate’.
Select Add which puts 1.00 = ‘certificate’ in the bottom box.
TENA
COM)
Repeat this procedure for the remaining four values of 2 for ‘diploma’, 3 for ‘bachelors
degree’, 4 for ‘masters degree’ and 5 for ‘Ph.D’.
Pepco Fizzie
Bubbly
Descriptive statistics
1 Click on Statistics to display a drop-down menu.
2 From this menu, select Summarize, then Explore to obtain the Explore dialogue box.
Std
Gender Statistic error
STATS Female Mean 45.1429 1.4842
95% confidence Lower
interval for mean Bound 42.1455
Upper
Bound 48.1402
5% trimmed mean 44.9418
Median 44.0000
Variance 92.516
Std deviation 9.6185
Minimum 29.00
Maximum 65.00
Range 36.00
Interquartile range 14.2500
Skewness L225 .365
Kurtosis —.610 PAW,
Male Mean 42.8571 1.5365
95% confidence Lower
interval for mean Bound 39.7542
Upper
Bound 45.9601
5% trimmed mean 42.9444
Median 44.0000
Variance 99.150
Std deviation 9.9574
Minimum 25.00
Maximum 59.00
Range 34.00
Interquartile range 15.5000
Std
Gender Statistic error
Skewness —.121 305
Kurtosis —.987 aes
SELFCONC Female Mean 44.0476 1.4293
95% confidence Lower
interval for mean Bound 41.1611
Upper
Bound 46.9342
5% trimmed mean 44.0423
Median 45.0000
Variance 85.803
Std deviation 9.2630
Minimum 26.00
Maximum 62.00
Range 36.00
Interquartile range 10.0000
Skewness .067 .365
Kurtosis —.290 alhllis
Male Mean 39.3333 1.9226
95% confidence Lower
interval for mean Bound 35.4505
Upper
Bound 43.2161
5% trimmed mean 40.0556
Median 40.0000
Variance SOoeZow
Std deviation 12.4600
Minimum 7.00
Maximum 58.00
Range 51.00
Interquartile range 12.7500
Skewness -.620 .365
Kurtosis 590 SAU
SOR
ab ohana
40 -
Statistics
20
Ne 42 42
female male
Gender
| |
| |
| |
|
} |
| |
T T T i t — i 1 T T | T t 1 iz
35 40 45 50 55 60 65 70 75 80 30 35 40 45 50 55 60
Marks Marks
* they are all converted into the M and 6 of one of the existing distributions.
When an entire distribution is standardised the individual standard score is in the
same relative position to the other standard scores as the raw score was originally. A
standard score is a transformed score that provides information of its location in a
distribution.
A standardised distribution is composed of standardised scores that result in a
predetermined value for M and standard deviation regardless of their values in the
raw score distribution. Standardised scores do not alter the rank order of
performance; the highest scoring person in raw score terms will still maintain that
position though the numerical value of the score may change.
The most common standard scale in test measurement is known as the Z score. Such
Z scores take account of both M and o of the distribution. The formula is:
_ .score—Mean A=—M
SD of distribution o
The mean of the Z score distribution is always 0 and the o is always 1.
Let us look at some examples.
EXAMPLE 1
The mean score on a test is 50 and the standard deviation is 10. What is the standard
score for John who scores 65?
20 30 40 50 60 70 80 Score
-3 —2 —1 0 +1 +2 +3 Standard
deviations
+1.5 (score 65)
EXAMPLE 2
As part of an apprenticeship selection assessment, school leavers are required to take three
tests consisting of Test 1 (numeracy), Test 2 (literacy) and Test 3 (general knowledge).
Given the following results for Candidate R, and assuming normal distribution of
results, on which of the three tests does that individual do best?
7 = X-M = 1B FS peda
SD 2
Z
mi | SST aa = -0.3
rasp 216
aes ai as
= -2 el 0 +| | +2 +3 Standard
general literacy score —-*Aeviations
knowledge
score numeracy
SCOre
You can probably detect the source of the major disadvantage in adding and averaging
raw scores from the above example. It lies in the size of the 6 of each distribution. If you
do badly in a test with a small 6, it will not affect you as much as doing poorly in a test
with a large o. In raw score terms, this latter situation implies a larger loss of marks, whereas
your position relative to others in the group might be very similar in both cases. The
converse applies too. Doing well in a test with a large 6 will be a better boost to raw score
average than a similar relative position in a test with a small 6. Tests with large 6 carry more
weight in the summation and averaging of raw scores than tests with small 6 because there
is a greater range of marks around M in the former. Z scores provide a standard unit of
relative worth; for example, +1Z above the mean in any test is always a position only
exceeded by 16 per cent of the population, provided the distribution is normal.
T scores and deviation IQs are other forms of standard scores. T scores have a mean
of 50 and a standard deviation of 10, while deviation IQs have a mean of 100 and a
standard deviation of 15. Standard scores combine all the information needed to evaluate
a raw score into a single value that specifies a location within a normal distribution.
Stanines are formed by dividing the normal curve into nine equal intervals along the
baseline. Rather than provide a score, a stanine provides a grade from | to 9. Within each
grade are a set percentage of scores in the following sequence from the left-hand extreme
end of the normal curve to the extreme right-hand end: 4%, 8%, 12%, 16%, 20%,
16%, 12%, 8%, 4%.
1 Ifa student received a Z score of 0, one would know that this student’s raw score was
a below
b above
c equal to
the mean.
2 Below are listed the scores made by five students on maths and spelling tests. The
maths test had a mean of 30 and a standard deviation of 3. The spelling test had a
mean of 45 and a standard deviation of 5. For each, say whether the student did
better on the maths or the spelling test, or did equally well on both:
John Hui Rachel Chris Zola
MATHS 36 33 ratf 33 36
SPELLING 35 55 40 45 55
Ww A student was told that she had made a Z score of +1.5 on a test where the mean
was 82 and the standard deviation was 6. What raw score did this student obtain?
4 What IQs are represented by the following standard scores?
(mean = 100, o = 15) (a) Z = 2 (b) Z = -1 (c) Z = 1.5 (d) Z = -0.66
(e) Z=0
5 What information does a Z score provide?
6 Given a mean of 45 and a standard deviation of 5, find the Z scores of the following:
47, 39, 56
and the raw scores of the following:
+1, -3.0,+2.8
7 Why is it possible to compare scores from different distributions after each
distribution is transformed into Z scores?
8 Distribution A has a M = 20 and SD = 7 while distribution B has a M = 23 and SD =2.
In which distribution will a raw score of 27 have higher standing?
9 Apopulation has a mean of 37 and a standard deviation of 2. If it is transformed into
a distribution with a mean of 100 and a standard deviation of 20, what values will
the following scores have: 35, 36, 37, 38 and 39?
10 Given scores of 2, 4, 6, 10 and 13 in a distribution with a mean of 7 and a standard
deviation of 4, transform these scores into a distribution with a mean of 50 and a
standard deviation of 20.
*Answers on p. 594.
Frequency distributions
So far, we have considered data only in terms ofits central tendency and its dispersal or
scatter. Ifwe plot the data we have obtained from any of our previous observations, we
would find a host of differently shaped curves when the graph lines were drawn. These
graphs are often called frequency distributions. The X-axis, or base line, supplies values
of the scores with lowest values placed to the left and increasing values to the right.
Every possible score should be capable of being located unambiguously somewhere along
the X-axis. In frequency distribution, the vertical, or Y-axis, represents the frequency of
occurrences, i.e. values of N.
These many differently shaped frequency distributions can be classified as normal or
skewed. Normal distributions are symmetrical, affected only by random influences—i.e.
influences that are just as likely to make a score larger than the mean as to make it
smaller—and will tend to balance out, as the most frequently found scores are located
in the middle of the range with extreme scores becoming progressively rarer (see Figure
BAL)
Skewed frequency distributions are biased by factors that tend to push scores one way
more than another (see Figures 5.2 and 5.3). The direction of skewedness is named after
the direction in which the longer tail is pointing. Imagine Figures 5.2 and 5.3 were the
distributions ofscores of two end-of-year examinations. What reasons could you suggest
might have caused the skewedness in each case?
FIGURE 5.1 Normal distribution (Gaussian curve)
Mean,
A
median
& mode
oS
<=
oO
=),
Sy
®
i
>
Score
A Median
o
(oe
oO
=)
Sp
oO
uw
Score
A Mode
2 /
2 /
=
Cc
|/
= /
oO /
he
o //
Score
What proportion of the total area lies between 0 and 1.56? To find the answer look up
1.50 in the Z column. Then look across to the next column. The answer is 43.32%
(since the figures are given as proportions of 1).
EXAMPLE 2
What proportion of the total area lies beyond —2.30? Look up 2.3Z (forget the negative
sign). It is 48.93%. But remember we want the area beyond; therefore the answer is
1.07%, i.e. 50.00% — 48.93%. Remember this table only covers one-half of the curve.
EXAMPLE 3
If one of the Z scores is positive and the other is negative, we find the proportion of the
curve between them by adding values from column 2 in Table 5.1. What proportion of
the curve lies between 6 of —1.6 and 6 of0.5? Column 2 indicates that the proportion
between Z = —1.6 and the mean is 0.4452, and from the mean to Z = +0.5 is 0.1915.
Therefore, the proportion between Z = —1.6 and Z = +0.5 is 0.4452 + 0.1915, or 0.6367.
This means that 63.67 per cent of the cases in a normal distribution will fall between
these Z scores.
EXAMPLE 4
When we want the proportion of the normal curve falling between two Z scores with
the same sign, we subtract the area for the smaller Z score from the area for the larger Z
score. For example, let us find the proportion of cases between a Z of —0.68 and a Z of
—0.98 in a normal distribution. Column 10 in Table 5.1 indicates that the area between
the mean and a Z of 0.98 is 0.3365, while the area between the mean and a Z of 0.98
is 0.2517. Thus, the area between Z = 0.68 and Z = 0.98 is found by subtracting the area
for the smaller Z score from the area for the larger Z score; in this case, 0.3365 — 0.2517
= 0.0848. We would expect 8.48 per cent of the cases in a normal distribution to fall
between these Z score points.
Frequency distributions are then convenient ways of describing all the basic
information contained in the measurement of any variable. They are charts or tables
showing the frequency with which each of a number of values of a variable are observed.
\
\
\
4
\
FIGURE 5.5
tat cam he
Udla Lalf UX
Listributio
Witsoe torte
>
ee
70
There is a family of normal distribution curves whose shapes depend on the mean and
standard deviation of the distribution. However, they all retain the same mathematical
characteristics. Figure 5.6 illustrates this.
1
1
I <— Mean, median and mode
1
t
1
1
1
ols
1
'
1
'
1
1
1
1
1
1
1
1
'
1
1
1
1
1
f
Frequency 1
'
'
1
t
'
1
'
1
‘
'
i
'
1
'
4
Score
The normal distribution curve is the theoretical distribution of chance or random
occurrences, and is thus centrally related to probability. Normal distribution curves
possess some notable characteristics which are always present; otherwise normality would
not exist. Figure 5.7 depicts these constants.
FIGURE 5.7 The shape and important characteristics of the normal, or Gaussian,
distribution
Bilaterally symmetrical
Inflection
; O Inflection point
point =
A
—_ Z om sa > rangetBoorZ;
6 units midpoint or 0 is mean,
median & mode
FIGURE 5.8
|
|
|
|
|
| Vie S0Fo= 10
|
|
|
|
|
|
|
|
2) BO. 240 80) Gd) 70 80
-3 —2 —| M +1 42 +3
As you see, for every unit ofZ the score increases or decreases by 10 marks, i.e. +1Z
is 10 marks higher than M since we regard the Z as a standard deviation unit. Most
standardised published tests of intelligence, aptitude, and attitude are standardised to give
a mean of 100 and a standard deviation of 15, i.e. a child who scores an IQ of 130 is two
standard deviations above the mean, and referring to Table 5.1, we note that this score
is only bettered by approximately 2.5 per cent of the child population of this age.
Probability
Probability can best be defined as the likelihood of something happening. Probability is
usually given the symbol of ‘p’and can be thought of as the percentage of occurrences
ofa particular event to be expected in the long run. For example, a fair dice will produce
in the long run a ‘5’ on 1/6, or 16.66 per cent of all rolls because each of the six sides is
equally likely to come up on each roll.
A roulette wheel has thirty-six numbers on it. On average, how many times would
we expect any number to occur in 360 spins? In thirty-six spins?
In the first case you should have answered ten, and in the second case the answer is
one. On average, over a large number of spins, we might expect each number to occur
one time in every thirty-six spins. This fact is reflected in the odds given by a casino to
It is possible, if not very probable, that we can get a very long run of heads by chance
alone, we argue, but at some point we have to draw the line and start to believe in foul
play rather than chance. At what point would you definitely believe the coin to be biased?
What is the probability of getting such a run of heads by chance alone? Look at Table
De
Here are the actual probabilities of getting a run of heads for each size of run.
TABLE 5.2
Toss Ist 2nd 3rd 4th 5th
Probability .500 .250 .125 .063 .031
No. of times likely to 50 25 1255 633 Bl
occur by chance in in in in in
in 100 100 100 100 100 100
Toss 6th 7th 8th 9th 10th
Probability .016 .008 .004 002 .001
No. of times likely 1.6 0.8 0.4 OZ 0.1
to occur by chance in in in in in
in 100 100 100 100 100 100
You probably notice that on the first toss the probability is 0.5, i.e. a 50/50 chance
of obtaining heads or tails. For succeeding heads, the probability is half the preceding
probability level. These probability levels are based on the assumption that only chance
factors are operating, i.e. the coin is symmetrical, uniform and is not double-headed!
Thus, the coin has no tendency whatsoever to fall more often on one side rather than
another; there are absolutely even chances of heads and tails.
Look at Figure 5.9. It is a normal distribution curve with the three standard deviations
above and below the mean marked as you saw earlier. In addition, the approximate p
level of each point is added. At M there is a probability 0.5; at +10 from M, the
Statistical significance
By now you should have grasped the principles that:
¢ the normal distribution curve depicts the distribution of chance occurrences round a
mean;
The probabilities of a chance occurrence of greater than 0.05 are usually designated
‘not statistically significant’ or, ‘NS’.
Can you now explain to yourself why such results are regarded as not statistically
significant?
You may also encounter another form of expressing statistical significance, in that
probability may be expressed as a percentage which is termed a significance level; for
example, p < 0.01 becomes the 1% significance level.
O<
The probability of a particular event A is the number of outcomes classified as A divided by the total
number of possible outcomes. All probability problems can be restated as a proportion problem,
e.g. the probability of selecting a king from a deck of cards is equivalent to stating the proportion
of the deck that consists of kings.
Why sample?
We could not carry out everyday life and business if we did not employ sampling in our
decisions. The food purchaser examines the fruit on display and, using this as a sample,
decides whether to buy or not. The official consumer survey behaves likewise with certain
goods, a sample of which are tested. A teacher samples the increase in learning among his
or her students by an examination which usually only tests a part of that learning. Many
of our decisions are based on sampling—possibly inadequate sampling in some cases.
We are all guilty on many occasions of making generalisations about groups ofpeople,
or making inferences about individuals based on very limited experience or knowledge
of them. We might meet one member of a group, say a Welshman who can sing, and
this causes us to attribute such vocal ability to all natives of that country. One student
once said to me, ‘I don’t like Norwegians; I met one once’. Likewise, we read in
newspapers that ‘people have no moral values now’, or that ‘politicians are corrupt’.
Such generalisation is invalid, yet generalisation is necessary in research.
The educational researcher is not just interested in the students chosen for a particular
survey; they are interested in students in general. They hope to demonstrate that the
results obtained would be true for other groups of students.
The concept of sampling involves taking a portion of the population, making
observations on this smaller group and then generalising the findings to the large
population. Generalisation is a necessary scientific procedure, since rarely is it possible
to study all members of a defined population.
Population and sample
When you are wearing your researcher’s hat, always remember that a population is a
complete set of all those things (people, numbers, societies, bacteria, etc.) which
completely satisfy some specification; for example, all colour-blind male secondary school
pupils. In this context, the population is not a demographic concept.
As the total number of potential units for observation, it can have relatively few units
(e.g. all the students in the social science faculty); a large number of units (all Australian
teachers); or an infinite number of units (e.g. all the possible outcomes obtained by
tossing a coin an infinite number of times). So a population is an entire group of people
or objects or events which all have at least one characteristic in common, and must be
defined specifically and unambiguously.
The first task in sampling is to identify and define precisely the population to be
sampled. If we are studying immigrant children we must define the ‘population’ of
immigrant children: what ages are included by the term ‘children’; what countries are
implied by ‘immigrant’; whether we refer to migration from one’s place of birth or
merely migration from a former place of residence, and so on. Careful attention must
be given to the precise limits of the population, whether or not to include individuals
whose position is marginal. For example, in defining a population of primary school
children, does one exclude children from private schools? children attending special
schools for the handicapped? children outside the normal age-range of primary school
who happen to be at primary schools? and so on.
A sample is any part of apopulation regardless of whether it is representative or not.
The concept of representativeness is not implicit in the concept of asample. One of our
great concerns in this current chapter will be to distinguish between samples which are
in some sense representative of a population and those which are not, and to demonstrate
ways of drawing samples that will be representative.
The major task in sampling is to select a sample from the defined population by an
appropriate technique that ensures the sample is representative of the population and as
far as possible not biased in any way. Since we can rarely test the defined population, our
only hope of making any generalisation from the sample is if the latter is a replica of that
population reflecting accurately the proportion or relative frequency of relevant
characteristics in the defined population.
The key word in the sample population relationship is representativeness. We cannot
make any valid generalisation about the population from which the sample was drawn
unless the sample is representative. But representative in terms of what?—weight? IQ?
political persuasion? cleanliness? The answer is that the sample must be representative
in terms of those variables which are known to be related to the characteristics we wish
to study. The size of the sample is important too, as we shall see, but it will suffice for
now to remember from our Welshman who could sing, and from the student who
disliked his one Norwegian contact, that we are on very dangerous ground trying to
generalise from a sample of one. Usually, the smaller the sample, the lower the accuracy.
However, size is less important than representativeness. There is a famous example of a
sample of ten million voters supposedly representative of the electorate in the USA to
poll voting intentions for the presidential election in 1936. The forecast, despite the
POPULATION SAMPLE
statistics poparire|
estimate parameters
A research worker interested in studying ego strength among 12-year-old males found that
the average ego strength level in a selected group from the Gas Works High School was
higher than that of all the 12-year-old males in the city.
Educationists have few lists they can use and the tendency is to grab whatever sample is
handy; often whatever group they are lecturing to, or schools where they have a good
relationship with the principal (opportunity sampling). As a result, much research is
carried out on captive groups of pupils and students, who are a biased sample of the
general population. Can you think of the ways in which they are biased?
Techniques of sampling
In order to draw representative samples from which valid generalisations can be made to the
population, a number of techniques are available. These techniques are, of course, the ideal.
Few researchers, apart from government bodies, have the resources and time to obtain truly
representative samples. For most research, investigators often have to make do with whatever
subjects they can gain access to. Generalisations from such samples are not valid and the
results only relate to the subjects from whom they were derived. This lack of representative
samples is a deficiency in many research studies and should be noted in any research project
write-up or evaluation of a research paper.
This method can be speeded up by the use of computers which select at random. By
random we are implying ‘without bias’ and the sample is drawn unit by unit, with all
members of the population having an equal chance of selection. Results may be
generalised to the population but even a random selection is never a completely accurate
reflection of the population from which it was drawn. There is always some sampling
error and the generalisation is an inference, not a certainty, because the sample can never
be exactly the same as the population.
Random sampling can be ‘staged’; that is, a sample of schools from a population of
schools (Stage 1), then a sample of children from each selected school (Stage 2). There
can be more than two stages but this is dealt with later under cluster sampling.
Try this exercise to prove to yourself that sampling error does occur even with random
sampling. In Table 6.2, there are ten columns. In each column, you are going to list ten
single digit numbers drawn at random from Table 6.1. (The first column is done for
you.)
To draw these random samples, write the numbers 1-15 on separate slips of paper
representing the rows of Table 6.1, and 1-10 on separate slips of paper representing the
columns of Table 6.1. Place each set of slips in a separate box and shuffle them around.
Draw out one slip from the first box and one from the second. The number of the first
slip locates the row for your first random numbers and the second slip locates its column.
Take the first number in the set of four in the table you have located. Thus, if the numbers
you draw first are 15 and 2, your first random number will be 3. After drawing each slip
replace it in its box. Continue the drawing process until you have selected all the random
numbers you need. When you have drawn the remaining nine samples of ten random
numbers, compute the arithmetic mean for each column.
(continued)
041424+344454+6+74+8+9 = A50
10
9
7
6
7
3
8
A
1
3
2
50 = total
Mere = 6.0)
100
Now we can see if any sampling error has occurred in your random samples in Table 6.2.
Are any of your sample means exactly 4.5? This is not likely, though many will be close to
it. When we tried it, our sample averages ranged from 2.9 to 6.2, with 4.3 as the sample
average closest to the population average. In other words, there is always some error. We
shall see later how this error can be estimated. Now, compute the average of the ten sample
averages. Hopefully, this average is a more accurate reflection of the population mean than
most of the sample averages. This illustrates an important point that usually the larger the
sample, the smaller the sampling error.
Given a population of 100 pupils in a room, would the following constitute methods for
selecting a random sample?
a Enter the room and grab the nearest twenty children.
b Open the door and take the first twenty children to emerge.
Do these procedures qualify as random sampling? Why or why not?
FIGURE 6.2 Two possible samples resulting from a systematic sampling scheme
Population 20 21 22 23 24 25 26 27
if heads if tails
sample is sample is
20, 22, 24, and 26 D282 Dy anc 2 7,
The major disadvantage of systematic sampling is where a periodic cycle exists in the
sampling frame, or population list, which would bias the sample. For example, if all the
school classes were twenty-five strong, with boys listed before girls, then a sampling
interval of twenty-five would generate a sample all of the same sex. With lists of names,
selecting those whose name begins with M may give a sample with too many Scots,
while any other letter may result in too few Scots. Whether such bias matters depends
on the topic. Date of birth, especially a number of dates spread evenly over the year,
provides a sound basis for obtaining a representative sample which is easy to apply to
school populations.
If the number 2 was the starting point on a selection ratio of | to 5, what are the next
four successive selections?
— 1 3 5 vi 9) }———-
Population
2 4 6 8 10
A aa
Randomly
1 4 8 9 chosen subgroups
or clusters
00 O () i we
0
0 0
( () cas ae 2
Samples
randomly|
olA "|
- <a — selected
In educational research, there is often a gap between sampling theory and practice
because the designs are often unrealistic to use. Can you explain why random sampling
of all the fourteen-year-old pupils in the state is unrealistic?
The efficiency of cluster sampling depends on the number and size of the clusters used.
At one extreme, a large number of small groups could approach simple random sampling.
At the other extreme, taking one large group such as a school at random, although more
convenient, is likely to provide an unsatisfactory sample because of increased sampling error.
The error is greatest when the clusters are large and homogeneous with respect to the variable
under study. The term homogeneous in this context refers to groups whose members are very
similar with respect to the characteristics one is studying. For example, there are dangers of
taking the level of reading attainment in any one school as representative of the reading level
for a whole area.
Stage sampling
In the preceding section a distinction was drawn between simple random sampling of
individuals, which in the practical situation may involve testing two or three children in each
of a hundred schools, and cluster sampling, which describes the sampling of relatively few
entire schools or classes, and usually has the disadvantage of a much greater error variance. A
third possibility exists, intermediate between these two. It is possible to take a random sample
of schools, and within each school a random sample of children.
Thus, a compromise may be obtained, which for the same size of sample avoids the
virtually impossible rigour ofa simple random sample, and at the same time ensures a
wider representation than the sampling of entire groups.
This is stage sampling. A two-stage design would first take a random selection of
schools, and within each school a random sample of children. A two-stage design could
be used by a primary school survey to investigate the necessity for auxiliary help in such
schools. Such a study could be concerned with classes rather than children, and therefore
you would first select schools at random, and then select classes within these schools on
a random basis. For larger studies, multi-stage designs can be introduced—regions within
the state would be randomly selected; within these, districts would be randomly selected;
within districts, schools would be randomly be selected; within schools, classes would be
selected; within the classes, children would be selected, and so on. Such designs obviously
lie beyond the resources of most individual researchers. But the great strength of such
designs is that they permit accurate estimates of the sampling error. This is because at
each stage the principle of random selection is maintained.
Opportunity sampling
This form of sampling involves considerable error but is often used because no other
alternative is open to the research worker. This happens when, due to constraints of
finance and even permission, research is carried out on conveniently accessible groups,
such as students in one’s own college, people living in your neighbourhood etc. There
Sample size
Size versus representativeness
In general, the larger the sample the better, simply because a large sample tends to have
less error, as we found in the exercise using the table of random numbers. This is not to
say that a large sample is sufficient to guarantee accuracy of results. Although for a given
design an increase in sample size increases accuracy, it will not eliminate or reduce any
bias in the selection procedure. We have already seen an example of this principle in the
American presidential telephone survey mentioned at the beginning of this part of the
chapter. As another example, if a 1-in-2 sample of the whole country consisted of one
sex only, it would be large, but unrepresentative. Size is therefore less important than
representativeness.
Non-response
In deciding size, account should also be taken of possible attrition of numbers; even the
best designed projects lose about 10 per cent of their sample, depending on the nature
of the study. This is particularly important in follow-up studies in which the same
children are re-tested at a later date. There is always the likelihood that children have left
the area between visits, but of equal importance is the level of school absenteeism. Losses
of this order could seriously affect the representativeness of the sample and cause
increased error. The children who stay away may have quite different characteristics
from those who attend.
2 Systematic
3 Stratified
4 Cluster
5 Opportunity
1 Below are examples of the selection of samples. Decide for each which sampling
technique was used. :
a_ Restricted to a 5 per cent sample of the total population, the researcher chose
every twentieth person on the electoral register.
b A social worker investigating juvenile delinquency and school attainment
obtained her sample from children appearing at the juvenile court.
c Aresearch organisation took its sample from public schools and state schools so
that the samples were exact replicas of the actual population.
2 Explain how you would acquire a random sample of first-year college students in a
city with two colleges.
3 Explain why a random sample from a population in which certain subjects were
inaccessible would be a contradiction in terms.
4 If, from an extremely large population, a very large number of samples were drawn
randomly and their mean values calculated, which of the following statements are
true?
a The sample means would each be equal to the population mean.
b The sample means would vary from the population mean only by chance.
c The sample means, if averaged, would have a grand mean grossly different from
the population mean.
d The sample means would form a distribution whose standard deviation is equal
to zero.
e The sample means would be very different from each other if the sample sizes
were very large.
Why is a stratified sample superior to a simple random sample?
AWM In what context would a multi-stage cluster sample be particularly useful?
7 If a sample of psychologists were randomly selected from the Yellow Pages in a
particular city would you have necessarily a representative sample?
*Answers on p. 596.
Standard error
The general difficulty of working with samples is that samples are generally not identical
to the population from which they were drawn. The statistics collected from samples will
therefore differ from the corresponding parameters for the population.
1 For each graph in Figure 6.4, what is the 95% probability limit of a mean of a sample
of the size given, drawn at random?
2 Given a population of scores normally distributed with a mean of 100 and standard
deviation of 16, what is the standard error you would expect on average between
the population mean and sample mean for four scores and for sixty-four scores?
*Answers on p. 596.
—1 M +1 +2 +3
98.5 99 SE)I5) 100 100.5 101 101.5
|
iiss |
-3 -2 -1 M +1 +2 +3
99 99.33 99.66 100 100.33 100.66 101
TABLE 6.4 The distribution ofsample means for the 96 samples taken by students in
Horowitz class. Each sample mean was based on ten observations (after Horowitz
1974, Table 8.1)
Interval Frequency
62.0-63.9 1
60.0-61.9 1
58.0-59.9 3
56.0-57.9 7
54.0-55.9 2 Mean of sample means = 49.99
52.0-53.9 12 Standard deviation of sample means
50.0-51.9 tS (SE 0)
48.0-49.9 15
46.0-47.9 13
44.0-45.9 9
42.0-43.9 6
40.0-41.9 3
38.0-39.9 1
36.0-37.9 1
96 samples
Suppose we draw a random sample of 100 twelve-year-old children from the state
school system. It would be difficult to measure the whole universe of 12-year-old
children in the state for obvious reasons. We compute the mean and the standard
deviation from a test we give the children and find these statistics to be M = 110: SD = 10.
An important question we must now ask ourselves is, “How accurate is this mean?’.
Or, if we were to draw a large number of random samples of 100 pupils from this
same population, would the means ofthese samples be 110 or near 110? And, if they are
near 110, how near? What we do, in effect, is to set up a hypothetical distribution of
sample means, all computed from samples of 100 pupils each drawn from the parent
population of 12-year-old pupils.
If we could compute the means of this population of means, or if we knew what it
was, everything would be simple. But we do not know this value, and we are not able
to know it since the possibilities of drawing different samples are so numerous. The best
we can do is to estimate it with our sample value or sample mean. We simply say, in this
case, let the sample mean equal the mean of the population mean—and hope we are
Can you now calculate the boundary scores within which 99% of sample means will lie?
*Answer on p. 596.
The location of each sample mean in the distribution of sample means can be specified
by a Z score just as a single score can be in a distribution of scores.
We can use the Z scores to find the probabilities for specific sample means and thereby
determine which sample means are unlikely to be obtained from a particular population
using the standard levels of significance.
Suppose we did obtain quite a number of other samples, most of whose means lay
close to 110; but suddenly out of the blue we found one which was quite different—a
Ti _M-M pop
SE m
7 14-110_ 4 = AN6,
110 1.00
The probability of obtaining a sample M at this point is well beyond the 0.01 level,
in fact, three times in 100 000.
This would be so rare a sample mean as to suggest that either it is a biased or non-
random sample, or that the samples do not come from the same sampling distribution,
i.e. they are from a different population of children; perhaps an older age group, for
example.
1 Which sample size will provide the smallest SE,, assuming the same o: 25, 100, 5 or
70?
2 Arandom sample of 290 10-year-old children were given a test of reading abilities
with the intention of estimating the reading ability of all 10-year-old children; the
mean was 104, standard deviation 5.67.
a Calculate the standard error.
b What limits does the population mean lie within at a 95% confidence level?
c What limits does the population mean lie within at a 99% confidence level?
3 Evaluate each of the following hypothetical situations, in terms of whether the
method of selecting the sample is appropriate for getting information about the
population of interest. How would you improve the sample design?
a_ A principal in a large high school is interested in student attitudes toward a
proposed general achievement test to determine which students should
graduate. She lists all of the first-period classes, assigning a number to each.
Then, using a random number table, she chooses a class at random and
interviews every student in that class about the proposed test.
b An anthropology professor wanted to compare physical science majors with
social science majors with respect to their attitudes toward premarital sex. She
administered a questionnaire to her large class of Anthropology 437,
Comparative Human Sexuality. She found no appreciable difference between
her physical science and social science majors in their attitudes, so she concluded
that the two student groups were about the same in their relative acceptance
of premarital sex.
*Answers on p. 596.
The mean of the distribution of sample means is identical to the mean of the population.
Reference
Horowitz, L.M. (1974), Elements of Statistics, McGraw Hill, New York.
Introduction
What is a hypothesis?
Chapter 3 described how research topics can arise and pointed out that the problem, still
probably rather vague at this stage, has to be translated into precise operational
hypotheses on which a research plan can be designed. This planning stage is possibly the
most demanding and certainly the most important part of the research process.
The word /ypothesis is generally used in a more restricted sense in research to refer to
conjectures that can be used to explain observations. A hypothesis is a hunch, an
educated guess which is advanced for the purpose of being tested. If research were limited
to gathering facts, knowledge could not advance. Without some guiding idea or
something to prove, every experiment would be fruitless, since we could not determine
what was relevant and what was irrelevant. Try this everyday example of hypothesis
formation. Suppose the only light you had on in the bedroom was the bedside table
lamp. Suddenly it went off. You would no doubt ponder the reason for it. Try to think
of several reasons; now and write them down. There could be a number, of course. I
wrote:
¢ lamp bulb failure;
¢ plug fuse failure;
* main power fuse failure.
Whatever you wrote is an implied hypothesis—an educated guess. In practice you
would test each one in turn until the cause was located. Let us imagine the cause was a
fuse failure in the plug. The fact that the lights came on after | changed the fuse only
lends support to the hypothesis. It does not prove it. The fault could have been caused
by a temporary faulty connection which in turn caused the fuse to blow. In mending the
fuse, I corrected the connection by chance as I caught the wire with my screwdriver
unbeknown to me. ‘Proved’ carries the connotation of finality and certainty. Hypotheses
are not proved by producing evidence congruent with the consequences; they are simply
not disproved. On the other hand, if the observed facts do not confirm the prediction
made on the basis of the hypothesis, then it is rejected conclusively. This distinguishes
the scientific hypothesis from everyday speculation. A hypothesis must be capable of
being tested and disproved.
This mode of accounting for problems is the characteristic pattern of scientific
thinking. It possesses three essential steps:
1 The proposal of ahypothesis to account for a phenomenon.
2 The deduction from the hypothesis that certain phenomena should be observed in
given circumstances.
3 The checking of this deduction by observation.
Let us look at an educational example. An educational researcher may have reasoned
that deprived family background causes low reading attainment in children. He or she
may have tried to produce empirical evidence that low family income and overcrowding
are associated with poor reading attainment. If no such evidence was forthcoming then,
as we have seen, the hypothesis must be decisively rejected. But if the predicted
relationship was found, could the researcher conclude that the hypothesis was correct,
i.e. that poor family background does cause low reading attainment? The answer must
be ‘no’. It might equally have been the case that a low level of school resources is also to
blame. These alternative hypotheses will have other deducible consequences. For
example, if lack of resources is related to reading backwardness, then improved
resourcing should improve reading attainment among the children in these schools. To
underline the main point again, the scientific process never leads to certainty in
explanation, only to the rejection of existing hypotheses and the construction of new ones
which stand up best to the test of empirical evidence.
Try to formulate operational definitions of affectionate, popular, morale, and anxiety. How
do you propose to measure such concepts?
1 How would you define and measure the variables underlined in the following?
a Teachers who suffer stress are less child centred in their teaching method than
teachers who are not so stressed.
Stimulation at home advances language development in young children.
c Adolescents with substance abuse problems tend to come from disrupted home
environments.
Write down now two operational hypotheses which could be derived from the research
hypothesis.
Unconfirmed hypothesis
But what if the hypothesis is not confirmed? Does this invalidate the prior literature? If
the hypothesis is not confirmed then either the hypothesis is false or some error exists
in its conception. Some of the previous information may have been erroneous, or other
relevant information overlooked; the experimenter might have misinterpreted some
previous literature or the experimental design might have been incorrect. When the
experimenter discovers what he or she thinks is wrong, a new hypothesis is formulated
and a different study conducted. Such is the continuous ongoing process of the scientific
method. Even if a hypothesis is refuted, knowledge is advanced.
Now consider the following examples in terms of the five criteria above. Look at each
hypothesis and say whether it satisfies all the criteria. If it does not, say why not.
EXAMPLE 1
Among 15-year-old male school children, introverts, as measured by the Eysenck
Personality Inventory, will gain significantly higher scores on a vigilance task involving the
erasing of every ‘e’ on a typescript than extroverts.
EXAMPLE 2
Progressive teaching methods have led to a decline in academic standards in primary
schools.
EXAMPLE 3
The introduction of politics into the curriculum of secondary schools will produce better
citizens.
EXAMPLE 4
Anxious pupils do badly in school.
General problem
Research hypothesis
1
Operational or experimental hypothesis
Null hypotheses should take the following form: “That there is no significant
relationship (difference) between... ’.
By placing the term s¢gnificant in the proposition, we are emphasising the fact that our
test of the null hypothesis invokes the test against a stated and conventionally acceptable
level of statistical significance. Only if such a defined level is reached can we discard the
null hypothesis and accept the alternative one, always remembering that we are never
proving a hypothesis, only testing it, and eventually rejecting it or accepting it at some
level of probability. An example will add clarity.
Suppose an experiment tests the retention of lists of words under two conditions:
1 The lists are presented at a fixed pace determined by the experimenter.
2 The lists are perused at a rate determined by the subject attempting to memorise
them—self-pacing.
—1.96 1.96
(critical value) (critical value)
(critical value)
ei
iii 2S
In this case, to reject the null hypothesis at the p < 0.01 level, we have to find a value
that is greater than the population mean value by only 2.33 standard errors. Similarly,
to reject the null hypothesis at the p < 0.05 level, our sample value needs to exceed a
critical value of only 1.65 standard errors. Thus, our directional null hypothesis about
the effect of discovery learning on arithmetic attainment is now even more strongly
rejected. In other words, if we can confidently state the direction ofa null hypothesis,
we do not need such large observed differences to reject it at particular significance levels.
Hypothesis testing is an inferential procedure for using the limited data from a sample to draw a
general conclusion about a population. The null hypothesis states that the treatment has not had
any significant effect beyond that due to chance.
We set a level of significance that creates a critical level to distinguish chance from a statistically
significant effect, usually at 0.05 (5% level) or 0.01 (1% level). Sample data that fall beyond the
critical point in the tails of the distribution would imply that the effect is unlikely due to chance as
The research data with which the quantitative approach deals are evaluated statistically.
Much of the power of statistics results from the fact that numbers (unlike responses to
a questionnaire, videotapes of social interactions in the classroom, or lists of words
recalled by a subject in a memory experiment) can be manipulated with the rules of
arithmetic. As a result, researchers prefer to use response measures that are in or can be
converted to numerical form. Consider a hypothetical study of aggression and sex. The
investigators who watched the subjects might rate their aggression in various situations
(from, say, ‘extremely aggressive’ to ‘extremely docile’), or they might count the number
of aggressive acts (say, hitting or insulting another child) and so on. This operation of
assigning numbers to observed events (usually, a subject’s responses) is called
measurement.
There are several levels of measurement that will concern us. They differ by the
arithmetical operations that can be performed on them. The particular statistical
technique that is appropriate for analysing a set of variables depends on the way in which
those variables are measured.
Levels of measurement
We will refer to four distinct levels of measurement. These differ in the extent to which
observations can be compared. From lowest to highest in the degrees to which
comparisons can be made, the levels are nominal, ordinal, interval and ratio. It is
important to understand clearly the distinction between these levels, since the use of
inappropriate methods for the measurement levels of variables can lead to meaningless
or incorrect results.
Nominal measurement
Nominal means to name; hence a nominal scale does not actually measure, but rather
names. Observations are simply classified into categories with no necessary relationship
existing between the categories. Nominal is the most primitive level of measurement, and
only requires that one can distinguish two or more relevant categories and know the
criteria for placing individuals or objects into one category or another. The relationship
between the categories is that they are different from each other.
For example, when children in a poll are asked to name the television channel they
watch most frequently, they might respond ‘7’, ‘9’, or ‘10’. These numbers serve only
to group the responses into categories. They obviously cannot be subjected to any
arithmetic operations. Again, marriage form would be measured on a nominal scale,
with levels such as monogamy, polygyny, and polyandry. States of residence such as
NSW, Victoria, etc. would be another example. Other common variables measured with
nominal scales are religious affiliation, sex, race, occupation, method of teaching, and
political party preference. Variables measured on nominal scales are referred to as
nominal variables. Nominal variables are often called qualitative.
Names or labels can be used to identify the categories of a nominal variable, but those
names do not represent different magnitudes of the variable. Bus route 10 is not twice
as long as bus route 5. It is simply a labelling system; the numbers assigned to categories
cannot be added, multiplied or divided. The major analytic procedure available for
nominal data is chi square (x7).
Interval
O}celfar-l
Nominal
with interval variables as well, by using just the order characteristics of the numerical
measurement. Normally, we would want to apply the statistical technique specifically
appropriate for the actual scale of measurement (for example, interval level techniques
for interval variables) since we then utilise the characteristic of the data to the fullest.
It is not possible to move in the other direction in the measurement hierarchy. If a
variable is measured only on a nominal scale it is not possible to treat it on the ordinal
level, since there is no natural ordering of the categories.
In general, it is important to try to measure variables at as high a level as possible,
because more powerful statistical techniques can be used with higher level variables.
1 What is the scale of measurement that is most appropriate for each of the following
variables?
Attitude toward legalisation of marijuana (in favour, neutral, oppose).
Sex (male, female).
Helen was born in 1968; Richard in 1975.
Church affiliation (Roman Catholic, Baptist, Methodist, . . .).
Political philosophy (liberal, moderate, conservative).
The IQ of students.
Highest degree obtained (bachelor, master, doctorate).
ie
So
St
On
Oe Peter is the second most popular pupil in class.
Average score in class test.
Occupational status (blue collar, white collar).
Numbering of houses along a road.
Population size (number of people).
—-_x~-
The characteristics that are measured for each of the numbers of a sample are usually
referred to as variables. A variable is a characteristic that can take on more than one
value among members of a sample or population.
Examples of commonly used variables in educational research are sex (with values
male and female); age at last birthday (with values 0, 1, 2, 3 and so on); religious
persuasion (Baptist, Methodist, Roman Catholic, Unitarian and so forth); social status
(upper class, middle class and working class); and exam results (say, measured in scores
out of 100). To speak tautologically, a variable is something that varies. A variable must
have a minimum of two values, but most are characterised by continuous values.
FIGURE 9.1 Two levels of IV. Hypothesis: students who have completed a speed
reading course will make significantly higher grades than students who have never
taken such a course.
Completed
speed reading ni not
course take course
IV, VV,
. Z
Slips
Grades
Instructions as an IV
So far, the manipulation of the /V has been discussed in terms of the manipulation of
events. Another way variation can be introduced is by manipulating instructions. For
example, in a memory experiment one group might be asked to rehearse the presented
words in the period between learning and recall, while another group might be requested
to think of other words which would remind them of the original words. There are two
dangers inherent in the manipulation of instructions. Firstly, some subjects might be
IV: Students who plan to pursue careers in marketing versus those who do not.
DV: Aggressiveness, conformity, independence, need for achievement.
Hypothesis 3 Students with positive academic self-concepts will gain significantly higher
grades than students with negative academic self-concepts.
IV: Students with positive academic self-concepts and students with negative
academic self-concepts.
DV: Grades.
Control variables
A control variable is a potential /V
that is held constant during an experiment. It is not
allowed to vary. For any one experiment, the list of control variables that it is desirable
to control is large; far larger than can ever be accomplished in practice. The problem is
that such variables have their potential effects on the DVso that it becomes impossible
to separate those variations in the DV that are due to the /V and those that are due to
other variables. The potential effects are unsystematic too, sometimes causing
improvements and at other times deficits, so their influence is unmeasurable. Consider,
for example, a simple experiment in which pupils are required to solve five-letter
Alypothesis: First-born college students with an introverted personality get higher grades
than their extroverted counterparts of equal intelligence, while no such differences are
found among ‘later borns’.
Control variable: \ntelligence
Hypothesis: Among boys there is a correlation between IQ and social maturity, while for
girls in the same age group there is no correlation between these two variables.
Control variable: Age
In each of the above, there are undoubtedly other variables, such as the subjects’
relevant prior experiences or the noise level during treatment, which are not specified
in the hypothesis but which must be controlled. Because they are controlled by routine
design procedures, universal variables such as these are often not systematically
labelled.
Intervening variables
All the variables so far discussed have been under the control of the experimenter. Each
IV and control variable can be manipulated and each variation observed on the DV.
However, what the experimenter may be trying to find out in some experiments is not
necessarily concrete but hypothetical. The intervening variable is a hypothetical one
whose effects are inferred from the effects of the /V on the DV. Significant effects suggest
support for the hypothetical construct. Look at the following hypothesis.
Hypothesis 1 Pupils subject to high levels of criticism exhibit more aggressive acts
than those not so criticised.
IV: Criticised or not criticised.
DV: Number of aggressive acts.
Intervening variable: Frustration.
~a Intervening y4~
RELATIONSHIP variables
!
Dependent
EFFECTS variables
The purpose of research design is to minimise experimental error, thereby increasing the
likelihood that an experiment will produce reliable results. Entire books have been
written about experimental design. Here, we will cover only a sample of some common
techniques used to improve the design of experiments. While this treatment 1s necessarily
less complete than that of an entire text devoted to the subject, it should give you an
understanding of the aims ofdesigning an experiment, even though it will not give you
all the techniques that could be used.
Experimental error
Experimental error occurs when a change in the DV (dependent variable) is produced by
any variable other than the /V (independent variable). What we are wholly interested in
is the effect of the V on the DV. When other variables that are causally related to the
DV are confounded with the /V, they produce experimental error; they produce
differences on the DVbetween the experimental conditions that add to or subtract from
the difference that would have been produced by the /V alone. Experimental error covers
up the effect you are interested in and can make it difficult or impossible to assess.
Let us imagine that we obtain some subjects, assign half to a formally taught
educational psychology course and half to the same course taught by programmed
instruction, and at the end of the term measure all subjects on an attainment test.
There are a number of variables which, unless we are careful, may be confounded
with the /V. One is time of day: one course may be taught in the middle of the
morning, when students are alert; and the other at the very end of the day. Another
is intelligence: students in one course may be brighter than those in the other. Since
both of these variables are likely to affect the attainment score obtained at the end
of the term, allowing either to be confounded with the /V is likely to result in
experimental error.
There are two kinds of experimental errors: constant or systematic error and random
error. An understanding of these kinds of errors and of ways to deal with them
constitutes a fundamental basis of experimental reasoning.
¢ Systematic or constant error is an error or bias that favours the same experimental
condition every time the experiment is repeated. Any error due to time of day would
be a constant error, for whichever course is taught at the more favourable time of day
is taught at that time for all subjects in that experimental condition.
Constant error operates systematically so that it affects performance in one
condition but not the other. The different manner and personality of the experimenter
with the control group compared to those of the person handling the experimental
group would affect each group in a different way. If, for instance, one group
undertakes their experiment while sitting in a cold or noisy room but the other group
is in more amenable surroundings, the results are likely to include effects from this
variable. If one group has more practice than the other, this will produce improved
performance and will be a constant error, provided practice is not designated as the
independent variable.
¢ A random error is an error which, on repetitions of the experiment, sometimes favours
one experimental condition and sometimes the other, on a chance basis. If subjects
are assigned to conditions randomly, then any error resulting from differences in
intelligence will be random error. Very often, the error from any particular source has
both constant and random components. This would be true of the error produced by
intelligence if the subjects volunteered for the conditions, and if the brighter subjects
usually, but not always, chose the same experimental condition. We shall continue to
consider just the pure cases of constant and random error, and it is perfectly legitimate
to do so. But we do this with the understanding that the error from any particular
source may involve these components in any proportions.
The effect of a constant or systematic error is to distort the results in a particular
direction, so that an erroneous difference masks the affairs. The effect of a random error
is not to distort the results in any particular direction, but to obscure them. In designing
an experiment, controls are employed to eliminate as much error as possible and then
randomisation is employed to ensure that the remaining error will be distributed at
random. Controlling or randomising sources of constant error eliminates bias.
Do you remember what a random sample is? Refer back to page 85 if you cannot recall.
Randomisation is the most important and basic of all control methods, providing
control not only for known sources of variation but for unknown ones too. In fact, it is
the only technique for dealing with the latter source. It is like an insurance policy, a
precaution against disturbances that may or may not occur, and clearly stated procedures
involving tossing coins and random number tables should be employed.
Between-subjects design
There are some kinds of experiment in which it is very difficult to use the same people
for all conditions. What about an experiment testing the relationship between sex and
induced learned helplessness? There is simply no way (apart from a split-second sex
change!) in which the same people can perform in both the male and female groups. So
there have to be different people (men and women) in the two groups. This design is
known as a between-subjects design because the comparison is between two independent
groups or unrelated people.
There are other experiments too in which it is easier to use different people for the
different experimental conditions. These include experiments involving very long tasks
which would exhaust the patience and lower the motivation of subjects if the same
people had to perform all the conditions. Another reason is that if you use different
people you avoid the possibility of practice effects transferring from one task to another.
But individuals do differ markedly from each other. The only way to deal with this
is to allocate the different people at random to the different experimental conditions.
Here, random means that it is purely a matter of chance which people end up doing
which condition or level of the IV. The reasoning is that if subjects are randomly allocated
to experimental conditions on a chance basis, then people of different ages or abilities
are just as likely to be found in all the experimental groups. For example, you might find
that all the subjects who arrive first to volunteer for an experiment are the most highly
motivated people who would tend to score more highly, quite regardless of experimental
condition. So they should not be placed in the same group. It would be better to allocate
alternate subjects as they arrived, or perhaps to toss a coin to decide which group each
subject should be allocated to; in which case, unless your coin is biased, the allocation
of subjects to groups should be truly random.
The groups should also be assigned their level of the JV by a random procedure,
i.e. toss a coin to decide which group shall be the experimental one and which the
control group. Random assignment exerts control by virtue of the fact that the
1 Briefly outline what you see as the advantages and disadvantages of the between-
subjects design.
2 How can we generally avoid error due to assignment?
Vv
Treatment 1 Treatment 2
Scores
Within-subjects design
So far, we have presented exclusively the advantages of using different rather than the
same subjects for all the experimental conditions, i.e. the between-subjects design.
But, against all these advantages, there is one crucial disadvantage about using
different subjects for each experimental condition. This is that there may be all sorts of
individual differences in the way different subjects tackle the experimental task.
What major individual differences affect the way different subjects tackle an experimental
task? List as many as you can.
There are many. For instance, in a memory experiment some people might be more
intelligent than others, some might think the experiment a bore, others that it would get them
extra marks, some might not even be able to read the items, some might be old, some young,
some take three minutes over each item, others four seconds, some might be anxious, and
others might be thinking about the nightclub they are going to that night. Such variability
might affect their ability to remember the items, from nil by a person who cannot read, to
100 per cent by an ambitious person who thinks they are going to get an A grade for their
performance. The existence of all these other factors might mean that each subject’s behaviour
Counterbalancing
One way to minimise the effect of asystematic confounding variable like learning is to
counterbalance the order in which you present the levels of the independent variable.
One of the more frequently used techniques is called ABBA counterbalancing. If we
call the manual typewriter ‘A’ and the electric typewriter “B’, then ABBA simply indicates
the pattern in which subjects will learn typing. This pattern serves to counterbalance the
confounding effects across the two levels of our independent variable.
An ABBA counterbalancing technique attempts to counterbalance order effects in a
completely within-subject manner; the same subject gets both orders. Other
counterbalancing techniques make order a between-subjects variable by
counterbalancing order across subjects. In intra-group counterbalancing, groups of
subjects are counter-balanced rather than each subject. So, for example, with two
conditions (i.e. A and B) of the ZV, half the subjects chosen at random receive sequence
AB, and the other half receive BA. A completely randomised counterbalancing is possible
too, with each subject receiving one of the sequences chosen by a random process.
However, in all counterbalancing, you are making the assumption that the effect of
having B follow A is just the reverse of the effect of having A follow B. This assumption
is sometimes called an assumption ofsymmetrical transfer.
As we add more levels to our independent variable, we increase the complexity of a
complete counterbalancing procedure. In a completely counterbalanced design, every
level has to occur an equal number of times. Complete counterbalancing can become a
monumental task when you have a large number oflevels.
Counterbalancing will not remove an order effect. Hopefully, it will make it constant
in both levels. Moreover, demand characteristics are likely to be a big problem in this
design since taking several levels of the JV may allow a subject a greater chance of
correctly guessing what the experiment is about or what the experimenter ‘wants’,
whatever order the conditions are taken in.
4 7 Match and
ia 19) measure
q q
Randomly assign one
member of each pair
to each group
Matching causes the pool ofpotential subjects to be smaller, too. For example, we can
hold constant across each experimental group by, say, sex by only using males, IQ by
only using an IQ
level, age by only having 21-year-olds, and ethnic group by only using
Iranians, etc. But it would be difficult to locate sufficient numbers of male 21-year-old
Iranian subjects with the same IQ. However, it is clear that experimental variation will
not include any variation due to sex, or age, or ethnic group, or IQ variation, hence less
of the variation is error variation.
Standardising
This term can be applied in two ways. It can refer to the standardising of the experimental
conditions, or the standardising of the marking or assessment of test instruments. It is the
former which is to be considered briefly here. Standardising of the experimental conditions
ensures that every subject undertakes the /V level applicable to himself or herself in the same
conditions as everyone else. For example, think how the results of an experiment could be
interpreted if some subjects had a longer time to complete the task, or were given different
instructions, or suffered from noisy disturbances while undertaking the task. (Of course,
experiments can be devised in which these variables could act as the /V.) These
unstandardised conditions would be reflected in unknown ways in the DVvariations, and
would be impossible to disentangle from the real effects of the /V. So, for these types of
variables, control of error is effected by holding them constant for all subjects.
Holding a variable constant ensures that it will produce no experimental error, for the
obvious reason that variables which do not change cannot produce changes on any other
variable.
Can you write down a few variables that need to be held constant in most experiments?
It is now possible to see what each design and control technique achieves. In the
between-subjects design, no subject variables are controlled. In the matched-pairs design,
one or more between-subject variables (such as age, IQ, sex, social class) are controlled.
In the within-subjects design, all between-subject variables are controlled, though the
variables (on which a single person varies from time to time) such as fatigue and
motivation, are not controlled.
In the between-subjects and matched-subjects designs, subject variables that have not
been controlled are randomised by assigning the members of each pair randomly to
conditions. In the within-subjects design, within-subject variables that are related to
order of presentation, such as fatigue and adjustment to the experimental situation, can
be counterbalanced with order of presentation. But the experimenter has no control
over other within-subject variables, such as anxiety, and nature must be trusted to
randomise these. The essential difference, then, between the three kinds of design is the
degree to which they control error due to between-subject variables. Since between-
Research design
A research design is essentially a plan or strategy aimed at enabling answers to be obtained
to research questions. In its simplest form, an experiment has three characteristics:
1 An /V is manipulated.
2 All other variables except the /V are controlled.
3 The effect of the manipulation of the /V on the DVis measured.
Experimental group Xx NC
Be Compare
Control group 4
cei oi
Boonen! Pre-test Treat
eatmen t Feost-test Compare
Control y = =
group 1 Gs a Ho
with
Experimental
ea Y,r $$» ———____»
X Yo Y, — Yo
Factorial designs
We have been considering so far the classical design in which an /V is manipulated with
the effect measured on the DV. But in any research context, there can be a variety of
variables interacting simultaneously. Such techniques as randomisation, matching,
counterbalancing and standardisation have been discussed as ways of controlling the effects
of variables. However, in some studies, the interaction effect is important. For example,
particular teaching methods may have differential effects with pupils of different levels of
IQ, so that Method A is effective with low IQ pupils and Method B with high IQ pupils.
Similarly, changes in the DV may be effected by the interaction of sex, ethnic group,
personality traits, family size, school size with the /V. By employing a factorial design, the
effects of other variables (other /Vs) can be determined. In essence, the researcher
investigates the effect of the main /V at each level of one or more other attributes. This
increases the precision of the findings and the validity and reliability of the research.
Factorial designs can be quite complex and the reader should consult more advanced
texts for these.
Quasi-experimental designs
The goal of the experimenter is to use designs that provide full experimental control
through the use of randomisation procedures. There are many situations in educational
Experimental Y; Xx Y5
group
Control Y, =e
group
A researcher might conduct an experiment with the four parallel English classes of a
high school. Because the classes meet at different times, subjects cannot be randomly
assigned to treatments. However, the researcher can use a random procedure to
determine which two sections will be experimental and which two will be control. Since
both experimental and control groups take the same pre-tests and post-tests, and the
experiment occupies the same time period for all subjects, it then follows that testing,
instrumentation, maturation and mortality are not internal validity problems.
If the researcher teaches all four classes, history is not a problem. If the researcher
only supervises the regular teachers who deliver the experimental and control treatments,
differences among teachers can systematically influence results.
Another problem is the ceiling effect. Because no random allocation to groups occurs,
the pre-test means can differ considerably. Thus any change measured on the post-test
is constrained by how high one group scored initially.
For example, if the non-randomised control group pre-test/ post-test design is used to
compare the effects of two methods of maths instruction in which equivalent forms of
a 100-item test are used as pre-test and post-test, and one group has a pre-test mean of
80 and the other a pre-test mean of 50, the ceiling effect would restrict the possible gain
of the former more than the latter.
Demand characteristics
Researchers call the influence of an experimenter’s expectations, or the subject’s
knowledge that an experiment is underway, demand characteristics. To the extent that the
behaviour of research participants is controlled by demand characteristics instead of /V,
experiments are invalid and cannot be generalised beyond the test situation.
A well-known example of a demand characteristic is the Hawthorne effect, named after
the Western Electric Company plant where it was first observed. The company was
interested in improving worker morale and productivity and conducted several
Experimenter bias
Experimenters are not impersonal, anonymous people, all capable of identical
observation and recording. Researchers too are human! They have attitudes, values,
needs and motives which, try as they might, they cannot stop from contaminating their
experiments. The researcher firstly has a motive for choosing and carrying out the
particular study in the first place. He or she has certain expectations regarding the
outcome. This is implicit ifa sensible and critical review of previous work in the area has
been carried out. The researcher would hence like to see their hypothesis confirmed. The
experimenter, knowing about the hypothesis and projected outcome, is likely to provide
What do you feel are the main strengths and weaknesses of the experimental method?
Consider an aspect of classroom behaviour in which you are interested, and think how you
could investigate it with an experimental treatment, yet make it as ‘real life’ as possible.
Research design tries to ensure that it is the manipulation of the /V, or the experimental effect, that
produces the changes in the DV and not experimental error.
Experimental error can be systematic or random. The first distorts or biases results in a particular
direction; the latter obscures results. Experimental error can result from sampling procedures, subject
assignment, experimental conditions, measurement, experimenter bias and demand characteristics.
Reference
Orne, M. (1962), ‘On the social psychology of the psychological experiment’, American Psychologist,
17, 776-83.
Further reading
Rosenthal, R. (1966), Experimenter Effects in Behavioural Research, Appleton Century Crofts, New York.
Shaughnessy, J. & Zeichmeister, E. (1997), Research Methods in Psychology, McGraw Hill, Singapore.
Winer, B.J. (1971), Statistical Principles in Experimental Design, McGraw-Hill, New York.
There is a bewildering variety of statistical tests for almost every purpose. How do we
pick an appropriate test from all those available? There is no hard-and-fast rule. Tests
vary in the assumptions they make, their power, and the types of research design for
which they are appropriate. The major considerations that influence the choice of test
are reviewed below.
Distribution of scores
A second factor influencing the choice between parametric and non-parametric tests is
the distribution of data. For a parametric test, data should be normally distributed or
closely so. Extremely asymmetrical distributions should not be the basis for parametric
testing.
Homogeneity of variance
The final assumption about the research data for use with parametric tests is that the
amount of random, or error, variance should be equally distributed among the different
experimental conditions. This goes back to the idea that all the variability due to variables
which cannot be controlled should be equally distributed among experimental
conditions by randomisation. The formal term for this is homogeneity of variance: the
word ‘homogeneity’ simply indicates sameness; for example, that there should not be
significantly different amounts ofvariance in the different conditions of the /V.
The normal procedure is for the experimenters to check for these three assumptions
before using a parametric test. The point is that, if these theoretical assumptions are not
met, then the probability you look for in a statistical table may not be the correct one.
However, some statisticians have claimed that parametric tests are, in fact, relatively
robust. This means that it is unlikely that the percentage probability will be very
inaccurate unless your data do not meet the assumptions at all, i.e. are not on an interval
scale and/or are distributed in a very asymmetrical fashion.
Tests which are appropriate for categorical data such as ¥* or data in the form of
ranks, such as the rank-order correlation rho, in general involve fewer assumptions than
t or F tests. Many such tests do not specify conditions about the shape or character of
the distribution of the population from which samples are drawn. Such tests are called
distribution free ot non-parametric.
In using distribution-free tests, we do not test hypotheses involving specific values of
population parameters. This eliminates hypotheses involving values of M, 6 or other
descriptive parameters of population distributions.
Type of hypothesis
We saw in chapter 7 that hypotheses in educational research are very often tested from
estimates of population differences on some variable; for example, whether children who
attend pre-school play groups differ from children who did not attend play groups in
social development on entry to infant school. But at other times, a hypothesis can state
explicitly that one variable, say perceptual rigidity, is associated with another variability,
say anxiety, among children. In géneral terms, therefore, we can distinguish two types
of hypothesis in educational research:
¢ difference hypotheses between samples;
¢ hypotheses of association (or correlation) between variables.
Levels of data
The data may be nominal, ordinal, interval or ratio. The level of data obtained influences
the choice of test.
To conclude therefore, in evaluating techniques ofstatistical analysis, you determine:
¢ Whether the statistics employed are appropriate for the type of data available.
¢ Whether the test employed is appropriate for the type of hypothesis being tested.
¢ Whether the test employed is suitable for the design.
¢ Whether the assumptions involved in the test are realistic.
Figure 11.1 is a guide to the selection of the correct statistical test.
Degrees of freedom
In using some statistical tables to evaluate the obtained statistic, we need to know the
degrees of freedom for the particular sample rather than N. Why do we use degrees of
freedom or ‘df as they are symbolised?
7
{So} ] ise]]
uOXod|I IIMA a1e|98
pelejoy Aaullupp-uue
IUA\-UUB/\| juepusdepu|
OuJeWeled oueweled-UON OUJEWeIed
UO!LL|91109 UOILL}a109
Jap1o JUeWWOLW
oujeweled
ued —jonpod
uewleeds uosledad
peuojew
UUM
10
vena
FIGURE 11.1 Choosing your test flow chart
sjoelqns
diysuolje}as
BOUdIOLIG
sisayjodAy siseyjyodAy
siseyyodAy
10
diysuoljejay
cons fa
SOUSIONIG -sjoalqns
usemieg
aay
eS
Imagine that you are holding a dinner party for twelve people. You wish to seat them
round a table each in a pre-arranged place. When you have labelled the places for eleven
of the guests, the twelfth or final person must sit at the only place left. We have no
‘freedom’ about where we will place them. So we can say that although there are twelve
people (N) there are only eleven (N — 1) degrees of freedom. The final place is always
determined, no matter how many people we are trying to seat. This principle applies in
statistics, whether we are concerned with X or M.
Let us consider Xand M first of all. Suppose we have a sample size N = 10. The mean value
can be calculated and all but one of the scores can be altered to other values without altering
the mean. One score’s value is, however, determined by the remaining nine because of the
necessity that their total scores sum to N x M. Thus one score is not free to vary, but is
controlled by what the other nine values are. The number of degrees of freedom are one less
than N, or N — 1. In another example if we had three numbers, which sum to 24 and hence
a mean of 8, with two of the numbers known, for example 10 and 6, then the third is fixed,
for example 8. If we alter the 10 to 11 and the 6 to 7 then we have determined that the third
must be 6 if the XX and M are to remain the same.
In experiments with two groups of subjects, each group must lose one degree of
freedom. To show this, let us return to our dinner party. We are old-fashioned enough
to try to seat men and women in alternate seats round the table. In this case, once you
had seated five men and five women, everyone would know where the last man and the
last woman would have to sit. This is like the case where there are two groups of six
subjects, Nj and No. In fact, for each group there are only five degrees of freedom (Nj —
1 and N>- 1).
A range of considerations govern the appropriateness of any statistical test for a particular study.
The major considerations involve assumptions about the distribution of the data, types of
hypothesis used, research design employed and the level of the data.
So far in this book we have focused strongly on the concept ofstatistical significance and
p values as being extremely important when interpreting conclusions from research.
However, they can also be misused and can lead to misleading interpretations:
The play of chance can make it appear that a worthless treatment has in fact worked,
causing the importance of minor findings to be inflated in the researcher’s mind if
they hit the threshold of significance.
Of more concern is the way important information is overlooked in studies that fall
short of significance. It is almost as if a study that does not reach significance is worthless.
In fact, most never get published and much interesting work and findings which could
be the basis of further investigation are lost for ever. Thus, focusing too heavily on
significance can have a major effect on the development of cumulative knowledge.
Dividing research findings into two categories—significant and not significant—is a
gross oversimplification, since probability is a continuous scale. A rich data set reduced
to a binary accept—reject statement may lead to false conclusions due to naturally
occurring sampling error, even when using appropriate sampling methods in a
rigorous way.
4 Avital yet often ignored question in statistical analysis is—are my findings of any real
substance? We are usually asking one of three things:
e Is this a theoretically important issue?; or
e Is this issue one of social relevance?; or
e Will the results of this study actually help people?
While statistics can help to quantify the strength of the findings of the research, none
of these questions are essentially statistical. This is due to confusion over the word
‘significance’, as it has a different meaning within statistics from that in normal daily
use. This leads us frequently to imply and believe that even in a statistical context it
also has the same implications as found in the normal sense of the word. However,
by now you should have fully grasped the idea that statistical significance only means
that you can be confident that your results are unlikely to be a random variation in
samples (sampling error) but signify differences and relationships which rarely occur
by chance, and that therefore the findings reflect real differences and relationships.
But it can happen, perhaps more frequently than we might suspect, that a
statistically significant result has /it¢le significance whatsoever in the everyday sense. If
we focus too much on statistical significance it can blind us to the dangers of
exaggerating the importance of our findings.
Thus, researchers are now realising that there is more to the story of a research result
than p < .05, or ns. This chapter will help you become sophisticated about other ways
of interpreting research results. This sophistication means learning about three closely
interrelated issues: power, effect size, and types of error.
Type | error
Suppose you conducted a study and set the significance level cut-off at a high probability
level, say 20% or p = 0.2. This level would enable you to reject the null hypothesis very
easily. If you conducted many studies like this you would often believe (about 20 per cent
of the time) that you had support for the research hypothesis when you did not. This is
called a Type I error. Even when we set the probability at .05 or .01 we can still
sometimes make a Type I error because we will obtain a chance result at and beyond that
level for 5 per cent or 1 per cent of the time, but we never know which occasion it is.
Imagine a study where stress management techniques have been taught to 100
teachers with the intention of reducing teacher absenteeism through stress-related health
problems. Suppose the special instructions in reality made no difference. However, in
doing the study the researchers just happened to pick a sample of 100 teachers to receive
the instructions who were generally not the stressed-out sort anyway. This would lead
to rejection of the null hypothesis and a conclusion that the special instructions do make
a difference. This decision to reject the null hypothesis would be a mistake—a Type I
error. (Note that the researchers could not know that they have made an error of this
Type Il error
If you set a very stringent significance level, such as .001, there is a different kind of risk.
You may conduct a study where in reality the research hypothesis is true, but the result
does not come out extremely enough to reject the null hypothesis. Thus, the error you
would make is in not rejecting the null hypothesis when the reality is that the null
hypothesis is false (that the research hypothesis is true). This is the Type I error.
Consider again our study of teacher absence through stress. Suppose that, in reality,
practising stress management does improve their attendance records. However, in
conducting your particular study, the random sample that you selected to try this out
on happened to include many teachers who were already too stressed to manage the
stress reduction exercises properly. Even though your procedure may have helped
somewhat, their attendance records may still be lower than the average of all teachers.
The results would not be significant. The decision not to reject the null hypothesis
would constitute a Type II error.
Type II errors especially concern social scientists who are interested in practical
applications, because a Type I] error could mean that a useful theory or practical
procedure is not implemented. As with a Type I error, we never know when we have
made a Type II error; however, we can try to conduct our studies so as to reduce the
probability of making a Type II error. One way of buying insurance against a Type II
error is to set a more lenient significance level, such as p < .10. In this way, even if a study
results in only a very small difference, the results have a good chance of being significant.
There is a cost to this insurance policy too—a Type I error. The trade-off between these
Briefly explain why protecting against one type of error increases the chance of making
the other.
Answer on p. 599.
The research hypothesis (one-tailed) is that the experimental group (Exp Gp) would
record significantly higher attendance records than the control group (Con Gp). The
null hypothesis is that the Exp Gp will not record significantly different attendance
records from the Con Gp. The top distribution in Figure 12.1 shows the situation in
which the research hypothesis is true. The bottom distribution shows the distribution
for the Con Gp. Because we are interested in means of samples of 100 individuals,
both distributions are distributions of means. The means are number of days in
attendance.
Sampling distribution
of exp. group means
Null hypothesis |
sampling distribution |
of means
200 205
Answers on p. 599.
Effect size
What a statistically significant ¢ test does not tell us is how large an effect the independent
variable had. Measures of effect size are used to determine the strength of the relationship
between the independent and dependent variables. That is, measures of effect size reflect
how large the effect of an independent variable was.
or
2.34 2.34
6 To interpret our value ofd = 1.71, use the following classification of effect sizes:
d = .20 for a small effect size,
d = .50 for a medium effect size, and
d = .80 fora large effect size.
7 Because our value of d is larger than .80, we would conclude that the independent
variable had a large effect on test performance.
We are only concerned with one population’s SD, because in hypothesis testing we
usually assume that both populations have the same or similar standard deviation.
In the teacher stress example in Figure 12.1 that we began with, the difference
between the two population means is 8, and the standard deviation of the populations
ofindividuals is 24. Thus, the effect size is:
Population M, — Population M, 208 = 200-8
Effect size = = 38
Population SD 48 24
If the mean difference had been 15 points and the population standard deviation was
still 24, the effect size would be virtually doubled: 15/24, or .63. By dividing the
difference between means by the standard deviation of the population ofindividuals, we
standardise the difference between means in the same way that a Z score gives a standard
for comparison to other scores—even scores on different scales.
Here is information about several different versions of a planned study, each involving a
single sample. (This assumes the researcher can affect the population standard deviation
and predicted mean by changing procedures.) Calculate the effect size for each study.
(continued)
Answers on p. 599.
For at test ae
r=
Ve
fd
or
Z :
r= ie where Z is the Z value of the reported p-value and N is the sample size.
ae
The relation between rand dis: r=
pq
his = z
\t, + df
To compute the effect size from analysis of variance (ANOVA—see chapter 19), a
correlation measure termed eta” is employed. This is analogous to a correlation but
describes a curvilinear relationship rather than a linear one. It is used with the various
forms of ANOVA because it is difficult to ascertain which of the independent variables
is explaining the most variance.
To calculate eta” for any of the independent variables in an ANOVA the following
formula is used:
10 45 55
.20 40 60
30 35 65
40 30 70
50 25 75
.60 20 80
70 1S 85
.80 10 90
.90 5 95
1.00 0 100
Source: Based on Rosenthal & Rubin (1982), p. 168.
You read a study in which the result is just barely significant at the .05 level. You then
look at the size of the sample. If the sample is very large (rather than very small), how
should this affect your interpretation of (a) the probability that the null hypothesis is
actually true and (b) the practical importance of the result?
Answers on p. 599.
sli5 85 DS 10 10 10 10 10
20 25 35 iS 10 10 10 10
.30 200 55 25 15 10 10 10
40 300 UES 35 20 15 10 10
50 400 100 40 25 13 10 10
.60 500 125 DS. 30 20 (Ws) 10
70 600 155 65 40 BS: Is 10
.80 800 195 85 45 30 20 15
90 1000 260 115 60 40 25 15
Source: Based on Cohen (1988), p. 92.
References
Cohen, J. (1988), Statistical Power Analysis for the Behavioural Sciences, Laurence Erlbaum, New York.
Cohen, J. (1992), ‘A power primer’, Psychological Bulletin, 112, pp. 155-9.
Keppel, G. (1991), Design and Analysis, Englewood Cliffs, Prentice Hall.
Smith, M., Glass, G. & Miller, T. (1980), The Benefits of Psychotherapy, John Hopkins University,
Baltimore.
Seem erie = ee m1 m2
The subscripts indicate sample mean 1 and sample mean 2 respectively. Since
SEm = O/VN then SE2, = 67/(N). We can then substitute into the SE formula and
obtain the following easier to compute formula, i.e.:
2 2
Oi, OF
Sega) =
Niven ING
the subscripts again referring to the first and second samples respectively. As with the
standard error of the mean, a critical ratio is formed to find the deviation in standard
error unit terms of the difference between the means. This ratio is called the ¢ ratio.
es M, -M, or M, —M,
SE git OF O53
ee + eS
N, N,
The obtained ¢ is compared to the tabled entry of ¢ (Table 13.1) for the relevant df
and level of significance. dfin a t ratio is (N — 1) + (N2 — 1). If ris the same as or greater
than the relevant tabled entry then the ¢ is significant at that level of significance.
t distributions
In our outline of confidence intervals and limits, we used Table 5.1 to estimate, for
example, that the 0.05 level of confidence is situated at + 1.96 standard deviations or
standard errors from the mean of the population (i.e. Z = 1.96). To find the exact limits
in actual scores for a particular sample mean, we multiplied the calculated SE,,, by 1.96.
Table 5.1 can only be used, of course, when the distribution of sample mean is
normal. If the distribution is not normal, then the 0.05 level will not be located at a Z
score of + 1.96. As samples become smaller, their distributions become flatter and more
spread out. These non-normal distributions are called ¢ distributions.
The use of the ¢ statistic requires that the data satisfy the homogeneity of variance
assumption and that both sets of data are approximately normal. There is a family of ¢
distribution which approximates to the normal distribution of Z. (Z cannot be used as
the population SD is not known.) The shape of each ¢ distribution depends on the
ee
Z,
//
/
/
t
i
i
{
rmance
Manc,
vs (12
y OL Z
W, ge 7 “¢ ye
L E, 6 25
avana re Ve yee lata
’ JJ LS) W488 JJ
y r
Ze wo
Ly fad fA A c f
“
‘ X N MX ‘ N \ 4 N x
a
j
nerarre6t
CTIOTIVIZNICE €
a hic
;
creativity
CKCAMIYS test
tect
DET
herween the
Nn tne
cevex
~ Dake § supported quite strong]
1—
/ /
TABLE 13.1 Critical values of t (between groups)
Level of significance for one-tailed test
-10 .05 .025 .01 .005 .0005
Source: Table Ill, Fisher & Yates, Statistical Tables for Biological, Agricultural and Medical Research, 6th edn,
Addison Wesley Longman Lid, London.
If you are comparing two means, the degree of freedom is the sum of the two samples minus 2, i.e. if one
sampie consists of 10 people and the other of 8, df=10+8-2=16.
The table gives ‘critical values’—the minimum value of t which is significant at the desired level. For
example, with 19 degrees of freedom a t of 2.093 or larger is significant at the .05 level (two-tailed test);
with 16 degrees of freedom you would need a t of 2.120 or larger. (Again you need to consider whether a
one-tailed or a two-tailed test is appropriate.)
If your dfis not represented in the left-hand column of the table take the next lowest figure, unless your
sample size is well in excess of 120, in which case you may use the bottom row (-, or infinity).
0.888 0.888 ! | |
1.33 1.33 !
|
= Our difference = 2 mee
Let us look at the above result in a more visual way. The SEgi¢¢ is the estimate of the
dispersal of all possible mean differences between an infinite number of similarly drawn
pairs of samples. The SEgig = 0.444. This is shown on the normal distribution above
(Figure 13.2) which represents a population of differences M = 0, 0 = 0.444 (the
denominator in our example).
We would expect 95 per cent of sample mean differences to be between —0.888 and
+0.888. Our difference of 2 is equal to 4.540, which is way beyond 30 from the mean.
Obviously something is happening here besides chance! It is, provided we have
controlled other variables, the experimental effect.
The formula
is unfortunately appropriate only when the samples are large and both groups are of
approximately equal size. In some ¢ test situations, the samples are small and of different sizes,
and the way one must cope with this and compensate is by adopting this modified formula.
0; +03 N, +N,
SE |
N, - 14 Ny —1\ (NING)
The 67 in the formula can be restated mathematically as (Z2X* — (ZX?/N). Thus the
total ¢ formula for between-subjects design when samples are small, say 30 or less, is:
; M, —M,
[xg - 2 2
|a|zxe-2 2
N, N, N, oF N,
Are they
significantly
different? Therefore, each sample is
likely to come from the
same population.
(b)
((()
Result: ce Mg
Significant difference
between samples. No
Therefore each
support for null
sample is likely
hypothesis.
to come from a
different population.
The following figures illustrate the influence ofvariability and what it is we are trying
to determine in applying a f test. Both figures show a sample of seven from two
populations with the same sample mean in both. The only difference is variability.
The first figure (Figure 13.4) shows each sample data clustered around the mean of
that sample. Sample variability is small but the means are significantly different (in this
case t= 9.16)
The second figure (Figure 13.5) shows that the two samples overlap and it is not easy
to see a difference between the two samples. In fact all fourteen scores could have come
from the same population. In this case t = 1.18 and we conclude that there is not
sufficient evidence to reject the null hypothesis.
N=
4 M =8.00
oO SS=4
O00 =
2
e. 2 = ]e e
ic aeer
1 =
i BSA C= pe aD JOU 14 16 16 Te TS ao
Attitude scores
Ne Nie
M = 8.00 ViE—=s2,00
SS = 2210 SSS Wag
Frequency
1 If anear infinite number of pairs of samples were taken randomly from a population
and the mean differences between each pair plotted on a graph, what would we
term the standard deviation of these plotted differences? What would the average
of these differences be?
2 An independent group experiment is carried out correctly and yields the following
information:
M (0) N
t — M, = M,
2 2
N, N; N, +N,
(N, = + (N= 0 (N,)(N.)
74-47
t=
589 _ 5476 Wipao = 2209
10 10 10+ 10
(9 + 9) (10)(10)
- 74-47
| x 547.6 + 249-2200) |
oe ——
18 3)
ib —
2d
3.806 ~ OZ
ay
t = ——
fOr 20
tf =3409
11 121 3 9
6 36 2 4
y. 49 8 64
4 16 5 23
10 100 5 25
9 81 6 36
8 64 3 9
6 36 6 36
7 49 4 16
5 36 5 Z3
1 Aresearcher wishes to find out whether students of education are more empathetic
than students of engineering. An empathy scale is applied to a random sample of
each faculty. The results were:
Education 25 30 ZF 19 21 36 Vv 26 24
Engineering 20 7 15 a7 18 24 19 16 29
Effect size and power for the ttest for independent groups (means)
Effect size
Effect size for the ¢ test for independent means is the difference between the population
means divided by the standard deviation of the population ofindividuals. When using
data from a completed study the effect size is estimated as the difference between the
sample means divided by the pooled estimate of the population standard deviation (the
square root of the pooled estimate of the population variance).
Stated as formulas:
q=t Ny + No
\ NN,
A d can range in value from negative infinity to positive infinity. The value of 0 for
d indicates that there are no differences in the means. As d diverges from 0, the effect size
becomes larger. Regardless ofsign, d values of .2, .5, and .8 traditionally represent small,
medium, and large effect sizes respectively.
Eta* may be computed as an alternative to d. It too ranges in value from 0 to 1. It is
interpreted as the proportion of variance of the test variable that is a function of the
group variable. A value of 0 indicates that the difference in the mean scores is equal to
0, while a value of 1 indicates that there are differences between the sample means, but
within each group there are no differences in the test scores (i.e. perfect replication).
You can compute eta’ using the following formula:
te
eta? =
t? + (N, +N, — 2)
Power
Table 13.3 gives the approximate power for the .05 significance level for small, medium,
and large effect sizes and one- or two-tailed tests.
Power is greatest when the participants in a study are divided into two equal groups.
For example, an experiment with 10 people in the control group and 30 in the
experimental group is much less powerful than one with 20 in both groups.
There is a practical problem in deriving power from tables when sample sizes are not
equal. Like most power tables, Table 13.3 assumes equal numbers in each of the two
groups. What do you do when your two samples have different numbers of people in
them? It turns out that in terms of power, the harmonic mean of the two unequal sample
sizes gives the equivalent sample harmonic mean size for what you would have with two
equal samples. The harmonic mean sample size is given by the formula:
(2)(N,)(N5)
Harmonic mean =
N, +N,
Consider an extreme example in which there are 6 people in one group and 34 in the
other. The harmonic mean comes out to about 10:
TABLE 13.3 Approximate power for studies using the t test for independent means
testing hypotheses at the .05 significance level
Number of participants in each group Effect size
Small Medium Large
(.20) (.50) (.80)
One-tailed test
10 ll 29 ae Fo
20 15 46 80
30 ale: 61 92
40 eee ve 97
50 26 .80 99
100 Al 97 -
Two-tailed test
207 19 39
20 .09 33 69
30 12 47 .86
40 14 .60 94
50 17 .70 98
100 Ve) 94 -
How to proceed
1 Select Statistics and choose Compare Means from the drop-down menu.
On the next drop-down menu click on /ndependent-Samples T Test to open the
Independent Samples T Test dialogue box.
Select your dependent variable and click on the arrow button to place it into the Test
variables box.
Select your independent variable and move it to the Grouping Variable area.
Choose Define Groups.
Type in the box beside Group 1 the coding for one category of your grouping variable.
Do the same with the other box beside Group 2.
Aw Click on Continue and then OK to produce the output.
AWN
CON)
188
4 48aq ut jndino
quapuadap fo
adwexq S'¢l
ATAVL
How to report the output in Table 13.5
You should state the results of this analysis like this: ‘An independent-samples ¢test was
conducted to evaluate the hypotheses that males and females differ significantly in their
attitudes to research and in their length of work experience. The mean attitude to
research score of females (M = 47.52, SD = 8.31) is not statistically significantly different
(t = .453, df = 82, two-tailed p = .652) from that of male students (M = 46.76,
SD= 7.05). The effect size is d = .02, implying virtually no effect at all’.
A similar statement indicating that the ¢ test for unequal variances was used would be
made for the length of work experience data but would emphasise in this case the
statistical significance of the differences between the genders.
Many social scientists/psychologists are probably unaware of the existence of the
unequal variance ¢ test. If you have to use one, you should write: “Because the variances
for the two groups were significantly unequal (F = 11.05, p < 0.001), a test for unequal
variances was used’.
SEEEOONNOO
i 2S ZO Rae ae Us} 10
seen oeoee
i 2 A Ne ey PBR SE ALG
X values (rank order) X values (rank order)
remember a set of words which are organised in categories than a set of words that has
no organisation.
We allocate six subjects randomly to a group where they will learn material without
the aid of organised categories, and allocate eight subjects to a group where they will. The
effect of organisation on memory will be measured by the number of words each subject
recalls (Table 13.6).
1 Place all the scores in rank order, taking both groups together. Give ‘rank 1’ to the
lowest score, ‘rank 2’ to the next lowest score, etc. Add the ranks for each group
separately.
2 Use the following formula to find Uj.
uy, = 1, - Nit
Z
U, = 449-5020
Fos
2
= 44.0=21
= 23
Similarly, ins e N.(N, + 1)
2
= 61-3? 25
The smaller of these two U values is used to test significance. The smaller this value
the more significant it is.
3 Table 13.7 (1) to (4) gives you the critical values of U or U; at different levels of
significance for one-tailed and two-tailed tests and different combinations of N; and
N> for the groups of subjects. You have to locate the appropriate table from (1) to (4).
In our case, as we have selected a significant level of p < 0.05 two-tailed, choose Table
13.7 (3) on p. 193; Ny is shown on the top row and Nj on the left-hand column. At
the intersection of the appropriate N; column (in our case, 6) with the appropriate
N> row (in our case, 8) you will find the critical two-tailed value of U = 8. Our
smallest calculated U value is larger than this and therefore you can retain the null
hypothesis and conclude that there is no significant difference in the scores due to the
effect of the independent variable, i.e. the degree of organisation of the material.
In another example, a psychologist studying the manual dexterity of children recorded
the amount of time in seconds required by each child to arrange blocks in a specified
pattern. The data was as follows with scores ranked as though from one group:
Boys: Ravescotes 25 18" 29 "42" 21 iS gc bal ee
Ranks 3 1 Ge pad, 22
(N, + 1) (N, + 1)
ie te Bas 5A U, = T, -N, ——
Ute Uz =33
12 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20
N2
1 — = ee fe ‘
2 : - =-— = = 0 0
3 - = O° “O° Ore il ] ] 2 2 2 2 3 3
+ O10] ] Pe ee 3 4 5 5 6 6 7 8
Sy ee ee et eet es 34 2447557 6 rig ee Ty 8 9° 10. Thhidd2alsS
Ov eS Sg Ba) Pars aes 6a. 7, 9 105 41) 21S Is ete oe ae
_. —. — — Q? i 3) 4A 6 7) 9S TO) 12 tS «1S Ue IS Se
Ser eer ee |6 UF A OG FT PG eall.-Ts- 150 W 18 2 ee ee 26 ee
9S wh See S Go 1 113-9698 2622 2 24a eee
LO Oe PS 6G B91 138m 1 6: 182) @24.526)..29. Sh 84 aie ee
1] =e tO eS. O67 10: 13. 16418: 20 24 27) 30 (338 360 49 3a as as
(oer eS Gl SD feed: 18, 20 24 2F Sl 340 SF Al Aare es
nee ee en? 01S TF 20 4 Sir 4s ec. 42 45 49 So oe ou
TAD = OE 72171 TSBs 22e ZEN 3001341938 942-468 S54 SSe6s es
Te) a ie, tet 8 rl P1620 924 290) 33 iF) AW AG Silas D5 aeGONG Ats 6 Ss
16s 2S, 9. 13:18: 22227 -31 36-41 457-50) 55) 60E 65) FON 74a
[ee en ee? 6 10h SS 19924 29" 340 394449" 54°60 “65" 70 7) Oot sao
ita Ome G 21 26res leo? 42, 40, 55 30 £645 0705-75 | Crean
ae hs Q Set 251782? 28) 633 N39 M455 1 B56 763 Ooms A Sl ere es S99
20s S50 3) 9810180 1824430 13674724854 160: 16%) ZS 52952865 192 9 185
al
*
SN
™
\“
@
AA 4]
Ve ey
“N
~
N Om
-
x t stow
7
>, 7,5 IW téiltd Vist at ON0*
he
uN
AN
‘a
Ww N ON
w~
WS
¢
wN
47
w
//
193
As U, =7 is the smaller of the two values it is used to compare within the tabled value
of U. Because this is greater than the tabled value of 6, we fail to reject the null
hypothesis. At the .05 level ofsignificance the data does not provide sufficient evidence
that there is a significant difference between boys and girls in manual dexterity.
1 Using random samples of 750 children, an investigator tested the hypothesis that
children in Perth receive significantly less pocket money than children in Adelaide.
Which one of the following tests should he apply?
at test for related samples.
by Z test.
c Mann-Whitney test.
dt test for independent samples.
In the hypothesis above, which one of the following procedures should the
investigator use?
a_ A two-tailed test.
b Aone-tailed test.
c_ Either a one-tailed or a two-tailed test.
N What is the null hypothesis of the Mann-Whitney test?
3 Two random groups of pupils are used to determine whether a programmed
textbook is worse than an ordinary textbook. The following data are the final test
scores on a common examination.
Programmed text Bic LS WO 2 he FO 76
Ordinary text’ 6634 — BS GO" BF 94 FSG 90
Perform a Mann-Whitney U test and come to a conclusion using the 0.05 level.
4 Twenty six-letter words were presented on a screen to two groups of different
subjects. For both groups, the words were all presented at a very fast level of
exposure. For one group, the words were presented on the left-hand side of the
screen; for the other group, the words were presented on the right-hand side of the
screen. The experimenter was testing the effects of reading (left to right in our
culture) on a subject’s ability to recognise words at very fast levels of presentation.
The results were as follows:
Group 1 (left-hand side presentation): 18, 15, 17, 13, 11, 16, 10, 17.
Group 2 (right-hand side presentation): 17, 13, 12, 16, 10, 15, 11, 13, 12.
The level of significance chosen was p < 0.05. The experimental hypothesis stated
that subjects given the words on the left-hand side of the screen would perform
better. Use a Mann-Whitney U test.
a Why do we use a Mann-Whitney test in this experiment?
b Is the experimental hypothesis one-tailed or two-tailed?
c Can the experimental hypothesis be accepted at the p < 0.05 level of
significance?
5 Children’s tendency to stereotype according to traditional sex roles was observed.
Two groups were drawn, one with mothers who had full-time paid employment
b Test statistics @
HGHT RESEARCH SELFCONC STATS SALNOW
Mann-Whitney
U 72.000 840.000 674.000 790.000 784.000
Wilcoxon W 975.000 1743.000 1577.000 1693.000 1687.000
JE —7.260 —.376 —1.862 —.824 —.878
Asymp. Sig.
(2-tailed) .000 .707 .063 .410 .380
The table displays data comparing males and females on five variables: height,
attitudes to research, self-concept, statistics test score, and their current salary.
How to interpret the output in Table 13.8
e The focal data is in the top table where the average rank given to each group on each
variable is located. For example, female height rank mean is 23.21, while the average
rank given to male height is 61.79. This means that the heights for males tend to be
larger than those for females, as the ranking is done from smallest person = rank 1 to
tallest person = rank 84. The other variables can be interpreted in the same way.
D
= D
where Sp = the standard error of the difference between two means when observations
are paired.
6 8 = 4
]
2. 10 14 4 16
3 4 2 2 4
4 1S 28 =o 9
5 5 8 3) 9
N=5 =D =-10 = 42
xD?
Dis?
If you compare this with the between-subjects ¢ test formula, you will notice that we
have substituted D for Mj — M> and Sp for SEgi¢p.
The denominator of the above formula SD is calculated from the following formula:
2
¥ DE = ao
S= sy —_————
7 NIN—1)
where: YD2 =sum of the squared difference scores
(XD)? = sum of the difference scores squared
N = number ofpairs
Thus the full formula for t when the samples are correlated is:
Ol
1 Use the tf table (Table 13.1, page 178) and decide whether the null hypothesis is
retained or rejected in this example at the 0.05 level, two-tailed test.
2 What is the df when a matched-pairs design experiment is performed with a total of
twenty subjects in all?
3 in amatched-group design with the following statistics, would you retain the null
hypothesis using the 5% level of significance in a two-tailed test?
D = 2.10; Sp= 0.70; N in each group = 18
4 To test the effects of organisation on recall, twelve out of twenty-four matched
subjects were told to try and relate the forty words contained in a list. The other
twelve subjects were not given this hint. After the first trial, the scores of the
instructed subjects were 15, 12, 11, 16, 14, 11, 9, 15, 16, 12, 10, 15 (words recalled)
and those of the naive subjects were 12, 13, 12, 13, 10, 13, 13, 12, 11, 14, 13, 12
(words recalled). Can you reject the null hypothesis at p < 0.01 (two-tailed)?
5 An investigator is interested in assessing the effect of a film on the attitudes of
students toward a certain issue. A sample of six students is administered an attitude
scale before and after viewing the film. The before-and-after scores are as follows (a
high score reflects positive attitudes).
(continued)
The concept of sampling error underpins the whole subject of statistical testing.
Can you explain why? Illustrate your answer by reference to two statistical tests.
A psychologist believes that relaxation can reduce the severity of asthma attacks
in children. The researcher measures the severity of asthma attacks by the number
of doses of medicine required. Then relaxation training is given. The week
following training, the number of doses is again recorded. The data are reported
below.
Subject Before X After Y D D2
A 9 4 -5 25
B 4 1 -3 9
Cc 5 5 0 0
D 4 0 -4 16
E 5 1 4 16
sum of D =-16 sum of D2 = 66
What conclusion could we reach about the effects of relaxation training using
p < .05 two-tailed test?
A researcher wishes to examine the effect of hypnosis treatment on teenage smokers.
A sample of four adolescent smokers records the average number of cigarettes
smoked per day in the week prior to treatment. One month into the hypnosis
treatment they record the average daily cigarette consumption over a week. The
data are as follows:
Computing the effect size statistic for the parametric paired-samples t test
SPSS supplies all the information necessary to compute two types of effect size indices,
dand eta. The d statistic may be computed using the following equations:
t
mean
6 = Or of = VN
SD
where the mean and standard deviation are reported in the output under ‘paired
differences’.
The d statistic evaluates the degree that the mean of the difference scores deviates
from 0 in standard deviation units. If d equals 0, the mean of the difference scores is
equal to 0. As d diverges from 0, the effect size becomes larger. The value of d can range
FS eS
te iN=
Traditionally, values of .01, .06 and .14 represent small, medium and large effect
sizes respectively.
How to proceed
1 Choose Statistics to display a drop-down menu.
2 Select Compare Means to open a second drop-down menu from which you select
Paired Sample T Test.
3 In the open Paired-Samples T Test dialogue box click on your first dependent variable
which moves it to the Current Selections box as variable 1.
4 Select your other variable which puts it beside Variable 2 in the Current Selections box.
5 Click on the arrow button to move both variables into the Paired Variables box.
6 Choose OK to produce the output.
Std
Std error
Mean N deviation mean
Pair 1 POST-TEST 47.1429 84 7.6727 8372
PRE-TEST 42.4048 84 9.7833 1.0674
95% confidence
Std interval of the
Std error difference Sig.
deviation | mean (2-tailed)
Pair 1 POST-TEST/
PRE-TEST 4.7381 | 12.0842 i} 1.3185 | 2.1157 | 7.3605 3.594 | 83 | .001
Rationale
The aim of the Wilcoxon signed ranks test is to compare the performance of the same
subjects or matched pairs of subjects across two occasions or conditions, to determine
whether there are significant differences between the scores from the two performances.
The scores of Occasion 2 or Condition B are subtracted from those of Occasion I or
Condition A, and the resulting differences given a plus (+), or, if negative, a minus (—)
sign. The differences are then ranked in order of their absolute size; the smallest size
a We arrived at our value of 7 by adding together the ranks with the plus (+) sign
because their total is undoubtedly smaller than that of the ranks with the minus (—)
sign. Sometimes the difference between the two totals is not obvious. If there is any
doubt, the totals of the ranks for both plus and minus signs should be computed to
see which is smallest.
b We reduced the number of pairs of subjects N from 10 to 9 in view of the tie which
occurred between the scores of one pair.
5 1 - - - 28 130 Wz 102 92
6 2 1 ~ ~ 29 141 127 111 100
7 4 2 0 - 30 152 137 120 109
8 6 4 2 0 31 163 148 130 118
9 8 6 3 2 32 175 159 141 128
10 11 8 5 3 33 188 17] 151 138
11 14 11 7 5 34 201 183 162 149
12 17, 14 10 7 35 214 Lie 174 160
T3 21 17 13 10 36 228 208 186 171
14 26 21 16 13 37 242 22Z 198 183
15 30 25 20 16 38 256 235 211 195
16 36 30 24 12 39 271 250 224 208
AZ, 4] 35 28 23 40 287 264 238 221
18 47 40 33 28 41 303 279 pW 234
ne, 54 46 38 32 42 319 295 267 248
20 60 52 43 37 43 336 311 281 262
21 68 52 49 43 de 353 327 297 277
22 75 66 56 49 45 371 344 313 292
23 83 Vhs) 62 55 46 389 361 a2 307
24 a2 81 69 61 47 408 Eve) 345 323
25 101 90 77 68 48 427 397 362 339
26 110 98 85 76 49 446 415 380 356
Jai 120 107 93 84 50 466 434 398 373
Source: F. Wilcoxon, Some Rapid Approximate Statistical Procedures, American Cyanamid Co. 1949.
The symbol T denotes the small sum of ranks associated with differences that are all of the same sign. For
any given N (number of subjects or pairs of subjects), the observed T is significant at a given level if it is
equal to or less than the value shown in the table.
1 Twelve mothers were tested for their attitudes to pre-school education, before and
after their own children attended a pre-school. Here is the data with high scores
representing a positive attitudes:
Before 7 92 8 6 4. 10 12 7 a 9
After Zo le 1) ie / 9 13° «14 8 9 73
a What is the null hypothesis?
b Using the Wilcoxon, interpret the results at the 5% level with a two-tailed test.
*Answers on p. 601.
c Test statistics*
SALNOW/SALBEG
jl —7.963>
Asymp. Sig.
(2-tailed) .000
This table displays results comparing the starting salaries and current salaries of 84
persons in one organisation.
Demonstrate this smaller standard error for the related t test, by recalculating the
example in Table 14.3 by the between-subjects t test as though the sets of scores came
from independent groups. What difference do you note? Why is the standard error now
larger?
In a related t test, which of the following make it easier to reject the null hypothesis?
a Increasing N
b Decreasing N
c Decreasing D
d__ Using a one-tailed rather than a two-tailed test
*Answers on p. 601.
In repeated measures or within-groups design a single sample of subjects is randomly selected and
measurements repeated on this sample for both treatment conditions. This often takes the form
of a before and after study. The null hypothesis states that there is no significant difference between
conditions. The matched-pairs design employs two groups which are matched on a pair-by-pair
basis, which can then be regarded as the same group tested twice.
The repeated measures design has the advantage of reducing error variance due to the removal
of individual differences. This increases the possibility of detecting real effects from the
experimental treatment. A problem is the likely presence of carry-over effects, such as practice and
motivation. Counterbalancing should be used to prevent such order effects.
The Wilcoxon test is the non-parametric test for the within-groups design. It involves testing
for significant differences between ranks in the two conditions.
We have seen how research data obtained to test hypotheses of difference are analysed.
We now turn to research data obtained to test hypotheses of relationships and see how
that is analysed.
Research on relationships is concerned with the association of variables; essentially
how strongly variables are related to each other. In these situations there are a number
of statistical techniques for analysing the data depending on the level of data. Three
techniques will be considered:
1 Chi square for testing associations with nominal data.
2 Rank order correlation for testing relationships with ordinal data.
3 Product—moment correlation for testing relationships with interval data.
Chi-square rationale
Chi square is a simple non-parametric test of significance, suitable for nominal data
where observations can be classified into discrete categories and treated as frequencies.
Chi-square tests hypotheses about the independence (or alternatively the association) of
frequency counts in various categories.
For example, the data may be the proportions ofstudents preferring each one of three
brands of low-calorie colas, or the proportion ofstudents voting for particular candidates
in an election for student union president. Categories of responses are set up, such as
Brands A, B and C of colas or ‘For and Against’ in the election, and the number of
individuals or events which fall into each category is recorded. In such a situation one
can obtain nothing more than the frequency, or the number of times, that a particular
category is chosen. This constitutes nominal data. With such data the only analysis
possible is to determine whether the frequencies observed in the sample differ
significantly from hypothesised frequencies. There are many educational and social
science issues which involve nominal data for which chi square is a simple and
appropriate means of analysis; for example, social class levels, academic subject categories,
age groups, sex, voting preferences, pass—fail dichotomies, etc. The symbol is the Greek
letter chi which is pronounced ‘kye’ to rhyme with ‘sky’, and written as x7.
The hypotheses for the chi-square test are Hg where the variables are statistically
independent, and H, where the variables are statistically dependent.
[
We would expect an even distribution. That is 25 in each season. Now it might turn
out that when we actually grouped the observed birth dates of the 100 pupils they were
distributed thus:
Spring Summer Autumn Winter
The question now would be, ‘Is the fact that the observed frequencies are different
from what we expected more likely due to chance, or does it more likely represent actual
population differences in birth rate during the different seasons?’.
To arrive at an estimate of the probability that the observed frequency distribution is
due to chance, the ne test is applied. The chi-square test permits us to estimate the
probability that observed frequencies differ from expected frequencies through chance
alone.
If the null hypothesis is true, any departure from these frequencies would be the result
of pure chance. But how far can a departure from these frequencies go before we can say
that such a discrepancy would occur so infrequently on a chance basis that our
observations are significantly different from those expected? Well, when x7 is computed
it can be compared with its table value (Table 15.1) at the usual levels of significance to
see if it reaches or exceeds them. If it does, the null hypothesis of chance variation is
rejected.
What are the two major conventional levels of significance we would employ?
In calculating x7, we need to enter our observed and expected data into a table of cells.
The cells are filled with the data on the lines of the following model:
O E
~
Nh I] BZ
A relatively large chi square should indicate that the Es differed more from the Os than
is likely by chance. As to how large a value for chi square is needed to reject the null
hypothesis of no significant association between season and births, consult Table 15.1.
To enter the table we need to know the df
Do you recall what degrees of freedom (df) represent? Look back at p. 154 if you don’t.
The number of observations free to vary in our example is 3 because once we have
fixed the frequency of three categories, say spring, summer and autumn, the fourth has
to be 22 to make the total 100. So the fourth category is fixed. The same principle holds
true for any number of categories. The degrees of freedom in a goodness-of-fit test is one
less than the number of categories (k — 1).
If we were dealing with preferences for five brands of a product, how many df would we
have?
The final step is to refer to Table 15.1 in order to determine whether the obtained
value is statistically significant or not. Look at it now. The probability values along the
top refer to the likelihood of the values of x7 listed in the columns below being reached
or exceeded by chance. Our old friend df forms the left-hand column. In our example,
it is entered with 3 df.
As is true with the table, if the x? value calculated is equal to or greater than the value
required for significance at a predetermined probability level for the df then the null
hypothesis of no real difference between the observed and expected frequencies is rejected
at that level of significance. If the calculated x7 is smaller than the tabled value required for
Source: Table IV, Fisher & Yates, Statistical Tables for Biological, Agricultural and Medical Research, 6th edn,
1974, Addison Wesley Longman, London.
To use Table 15.1 you first compute the df—which is explained in the text—and locate the appropriate row.
You then look across the table to see how large x? needs to be in order to reach a desired level of
significance. For example with 1 df, x2 must be at least 3.84 to be significant at the 5% level; with 10 dfit
must be at least 18.31.
significance at that level, then one may not reject the null hypothesis. Table 15.1 tells us that
with 3 dfa x7 value of at least 7.82 is necessary to reject the null hypothesis at the 0.05 level.
The value of x? in our example is 8.72. Since 8.72 is more than the value required for
significance, we can reject the null hypothesis, and accept the alternative hypothesis that
births are not randomly distributed through the year.
At this point, it is important to note the effect of the degrees of freedom on the
significance of any calculated x”. In Table 15.1, it can be seen that the x? required for
Signiniean sesat any given level gets larger as the number of degrees of freedom gets parece
For 2 dfa x7 of 5.99is needed for significance at the 5% level. With 3 df this y? value
(JO- | - 0.5)"
aa) a el
‘ E
For example, if in one cell O = 60 and E = 80 then (O — E) = 20. From this would
be taken 0.5 so that (O — E) is corrected to 19.5 before squaring.
1 A random sample of teachers in a local authority were asked, ‘Should the local
council economise by cutting down the number of hours the town swimming pool
is open?’. The results were:
Agree Disagree Indifferent
12 24 2
What is the null hypothesis?
What is the value of chi square?
What are the degrees of freedom (df)?
Is the value significant at the 0.05 level?
MO
2
Oo
Q0 What conclusion would be reached?
2 Ahn investigator studying sex stereotypes in basic reading texts finds that there are
16 episodes in which Joe plays the role of leader and four in which Jane plays the role
of leader. Could these proportions be a function of chance?
(continued)
Candidate Frequency
Smith 5
Jones 20
Brown 5
Test the null hypothesis that there is no difference in preference for any of the
candidates.
A group of college students were shown a series of pictures illustrating a new clothing
style. After viewing the pictures, each student was asked if they approved or
disapproved of the style. Below are the data received from 54 students.
Approve 35
Disapprove 19
Compute the chi square for the above data.
Using p = 0.05 as your designated level for significance, use Table 15.1 to evaluate
the chi square obtained. State the null hypothesis being tested. On the basis of the
chi-square test, do you accept or reject the null hypothesis?
A researcher wished to examine children’s preferences among four types of
transportation. A sample of 90 children was randomly selected and asked which
type they preferred. The following data were obtained:
Protestant 40 ZZ 10
Roman Catholic 12 36 42
Protestant 40 22 10 72
The expected frequency for each cell is calculated by multiplying the row total for that
cell by the column total for that cell and then dividing by the grand total.
44 x 74
Expected for cell i ‘b = 06 = Gi
Look up in Table 15.1 the intersection of 4 df and 0.05 level of significance. Is our value
of 38.05 significant at the 0.05 level? Is it significant at the 0.01 level as well?
Since our computed chi-square far exceeds the critical tabled values of 9.488 (0.05 level)
and 13.277 (0.01 level), we are well justified in claiming that there is a significant
association between family size and religious persuasion in our sample. We reject the null
hypothesis that there is only a chance relationship, i.e. that both categories are independent.
2 x 2 contingency table
One of the most common uses of the chi-square test of the independence of categorical
variables is with the 2 x 2 contingency table, in which there are two variables, each
divided into two categories. Let use imagine our categories are 1) adult/adolescent, and
2) approve of abortion/don’t approve. The x7 table would be as follows:
Adolescent Adult
Approve of Abortion 120 | 18 | 138
Don’t approve of Abortion 20 | 98 | 118
140 | 116 alee 2256
(df= 1 always in a 2 x 2 table)
It is up to you now to work out E, (O — E) and (O- E)2. Does Yates’ correction mean
anything to you? When you have done all this, calculate x7 and check for significance
at the 0.05 and 0.01 levels. How many degrees of freedom are you to use? What is your
conclusion about the results in terms of the null hypothesis? A completed table and
interpretation of results will be found below for the cowardly.
Adolescent Adult
Approve 120 ES 18 62.53 138
of Abortion 44.5 1936 44.5 1936
Don’t Approve 20 64.53 98 DED 118
of Abortion 44.5 1936 44.5 1936
chi square
effect size =
(total sample size across all categories)(number of categories— 1)
TABLE 15.3 Approximate power for the chi-square test for independence for testing
hypotheses at the .05 significance level
Effect size
Total df Total N Small Medium Large
( = .10) (o = .30) (o = .50)
1 25 .08 a2 .70
50 ali 56 94
100 aly: 85
200 Pe) ee
2 25 207 ao .60
50 09 A6 90
100 .13 1a
200 223 27
3 25 07 21 54
50 .08 .40 .86
100 a2 71 99
200 319 96
4 2 .06 19 50
50 .08 .36 82
100 ali .66 79
200 sl 94
Table 15.4 indicates the approximate total number of participants needed for
80 per cent power with small, medium, and large effect sizes at the .05 significance level
for chi-square tests of independence of 2, 3, 4, and 5 degrees of freedom. For example,
suppose you are planning a study with a 3 x 3 (df= 4) contingency table. You expect a
large effect size and will use the .05 significance level. According to the table, you would
only need 48 participants.
TABLE 15.4 Approximate number ofsubjects needed for 80 per cent power for the
chi square test for independence for testing hypotheses at the .05 level ofsignificance
Effect size
Total df Small Medium Large
(> = .10) (> = .30) (9 = .50)
1 785 87 26
pi 964 107 59.
3 1090 121 44
4 1194 133 48
How to proceed
1 Click on Statistics on the menu bar of the Applications window which produces a
drop-down menu.
2 Select Nonparametric Tests from this drop-down menu to open a second drop-down
menu.
3 Choose Chi-square which opens the Chi-Square Test dialogue box.
4 Select the variable that codes the three categories of agreement and then click on the
arrow button which transfers this variable to the Test Variable List: box.
5 Select OK. The results of the analysis are then displayed like the dummy data below.
b Test statistics
DIETHROW
Chi square? 4.57]
df 5
Asymp. sig. .470
a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 14.0.
Effect size
This is computed as:
: chi square
effect size = - ee ————-
(total sample size across all categories)(number of categones — 1
57
4571 — 9nI1
(84)(5)
a 0 cells (0%) have expected count less than 5. The minimum expected count is 11.00.
d Symmetric measures
Value Approx. Sig.
Nominal by nominal Phi 321 .013
Cramer's V S21 .013
N of valid cases 24
Are these response differences significant at the 5% level? What is the hypothesis you
will accept?
A sample of children was classified into those who took paper rounds out every
evening and those who didn’t. The teacher was then asked to indicate which children
had failed to hand in homework at least once during the last month.
Failure to hand Homework always
in homework handed in
Paper round 12 8
No paper round 6 19
Pass 50 47 56
Fail 5 14 8
*Answers on p. 601.
A chi-square test is a non-parametric technique of testing hypotheses about the form of a frequency
distribution using nominal data. Two types of chi square tests are the goodness of fit test and the
test for independence. The former compares how well the frequency distribution for a sample
fitted with the frequency distribution predicted by the null hypothesis. The latter assesses
relationships between two variables, with the null hypothesis stating that there is no relationship
between the two variables. Rejecting the null hypothesis implies a relationship between the two
variables. Both tests are based on the assumption that each observation is independent of the
others, and that one observation cannot be classified in more than one category.
The chi square statistic is distorted when there are less than five observations in a cell and the
test should not be performed if this situation exists. Yates correction should be employed in cross
tabulations with 1df.
Reference
Cohen, J. (1988), Statistical Power Analysis for the Behavioural Sciences, Laurence Erlbaum, New York.
Introduction
My dictionary defines correlation as ‘the mutual relation of two or more things; the act
or process of showing the existence of a relationship between things’.
You may note that the relationship is mutual or reciprocating and that we do not
include in our concept ofcorrelation any idea at all of the one thing being the cause and
the other thing being the effect. We play safe. We merely say that we have discovered
that two things are connected. Now, it may well be that one thing is a cause of another,
but correlation does not delve that far down on its own.
In principle, correlation is different from any of the inferential statistics you have so
far studied, because these techniques compare groups as groups, and not the individuals
who compose them. Ask yourself, “What happens to the individual in chi square?’. We
throw him or her into a cell with a number of others and forget all about the individual.
(Indeed, we like large numbers in chi-square cells in case the expected frequency is less
than 5!) In ¢ tests, we do not analyse the individual, but only the performance of the
group to which he or she has been allocated—+ tests cannot function with only one
individual making up a ‘group’.
Basically, however, all these techniques are ‘difference’ testings. All are quite practical:
‘Are the persons in Group A (as a group) significantly better than those in Group B at
doing something or other, etc.?’.
We are using the razor of difference to settle a question. But in relationship testing
we are examining the strength of aconnection between two characteristics which both
belong to the same individual, or, at least, two variables with a common basis to them.
Many variables or events in nature are related to each other. As the sun sets, the
temperature decreases; as children increase in age their size of vocabulary also increases;
persons bright in one academic area tend to be bright in other areas. These relationships are
correlations. If the river rises when it rains then the two events are said to have a positive
correlation, i.e. when an increase in one variable coincides with an increase in another there
exists the positive correlation. There is a negative correlation between altitude and air
pressure, as an increase in altitude brings with it a decrease in pressure. In children there is
a negative correlation between age and bed-wetting. A negative correlation thus occurs when
an increase in one variable coincides with a decrease in another.
Most of you have tried to play the piano at some time. The movement of the hands
over the keys also illustrates correlation. Imagine you are practising the scale of C major
and both hands are commencing on C, an octave apart, to travel up the keyboard in
unison. This is a positive correlation between the movements of the hands (or scores).
If both my hands stay in the same position on the keyboard this is not a correlation
since there is no movement (or scores) to calculate. Correlation is here a measure of
mutual movement up and down a scale of scores.
left hand J tight heh
positive correlation (up the scale)
<—___—— left hand «<—_______ right hand
positive correlation (down the scale)
left hand right hand
still still
no correlation of movement possible
If you commence both hands on the same note and play the scale simultaneously in
different directions, the right hand going up as the left hand travels down, then there is
a negative correlation.
left hand —» ~«— right hand, or negative correlation
=—_ left hand right hand ——_____» (scales in contrary
motion)
When the movements of my hands on the keyboard bear no systematic relationships
in direction with each other then it is a zero correlation. But when the hands sometimes
go in a systematic relationship with each other there is calculated a modest correlation,
negative or positive as the case may be, i.e. when I am playing a piece of music. Now that
we have some glimpse of what correlation is concerned with we will desert our piano
practice and turn to drawing graphs.
Scores Oa Scores _
on sy on
“ perfect erfect
test Y y
yd ee
positive
eo aan |negative
enic Ps
‘¢ correlation correlation wd
te r= +1,0 r==1.0 a
(c) (d)
a
y aie
i
e s e F
(e)
zero correlation
Scores i 200
on
test Y
Scores on test X
Thus, a correlation coefficient indicates both the direction and the strength of
relationship between two variables. The direction of relationship is indicated by the sign
(+ or —), and the strength of relationship is represented by the absolute size of the
coefficient, i.e. how close it is to +1.00 or —1.00.
Below are the final examination scores in algebra and English for eight students. Draw a
scatter diagram of these scores. Let the algebra scores be the X variable.
Student A B C D E F G H
Algebra scores = 81 84 86 82 85 82 83 84
English scores os 97 98 94 96 95 94 o>
Be careful never to confuse negative correlation with zero correlation. The latter
simply means no correlation or relationship whatever between two sets of data, whereas
a negative correlation is a definite relationship, the strength of which is indicated by its
size. It is absolutely necessary to place the algebraic sign (+ or —) before the numerical
value of the correlation as the interpretation ofthe correlation is affected by its positive
or negative condition.
A correlation coefficient is not a direct measure of the percentage of relationship
between two variables; however, its square is (see p. 239). One cannot say that a
correlation of 0.90 is three times as close as a relationship of +0.30, but merely that it
indicates a much stronger degree of relationship. It is a pure number that has no
connection with the units in which the variables are measured. Correlations are usually
expressed to two and often three places of decimals. There are a number of different
correlation coefficients which can be calculated to express the strength of the relationship
between two variables, depending on the level of data. All correlation coefficients,
however, share in common the property that they range between +1.00 and —1.00.
ee X(ZX)(ZY)
ee
EXAMPLE
7.00
Average product = —— = ny = +1.00
paar 700u
The scores for Test X and those for Test Y varied systematically to the highest degree,
i.e. each person did precisely as well or as badly, on Y as on X, and so the r coefficient
came out as +1.0 (the highest possible). With this degree of correlation one could have
predicted a subject’s score on X from his or her score on Y and saved oneself the trouble
of testing him or her on X, or vice versa.
Look closely at the middle column in the above example—the products of ZX scores
and ZY scores. Some people score a lot in this column and some score little—one subject
scores zero. In other words, these subjects are each contributing quite different amounts
to the sum of the cross-products (ZX x ZY). Why should they differ so much in their
contributions when they all duplicate exactly their score on X and Y? Each person did
precisely as well or as badly. The reason is seen in their Z score on X or on Y. Those
people who had large Z scores, i.e. those people who were most different from their
average colleagues, and who were consistently different (on Y as well as on X) contributed
a high XY score in the middle column. (Ifyou bet that you can pick the winner in both
horse races you should be promised more winnings than if you bet only that you can pick
the winner in one race, quite divorced from your ability to do so in any other.) In placing
x Y XY x2 y2
20 14 280 400 196
26 10 260 676 100
17 8 136 289 64
14 2 28 196 4
8 6 48 64 36
85 40 752 1625 400
XIX =85 XTY =40
IXY =752 rX2 = 1625
LY? (22/400 (9 37225
(xY)2 = 1600 N =5
EXAMPLE
1 If the correlation (7) between variable X and variable Y = 0, then the coefficient of
determination = 0* x 100 = 0% (see Figure 16.2).
FIGURE 16.2
Total Total
variance variance
Xx Y
FIGURE 16.3
64%
ONG
100%
Now, let us bring the algebra to life by looking at a few examples. Imagine that we found
a correlation of +0.73 between IQ scores and self concept scores. This means that
(0.73)? = 0.53 or 53 per cent of the variance in the IQ scores is predictable from the variance
in the self-concept scores. Similarly too, if we find an estimated correlation of +0.16 between
desired age for leaving school and verbal fluency, this means that if we designate the former
variable as dependent and the latter variable as independent only (+0.16)? = 0.03, or
3 per cent of the variance in desired age of leaving school is predicted from verbal fluency.
You can now see that we can use the product-moment correlation coefficient to
interpret the strength of relationship between two variables in a much more precise way.
We can define the strength of relationship as the proportion of variance in one variable
which is predictable from variance in the other. We can go further and say now that a
correlation of 0.71 is twice as strong as a correlation of 0.50 in the sense that the former
correlation predicts twice the amount of the variance in a dependent variable than is
predicted by the latter (50 per cent as against 25 per cent).
The variance interpretation of correlation emphasises the point that even with
strongly correlated measures a substantial amount of variance in the dependent
variable remains unaccounted for. It is well to bear this in mind when looking at
correlations reported in research. Many researchers set their sights at finding
statistically significant correlations, which simply means that the correlation is
unlikely to have occurred by chance. When they come to draw causal inferences from
their findings, however, the amount of variance they have actually explained by a
significant correlation is very small indeed.
Take, for example, Entwistle’s correlations (Entwistle 1972) between a number of
personality characteristics and academic performance in higher education. The two
largest correlations are —0.41 for ‘neuroticism’ among women students in
polytechnics, and 0.39 for A-level grades among female students at colleges of
education. It is notable that in the former case, only (0.41)* = 17% of the variance
in academic performance is explained by neuroticism and in the latter case, 15 per
cent. Such correlations are significant because N is large, yet personality factors are
here explaining only a small part of the variance in academic performance.
For example, with a sample of 20, df = 18, and r must equal to at least .4438 to be significant at the .05
level (two-tailed test). With a sample of 12, df= 10, you need an rof at least .5760. If your dfis not
represented in the left-hand column of the table, take the next lowest figure.
Child X ¥ pi y= XY
A 4 9 16 81 36
B 1 4 1 16
S 3 1 “4 1 3
D 6 7 36 49 42
E 5 3 20 9 (Re
F 4 2 16 4 8
N=6 =X = 23 LY =26 =X?
=103 SY? =)160 XXY
= 108
ni dionay N2 XY = (2 XZ Y)
VINEX? - (© X)7]INEY2 — ( Y)?]
Calculate r and make a sensible comment about the relationship of X and Y. Some
calculation has already been done for you.
2 For the following set of data, the researcher has made this hypothesis: ‘There is a
relationship between the arithmetic scores and the English scores’. Compute the
Pearson product-moment correlation coefficient for these data.
x y
2 9
1 10
3 6
0 8
4 2
A researcher obtains an r=-—41 with N = 30. Using the 5% level, is this significant?
eo In a sample of 20 how large a correlation is needed for significance at .05 level?
As sample size increases the magnitude of the correlation necessary for significance
decreases. Why?
10 On the next page are some data correlating children’s reading test scores with
their spelling scores.
(continued)
where d is the difference between ranks for each pair of observations, and N is the
number of pairs of observations. For example, ifasubject is ranked first on one measure
but only fifth on another, d = 1-5 = 4 and d? = 16.
Rank order correlation follows the same principles as outlined earlier; that is, it ranges
from +1 to —1. It is very simple to use, but always remember to convert data into rank
order. It also demonstrates very clearly the effect of changes in covariation between the
two sets of data.
For example, (X and Y expressed in ranks):
6 <0)
rho
= 1-
5x 24
(b) X Y d d?
1 5 4 ele
2 4 » 4
3 3 0 0
4 2 2 4
5 1 4 16
r=40
fer
a KAD 81240
5X24 +120
ho= l= |
(c) X ¥ d d2
1 1 0 0
3 2 1 1
5 3 Z 4
4 4 0 0
2 5 =S 9
ye 4!
foci ee ew
5x 24 120
(no: = 1— O17 = +03
Now you try switching the ranks around and notice how the correlation changes. The
combination of the resulting rho with the visual impression of the rankings will bring
home to you (if you do not fully comprehend already) what correlation reveals.
Rank-order-correlation example
The order in which the students finish the exam (column a) serves as one set of
rankings. The test scores need to be ranked, however. The lowest score will be assigned
the rank of 1, the next lowest a score of 2 etc., as has been done in column c of the
data. Where two or more subjects have the same score the rank is calculated by
averaging the ranks those subjects cover. In our example above, two candidates
obtained 29. They were in seventh equal position, i.e. covering ranks 7 and 8. Hence,
they are both ranked as 7.5.
Column d is the difference between the rankings in column a and c irrespective of
+ or — signs, since squaring them in column e eliminates these.
Using the sum of column e, the rank order correlation can be determined as
follows:
faa jinSS 4 ZOO
10 x 99 990
= 1- 0.23 = +0.77
Can you interpret this relationship? Use Table 16.3. What does it imply?
On page 247 is a further example which shows the scores of thirteen boys in a class
of twenty-nine children on two variables—‘verbal fluency’ and ‘desired age of leaving
school’.
You can see from the columns X and Y again how we deal with the problem of‘tied
ranks’. On verbal fluency Y, for example, two boys obtained a score of 8. These scores
take up the ranks of3 and 4 in the class. So that all ranks 1-13 will finally be taken up,
This suggests barely any relation between these two variables in this sample of boys.
That is to say, ‘verbal fluency’ and ‘desired age of leaving school’ show only a small
positive correlation with each other.
While rho is only suitable for ranked (or ordinal) data, you will have realised that
interval data can be used, provided it is first ranked. However, rho will always provide
a lower estimate of correlation than r because the data is degraded, i.e. rho throws away
information in changing interval data into ranks.
Interpretation of rho
As with 7, the obtained value of rho must be compared with the critical values in Table
16.2, (p. 247). If the obtained rho equals or exceeds the tabled value, the null hypothesis
is to be rejected at the level of significance chosen.
Strong
“aes Zero
+ correlation
“1 correlation
X values restricted to
a limited range
FIGURE 16.6
Positive
FOO ee oe a:
correlation
cornelation ‘20.5.2 20
ys values restricted to
: a limited range
Calculating power
Table 16.3 gives approximate power while Table 16.4 provides minimum sample size
for 80 per cent power.
TABLE 16.3 Approximate power ofstudies using the correlation coefficient °°for
testing hypotheses at the .05 level of significance
Effect size
Small Medium Large
(r= .10) (r = .30) (r = .50)
Two-tailed
Total N:10 .06 ats se%)
20 07 22) .64
30 .08 37. 83
40 09 48 92
50 -14 557. (97.
100 sl7 86 1.00
One-tailed
Total N:10 .08 ae 46
20 ‘et co? fD
30 13 50 .90
40 15 .60 96
50 sl? 69 98
100 .26 92 1.00
Two-tailed 783 85 28
One-tailed 617 68 22
How to proceed
1 Click Statistics from the menu bar to display a drop-down menu.
2 From the drop-down menu, select Correlate which produces a smaller drop-down menu.
3 Choose Bivariate.
4 Select your variables and then click on the arrow button which places them in the
Variables box. You can either select the two variables in two separate operations or
drag the highlight down over the second variable using the mouse.
5 The Pearson option has already been pre-selected (i.e. it is a default option), so if
only Pearson’s correlation is required select OK; this closes the Bivariate Correlations
dialogue box and produces output such as that shown in Table 16.6.
6 Should you wish, you can also obtain means and standard deviations by clicking
Options, followed by those two items in the Statistics box.
Additionally, you can create a scatterplot to visually represent the relationship. This
would certainly enhance the results section. It also has the value of letting you see whether
it is a linear or non-linear relationship. You should never report a correlation coefficient
without examining the scattergram for problems such as curvilinear relationships.
In a student project it should always be possible to include a scattergram of this sort.
Unfortunately, journal articles and books tend to be restricted in the numbers they
include because of economies of space and cost.
To produce a scatterplot
1 Click Graph and then Scatter.
2 Simple is the default mode and is already selected for you. Click on Define. (If you
have a set of inter-correlations you click Matrix before Define.)
3 Move the two variables into the box by using the arrow button, then select OK to
produce the graph.
4 With a correlation it does not really matter which variable represents the horizontal or
X-axis (the abscissa), and which variable represents the vertical or Y-axis (the ordinate).
70
60
50
Age
40
30
20
0 10 20 30 40 50
Experience
e Correlations are produced as a matrix. The diagonal of this matrix (from top left to
bottom right) consists of the variable correlated with itself, which obviously gives a
perfect correlation of 1.000. Obviously no significance level is quoted for this value.
e The values of the correlations are symmetrical around the diagonal from top right to
bottom left in the matrix.
e The scattergram provides a strong visual demonstration of a strong positive
correlation.
How to proceed
In order to correlate the scores in ranks rather than as raw scores, the scores have to be
turned into ranks. To do this:
1 Select Transform and from the drop-down menu click on Rank Cases.
2 Select the arrow button to move the variable into the Variables text box.
3 Select the largest value button in the Assign Rank I to area.
4 Click OK and a new variable is created reflecting the ranking of the original variable.
This new variable will carry the same name as the original but is preceded by a letter
‘r’ to designate it as a ranked variable (e.g. ‘age’ will become ‘rage’; ‘salnow’ will
become ‘rsalnow’). Rank all variables to be used with Spearman this way.
To produce Spearman’s correlation
1 Select Spearman in the Bivariate Correlations dialogue box and de-select Pearson,
which is the default statistic.
2 Select OK to close the Bivariate Correlations dialogue box and display the output like
that shown in Table 16.6.
RANK of RANK of
SALNOW EXPER
Spearman’s Correlation RANK of
rho coefficient SALNOW 1.000 —.006
RANK of
EXPER —.006 1.000
Sig. RANK of
(2-tailed) SALNOW .987
RANK of
EXPER .987
N RANK of
SALNOW 10 10
RANK of
EXPER 10 10
STQ80*
A group of apprentices was given instruction in welding. A study was conducted to
determine if the number of hours spent in practice was related to proficiency.
Compute the rank order correlation for these data.
Using Table 16.3, determine if the rho computed is significant. Use p = 0.01 as your
designated level for significance. Would you accept or reject the null hypothesis?
A teacher was interested in knowing the extent to which her evaluation of her
children’s cooperativeness was related to their evaluation of themselves. She rated
each child on cooperativeness, using a scale ranging from 1 for very cooperative to 10
for uncooperative. The children also rated themselves, using the same scale. Below are
the data obtained for ten children. Compute the rank order correlation for this data.
Child A B Ce Dp E E=oG H | J
Teacher's rating eBid 16) 1 6 2 8 4 4 6 6
Child's self-rating 4 2 3 1 3 2 3 = 5 6
(continued)
Entry A B C D E F G H | J
Judge X 5 2 6 8 1 7 4 9 3 10
Judge Y 1 7 6 10 4 5 3 8 2 9
4 Some pupils are asked to report the hours per week they spend watching TV. Their
average academic grades are available. What is the relationship between the two?
State your null hypothesis. Can the null hypothesis be rejected at p = 0.01?
Chi square and correlation assess relationships between variables. Chi square uses nominal data
while correlation techniques mainly employ ordinal and interval data. Chi square assesses the
goodness-of-fit of data to theoretical distributions and evaluates relationships between categories.
Correlation techniques measure relationships but do not indicate cause and effect. The correlation
coefficient varies between +1, a perfect relationship, and -1, a perfect inverse relationship. The
more random the relationship, the closer the coefficient is to zero.
Essay a b c d
Judge P 3 4 y) 1
Judge Q 5) 1 4 2
The order of the essays is then rearranged so that the first judge’s ranks appear in
numerical order.
We are now in a position to determine the degree of correspondence between the two
sets of judgements. We now determine how many pairs of rank in judge Q’s set are in
their natural order with respect to each other. We start by considering all possible pairs
of ranks in which judge Q’s rank 2, the farthest to the left, is in a natural order with the
other ranks. The first pair, 2 and 4, has the correct order, so we assign a score of +1. The
second pairing, 2 with 3, is also in the correct order, so it earns a +1. The third pairing,
2 and 1, is not in the natural order, so we assign —1 to this pairing. Thus for all pairs with
rank 2 we total the scores as follows: +1 +1 —1 and obtain +1. We continue the process
by looking at all pairings with the second rank from left, which is 4. Both pairings with
4 are not in a natural order (4 with 3, and 4 with 1), therefore the sum is —2. Finally,
we consider the third rank from the left, which is 3. We only have one pairing here and
that is in the wrong order, so we allocate a score of -1.
The total of all scores assigned is +1, —2 and —1, which gives an overall score of —2.
Now we need to determine what is the maximum possible total we could have
obtained for the scores assigned to all the pairs in judge Q’s ranking. The maximum
possible would occur if both judges agreed on their rankings. This would place all judge
Q’s rankings in natural order. The maximum total then in the case of perfect agreement
between P and Q would be four things taken two at a time, or 6.
That is, r = — 0.33. This is the measure of agreement between the ranks assigned by
judge P and those assigned by judge Q. One may think of tau as a function of the
minimum number of inversions or interchanges between neighbours which is required
to transform one ranking into the other. It is a sort of coefficient of disarray.
5
The actual formula for tau is T2NIN 4
where S = the actual total score of all pairing orders as calculated above (i.e. in our
example, —2) and N = number of subjects ranked.
The effect of tied ranks is to change the denominator of the formula to:
\V/2NN-1)-Tx multiplied by /7/2NIN - 1) - T,
where
and
Subject 1 2 3 4 5 6
Rank X 1 2 3 4 5 6
Rank Y 1 2 4 5 z 6
*Answers on p. 603.
X
60
56
51
58
49
48
35)
45
47
se)
)(a)
te
tral
wal
RY
SE
ey
3 55 «
o---o-coco°o
X Y
0 1
1 1
2 0
4 1
5 1
6 0
7 0
8 0
8 1
9 0
*Answer on p. 603.
Tetrachoric correlation j
In a situation where both variables are in the form of dichotomies the tetrachoric
correlation is used. Two examples are where passing or failing is correlated with the sex
of the subjects, or where you are determining a relationship between leaving school or
staying on at school after the compulsory years and the possession of some behavioural
characteristic, such as smoking or not smoking. This statistic is sound when there are
large numbers in the sample and when the dichotomies divide into almost equal groups.
It has a large standard error.
I |2(a) |4(b) 6
Achievement Y
0 |30 |10d)
5 5 10
2 Multiply a x d to get ad, and b X c to get be. ad = 2 and be = 12. When bc is greater
than cd the correlation is positive; when bc is greater than ad the correlation is
negative.
3 Divide the larger of the two products by the smaller. This gives us 6.
4 Enter the table below with bc/ad or ad/bc—whichever is larger—and read the
corresponding value of 7, to the left of it. In our example we enter the table with a
bc/ad ratio of 6 and find the tabled value to be 0.61.
*Answer on p. 603.
rt bc or ad rt be or ad fr bc or ad re bc or ad
ad bc ad bc ad bc ad bc
1.000 .26 1.941-1.993 Ol 4.068-4.205 ./6 11543-12177,
1
.013-1 .039 PA, 1.994-2.048 sy 4.206-4.351 B Ais 12.178-12.905
|
.040-1 .066 28 2.049-2.105 3 4.352-4.503 78 12.906-13.707
1 .067-1 .093 29 2.106-2.164 54 4.504-4.662 79 13.708-14.592
1 094-1 22 e082 MOS=20225—° 2255 4.663-4.830 .80 14.593-15.574
1 .123-1 151 BA 2.226-2.288) ¢,50 4.831-5.007 81 15.575-16.670
1 -152-1 .180 soe: ~2:209-27395- D7 5.008-5.192 2 16.671-17.899
1 .181-1 211 330) 2.354-2.421 Po 5.193-5.388 83 17.900-19.287
] 212-1 .242 34 =2.422-2.491 Se: 5.389-5.595 .84 19.288-20.865
] 243-1 RIS: 35 2.492-2.563 .60 5.596-5.813 85) 20.866-22.674
1 .276-1 .308 36 = =62.564-2.638 61 5.814-6.043 .86 22.675-24.766
1 .309-1 342 sof: 2.039-2/16. 62 6.044-6.288 87 24.767-27.212
1 .343-1 ars e308 One? \7=2:797 ~ 263 6.289-6.547 .88 27.213-30.105
5 oe .378-1 413 39 = .2.798-2.881 .64 6.548-6.822 89 30.106-33.577
|
.414-1 .450 40 2.882-2.968 .65 6.823-7.115 Fo) 33.578-37.815
|
451-1 488 4] 2.969-3.059 .66 7.116-7.428 noi 37.816-43.096
1
489-1 528 42 —3.060-3.153 _.67 7.429-7.761 IV 43.097-49.846
1 529-1 568 43 3.154-3.251 68 7.762-8.117 42)8} 49.847-58.758
1 569-1 .610 445 2e3:252-3.353° 69 8.118-8.499 .94 58.759-71.035
|
.611-1 .653 45 3.354-3.460 .70 8.500-8.910 295 71.036-88.964
|
654-1 .697 46 3.461-3.571 71 8.911-9.351 .96 88.965-117.479
1 .698-1 743 AZ = 3.572-3.687 = .72 9.352-9.828 £07, 117.480-169.503
1 744-1 .790 48 3.688-3.808 .73 9.829-10.344 .98 169.504-292.864
|
791-1 .838 49 3.809-3.935 .74 10.345-10.903 Me, 292.865-923.687
|
.839-1 .888 50 3.936-4.067 .75 10.904-11.512 1.00 923.688- co
1
.889-1 .940
* If bc/ad is greater than 1, the value of mj is read directly from this table. If ad/bc is greater than 1, the table
is entered with ad/bc and the value of 7, is negative.
™OABDSD nc)
CG oO
==
(continued)
These data are then collated into a 2 X 2 table with cells labelled a, b,.c, and d. Below
is the 2 X 2 table for the above data:
xX
Op ibee
y 1 |2@ |40)
0 |5(¢) |1a)
ad
— be
i = +0.507
‘ la + b)(c + d)(a + c)(b + d)
*Answer on p. 603.
C= |= "" — 40.404
9.78 + 50
C never reaches +1.0 as its value depends on the number of cells in the chi square
table. The maximum value in a 2 X 2 table is .707, in a 3 X 3 table is .816 and in a
4 x 4 is .866 and so on.
Match the statistic with the definition. (There is one statistic with no definition provided.)
Curvilinear relationships
All the correlation techniques described above assume the relationship is linear, i.e. a
straight line best represents the relationship. However, the relationship may not be linear
but curvilinear. For example, performance is related to anxiety, but at both low and
high levels of anxiety performance must be low, since in the former low effort will result,
and in the latter the high anxiety level can disrupt performance. Performance, then, is
maximised somewhere in between, thus producing an inverted U-shaped graph. Other
examples producing non-linear plots, as shown in Figure 17.1, are size of schools against
number ofschools, i.e. fewer, very small, and very large schools; or decreasing disturbed
nights with increasing age of infant; or sudden increase in vocabulary, then tapering off
with age in the young child.
No. Disturbed
of Gace nights
schools 2am Ame
Vocab.
Size
Age
Partial correlation
This is a valuable technique when you wish to eliminate the influence of a third variable
on the correlation between two other variables. You may be studying the relationships
between self-concept, racial attitudes and authoritarianism. The results show that self-
concept is strongly related to the other two, as the less positive a person feels about
themselves, the more intolerant they become of others, the more they feel threatened by
others and the more they believe in coercive controls to protect the way they wish society
to be organised; again all in an attempt to protect their threatened self-concepts. By
partialling out self-concept, a ‘truer’ sense of the relationship between racial intolerance
and authoritarianism is obtained.
io Controlling for Z
partially explains
the relationship
between X and Y
How to proceed
1 Select Statistics and on the drop-down menu click on Correlate.
2 Next choose Partial, which brings the Partial Correlations box into view.
3 Select the variables that represent length of work experience and current salary and
move them into the variables box by using the arrow button.
4 Select the age variable and move it into the Controlling for box.
5 Click the Two tail option in the Test ofSignificance box.
6 Choose Options and select Zero-order correlations in the Statistics box.
7 Click Continue.
8 Finally select OK, which produces output like Table 17.2 below.
EXP SALNOW
EXP 1.9000 .0822
(90) ( 81)
P= . P= .460
SALNOW 0822 1.000
( 81) (a. 0)
P= .460 P=
(Coefficient/(df)/2-tailed significance)
“ “is printed if a coefficient cannot be computed
*Answers on p. 603.
Multiple correlation
If aresearcher wishes to know the relationship between one variable and two or more
other variables considered simultaneously, then multiple correlation is needed. The
general formula is:
Z
lo +hy3 — 2halaalos
2
\ 1— fx
where 7; 73 is the multiple correlation between variable 1 and the combination of
variables 2 and 3.
Suppose a university admissions officer is dissatisfied with selection procedures for
undergraduate entry which rely on the applicant's tertiary entrance exam mark, and she
wishes to find an additional measure that will enable a better prediction offuture success
to be made. The admissions officer decides on a personality measure, namely
neuroticism. Data is obtained on 100 students in respect of:
1 first year undergraduate exam results
2 tertiary entrance exam mark
3 neuroticism
The following correlations are computed:
9. = +0.50
1\3 = —0.60
a220.20
substitute these values into the formula:
Factor analysis
This is a very popular and frequently used way of reducing a plethora of variables to a
few factors; by grouping variables that are moderately or highly correlated with each
other together to form a factor. It is an empirical way of reducing an unwieldy mass of
variables into a manageable few. In studying the performance of students across the
curriculum you would end up with a huge correlation matrix in which every subject’s
marks were correlated with those from every other subject. A factor analysis would show
which groups of subjects were closely related and we may end up with three or four
major factors, which in effect group similar disciplines together because performance is
similar, such as a science group, a languages group, a social sciences group, or a practical
subjects group.
Correlating the performance of athletes across a large number of events would also
produce a large correlation matrix. Here, a factor analysis might show that some events
were closely linked in terms of performance and three major factors might emerge in a
factor analysis such as speed, endurance and strength, with each event lying within one
of these factors. The aim is to see whether the original variables can be described by a
much smaller number of factors. Factor analysis has been much used by Cattell, trying
to determine the basic personality factors, and by many researchers attempting to provide
construct validity for, among other things, intelligence tests. The mathematical basis of
factor analysis is beyond the scope of this book and modern computer statistics programs
carry out all the computations and produce the factor details without too much hassle.
Reference
Baer, J. (1991), ‘Generality of creativity across performance domains’, Creativity Research Journal, 4,
pp. 23-39.
Regression is a powerful tool for allowing the researcher to make predictions of the likely
values of the dependent variable Y from known values of X (for example, when choosing
the best applicant for a job on the basis of an aptitude or ability test, or predicting future
university academic performance from a knowledge of HSC results). Regression is a widely
used technique closely linked to Pearson’s 7, and shares many ofthe assumptions of r:
e the relationships should be linear; and
e the measurements must be in interval data.
FIGURE 18.1
a b
> ae
~
” we
7p)
oO
(ab)
w oe)
fox
=
fo)
oO
ee
”n
5 Perfect fod)
= Perfect
& positive D negative
correlation correlation
Forte |0 a_i 0
1 unit of Z
For example, if 7= +0.60, for every Z unit increase (run) in X there is a 0.60Z unit
increase in Y (see Figure 18.2).
Thus, predicted Lo = 2, (ee). For example, if we know that the correlation between
reading test scores and spelling test scores is +0.7, we can predict that a student with a
Z score of +2.0 in reading will have a Z score of (2.0)(0.7), or 1.4Z, in spelling. This
formula can of course be used in either direction, so that we can predict reading Z scores
from spelling Z scores. The correlation between X and Y is known as the standardised
regression coefficient, which is symbolised by the Greek letter beta or B. So that a
predicted Z, = Z,(B) where f is the correlation coefficient.
FIGURE 18. 3
1 unit of Z
If r = 0.60, for every Z unit increase (run) in X, there is a 0.60Z unit decrease (fall)
in Y (see Figure 18.3). So that, using the formula above, a student with Z score of —1.0
in reading will have a Z score of (—1.0)(0.7) = -0.7Z. The geometric relationship
between the two legs of the triangle determines the slope of the hypotenuse (the
regression line).
However, for practical purposes it is more convenient to use raw scores than Z scores.
EXAMPLE
Assume that research has shown that a simple test of manual dexterity is capable of
distinguishing between the good and poor students in a word processing end-of-course
test. Manual dexterity is a predictor variable, and input accuracy on the computer is the
criterion variable. So it should be possible to predict which future students are likely to
The scattergram (see Figure 18.4) depicts the data above. Notice that the scores on
the manual dexterity test form the horizontal dimension (X-axis), and the accuracy score
is on the vertical dimension (Y-axis). In regression, in order to keep the number of
formulae to the minimum:
© the horizontal dimension (X-axis) should always be used to represent the variable from
which the prediction is being made; and
e the vertical dimension (Y-axis) should always represent what 1s being predicted.
It is clear from the scattergram that accuracy is fairly closely related to scores on the
manual dexterity test. If we draw a straight line, the regression line or the line of best fit,
as best we can through the points on the scattergram, this line could be used as a basis
for making predictions about the most likely score on accuracy from the manual dexterity
aptitude test score. In order to predict the most likely accuracy score corresponding to
a score of 40 on the manual dexterity test, we can simply draw:
¢ aright angle from the score 40 on the horizontal axis (manual dexterity test score) to
the regression line; and then
¢ aright angle from the vertical axis to meet this point. In this way we can find the
accuracy score which best corresponds to a particular manual dexterity score (Figure
L335}
Estimating from this scattergram and regression line, it appears that the best
prediction from a manual dexterity score of 40 is an accuracy rate of about 14.
There is only one major problem with this procedure—the prediction depends on the
particular line drawn through the points on the scattergram. Many different lines could
be drawn. This eyeballing ofa line of best fit is not desirable. It is preferable to have a
method which is not subjective. So mathematical ways of determining the regression line
have been developed. Fortunately, the computations are generally straightforward, and
undertaken using SPSS.
FIGURE 18. 6 :
30 7
Y-axis
(accuracy)
Regression equations
The regression line involves one awkward feature. As we have seen, all values really
should be expressed in Z scores or standard deviation units. However, it is obviously
more practical to use actual scores to determine the slope of the regression line. But
because raw scores do not have the same means and standard deviations as Z scores, the
prediction procedure has to make allowances for this by converting to a slope known as
b, or the raw score regression coefficient (see below for the formula).
Fortunately, the application of two relatively simple formulae (see below) provides all
the information we need to calculate the slope and the intercept. A third formula is used
to make our predictions from the horizontal axis to the vertical axis.
EXAMPLE
Table 18.2 contains data on the relationship between anxiety scores and sociability scores
for a group of ten individuals. Remember, it is important with regression to make the
X scores the predictor variable; the Y scores are the criterion variable. N is the number
of pairs of scores, i.e. 10.
The slope ‘0’ of the regression line is given by several equivalent formulae.
When raw scores are many and/or large
D(X — My)(¥ — My)
siX= M7
; : SD
An alternative formula is: b = — (r)
Si
With small numbers, as above, a more convenient formula is :
XY — EOE
b= N
DE 2
1 8 6 64 48
Z 3 2 ie) 6
3 9 4 81 36
4 Vi 5 49 35
5 2 3 4 6
6 3 2 9 6
i ') ji 81 63
8 8 ji 64 56
2 6 5 36 30
10 7 4 49 28
Sums 62 45 446 314
iy sale ab
2 LOT as a2 eee
hee 3844 61.6
10
This tells us that the slope of the regression line is positive—it moves upwards from
bottom left to top right. Furthermore, for every unit one moves along the horizontal axis,
the regression line moves 0.568 units up the vertical axis.
We can now substitute 6 in the following formula to get the cutting point or intercept
(a) of the regression line on the vertical axis. a represents a constant factor of the value
of Y when X equals zero:
ae XXY-bUX
N
45 — (3.522)
a= ——_——— = 4.148
10
This value for a is the point on the vertical axis (sociability) cut by the regression line.
If one wishes to predict the most likely score for a particular score on the horizontal
axis, one simply substitutes the appropriate values in the regression formula.
Thus if we wished to predict sociability for a score of 5 on anxiety, given that we
know the slope (4) is 0.568 and the intercept is 4.148, we simply substitute these values
in the formula:
Y (predicted score) = a (intercept) + [b (slope) x X (known score)]
This is the best prediction—it does not mean that individuals with a score of 5 on
anxiety inevitably get a score of 6.988 (or rather 7.0) on sociability. It is just our most
intelligent estimate. If in the future we have a person with an anxiety score of 5, we can
say that they would in all probability obtain a sociability score of around 7.
The use of regression in prediction is a fraught issue, not because of the statistical
methods but because of the characteristics of the data used. Our predictions are based
on previously obtained data. For example, the data about anxiety and sociability are
based on data already obtained. For future predictions based on this data, we are
assuming the future sample is quite similar to the sample of our original data.
EXAMPLE
A private school charges a one-off registration fee of $2000, plus a fee for every semester
of $1000. With this information the total cost of placing a student in the school for any
number of semesters can be predicted.
total cost = $2000 + $1000 (no. of semesters), i.e.
Y=a+DX
b is the slope that determines how much the Y variable will change when X is
increased by one unit of $1000. a identifies the point where the line intercepts the Y axis;
in this case a = $2000 since you pay this even if the student only attends for one semester.
This is the base to which all semesters multiplied by a factor of 1000 are added.
For example, ifastudent attends for ten semesters, we obtain (see Figure 18.7):
Y = $2000 + $1000(10) = $12 000
Answers on p. 603.
A FURTHER EXAMPLE
Imagine a researcher who wants to know whether adolescents working part-time who put
in extra hours tend to get on better in the organisation than others. The researcher finds
out the average amount of time a group oftwenty new adolescent part-time employees
spend working extra hours. Several years later the researcher examines their hourly
earnings which are taken as a measure of promotion. Assume the regression equation
which is derived from an analysis of the raw data is Y = 7.50 + 0.50X. This line of best
fit is shown in Figure 18.8. The intercept a is 7.50, i.e. $7.50 per hour; the regression
coefficient (4) is 0.50, i.e. $0.50, meaning each extra hour worked per week has produced
an extra $0.50 per week to the wage packet in terms of promotion. We can therefore
calculate the likely income per hour of someone who puts in an extra seven hours per
week as follows:
Y¥ = 7.60 + (0.50)(7) = 11.00, i.e. $11.00
FIGURE 18.7
Y
Cosis of
private
education
18 000
12 000
FIGURE 18.8 2
18.00
yn
&
iS
A;
= 15.00
3
od
2.<b) 41.50
E 11.00
Oo
£ 1.00
iI = a 0)1510)
> 50
7.50 a= 7.50
7 8 15
X = extra hours worked each week
STQ88*
Two regression equations have been developed for different predictions based on
different samples of equal size. In one, r = 0.69, while in the other r = 0.58. In which
situation will the most accurate predictions be made?
Answer on p. 603.
1 Using the equation Y = —7 + 2X determine the values of Y for the following values of
X:
$3) S710!
(continued)
Answers on p. 603.
FIGURE 18.9
20
10 20 30 40 50
Because the correlation is not perfect—for any X score there is a range of scores which
may be obtained on the Y variable—the range increasing as the correlation decreases.
Thus in prediction it is important to obtain an estimate for the amount ofvariability in
Y scores of persons who obtain the same X score. The variability of the Y distribution
reflects error, and it is this error that is measured by the standard error of the estimate,
which can be used as a measure of the accuracy of prediction.
The standard error of the estimate (SE,,,) to estimate the prediction error of Y is:
SE esty r="sdy y 1 1
As r increases the prediction error decreases.
Thus it is almost certain that the person’s score will actually fall in the range of 5.48
to 6.52, although the most likely value is 6.00.
Exactly the same applies to the other aspects of regression. If the slope is 2.00 with a
standard error of 0.10, then the 95 per cent confidence interval is 2.00 plus or minus
2 x 0.10, which gives a confidence interval of 1.80 to 2.20.
The use of confidence intervals is not as common as it ought to be despite the fact that
it gives us a realistic assessment of the precision of our estimates. Precise confidence
intervals can be obtained by multiplying the standard error by the value of ¢ from Table
5.1 (chapter 5) using the dfrow, corresponding to your number ofpairs of scores minus
2 under the column for the 5 per cent significance level (i.e. if you have 10 pairs of
scores then you would multiply by 2.31).
Advice
e Drawing the scattergram (see use of SPSS in chapter 16) will helpfully illuminate the
trends in your data and strongly hint at the broad features of the regression
calculations. It will also provide a visual check on your computations.
e These regression procedures assume that the dispersion of points is linear; that is, the
scatter of points is the same around the whole length ofthe line of best fit. Where the
amount of scatter around the line varies markedly at different points, the use of
regression is questionable. If it looks like the regression line is curved or curvilinear,
do not apply these numerical methods.
How to proceed
1 Select Statistics and then Regression from the drop-down menu.
2 Choose Linear to open the Linear Regression dialogue box.
3 Click on the dependent variable and then the arrow button to place it in the
Dependent. box.
4 Select the independent variable and with the arrow button move it into /ndependent/s]
box
Select Statistics to obtain the Linear Regression: Descriptives dialogue box.
Choose Descriptives and ensure Estimates and Model fit are also selected.
ON
“J
WN Next choose Continue and finally OK to produce the output.
b Correlations
EXP AGE
Pearson EXP 1.000 73))
Correlation AGE 931 1.000
Sig. EXP .000
(1-tailed) AGE .000
N EXP 84 84
AGE 84 84
c Model summary %”
Std error
Variables Adjusted of the
Model Entered removed R R square R square estimate
1 AGE¢ ‘9 3i1 .868 .868 3.3787
d ANOVA“
Sum of Mean
Model squares df square F Sig.
1 Regression 6130.609 1 6130.609 557.029 .000»
Residual 936.094 82 11.416
Total 7066.702 83
e Coefficients“
Unstandardised Standardised 95% confidence
coefficients coefficients interval for B
Std Lower Upper
B Error Beta t Sig. bound bound
1 (Constant) -17.945 Ils -11.861 .000 -20.955 -14.935
AGE .830 .036 931 23.174 .000 459 .902
a Dependent variable: EXP
Regression scatterplot
The production and inspection of the scattergram of your two variables is warranted
when doing regression. The provision of this scattergram in a report is also of benefit.
Select the Graphs option on the menu bar.
2 From the drop-down menu click on Scatter.
3 Since the Simple option is the default, select Define.
4 Move your dependent variable with the arrow button into the Y Axis: box.
> Highlight your independent variable and transfer this to the X Axis: box.
6 Select OK and the scattergram will be displayed.
To draw the regression line on the displayed chart:
Double click on the chart to select it for editing and maximise the size of the chart
window.
In the Chart Dialogue box in the Output navigator, choose Chart and click on Options.
Select Total in the Fit Line box.
Select OK; the regression line is now displayed on the scatterplot.
50
Exp:
40
30
20
20 30 40 50 60 70
Age
As hypothesised, there was strong relationship between the two variables. The
correlation between the length of work experience and age was 0.931, p = .001.
Approximately 87 per cent of the variance of length of work experience was accounted
for by its linear relationship with age.’
Multiple regression
It may be that you have more than one predictor variable that you wish to use. If so, you
will need multiple regression. However, multiple regression employs the same rationale,
and the formula is a logical extension ofthe one for linear regression:
Y =a+ b)X1 + by Xp + bg Xz 4+... ete.
As an example, we may be interested in the sources and amount of stress that teachers
experience. This may be an effect of several variables in combination, such as class size,
the amount of administration they do, length of experience, etc. The Z-score multiple
regression prediction rule for this would be:
predicted Zst, = (B1)(Z class size) + ( B2)(Z admin hours) + (B3)(Z length of exp.)
Each of the betas is of course the respective correlation between stress and that
variable. When working with raw scores, the standard formula is:
predicted Y = a + (by)(X class size) + (02)(X admin hours) + (03)(X length of exp.)
If two variables are correlated, knowledge of the score in one can be used to predict the score in
the other. The more scatter there is in a scatter diagram the less accurate the prediction, with
prediction improving as the correlation coefficient approaches +1 and -—1. The use of the line of
best fit or regression line, which minimises the sum of the squared deviations, provides the best
possible prediction. The relationship between the two variables must be linear. The regression
line equation is Y = a + bX, where ais the intercept on the Y axis and b is the slope of the regression
line. This regression equation that defines the line of best fit will only provide a prediction or
estimate of likely Y values. The confidence interval round the prediction can be gauged using the
standard error of the estimate.
Hierarchical regression is used in research that is based on theory or some substantial previous
knowledge. Stepwise regression is useful in exploratory research where we do not know what to
expect or in applied research where we are looking for the best predictive formula without caring
about the theoretical meaning.
Analysis of variance (ANOVA) and ¢ tests are two different ways of testing for mean
differences. ANOVA has the tremendous advantage in that it can compare two or more
treatment conditions, whereas f tests are limited to two treatment conditions.
ANOVA is a hypothesis testing procedure used to determine if mean differences exist
for two or more samples or treatments. You are well aware by now that any samples
chosen from a population are likely to differ simply due to sampling error. They will have
slightly different means and standard deviations. The purpose of ANOVA is to decide
whether the differences between samples is simply due to chance (sampling error) or
whether there are systematic treatment effects that have caused scores in one group to
be different from scores in other groups.
For example, suppose a psychologist investigated learning performance under three
noise conditions: silence, background music and loud music. Three random samples of
subjects are selected. The null hypothesis states that there is no statistically significant
difference between the learning performance of the three groups. The alternative
hypothesis states that different noise conditions significantly affect learning, i.e. that at
least one group mean is significantly different from the other two. We have not provided
specific alternative hypotheses as there are a number ofdifferent possibilities as to which
group mean is different from the others, or whether all three are different from each
other, although the researcher may well have a good idea as to the outcome, from
intuition, commonsense or previous research literature, e.g. that performance is
significantly better in the silent condition than at the other levels of noise.
294
FIGURE 19.1 Are these means from different populations?
Population 1 Population 2 Population 3
(treatment 1) (treatment 2) (treatment 3)
Mar=2 M2 =4 M3 =6
Why could a t test not be used in the study illustrated in Figure 19.1?
*Answer on p. 604.
The single difference between the numerator and the denominator is variability caused
by the treatment effect. If the null hypothesis is true, then the numerator and the
denominator are the same because there is no treatment effect. The F ratio then equals 1.
If the null hypothesis is false the treatment effect has some effect and the F ratio must
be greater than 1.
The structure of the ¢ and F statistic are very similar. t compares the actual differences
between sample means with the differences expected by chance between sample means, as
measured by the standard error of the mean. In the same way, F measures differences
between samples as measured by the variability between them and differences expected by
chance as measured by the within group variability. Because the denominator in the F ratio
measures only uncontrolled and unexplained variability it is called the error term or residual.
Because the numerator contains treatment effects as well as the same unsystematic variability
of the error term, the difference in variability is due to the treatment. When the treatment
effect is negligible, the denominator or error term is measuring approximately the same
sources of variability as the numerator.
*Answers on p. 604.
(Note that these are three independent samples with n = 5 in each. The dependent variable is the number
of problems solved correctly.)
10) »_df88
To compute the final F ratio, we need an SS and a df for each of the two variances.
Thus the process of analysing variability will occur in two parts. Firstly, we will compute
SS for the total experiment and analyse it into the two components, between and within.
Secondly, we will calculate the dffor each component.
spa enh
N
To make this formula consistent with ANOVA terminology we substitute G for X.
We therefore obtain for our experiment SS...)
2
gies merc
15
Within treatments sum of squares SS within is the sum of SS within each treatment
group.
Between treatments sum of squares is the difference between the SS,,,,; and SS within?
Les
46 — 16 = 30
since both components ofvariability must add up to the total variability.
Should you wish to calculate it, the formula is:
nN 5 5 5 15
=5+80+5-60 = 30
_ Ss
af
_ FIGURE 19.2 Structure and formulas for the independent measure ANOVA
Total
SS = sum of X’
N
af=N-1
df=N-k
MS = cS
df
fe MS between treatments
MS
within treatments
= = MSpetiveen = 15 = ee
MS within Nees
The obtained value of 11.28 indicates that the numerator of the F ratio is substantially
bigger than the denominator. If you recall the conceptual structure of the F ratio you will
understand that this indicates a strong effect from the treatments and could not be expected
by chance. Degree of noise does appear to have a strong effect on learning performance. But
is it statistically significant? To determine this we shall have to see how F is interpreted.
Distribution of F ratios
We have seen that F is constructed so that the numerator and denominator are
measuring the same variance when the null hypothesis is true. F is expected to be around
1 in this situation. But how far does it have to be away before we can say there is a
significant effect from the treatment?
To answer this question we need to look at the distribution of F. Like 4 there is a
whole family of F distribution which depend on the degrees of freedom. When graphed
it follows the following typical shape (see Figure 19.3). All values are positive because
variance is always positive. The values pile up around 1 and then taper off to the right.
Part of an F table is depicted below (see Table 19.1).
0.8
0.6L
0.4
0.2
Ow ie
1.0
from Table 18 of Biometrika Tables for Statisticians, vol. 1, eds E.S. Pearson and H.O.
If you look at Table 19.1, with df= 2, 12, the tabled numbers are 3.88 (.05) and 6.93
.01). In our experiment the obtained F was 11.28. This is well beyond the .01 level and
therefore we can be confident in rejecting the null hypothesis and conclude that degree
F noise does have a statistically significant effect on learning performance.
All parts of the ANOVA can be presented in a summary table form conventionally
Source of variance SS df MS
Between treatments 30 2 i F=11.28
Within treatments 16 iz 1.33
Tota 46 14
o ;
& |
ons ;4 ml
hos
a 4¢ vv i
fs
; & :| . fi
a) a] G
| mo
ra) ¢ x
Y il ay)
VA) ¢ f =) w
i] f 6 ,
wi
OH)
w f 4, ay
i) yy
7) —
“ Why :
i (
Wy "
“4 '“ So
A“ 4)
~
7 i f ie
{ '"S hy
ye!
J
C /
4
bdod Ai
UF
ro
Oo wy
9) t iJ ~
t=] wi t 4 /
A
J o
/ f 4
| +
W
lam) t
7)
Wh
i / et
y
, ;
:
"%y Cy
f A
gy y
The Scheffe test
The Scheffe test is one of a number of post hoc tests that are used to determine where the
significant difference(s) lie after the null hypothesis has been rejected in ANOVA. Other
commonly used post hoc tests include Tukey’s HSD test and Bonferroni. In rejecting the
null hypothesis we are simply saying that there is at least one significant mean difference
and there may be more. But we do not know between which means the difference(s) lie.
The Scheffe test uses an F ratio to test for a significant difference between any two
treatment conditions. The numerator is the MS between treatments that is calculated
using only the two treatments you wish to compare The denominator is the same MS
within treatments that was used for the overall ANOVA. The safety factor of the Scheffe
comes from the following:
¢ Although only two treatments are being compared, the Scheffe uses the value of k
from the original experiment to compute df between treatments. Thus, df for the
numerator of the F ratio is k — 1; and
¢ The critical value of the Scheffe F ratio is the same as was used to evaluate the F ratio
from the overall ANOVA.
Thus the Scheffe requires that every post-test satisfies the criteria used for the overall
ANOVA. The procedure is to start testing using the biggest mean difference and
continue testing until a non-significant difference is produced.
For the above example we will test treatment 2 (background music) against treatment
1 (silence).
Vie oeCree 0p esi8
Coe ae ee co 0s
N > 5 15
Oe aween =k-3=2
Chetnesn 2
With df2 and 12 and p < .05, the critical value in the F table is 3.88; therefore this
difference between silence and background music is significant. The difference between
background music and noise would also be significant, but the difference between silence
and noise is not likely to be significant.
With two treatment conditions, is a post hoc test necessary when the null hypothesis is
rejected?
n= 10 n= 10 n= 10 N = 30
T=10 1 = 20 T=30 G=60
SS = 27 SS = 16 SS= 23 sum of X2 = 206
*Answers on p. 604.
F
_ treatment effect + experimental error
experimental error
To obtain the error term we have to remove individual differences from the equation
we used in the independent measures approach. This is done by computing the variance
due to individual differences and subtracting it from the variance between treatments and
also from the variance within treatments.
EXAMPLE
Here is an example worked through.
A teacher wishes to test the effectiveness of a behaviour modification technique in
controlling the classroom behaviour of unruly children. Every time a child disrupts the
class, they lose a play period. The number of outbursts are monitored at various periods
to assess the effectiveness. The null hypothesis is that the behaviour modification
program will have no significant effect. The alternative hypothesis states that there will
be a significant difference without specifying between which time periods. We will use
the 5 per cent level of significance.
The data are as follows:
Before One week One month Six months
Subject treatment after after after p
A 8 2 1 1 12
B 4 1 1 0 6
e 6 Z 0 2 10
D 8 3 4 1 16
is == ae See — 77
SS between = —n —- —
N
= 4+t 4 4 A 16
SS \ithin = 101 — 77 = 24. This needs to be partitioned into between subjects SS and error
SS E (So : eee,Soeae lo 2s i
between subjects
k oN 4 4 4 4 16
: 3
MSerror = Senor = os = ee
Che g
Moe, 1 22
Source of variance SS df MS F
Between treatments 77 3 25.07, 21.04
Within treatments 24 12
Between subjects 13 3
Error Hi. 9 [igZ
Total 101 15
If you consult the F table on p. 308, F(3, 9) tabled value is 3.86. Our value of 21.04
is therefore significant at p < .05, and the null hypothesis can be rejected. However, we
do not know between which treatments the significant differences lie and we would
have to apply the Scheffe test as we did earlier. The only change in the Scheffe test is to
substitute MS.,,,, in place of MS within in the formula and use df.,,,rror in place of df...
when locating the critical value in the table.
What sources contribute to within treatments and between treatments variability for a
repeated measures design?
df numerator
2 3 4 5 6 7 8 10 12 24
5000 5403 5625 5764 5859 5928 5981 6056 6106 6235
99.0 9x2 DY 99°35 99.3 99.4 99.4 99.4 99.4 his:
30.8 2S 28.7 28.2 27.9 AYES 27.5 Die 27.1 26.6
18.0 16.7 16.0 15.5 li5a2 15.0 14.8 14.5 14.4 1309
(R227 12.06 ties? 10.97 10.67 10.46 10.29 10.05 9.89 9.47
10.92 9.78 Dols 8.75 8.47 8.26 8.10 7.87 Lf 7.3
Fo15) 8.45 7.85 7.46 7.19 6.99 6.84 6.62 6.47 6.07
8.65 Leo) 7.01 6.63 6.37 6.18 6.03 5.81 5.67 5.28
8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.26 5.it 4.73
7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.85 4.71 4.33
To2N 6.22 5.67 DrsZ 5.07 4.89 4.74 4.54 4.40 4.02
6.93 5.95 5.4] 5.06 4.82 4.64 4.50 4.30 4.16 3.78
6.70 5.74 521 4.86 4.62 4.44 4.30 4.10 3.96 Broo)
6.51 5.56 5.04 4.70 4.46 4.28 4.14 3.94 3.80 3.43
6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.80 3.67 329
6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.69 3200 3.18
6.11 5.18 4.67 4.34 4.10 3.93 33/9 Broo) 3.46 3.08
6.01 5.09 4.58 4.25 4.01 3.84 Seva 3-ol 3.37 3.00
3.5 5.01 4.50 4.17 3.94 Bel/. 3.63 3.43 3.30 D9
5.85 4.94 4.43 4.10 3.87 3.70 3.56 Seo, 3.23 2.86
df
denominator 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.31 Bully, 2.80
5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.26 Sule LIS
5.66 4.76 4.26 3.94 Bev 3.54 3.41 3.21 3.07 2.70
5.61 4.72 4.22 3.90 3.67 3.50 3.36 Sally, 3.03 2.66
5.53 4.68 4.18 3.86 3.63 3.46 3.32 Sulis 2.99) 2.62
5.56 4.64 4.14 3.82 3.59 3.42 329 3.09 2.96 2.58
5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.06 DDS} 2595
5.45 4.57 4.07 3°75 3.53 3.36 3525) 3.03 2.90 DOU.
5.42 4.54 4.04 3273 3.50 3.33 3.20 3.00 2.87 2.49
SRS 4.51 4.02 3.70 3.47 3.30 Ballz 2.98 2.84 2.47
5.34 4.46 3.97 3.65 3.43 3.26 Sus 2.93 2.80 2.42
S29 4.42 3:93 3.61 3.39 B27 3.09 2.90 2.76 2.38
By) 4.38 3.89 3.58 B35 3.18 3.05 2.86 26/2 2.35
5.21 4.34 3.86 3.54 3.32 Balls) 3.02 2.83 2.69 DSw.
5.18 4.31 3.83 3.51 3.29 Boz 2.99 2.80 2.66 2.29
4.98 4.13 3.65 3.34 Selly 2.95 2.82 2.63 2.50 2 AZ
4.79 3.95 3.48 Su7, 2.96 2.79 2.66 2.47 2.34 1.95
4.61 3.78 3292 3.02 2.80 2.64 BSH 2.32 2.18 M79
1 2 3 4 5 6 7 8 10 12 24 oo
] Ted 1995) 2is7 22416 2302 23410) 236.8: 238.9) 241.9) 2439) 240.0 2545
2 18.5 19.0 ley Ee 193 1933 19.4 19.4 19.4 19.4 1955 EES}
3 10.13 O55 Spey S172 9.01 8.94 8.89 8.85 8.79 8.74 8.64 8.53
4 ies 6.94 6:59" 6.39 6.26 6.16 6.09 6.04 5.96 5:91 Dd 5.63
S) 6.61 Sue!) 5.41 5.19 5.05 4.95 4.88 4.82 4.74 4.68 4.53 4.36
6 D099) 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.06 4.00 3.84 307;
7 Syey) 4.74 4.35 4.12 3.97 3.87 yf) 35/3 3.64 3.57 3.41 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3,35) 3.28 3512 2.93
9 Sal 4.26 3 S663: 3.48 3.37 3729 5:25 3.14 3.07 2.90 2eF|
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 2.98 Zoi 2.74 2.54
11 4.84 3.98 5-59 S30 320 3.09 3.01 ZS 2.85 (ode) 2.61 2.40
12: 4.75 3.89 3.49 3.26 oy |] 3.00 Zoi 2.85 PLSfs) 2.69 2:51 2250)
13 4.67 3.8] 3.41 3.18 3.03 2, 2.83 2h 2.67 2.60 2.42 2221
14 4.60 3.74 3234) 23. 2.96 2.85 2.76 2.70 2.60 7ap 2:35 2.13
iS 4.54 3.68 3:20] 55.06 2.90 2.70 2.7) 2.64 2.54 2.48 2.29 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 Zieye) 2.49 2.42 2.24 2.01
ly 4.45 Sy) 32058 2296 2.81 2.70 2.61 ZOD 2.45 2.38 219 1.96
18 4.4] 395 3 allOmme2.05 Deshi 2.66 2.58 2.51 2.4] 2.34 25: 1.92
19 4.38 Ske S135 92.00 2.74 2.63 2.54 2.48 2.38 2.31 214 1.88
20
denominator
df 4.35 3.49 SOME 2287, 2.71 2.60 2.51 2.45 2:35 2.28 2.08 1.84
21 4.32 3.47 3.07 2.84 2.68 2.07 2.49 2.42 2.32 2.25 2.05 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.30 228 2.03 1.78
23 4.28 3.42 3.03 2.80 2.64 7ssye) 2.44 Page 2027 2.20 2.00 1.76
24 4.26 3.40 5: ONG 2578 2.62 2.51 2.42 2.36 225) 2.18 1.98 1:73)
25 4.24 3:39 PASTS) PSK) 2.60 2.49 2.40 2.34 2.24 2.16 1.96 EAL
26 4.23 3.37 2.98 2.74 PEE) 2.47 BNE) Mageya 2a 2S) 19S 1.69
27 4.21 3335 PasNi\ PaTfe) Ze, 2.46 237 /as)} 2.20 2S 1:93 1:67;
28 4.20 3.34 2955 27 2.56 2.45 2.36 229 2lie) 2 ile 1.91 1.65
29 4.18 3333 PSs PTAD) 2555) 2.43 23) 2.28 2.18 2.10 1.90 1.64
30 4.17 3.32 22922269 2.53 2.42 233 PPI 2.16 2.09 1.89 1.62
32 4.15 3.29 DIO 2167, 251 2.40 23) 2.24 2.14 2.07 1.86 1.59
34 4.13 3.28 2.88 2.65 2.49 2.38 fievhs) 2.23 Pas |72 2.05 1.84 (SVs
36 4.11 3.26 2.07 2:63 2.48 2.36 2.28 221 Nil 2.03 1.82 1.55
38 4.10 3.24 2209) a 2202 2.46 2.35 2.26 RAS) 2.09 2.02 1.81 (ese
40 4.08 3°23 2.84 2.61 2.45 2.34 2520 2.18 2.08 2.00 VeA2 leo
60 4.00 S15 22/6 e253 2.37 P25) PINE 2.10 a2) 1.92 1.70 1e39
120 Boe 3.07 2.68 2.45 Jeff) 2.18 2.09 2.02 19i 1.83 1.61 125
co 3.84 3.00 2.60) 52.37, 2.21 2.10 2.01 1.94 1.83 hays Tea2 1.00
Source: Abridged from Table 18 of Biometrika Tables for Statisticians, vol. 1, eds E.S. Pearson and H.O.
Hartley.
FIGURE 19.4(a) |
Girls
vp)
®
ro)
Oo
WY
Boys
Methods e
FIGURE 19.4(b)
@ Girls
fe)
oO
”n
Boys
|
A B
Methods
FIGURE 19.4(c) |
A ;
Scores a
ee a
rs
A B
Methods
FIGURE 19.4(d) ;
A
Boys
:
A
Girls
A B
Methods
Since the between treatments variability is split between the two factors and the
interaction (see Figure 19.5) the two factor or two-way ANOVA has three distinct
hypotheses:
1 The main effect of factor A. The null hypothesis states that there are no statistically
significant mean differences between levels of factor A.
2 The main effect for factor B. There is a similar null hypothesis for factor B.
Total variability
F
_ treatment effect + individual differences + experimental error
individual differences + experimental error
The general format for a two-factor experiment is depicted below. The example has
two levels of factor A and three levels of factor B. There could of course be any number
of levels for each factor. Each cell corresponds with a particular treatment condition. It
is possible to use either an independent measures or a repeated measures design in a two
factor experiment. The example to be covered is an independent design. Calculations can
become complicated and most ANOVAs of this sort will be performed on computer. It
is unlikely you would ever attempt one manually.
Factor B
B, B, B;
(classroom) (computer) (discussion)
1 7 3
6 7 1
A, (female) 1 ib 1
A; = 60 1 zt 6
1 6 =
A,B, = 10 A,B, = 35 A,B; =15
SS ='20 $3 = 26 SS=18
Factor A (gender)
0 0 0
3 0 2
A> (male) jp 0 0
A> = 30 5 5 0
5 0 3
A,B, = 20 A,B, =5 A>B3 = 5
SS= 28 S520 SS=8
6, = 30 B, = 40 B; = 20
N = 30 G=90 X2
= 520
2
'Se= sum ore = = 520 —- 270 = 250
Ss S=ESUEOn
(AB)? G?
between
n N
_ UG , Son 1a 20 oe
+ = =
5 5 5 5 5 a 3 30 12
Now we need to partition SSpeuveen Of 130 into components relating to factors A and
B, and interaction effects.
Total variability
6g.
_
©N
Cima
Between Treatments
\ Within Treatments
esi2 Ape. @N SS = SS each cell
df=pq-1 dia Nod
Factor A = B Interaction
Variability Variability Variability
ee weal Ee ice SS found by
ae gn oN 2S N subtraction
This MS computed for all factors This MS used as error term for
and interaction all F ratios
Sums of squares
eG 60) 30" 002
Soe=sthesinioie == ad ee en
an N Is (8G) 30
mole pe mee 0: mee 20° 907
Sope= INesuMNnvor E =
pn N 10 10 10 30
Degrees of freedom
Ajo = 30-1=29
Of cece =3x2-l= 5
Ahi pt le=al
The final step in the analysis is to compute the mean square values and the F ratios.
The MS is the sample variance and is SS divided by df. In our experiment we have:
MS
Moira =. so
A 1
Vehe = = 310
dk 2
Nictora xB ave! 2° amg
Ae ae
Clie 24
F ratios
olf= 1, 24
therefore F (1, 24) = 6.00
Cli= 2, 2a!
therefore F(2, 24) = 2.00
Chip= 2 We
therefore F(2, 24) = 8.00
Source of variance SS df MS
6
2
{e)
D
ae
igo)
®o
=
4
A; Female
3
<x
8
(O}
P.
wc A> Male
B, ay Bs
Factor B Teaching methods
1 How does the F ratio for the repeated measures ANOVA differ from that of the one
for the independent measures?
2 A study is conducted to ascertain the level of anxiety of the learner and the
meaningfulness of the material on the speed of learning lists of words.
a___ Between which two variables might the interaction effect be shown?
b If a total of 60 subjects are equally distributed between the groups what F values
are necessary for significance at the 5 per cent level?
3. In a two-way ANOVA, SSpeween IS divided into three parts. Which of the following
symbolises this?
a SSpetween = SS xBt SSw 5 SS,
D = 'SSiagucedti= 9544 S55 +'S5y;
C SShetween = 55a xB + SS + SSg
O SSietvean F594 SS wh 53;
*Answers on p. 604.
Method of teaching
Directed learning Discovery learning
By B.
10 8
11 9
English 12 11
A, 14 13
16 1S
Home
language
12 15
Le 16
Thai 16 18
A> 17 19
18 23
Factor B
5; B> B;
n=10 n=10 n=10
AB=0 AB = 10 AB
= 20
A; X=0 pm K=2
SS = 30 SS = 40 SS='50
*Answers on p. 605.
For two-factor or two multivariate ANOVA, the effect size is associated with Wilks’
lambda (A), and this is usually quoted in research findings.
In general, partial eta is interpreted as the proportion of variance of the dependent
variable that is related to a particular main or interaction source excluding the other
main and interaction sources.
Power
Table 19.3 shows the approximate power for the .05 significance level for small,
medium, and large effect sizes; sample size of 10, 20, 30, 40, 50, and 100 per group; and
three, four and five groups. These are the most common values of the various influences
on power.
Mote detailed tables are provided in Cohen (1988, pp. 289-354).
TABLE 19.3 Approximate power for studies using the analysis of variance testing
hypotheses at the .05 significance level
Effect size
Small Medium Large
Participants per group (n) (.10) (.25) (.40)
For example, suppose you are planning a study involving four groups and you expect
a small effect size (and will use the .05 significance level). For 80 per cent power you
would need 274 participants in each group, a total of 1096 in all. However, suppose you
could adjust the research plan so that it was now reasonable to predict a large effect size
(perhaps by using more accurate measures and a more powerful experimental
manipulation). Now you would need only 18 in each of the four groups, for a total of 72.
How to proceed
1 Select Statistics and then Compare Means from the drop-down menu.
2 Select One-Way ANOVA to open the One-Way ANOVA dialogue box.
saaudissagd
uvaul JOJ JVAIa}UI
BIUIPIJUO %S6
V
“va-auo VAQNYndjno fo
adwvxq C61
AIAV.L :
322
(1)
a
if
Buljjasuno> uray [easazus
Burjjasu
Ipno> d2Ua4dj
dnosb dnosb (f-1) 43M0} 4addp
Adyn leNpIAIpuy
OSH dnoi5 punog Punog
Huljjasunos HuljjasuL- 0009
no>
JO4JUO> +000€'Z- L6S=
ES ISLZZ
dnoi5 IENpIAIpu| POS
PSL C— 6086
Buljjasuno> Huljjasu|no> 0009°
JO1JU0> +0002°S- CH StL LSL6°S
|O1.U0>D JENPIAIpu} LO‘O
LSL- 678E°L—
Huljjasuno> xO000€°Z
dnoiy 6V86'7 EL LSE
aYJayrs IENPIAIpu| Buljjasuno> +0002°S
dnoin 6r8E'L LSLO'OL
Buljjasunos HuljjasuL- 0009"
no>
JO1]U0> +000€Z- 9Z0L°9- 92067
dnoiy JENPIAIpu| I= 9ZO8iE C- VC6L
Burjjasunos Buijjasunoo 0009'|
9L06°C> 9Z0L°9
JO1JUOD *000Z'S-
JO1]U0>D IENPIAIpU| Ol- 920 PC6L
Buljjasuno> +000€°Z
dnoin vC6L°7 LL 9208
IUOJIaJUOg JENPIAIpu| Buljjasunod 0002'S
dnoi5 vC6L'L 01 9Z0Z
(ANOVA)
|O1.U0>D *000Z°S— C— CCV8 C€Cv0'9
JO1]U0> JENPIAIpU| OL- L CCV [- S252
Huljjasunos xO000€°Z
dnoiy 8ZS8°7 LL CCKL
Jeuu
> ng JENPIAIpU] Buljjasunod +0002°S
dnoi5 8ZS7'L OL cCVL
Huljjasuno> Buljjasuno> 0009°L
JO4JUO> x000€°Z O8Lr'9- O8L7'E
|
dnoin JENPIAIpu| 8908°L1-
Buljjasuno> ¢— £°6L
Huljjasu|no> 0009°
JO1]U0> +0002°S €- OSC O8Lr'9
|
JO13U0> [ENPIAIpu| 01- 6$¢6 Lbly—
Huljjasuno> +000€°Z
dnoiy SY Cl6L LL 8908
Huljjasuno> +0002°S
323
, ay] Ueda B2UaJ
SI aYIP Lply 01
JUeDY
Je WIUBI
ay}S $0" ‘J9A2] 6SZ6
TABLE 19.6 Example ofrepeated measures ANOVA output
a Descriptive statistics
Std
Mean Deviation N
b Multivariate tests@
Hypothesis Eta Noncent. Observed
TIME Pillai’s
trace 593 7 29867" 2.000 4.000 .004 1937, 59.724 994
Wilks’
lambda .063 29.862°¢ 2.000 4.000 .004 1937, 59.724 994
Hoteling’s
trace 14.931 29.862¢ 2.000 4.000 .004 3h 59.724 994
Roy’s
largest
root 14.931 29.862¢ 2.000 4.000 .004 Q37/ 59.724 994
Pair1 endweight
—weight at
halfway —9.8333 5.0761 2.0723 | -15.1604| -4.5063 | -4.745 5 .005
point
Pair 2. endweight
start -19.1667 5.4559 2.2274 | —24.8923 |-13.4411]} -8.605 | 5 .000
weight
Pair 3. weightat
halfway
point— —9.3333 4.5898 1.8738 | -14.1501 | -4.5166 | -4.981 5 .004
start weight
How to proceed
1 Select Statistics, then click on General Linear Model.
2 Choose GLM-General Factorial.
Learning Std
Gender method Mean deviation N
Score on Male Computerised 20.8333 3.1885 6
course test class teaching 13.6667 1.9664 6
Total 17.2500 4.5151 T2
Femaie Computerised 1378333 1.7224 6
class teaching 18.5000 1.8708 6
Total 16.1667 2.9797 14
Total Computerised 173333 4.3970 2
class teaching 16.0833 3.1176 12
Total 16.7083 3.7819 24
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a Design: Intercept + GENDER + METHOD + GENDER*METHOD
Total 328.953 23
®
. ]
2
S ks =
oO
a
(e)
2
le
n
=
Oo
Sil Gender
S
nae: 2% So
male
a
| emale
(2 ————
Computerised Class teaching
Learning method
significant differences lie. However, there is a problem associated with multiple pairwise
testing, or family-wise testing as it is sometimes called.
al 2 |~3(N+1)
N(N+1)\ N
where
n = no of cases in each condition
R = sum of ranks in each condition
N = total number of cases
H is distributed as chi square with df= number of conditions — 1. If the observed value
of H is equal to, or larger than, the tabled value of chi square for the previously set level
of significance and correct df, then the null hypothesis may be rejected. With less than
five cases in each group the chi square distribution is not as accurate. There is a correction
for tied scores but the effect of this correction is negligible and is usually not applied.
Here is an example based on achievement motivation scores of three groups of educators.
Teaching-oriented teachers Administration-oriented teachers Administrators
Score Rank Score Rank Score Rank
96 4 82 Z 115 i
128 9 124 8 149 14
83 3 132 10 166 15
61 1 135 12 147 13
101 5 109 6 134 11
Rp 22 R, = 38 R; = 60
12 DD 8s GOs
ee
15(15+1)| 5 5
Analyse these data using the Kruskal-Wallis test. Do you reject or retain the null hypothesis
at p < 0.05?
Condition 1 Condition 2 Condition 3
Scores Scores Scores
10 15 21
16 14 28
8 12 13
1.9 11 16
rg Ew) +9
*Answer on p. 605.
eta2 - chisquare
N41 where N is the total number of cases
We must first rank the scores in each row. The table then becomes:
Group A 4 yi 1 3
Group B 3 2 | i:
Group C 4 1 2 3
R i) 5 4 10
If the null hypothesis were true, there would be no significant difference between the
sum of ranks for each condition. The distribution of ranks in each column would be on
a chance basis. We would expect the ranks 1, 2, 3 and 4 to occur in all columns in about
equal frequency, with rank totals about equal. If ranks depend on conditions then totals
would vary from column to column.
The calculation for above data is:
s SP
12
ae ad + 10 = CS
(3)(4)(4 + 1)
The tabled value with df= 3 and p set at 0.05 is 7.82. We must therefore retain the
null hypothesis.
Analyse the following data (raw scores) using Friedman’s test and make a statement
about the null hypothesis.
Conditions
i Il iI IV
Group A 20 13 6
Group B 18 15 8
Group C At lve 11
Group D Ai 19 3 14
Group E 16 10 12 4
*Answers on p. 605.
Match the test on the left with the definitions on the right. (There is one test with no
definition.)
a Mann-Whitney U test 1 Used following a significant F ratio.
b Scheffe test 2 Used to determine whether a set of ranks on a
single factor differ significantly.
c Wilcoxon 3 Used to determine whether mean scores on two
or more factors differ significantly.
d Chi square 4 Anon-parametric test used to determine whether
two uncorrelated means differ significantly.
Steps
1 Describe the statistical test(s), the variables, and the purpose of the statistical test(s).
For example: ‘A one-way analysis of variance was conducted to evaluate the
relationship between multi-vitamin treatment and the change in the number of days
absent over a year.’
¢ Describe the factor or factors. Ifafactor is a within-subjects factor, be sure to label
it as such. Otherwise the reader may assume that it is a between-subjects factor. If
a multifactorial design has one or more within-subjects factors, describe each factor
as a between-subjects or a within-subjects factor.
¢ Indicate the number oflevels for each factor. It may be also be informative to the
reader to have a description of each level if the levels are different treatments.
However, it is not necessary to report the number oflevels and what the levels are
for factors with obvious levels such as gender.
¢ Describe what the dependent variable(s) are.
Report the results of the overall test(s).
¢ Describe any decisions about which test was chosen based on assumptions.
¢ Report the F value and significance level (e.g. F (2, 27) = 4.84, V = .016). For
p-values of .000, quote p < .001. For multifactor designs, report the statistics for
each of the main and interaction effects. Tell the reader whether the test(s) are
significant or not.
¢ Report statistics that allow the reader to make a judgment about the magnitude of
the effect for each overall test (e.g. eta? = .45).
Report the descriptive statistics usually by reference to a table or figure that presents
the means and standard deviations.
Describe and summarise the general conclusions of the analysis. An example: “The
results of the one-way ANOVA supported the hypothesis that different types of stress
management treatment had a differential effect on the reduction of absences for
individual teachers’.
Report the results of the follow-up tests:
* Describe the procedures used to conduct the follow-up tests.
* Describe the method used to control for Type I error across the multiple tests.
* Summarise the results by presenting the results of the significance tests among
pairwise comparisons with a table of means and standard deviations.
* Describe and summarise the general conclusions of the follow-up analyses. Make
sure to include in your description the directionality of the test.
Report the distributions of the dependent variable for levels of the factor(s) in a graph
if space is available.
Reference
Cohen, J. (1988), Statistical Power Analysis for the Behavioural Sciences, Laurence Erlbaum, New York.
When data are obtained from a data gathering instrument or technique, we need to
know what faith we can put in the data as truly indicating the person’s performance or
behaviour. With all data we must ask:
Reliability
Synonyms for reliability are dependability, stability, consistency, predictability, accuracy.
A reliable person, for instance, is one whose behaviour is consistent, stable, dependable,
and predictable—what he or she will do tomorrow and next week will be consistent
with what he or she does today, and what he or she did last week. An unreliable person,
on the other hand, is one whose behaviour is much more variable, often unpredictably
variable. Sometimes he or she does this, sometimes that.
Psychological and social science measurements are similar to humans; they are more
or less variable from occasion to occasion. They are stable and relatively predictable, or
they are unstable and relatively unpredictable; they are consistent or not consistent.
If they are reliable, we can depend on them. If they are unreliable, we cannot depend
on them.
It is possible to approach the definition of reliability in three ways:
1 One approach asks the question, ‘Is the score which I have just obtained for student
A the same score I would obtain if Itested him or her tomorrow and the next day and
the next day?’.
This concept of reliability considers whether the obtained score is a stable indication of
the student’s performance on this particular test. This question implies a definition of
reliability in terms of stability, dependability and predictability. It is the definition most
often given in discussions of reliability.
2 Asecond approach asks, ‘Is this test score which I have just obtained on student A an
accurate indication ofhis or her ‘true’ ability?’.
This question really asks whether measurements are accurate. Compared to the
first definition, it is further removed from common sense and intuition, but it is also
more fundamental. These two approaches or definitions can be summarised in the
words stability and accuracy.
3 The third approach, which helps in the understanding of the theory of reliability,
also implies the previous two approaches. This approach asks how much error there
is in the measuring instrument.
The two sources of error are:
1 experimental variability induced by real differences between individuals in the ability
to perform on the test;
2. error variability which is a combination or error from two other sources:
a random fluctuation. Subtle variations in individual performance from day to day;
and
b systematic or constant error. This is the result of one or more confounding variables
which always push scores up or always push scores down, e.g. practice effect.
The amount of error in a score is a measure of the unreliability of the score. The less
error, the more reliable the score, since the score then represents more closely the ‘true’
performance level of the subject.
Errors of measurement are assumed to all be random error. They are the sum or
product of anumber of causes: the ordinary random or chance elements present in all
measures due to unknown causes, temporary fatigue, fortuitous conditions at a particular
time that temporarily affect the object measured or the measuring instrument,
fluctuations of memory or mood, and other factors that are temporary and shifting.
Reliability can be defined as the relative absence of errors of measurement in a measuring
instrument. Error and reliability are opposite sides of the same coin. The more error, the
less stable, and less accurate the measurement. Our three approaches above recognise that
reliability is the accuracy, stability and relative lack of error in a measuring instrument.
A homely example of reliability will further clarify the concept for those still in the
fog. Suppose we wished to compare the accuracy of two crossbows.
One is an antique, well used and a bit sloppy in its action. The other is a modern
weapon made by an expert. Both pieces are solidly fixed in granite bases and aimed and
zeroed in by an expert toxophilite. Equal numbers of darts are fired with each. The
We can define the reliability of any set of measurements as the proportion of their
variance which is true variance; in other words, the ratio of true variance to observed
variance. When the true variance is equal to the observed variance, i.e. when there is no
error variance, this ratio has a value of +1.0. This is the ‘perfect reliability’ value:
2
© ‘true
2
= 1 I,+ = +1.0 (7, symbolises reliability)
O obs
When there is no true variance present, i.e. when the observed variance is entirely
error, the ratio has a value of zero. This is the ‘nil reliability’ value:
gat)
That is,
or conversely,
Hence, reliability equals 1 minus the error variance divided by the obtained variance.
2
O error
al Definition of reliability
O obs
Explain the meaning of Xop; = Xtrue + Xerror. How does this expression help us to
understand the concept of reliability?
1 Why does a two- to three-month lapse of time between test and retest seem the
optimum?
2 What would you see as the problems stemming from:
a___ too short a time lapse?
b too long a time lapse?
If the period is too short, the subjects may remember the answers they gave on the first
occasion, and so spuriously increase the consistency of scores. On the other hand,
boredom, decreased motivation, etc. may influence the second testing, thus reducing the
congruence of results between the two occasions of testing. If the period is too long,
maturational factors—for example, learning experience, age—will influence changes of
score on the second occasion and cause an underestimation ofthe reliability. It is difficult
to state any general rule concerning an appropriate period of intervening time for all tests.
If the test is designed to measure a relatively stable trait or characteristic and the
individual is not subjected during the intervening time to experiences which may, in
some way, affect the particular characteristic involved, then the intervening period can
be relatively long. However, when measuring a trait which is influenced by the
individual's intervening experiences, the time should be shorter, but not short enough
to allow memory or practice effect to inflate artificially the relationship between
performance on the two administrations. Thus an appropriate period of intervening
time can only be decided on in the next context of the situation.
For each of the following, indicate whether they would contribute to true score or error
score variance when a test-retest method is employed.
1 Noise in the next room on the first test.
Since the rationale of the test-retest method implies that the same level of cognitive,
intellectual, motivational and personality variables are demonstrated on each occasion,
so that any changes are due to the instability of the test itself, changes which occur
within the subjects during the interval between test and retest are the largest source of
error in the test-retest reliability estimate. However, the subject must always be a
‘different’ person for the sole fact of having taken the test on the first occasion.
If the correlation between the scores from two occasions of testing was +1.0, what does
this imply for reliability and error?
If you recall your studies of the correlation coefficient, you should realise that r= +1.0
would imply perfect reliability with no measurement error evident.
-$TQ108
If reliability is less than +1.0 what does this imply?
As the test-retest correlation declines from unity, we are measuring the effects of
increasing random error.
Split-half method
which influence the test-retest and alternate forms me Phaiks of estime ares reliability
could be eliminated if a reliability coefficient could be determined fr is (0.
¢ 4 Cr)i) |ionlal]4 i) = f
single administration of a test. Two scores can be obtained simply by splitting the test
into halves. The scores obtained on one half of the test can be endings with scores on
the other half of the test.
Can you think why splitting a test into a first half and a second half would not be a good
way of splitting the test for a split-half reliability estimate?
different types of items with different difficulty levels may occur in each half
some subjects may run out of time and not finish the second half; and
Nw
Oo boredom may set in on the second half.
Hence, a commonly accepted way of splitting a test into two halves is to divide it into
odd-numbered items and even-numbered items. If the test is constructed so that adjacent
items tend to be similar in difficulty level, discriminatory power and content, this is not
an unreasonable procedure. However, one might ask what difference in the computed
coefficient there would have been ifthe test had been split in some other way. A test of
twenty items, for example, may be divided into two equal parts in exactly 184 756
different ways. Unless the twenty items are exactly equivalent there would probably be
some variation in the reliability estimates obtained by splitting the test in so many
different ways. In spite of this difficulty, the split-half reliability coefficient based on the
odd—even split still provides the test constructor with useful information.
A further problem exists. When a test is divided into two parts and the scores are
correlated, the result is a correlation between scores on tests that have only one-half as
many items as were originally administered. The Spearman-Brown formula is used to
estimate the reliability of the test in its original length. The general formula for the
reliability of atest n times as long as the given test is:
= 21050
.
_1.0_467
1=(2.= (0150) 1.5
The value 2 has been substituted for n because it was necessary to determine the
reliability of the test twice as long as the two 10-item tests used to obtain the original
reliability coefficient. The formula indicates that the split-half reliability of the 20-item
test is 0.67.
It is important to remember that any time the split-half method ofreliability estimate
is utilised, the Spearman-Brown formula must be applied to the correlation coefficient
in order to obtain a reliability estimate which is appropriate for the total test length.
The split-half reliability of a 50-item test is 0.79. What is the reliability of the whole test?
Review of methods
Other things being equal, a researcher will choose the test with the highest quoted
reliability. A test with a reliability of +0.90 is surely better than one with a reliability of
+0.80!
However, the quoted values need interpretation, in terms of the method chosen to
estimate reliability data, in terms of the situation in which the particular data were
gained, and in terms of the sources of error they control (or don’t control).
An interesting way of differentiating between these kinds of reliabilities is to note
some ofthe causes of error variance in observed scores.
Complete Table 20.1, indicating where the appropriate error variance will be present.
The ‘Parallel form’ column has been completed for you. There is a completed Table 20.1
on p. 348.
Clearly, we cannot always be certain that no error variance will be present. For
example, if an immediate test-retest method is used, it is possible that error variance
associated with the children will still be present, since they may be a little more tired on
the retest. But it is unlikely to be as great as error variance associated with the children
if a delayed test—retest method is used, with a long time interval between retests. And the
longer the interval, the lower will be the reliability figures obtained. So the important
point is not the exact specification of the source of error variance. The point is that a
reliability quoted by a test constructor must be interpreted according to the method
used to calculate it.
TABLE 20.1
The three basic methods for computing a reliability coefficient that have been
presented often lead to differing estimates of the reliability of the test scores. The various
conditions which can influence the outcome of the computation of a reliability
coefficient further emphasise the notion that no assessment or technique has a single
reliability coefficient.
Length of test
One of the major factors which will affect the reliability of any test is the length of that
test. On an intuitive basis, one can see that as the number of items in any particular test
is increased, the chance factors which might enter into a score are greatly reduced or
balanced out. For example, performance on a three-item multiple-choice test could be
greatly influenced by various chance factors which might influence student responses.
Objectivity of assessment
Another major factor affecting reliability of measurement is subjectivity of judgement.
We tend to get low reliability coefficients for rating scales, essay examinations,
interviewing, projective tests, etc. Objective tests produce high reliabilities because the
marker does not have to decide how good an answer is; it is either right or wrong.
where O is the standard deviation of the test scores and 7, is the reliability coefficient of
the test.
The interpretation of the standard error of measurement is similar to that of the SD
of a set of test scores. There is a probability of 0.68 that a student’s obtained score does
not deviate from his or her true score by more than plus or minus one standard error of
measurement. The other probabilities are
This interpretation of the SE,,.4, assumes that errors in measurement are equally
distributed throughout the range of test scores. If asubject obtains a score of 114 ona
standardised intelligence test which has M = 100, 6 = 15 and 7, = 0.96, what are the IQ
limits within which 95 per cent of the scores from an infinite number of testing would
lie? Here we need to know the SEjjea,, since 2 X SEpeasWill define the 95 per cent limits.
Substituting, we obtain:
SE ness = 15V1- 0.96
151.04
— 5 )8(
6)
="S
That is, the IQ scores the student would obtain from an infinite number of testings
would have a 6 = 3. The 95 per cent limits are therefore 2 x 3 IQ points above and below
the only real score we know he or she has. So the range of error using the 95 per cent
confidence limits are 114 + 6, i.e. 108 — 120. This range is quite revealing for it shows
that even with a highly reliable test there is still quite a considerable band of error round
an obtained score.
If the reliability is lower this range increases. For example, with an 7, of 0.75 and o = 15,
the 95 per cent limits are + 15 marks on each side of the obtained score. The limiting cases
are when 7, = 0 or 7,, = 1. In the former case SE;neas is the same as the 6 of the scores; in
the latter case SE,,¢, is zero, for there is no error.
1 Calculate the 95 per cent and 99 per cent score ranges for the following:
a raw score 114,06 =10, ht = 0.91
b raw score 62, 6 = 5, Nt = 0.84
re) 10 25 10 20 25
4 Describe the usefulness in reporting the standard error of measurement as well as the
reliability for a test.
At step 2, if you wish to determine the alpha of a subscale (subset of the whole test)
then only transfer those items you require.
LNA y fe ous
|
lowest item correlations with the scale as awhole.
o- ae 44 ‘ '
the
<7
s wiables 4,5, 12 an have
bility assessed again on the remaining tw relve-item
ase
determining the struct “st or scale in terms of its subggrou iping s of items (factors
s to submit it to factor analysis
How to report
lnternal rdiabilicy of a sixtcen-item scale was assessed using the Cronbach alpha technique.
Pere ee:
The scale produced an alpha of .7621. Inspection of the table suggested that four items
Reliability coefficients
Number of cases = 22.0
Number of items = 12
Alpha = .8176
should be eliminated because of their low correlation with the test as a whole. A further
reliability test then produced an alpha of .8176, which is acceptable for an attitude scale.
Validity
The subject of validity is complex, controversial, and peculiarly important in research.
Here perhaps more than anywhere else, the nature of reality is questioned. It is possible
to study reliability without inquiring into the meaning of the variables. It is not possible
to study validity, however, without sooner or later inquiring into the nature and meaning
of one’s variables. |
A measurement or assessment technique which is perfectly reliable would seem to be
quite valuable, but the test user should also raise the questions: ‘How valid is it? Does
the test measure what I want it to measure?’. A perfectly reliable test may not measure
anything of value, and it may not correlate with any other test score. Validity information
gives some indication of how well a test measures a given area, under certain
circumstances and with a given group. It is for this reason that any one test may have
many types of validity, and a unique validity for each circumstance and group tested.
Is this ruler:
1 reliable?
2 valid?
Surprising as it may seem, the ruler is quite reliable, for it produces consistent results,
even though the measurements are not what we think they are. Every time we draw
what we assume to be a line 12 inches long, we produce a line 13 inches long
consistently. The ruler produces a reliable measurement but it is not a valid measure of
12 inches. \t is a valid measure of 13 inches, but since we bought it on the presumption
that it measured 12 inches it cannot be measuring what it purports to measure. Similarly,
test instruments and techniques can be very reliable, producing consistent results from
occasion to occasion, but may not be valid as measures of what they set out to measure.
On the other hand, if an instrument is unreliable it cannot be valid.
So validity relates to the question, “What does the test measure?’. The importance of
this question would seem obvious; yet research literature contains many examples of
tests/techniques being used without proper consideration of their validity for the user’s
purpose. This lack of attention to validity may seem somewhat surprising, since most
tests are clearly labelled with a title intended to indicate quite specifically what is
measured. But one of the first steps in evaluating a new test is to disregard the title,
which may represent only what the test author had hoped to measure. A test of reading
comprehension may, in fact, measure only a general intelligence factor. A test of
achievement may be an equally good measure of general test-taking ability, particularly
if the items are poorly constructed. The possibility that a scale of neuroticism might
measure something else, such as a response set, or the ability to see through the test and
give a favourable impression, was seldom considered. It is important, therefore, that the
researcher be able to judge whether a test is valid for its purposes.
Types of validity
Five types of validity can be distinguished: predictive, concurrent, content, construct and
face. Each of these will be examined briefly, though we put the greatest emphasis on
construct validity, since it is probably the most important form of validity from the
research point of view.
Content validity
Content validity is most appropriately considered in connection with achievement testing.
An achievernent test has content validity if it represents faithfully the objectives of agiven
instructional sequence and reflects the emphasis accorded these objectives as the instruction
Predictive validity
Predictive validity involves the wish to predict, by means of assessment or technique,
performance on some other criterion. An example of such a situation is the use of Year 12
matriculation for tertiary study. The correlation between performance on the Year 12 exams
and final degree results is a measure of the predictive validity of the Year 12 exams. Predictive
validity is vitally important for vocational selection research techniques, because a person
responsible for the selection of those likely to succeed in a given job, college or curriculum is
concerned with test scores as aids in doing a better job of selection.
Primary teachers use reading readiness and intelligence tests as predictors when they
use them as aids in grouping children. Predictive validity cannot be judged by an
examination of atest’s content. It can only be assessed by comparing a later performance
(perhaps several years later) with the original test scores. This later performance, which
is to be predicted, is often termed the criterion performance and may be a rather crude
one such as successfully completing an apprenticeship, achieving an acceptable shorthand
rate, making at least a C average on a course, or developing a particular neurotic or
psychiatric symptom. We rarely require the prediction of a precise score.
It is usually possible to express predictive validity in terms of the correlation coefficient
between the predicted status and the criterion. Such a coefficient is called a validity
Concurrent validity
Concurrent and predictive validity are very much alike. They differ only in the time
dimension. For example, if we developed a neuroticism scale, we would require an
answer to the question, ‘Will a high scorer on this test become a neurotic at some time
in the future?’ for predictive validity, but an answer to the question, ‘Is the high scorer
a neurotic now?’ for concurrent validity. Predictive and concurrent validity are both
characterised by prediction to an outside criterion and by checking a measuring
instrument, either now or in the future, against some outcome. A test predicts a certain
kind of outcome, or it predicts some present or future state of affairs. In a sense then,
all tests are predictive. Aptitude tests predict future achievement; achievement tests
predict present and future achievement; and intelligence tests predict the present and
future ability to learn and to solve problems.
Face validity
In certain circumstances, one may be concerned with the question, “Does the test appear, from
examination of the items, to measure what one wishes to measure?’, or with ‘Does the test
appear to test what the name of the test implies?’.
This is usually the concern of the lay person who knows little or nothing about
measurement, validity or reliability. Researchers often require ‘high face validity’ for
tests or techniques which they use in research programs for industry, the military and
schools. However, it is difficult, if not impossible, to measure a validity of this type.
The high face validity will, hopefully, motivate the subjects to tackle the test in a
businesslike way. If the naive subjects looked at a test and started thinking that the items
were ridiculous and seemed (to them) unrelated to the aim of the tests, then motivation
would be considerably reduced. Up to a few years ago, we often chuckled when we read
abusive letters in national newspapers from parents asking how on earth some particular
(quoted) question could ever measure IQ. Obviously face validity had failed in instances
like this, yet the items probably had high construct, predictive and concurrent validity.
1 In what research situations might you find it desirable to have low face validity in an
assessment instrument?
2 Which one of the following properties of a test would be taken as evidence of its
reliability and not its validity?
a The scores obtained by children on two successive administrations of the test
were correlated 0.95.
b The even-numbered questions on the test were found to yield a substantially
higher mean score than the odd-numbered questions.
c Scores on the test correlated highly with scores on another test designed to
measure the same ability.
d_ Scores on the test at the beginning of the school year predicted scores on the
final examination.
Maturation
Between any two observations, subjects change in a variety of ways. Such changes can
produce differences that are independent of the experimental treatments. The problem
Statistical regression
Like maturation effects, regression effects increase systematically with the time interval
between pre-tests and post-tests. Statistical regression occurs in educational (and other)
research due to the unreliability of measuring instruments and to extraneous factors
unique to each experimental group. Regression means, simply, that subjects scoring
highest on a pre-test are likely to score relatively lower on a post-test; conversely, those
scoring lowest on a pre-test are likely to score relatively higher on a post-test. In a word,
in pre-test/post-test situations, there is regression to the mean. Regression effects can lead
the educational researcher mistakenly to attribute post-test gains and losses to low scoring
and high scoring respectively.
Testing
Pretests at the beginning of experiments can produce effects other than those due to the
experimental treatments. Such effects can include sensitising subjects to the true
purposes of the experiment and practice effects which produce higher scores on post-
test measures.
Instrumentation
Unreliable tests or instruments can introduce serious errors into experiments. With
human observers or judges, error can result from changes in their skills and levels of
concentration over the course of the experiment.
Selection bias
Bias may be introduced as a result of differences in the selection of subjects for the
comparison groups or when intact classes are employed as experimental or control
groups. Selection bias, moreover, may interact with other factors (history, maturation,
etc.) to cloud even further the effects of the comparative treatments.
Dropout
The loss of subjects through dropout often occurs in long-running experiments and
may result in confounding the effects of the experimental variables, for whereas
initially the groups may have been randomly selected, the residue that stays the
course is likely to be different from the unbiased sample that began it.
Hawthorne effect
Medical research has long recognised the psychological effects that arise out of mere
participation in drug experiments, and placebos and double-blind designs are commonly
employed to counteract the biasing effects of participation. Similarly, so-called Hawthorne
effects threaten to contaminate experimental treatments in educational research when subjects
realise their role as guinea pigs.
Reference
Kuder, G. and Richardson, M. (1937), “The theory of the estimation oftest reliability’, Psychometrika
2, pp. 151-6.
What is meta-analysis?
Each strand ofarope contributes to the strength ofthat rope. But the rope is stronger than
any individual strand. Similarly, when a particular finding is obtained again and again
under a variety of conditions, we are strongly confident that there exists a general principle
of behaviour. The results of individual studies, no matter how well conducted, are unlikely
to be sufficient to provide us with confident answers to questions of general importance.
Meta-analysis is a quantitative tool for comparing or combining results across a set of
similar studies. In the individual study the unit of analysis is the responses ofindividual
subjects. In meta-analysis the unit of analysis is the results of individual studies.
The term ‘meta-analysis’ means an ‘analysis of analysis’. Many studies are replicated
in various degrees using, for example, differently sized samples, different age ranges, and
are conducted in different countries under different environmental conditions.
Sometimes results appear to be reasonably consistent; others less so. Meta-analysis
enables a rigorous comparison to be made rather than a subjective ‘eyeballing’.
In chapter 3 you were introduced to the traditional literature review, a strategy in
which you read research relevant to the topic you wish to investigate further, then
summarise the findings and integrate the existing knowledge. From this you may
conclude that a particular variable is of crucial importance, or that the relationships
between particular variables are worthy of note. However, the conclusions you are
drawing are essentially subjective, based on your critical evaluation of the literature. You
often use a ‘voting method’ as a crude index of where the balance of results lie. The
possibility exists that your subjective conclusion may not accurately reflect the actual
strength of the relationship. You can reduce this possibility by adding a meta-analysis to
your review. This allows you to compare or combine results from different studies,
facilitating statistically guided decisions about the strength of observed effects and the
reliability of results across a range of studies. Meta-analysis is a more efficient and
effective way to summarise the results of large numbers of studies. A good source of
reviews using meta-analysis is the journal, Review ofEducational Research.
Meta-analytic techniques have been used to answer questions such as:
e Are there gender differences in conformity?
e Is there a relationship between self-concept and academic attainment?
e What are the effects of class size on academic achievement?
A few examples
1 In the first classic meta-analysis study, Smith and Glass (1977) synthesised the results
of nearly 400 controlled evaluations of psychotherapy and counselling to determine
whether psychotherapy ‘works’. They coded and systematically analysed each study
for the kind of experimental and control treatments used, and the results obtained.
They were able to show that, on average, the typical psychotherapy client was better
off than 75 per cent of the untreated ‘control’ individuals.
2 Rosenthal (1994) used meta-analysis to summarise the results of 345 studies on
experimenter effects. Experimenter effects occur when the participants in an
experiment respond in ways that correspond to the expectancies of the experimenter.
Rosenthal investigated this effect in eight areas in which the effect had been studied
(e.g. learning material, person perception, athletic performance), and the mean effect
size was .70. This suggests a strong effect size, so we can confidently state that
interpersonal expectancies can influence behaviour as a general principle.
3 Horton et al. (1993) performed a meta-analysis on nineteen studies related to concept
mapping by secondary school students. The analysis revealed that concept mapping
had generally positive effects and raised individual student achievement by 0.46
standard deviations.
Although meta-analysts use a number of more advanced techniques, we will concentrate
on some basic techniques that incorporate some fundamental statistical procedures such
as ce‘significance’,
.
‘p’ and ‘7’, discussed earlier in the book, to give you the flavour of what
ees) Gq . 4 4 ®
Conducting a meta-analysis
There are three stages to this:
TABLE 21.1 Tolerances for future null results as a function of the original average
level ofsignificance per study and the number ofstudies summarised
Number of studies Original average significance level
summarised .05 01 .001
1 1 2 ma
2 4 8 15
3 S) 18 32
4 16 32 7,
5 Vs 50 89
6 36 72 128
7 49 98 173
8 64 128 226
) 81 162 286
10 100 200 353
15 225 450 795
20 400 800 1412
25 625 1250 2206
30 900 1800 shan
40 1600 3200 5648
50 2500 5000 8824
Note: Entries in this table are the total number of old and new studies required to bring an original average
p of .05, .01, or .001 down to p > .05 (i.e. ‘non-significance’).
If, however, you determine (based on your analysis) that at least several thousand
studies must be in the file drawer before biasing ofyour results takes place, then you can
be reasonably sure that the file drawer phenomenon is not a serious source of bias.
For each approach, you can evaluate studies by comparing or combining either p-
values or effect sizes.
TABLE 21.2 Mets-analytic techniques for comparing and combining two studies
Technique Method/purpose
Comparing studies Used to determine if two studies produce significantly
different results.
Significance testing Record p-values from research and convert them to
exact p-values (such as a finding reported at
p < 0.05 that may actually be p = 0 .036).
Used when information is not available to allow for
evaluation of effect sizes.
Effect-size estimation Record values of inferential statistics (F, t, for example),
along with associated degrees of freedom. Estimate
effect sizes from these statistics. Preferred over
significance testing.
Combining studies Used when you want to determine the potency ofa
variable across studies.
Significance testing Can be used after comparing studies to arrive at an
overall estimate of the probability of obtaining the two
p-values under the null hypothesis.
Effect-size estimation Can be used after comparing studies to
evaluate the average impact across studies of an
independent variabie on the dependent variable.
Comparison of effect sizes of two studies generally is more desirable than simply
looking at p-values. This is because effect sizes provide a better estimate of the degree of
impact of a variable than does the p-value. (Remember, all the p-value tells you is the
likelihood of making a Type I error.) The p-values are used when the information needed
to analyse effect sizes is not included in the studies reviewed. Consequently, the following
discussion focuses on meta-analytic techniques that look at effect sizes. For simplicity,
only the case involving two studies is discussed. The techniques discussed here can be
easily modified for the situation where you have three or more studies. For more
information, see Rosenthal (1979, 1984), and Mullen and Rosenthal (1985).
1 converting the quoted statistic from both studies, e.g. ¢ or chi square into 7's;
2 giving the 7’s when calculated the same sign if both studies show effects in the same
direction, but different signs if the results are in the opposite direction;
3 finding for each 7 the associated ‘Fisher z’ value. Fisher’s z (i.e. lower case z to
differentiate this statistic from the upper case Z denoting the standard normal
deviate, or Z score described in chapter 4) refers to a set of log transformations of r
as shown in Tables 21.4 and 21.5;
4 substituting in the following formula to find the standard normal deviate or Z
score:
2,
— 29
chapter
21 meta-analysis 367
TABLE 21.5 Transformations of r to Fisher z
Second digit of r
r .00 01 .02 .03 .04 .05 .06 .07 .08 .09
0 .000 .010 .020 .030 .040 .050 .060 .070 .080 .090
“il .100 .110 121 io 141 sl] 161 ie .182 oe
Ze .203 213 .224 .234 .245 on .266 77 .288 W299
2 .310 aeyAl SZ 343 354 .365 tf .388 .400 .412
4 424 .436 448 .460 472 485 497 .510 523 .536
aS 549 563 .576 590 .604 .618 .633 .648 662 .678
6 {o32)3} .709 iliPES: 74) oo Wee) WE, 811 829 .848
lf .867 .887 .908 E929 950 273 996 1020 "1.0435" 1-971
8 1099 PAZ PAS? OVABS 12221" 972256" 129s) 33 3 76) ea ZZ
Third digit of r
r .000) §=6©..001) §=6.002)«§©—«.003) =—.004 = 00S (i006—(i«w00F=Ss(Ci«w00B——s«C#OD
.90 1547 2590247Bp AS3 re 1:4 BB eh x494 0 1499 p55 nle51Onl s5 1 Gopnihes 22
1 IBS 280506533. 812539.. * T5452 S155 dacCSS ZS ai Os Mic Oe os
a2. W589 1.596. 1.602" 1.609" 15616-1623. 1.630" 1.637- 91.644 5),650
93 IGS EMPL 666" 915673" 91.681 212689915697 0S” 1G 71S ee Ie21a 7s
94 W738) 1G747- 18756 91.764 TI 74 R783 15792" AEBO2- WGA Ae a B22
os HeSS2) R842 S 1b853)) 16863) 874 1esS6, 1897 1909 SDI 12933
.96 (3946 12959" e972 «13986 25000 2204 22029-22044 24060" 2.076
a7, 2092 -2NOQMZN2ZTW29A6: «2516S 2ZNES wee 205 DBA2Z7 e247 3
.98 2298 233236. 2351 ,2380 2A10s, 2443 «2477 2515,~ 29555 —2.599
199 2646 2700 2759 2:826 2:903 2994 3506 39250 3453 3.800
EXAMPLE
Here is an example comparing two studies.
Imagine you are interested in comparing two experiments that investigated the impact
of the credibility of a communicator on persuasion for similarity of effect size, to determine
whether it is worthwhile combining them. In the results sections of the two studies, you
found the following information concerning the effect of credibility on persuasion:
Stay 127 =7.57, D-O.0T N= 22"
Siugdy 2: f=52.21,.p.< 0.05,.N = 42.
The first thing you must do is to determine the size of the effect of communicator
credibility in both studies. Unfortunately, neither study provides that information (you
will rarely find such information). Consequently, you must estimate the effect size based
on the available statistical information which is ¢. Using the formula for t (see p. 170)
gives the following results when converted into r. The formula is:
=
r
i Aes
Study 1: r = |6.59/(6.59 + 20) = 0.50
Study 2: r = 4.89 /(4.89 + 40) = 0.33
Suppose you have used 100 subjects to try to replicate an experiment that reported a
large effect (r = .50) based on only ten subjects. You find a smaller sized effect in your
study (r = —.31), but it is in the opposite direction of the one previously reported.
a Do you code your effect as negative or positive?
b From Table 21.4 what are the Fisher z’s corresponding to each r?
c Compute Z and find the associated p-value.
d What is your conclusion about the merits of combining these two studies.
Answers on p. 605.
chapter 21 | meta-analysis
Alternatively, suppose your result is in the same direction as the original one and of a
similar magnitude, and you have used the same number of subjects. This time, imagine
that the original effect size was r= .45 (N = 120), and your effect size is r= .40 (N =120).
Following the same procedure as above, find:
a___ the corresponding Fisher z’s;
b Z; and
Cap:
d Nowcomment on your findings.
Answers on p. 606.
Meanizorz, ==———*+
in which the denominator is the number of Fisher z scores in the numerator; the
resulting value is an average (or Z,,).
The first step to take when combining the effect sizes of two studies is to calculate r
for each and convert each r-value into corresponding z-scores. Using the data from the
example used above to demonstrate comparing studies, we already have z values of 0.55
and 0.34.
_ (0.55 + 0.34)
ie SE ger = 0.45
This z is reconverted back to an 7 using Table 21.4. The r-value associated with this
average Fisher z is 0.42. Hence, you now know that the average effect size across these
two studies is 0.42.
Here is another example. Given effect size estimates of r= .7 (N = 20) for studyA and
r= .5 (N = 80) for study B, find a combined estimate of the effect size.
We first convert each r into z scores and then substitute into the formula. This gives
This average Fisher z converts back to a combined effect of 7 = 0.65. This is larger than
the mean of the two 7’s.
Remember, always compare the studies before combining them. If the effect sizes of
the two studies are statistically different, it makes little sense to average their effect sizes.
Given two studies, with effect sizes r= .45 and r= .40 (both coded as ‘positive’ to show
that both results were in the predicted direction),
a find Fisher z’s for each;
b compute mean z;
c find combined effect size.
Answers on p. 606.
ipdetZe
Z
2
is distributed as Z, so we can enter this newly calculated Z into a table of standard
normal deviates (p) to find the p-value associated with a Z of the size obtained or larger.
EXAMPLE
Suppose that studies A and B yield results in opposite directions, and neither is
‘significant.’ One p is .075 one-tailed, and the other p is .109 one-tailed but in the
opposite tail. The Z’s corresponding to these p’s are found in Table 5.1 (p. 73) to be
+1.44 and —1.23 (note the opposite signs which indicate results in opposite directions).
Then, from our equation we have:
Suppose that studies A and B yield results in the same direction. However, although the p-
levels appear to be very similar, one result is identified ‘significant’ by the author of A because
p = .05, and the other is termed ‘not significant’ by the author of B because p = .07.
a Find the Z’s corresponding to these p’s.
b Find the difference between the Z’s.
c_ Find the p-value associated with the new Z of the difference between the original p-
values.
d Comment on this p-value.
Answers on p. 606.
7s
372 part 2 |quantitative methods
That is, the sum of the two Z’s divided by the square root of 2 yields a new Z. This
new Z corresponds to the p-value of the two studies combined if the null hypothesis of
no relation between X and Y were true.
Suppose studies A and B yield homogeneous results in the same direction but neither
is significant. One p is .121, and the other is .084; their associated Z’s are 1.17 and
1.38, respectively. From the preceding equation we have:
woZg 2, Ta SBaZ:0b
Za = 1.81
J2 nei 2
as our combined Z. The p associated with this Z is .035 one-tailed (or .07 two-tailed).
This is significant one-tailed even though the original p’s were not.
As another example, imagine p-values (one-tailed) for studies A and B are p = .02
(significant), p = .07 (not significant).
The two p-values can be combined to obtain an estimate of the probability that the
two p-values might have been obtained if the null hypothesis of no relation between X
and Y were true.
LG He 26 20or 43
Z = 2.51
2 1.41
Its p-value is .006 (one-tailed) or .012 (two-tailed). This combined p-value is
significant; it supports the significant study A.
TABLE 21.6
Practical problems
The task facing a meta-analyst is a formidable one. Not only may studies on the same
issue use widely different methods and statistical techniques, some studies may not
provide the necessary information to conduct a meta-analysis and have to be eliminated.
The problem of insufficient or imprecise information (along with the file drawer
problem) may result in a non-representative sample of research being included in a meta-
analysis. Admittedly, the bias may be small, but it may nevertheless exist.
Meta-analysis is a quantitative tool for comparing or combining results across a set of similar
studies, facilitating statistically guided decisions about the strength of observed effects and the
reliability of results across a range of studies. Meta-analysis is a more efficient and effective way to
summarise the results of large numbers of studies.
e The first general technique is that of comparing studies. This comparison is made when you
want to determine whether two studies produce significantly different effects.
e The second general technique involves combining studies to determine the average effect size
of a variable across studies.
For each approach, you can evaluate studies by comparing or combining either p-values or
effect sizes.
The problem posed by the file drawer phenomenon is potentially serious for meta-analysis
because it results in a biased sample—a sample of only those results published because they
produced acceptable statistically significant results. But even published research may be of uneven
quality.
There is an art in writing an experimental report, and you will find that it becomes easier with
practice. Research work must be written up systematically and with great care, so that the
reader is not faced with the prospect of having to sort out ambiguities or misunderstandings.
You must be prepared to write up a report a couple of times at least before you are satisfied
that there is no room for improvement. Researchers who are experts in writing up their
results often have to make several drafts before writing up the final report. Extremely valuable
and interesting practical work may be spoiled at the last minute by a student who is not able
to communicate the results easily. The report should be written in an impersonal third person
style, with a minimum of rhetorical excess. Scientific writing is a stripped-down cool style that
avoids ornamentation.
e You must write in an accepted style. Most universities and journal editors will
provide a style manual which details the organisation, presentation, format and
language to be used. It will pay to examine some previous studies printed in the
journal in which you wish to publish, or some successful theses/dissertations at your
university. Your material will be subject to the same format and style requirements.
¢ It is essential to write clearly and precisely in presenting your material. You should
avoid jargon and assume that your readers will have a general understanding of, and
familiarity with, basic statistical concepts such as standard deviation and the normal
distribution curve. These concepts need not be explained. The entire report should
have a coherent structure with an orderly progression in the presentation ofideas, data
and arguments.
° Always acknowledge the work of others. If you quote from another person’s work
present the passage within quotation marks with a reference at the end of the
quotation. Don’t try to pass off the work of others as your own; this is plagiarism. You
need to cite other authors in the text by placing the date of publication after their
name, ¢.g., ‘In a follow up study, Jones (1993) found . . .’. You can also include the
name and date in parentheses at the end of a sentence if the name is not part of the
narrative, e.g. ‘Students from single parent families had significantly lower self-
concepts (Smith 1994)’. The name and date can then be looked up in the references,
where full details of the texts referred to are given. Where there are two authors, join
their names with ‘and’. Where there are three or more use ‘and’ on the first occasion;
on subsequent occasions cite only the first author followed by the abbreviation ° etal.’,
e.g. Jones, Smith and Wilson become ‘Jones et al.’.
* Avoid sexist language by rephrasing sentences, e.g, change ‘the teacher completed her
questionnaire...’ to ‘the teacher completed the questionnaire...’. ‘He’ and ‘she’ can
often be replaced by ‘they’.
¢ Begin writing up data as soon as possible. Do not wait until al) the data have been
collected and all analyses have been completed before commencing to draft the first
sections of your thesis/ paper. If you leave it all to the end you may find difficulty in
meeting deadlines. The Introduction and Methodology sections can certainly be
drafted while data collection is under way, as these include your review of the
literature, theory and previous research, the statement of the problem, the hypotheses,
the sample, design and any other matters that must have been dealt with before data
collection commenced. Your original research proposal could provide a basis for the
introduction as it should contain a detailed starement of the problem and hypotheses.
This could be extended with a fuller literature review. From your computer search of
the literature you can also start organising your bibliography.
The time factor is important. Writing up experiments cannot be rushed without
doing an injustice to the results and conclusions of these experiments. Unless enough
time is devoted to the writing up, serious errors can be made. The purpose of writing
up one’s findings is to enable others to understand clearly what you have done; to
replicate the work if they are interested or to modify aspects of it. Each part of the
experiment should be reported carefully and accurately. Most research articles in the
behavioural sciences are organised in essentially the same way; so too are theses and
dissertations although most aspects of the research is covered in greater detail than in
an article. The usual format is based on the Publication Manual of the American
Psychological Association and is used in most journals in education and psychology.
The following sequence is recommended as being one which students find easy to use
and which sets out the stages of the report in a logical fashion.
Title
Here is an example of a point referred to above about rewriting aspects of the report. The
title should be as short as possible but it should retain meaning. You may have to make
This is bit of a mouthful and the researcher will realise this. Gradual refinement
should lead to a title similar to the following:
‘Classroom teaching versus distance education in a first year university biology unit.’
Similarly a draft title,
‘The effects of lunchtime involvement in exercise on the stress levels of teachers in secondary
schools in inner city areas.’
might be just as clear and yet less of a mouthful if rewritten:
‘Exercise and stress levels in inner city secondary school teachers.’
Rewrite the following titles so that they are succinct yet informative.
a_ A Study investigating the relationship between selected socioeconomic factors
among 7-year-old and 10-year-old boys and girls and their attitudes towards certain
aspects of their schooling.
b A pilot study of the effectiveness of individual versus group counselling on the self-
concepts of forty high school students selected on the basis of suffering physical and
emotional child abuse during their primary school years.
c An investigation to test the hypothesis that 10-year-old girls are significantly better
at reading but significantly poorer at mathematics than 10-year-old boys.
Summary
In printed research papers, this may be referred to as the abstract, and it may appear at
the beginning or end of the paper. For our purposes, it seems sensible to call it the
summary, as this acts as a reminder to us that its purpose is to present an overview of the
design, contents and results of the experiment. Another reason for placing the summary
early in the write-up is to enable the reader to determine whether or not it is worthwhile
ploughing through the whole report.
Introduction
This is best started with a general statement of the problem which enables the reader to gain
an appreciation of the issue, its importance, pertinence, and its place within the ambit of
education. A brief review of research findings and theories related to the topic follows. This
is to provide an understanding of previous work that has been done and a.context into which
the current study fits. Ifyou have conducted a computer review of the pertinent literature the
printout should provide the basis for a chronological or topic order of the review. Some
researchers put a summary of each study reviewed on their word processor as they progress
with their reading. This offers a basis for the writing of this section.
However, you should not simply regurgitate the summary of each reviewed study, but
attempt to synthesise and analyse the material, distilling the essential themes, issues,
methodologies, discrepancies, consistencies and conclusions, as well as the specific results
and conclusions of particular studies where appropriate. The aim is a clear, unified and
thorough picture of the status of research in the area and not a boring, stereotyped
sequential presentation of all the separate summaries of each study taken from the
indexes and abstracts consulted like a furniture sale catalogue commencing with
‘Professor X (19YY) discovered that... and Dr Smith (19ZZ) found that ...’. There
are of course some significant and seminal studies that should be reviewed individually.
The process of combining and interpreting the literature is more difficult than simply
reviewing. You should also avoid an excessive use of quotations. Any quotation used
should be a significant one that bears a relationship to what you propose to investigate
in your study.
Out of this review of the literature, and linked to the original problem, there should
emerge a statement of the hypothesis to be tested. This could be followed by a definition of
terms and concepts to be used in the study. It is imperative to know what is meant, for
example, by such terms as “disadvantaged student’, ‘holistic education’, ‘aggressive behaviour’
or ‘creativity’, as these can be deployed with a variety of meanings by different authors. The
hypothesis should stem out of the matrix of knowledge and interpretation already presented.
It should be succinct, be consistent with known facts and be testable.
After reading the introduction, a reader should understand why you decided to undertake
the research, how you decided to set about doing it and what your hypothesis is. This is a
general to specific organisation, as indicated below:
Test instruments/apparatus/materials
Details of all tests should be given, including name, source, reliability and validity data.
If the tests are self-created, evidence of reliability and validity from pilot studies is needed.
If you construct equipment or manufacture material specifically for your experiment,
then it is essential that you describe it in some detail. When you have done this, read over
what you have written and ask yourself whether another person, unfamiliar with your
work, would be able to reconstruct your equipment or material.
A photograph, or photographs, of new equipment will be welcomed by a reader who is
unacquainted with your work, and examples of your actual material should be appended to
the report where possible. For example, to refer to a list of anagrams without specifying
their length, their difficulty or even their number is singularly unhelpful. Moreover, if
material is available in the report, you will have little difficulty describing your work and
discussing it.
Design
Some researchers would regard this as being the most important section in the report.
Since, in theory, the purpose of writing a report is to enable others to replicate your
work, great care must be taken in describing your design clearly, concisely and logically.
You will need to furnish information as to how each stage of the experiment was
conducted. Once again, we must emphasise that this will demand much patience on your
part, and you must be prepared to rewrite the section several times before producing an
acceptable version.
Data analysis
This describes the statistical analyses undertaken. If they are commonly known—e.g. chi
square, independent ¢ test—then the tests only need to be named. Unusual approaches
should be detailed. The significance level to be accepted should be stated.
Results
Clarity! This is the key word in this section. Sound studies which have produced
excellent results are often spoiled at this stage simply because students either do not
know the best way to present their results, or do not pay sufficient attention to their
presentation. The major failing seems to be in trying to present too much information
at once; graphs become confusing and tables are difficult to interpret simply because the
student has tried to convey too many results in one display. Present your results clearly,
simply and neatly. Your results should be presented in tables or other diagrams which
have appropriate, meaningful, short headings. Short comments on the results are
permissible, such as, ‘Introverts scored significantly higher than extroverts on this scale’,
but do try to avoid detailed and penetrating comments which are better placed in the
‘Discussion’ section.
Discussion
Instead of moving from the general to the specific, as we did in the introduction, we
move from the specific to the general in the discussion.
The purpose of this section is to enable you to assess your results and draw sensible
conclusions from them. It is quite common to find students who have painstakingly
undertaken an interesting experiment, produced a set of results, and applied appropriate
statistical procedures, yet are incapable of explaining their significance. What do the
results mean in terms of your hypothesis? What implications or inferences can be made
from your results? Are you able to summarise your conclusions, perhaps suggesting ideas
from your experiment which could be developed by another person?
Do not be afraid to mention any failings in your experimental design, your sampling
difficulties, or your procedure. The discussion section has a frame of references—the
introductory section. The points raised in the introduction must be responded to in the
discussion. But within this frame of reference, the writer is free to use whatever art and
imagination they can to show the range and depth of significance of their study. The
discussion section ties the results of the study to both theory and application by pulling
together the theoretical background, literature reviews, potential significance for
application, and results of the study.
Conclude this section by summarising the major conclusions and results of the
experiment, and restate the hypothesis in its original wording, pointing out whether or
not it was supported.
References
You must record all the references you have mentioned in your experiment, but no others.
Accuracy is important, as other people may wish to follow up your work by reading some of
the references in the library, and much time can be wasted if a librarian has to search for a
Within the text, any reference appears as the author’s surname and year of publication.
This model for writing an experimental report is, of course, only one particular model.
But whatever model or pattern you use, the basic criterion is that it should provide a
logical sequence, concisely expressed, enabling another interested person to understand
what has been done, why, and with what results.
¢ Introduction The introduction should contain brief overview of issues and concepts
to place research in its context. Aim(s) and hypotheses should be stated clearly in a
predictive form.
¢ Method Give enough detail to enable readers to repeat the study as you did it.
¢ Design Detail the variables, the design form, the statistics employed, the subjects,
materials/tests, procedures/instructions.
¢ Results Verbal description of results plus summary tables clearly titled and labelled
(raw data in appendix if necessary) are essential with significance levels stated and
statements about rejection or support for null hypotheses.
¢ Discussion Relate results to hypotheses, background theory and previous research.
Note and explain, if possible, unexpected results. Suggest modifications and future
directions for the research area. Discuss limitations.
¢ References List all studies referred to using standard format.
Further reading
American Psychological Association (1993), Publication Manual, 4th edn, APA, Washington DC.
Campbell, W., Ballou, S. & Slade, C. (1986), Form and Style: Theses, Reports and Term Papers, 7th
edn, Houghton Mifflin, New York.
Light, R.J. & Pilliner, D.B. (1984), Summing Up: The Science of Reviewing Research, Harvard
University Press, Cambridge.
Mauch, J.E. & Birch, J.W. (1983), Guide to the Successful Thesis and Dissertation, Dekker, New York.
wig) ty
wt yebers eo fe fiat UE EBS eet ene aes
ar beatin
eherpwsbs ' in aes SU wcoarrott
om -4 tides epitadtins mh
> Yi
j Mr
.
0 pha 4 8 Teta)
ime Milas Pee 20 Stas Pay creat 10 bbome , wy
hoe Seas fe
JUALITATIVE METHODS
23 ETHNOGRAPHIC RESEARCH
24 UNSTRUCTURED INTERVIEWING
25 ACTION-RESEARCH
26 CASE STUDIES
27 HISTORICAL RESEARCH
28 THE QUALITATIVE RESEARCH REPORT
REPORT 1 ‘I FEEL SORRY FOR SUPPLY TEACHERS...’
REPORT 2 NASR’S DEVELOPMENT AS A WRITER IN HIS SECOND
LANGUAGE: THE FIRST SIX MONTHS
REPORT 3 DIMENSIONS OF EFFECTIVE SCHOOL LEADERSHIP:
THE TEACHER’S PERSPECTIVE
The purists assert that qualitative and quantitative methods are based in paradigms
that make different assumptions about the social world, about how science should be
conducted, and what constitutes legitimate problems, solutions, and criteria of ‘proof’.
So far, we have been considering a pervasive, scientific mode of inquiry—a mode
characterised by objectivity, reliability and prediction. The assumption that ‘truth’
and ‘knowledge’ are fixed and singular entities has predisposed research towards
numerical quantification procedures and technical controls, generally statistically
oriented.
During the late 1960s and throughout the decade of the 1970s, a new critical form
of inquiry began to emerge. A more diffuse recognition of the implicit relationship
between knowledge and human interests led to the advocacy of an alternative, more
humanistic, investigative paradigm. This paradigm is based on the concept of verstehen,
a form of subjective understanding.
In current research, movements towards humanness are based on a recognition of
the need for critical inquiry and meaning in educational action. The traditional
emphasis on ‘factual’ knowledge and singular truths has become obsolete as the avenues
for knowledge generation and cultural interchange increase. The qualitative researcher
attempts to gather evidence that will reveal qualities of life, reflecting the ‘multiple
realities’ of specific educational settings from participants’ perspectives.
Social reality
Qualitative researchers believe that since humans are conscious of their own behaviour,
the thoughts, feelings and perceptions of their informants are vital. How people attach
meaning and what meanings they attach are the bases of their behaviour. Only qualitative
methods, such as participant observation and unstructured interviewing, permit access
to individual meaning in the context of ongoing daily life. The qualitative researcheris
not concerned with objective truth, but rather with the truth as the informant perceives
it. Ifa student believes a teacher dislikes him or her, then every act of that teacher towards
the student will be interpreted by the latter in terms of that belief. This information is
necessary in order to fully understand the behaviour of the student towards the teacher.
In an objective sense, only a disruptive student is seen.
Social reality is the product of meaningful social interaction as perceived from the
perspectives of those involved, and not from the perspectives of the observer. Thus, the
central data-gathering techniques of a qualitative approach are participant observation
and unstructured interviewing. Qualitative methods attempt to capture and understand
individual definitions, descriptions and meanings of events. Quantitative methods, on
the other hand, count and measure occurrences. Abercrombie (1988) argues that social
science research can never be objective because of the subjective perceptions of those
involved, both informant and researcher; because all propositions are limited in their
meaning to particular language context and particular social groups; because all
researchers impose unwittingly their own value judgements and because all observations
are theory laden.
Sampling
Whereas quantitative research uses probability sampling, qualitative research employs
non-probability sampling, especially snowball sampling and theoretical sampling. In
snowball sampling, a person, who is identified as a valid member of a specified group to
be interviewed, is asked to provide the names of others who fit the requirements. This
is because in many situations the interviewer would not know the potential members of
the sample; for example, in a study of delinquent gangs, one gang member can provide
names of other gangs and gang members—information that may only be known to a
select few and which would remain confidential. The gang member would be asked to
introduce the interviewer to the further potential interviewees. In theoretical sampling,
data collection is controlled by the developing theory. As information is gathered from
the first few cases the underlying theory becomes extended, modified, etc., and therefore
informs the investigator as to which group(s) are relevant to interview. For example, in
a study investigating how students respond to sports injuries that leave them unable to
take part in that sport again, successive interviews might gradually narrow down the
range of sports to be covered, as the investigator determines that some sports have no
participants that ever suffer such disabling injury. Eventually, because of the incidences
noted, the study begins to limit its focus to rugby and skiing participants. Thus the
purpose of theoretical sampling is the discovery and development of categories. It is a
recurrent process, as incoming data provides new evidence and suggests new categories.
Cases are analysed until new categories and disconfirming cases no longer appear to
change the theoretical and conceptual model. At this stage we have reached theoretical
saturation. This approach is linked to analytic induction in which a search for falsifying
evidence is made which leads to modification of the theory until no further
disconfirming evidence is found. There is a resemblance between analytic induction and
Popper's emphasis on the importance of setting up null hypotheses.
As Glaser and Strauss (1967) suggest, sampling is often guided by the search for
contrasts which are needed to clarify the analysis and achieve maximum identification
of emergent categories. So no representative sample is found here but particular samples
to identify specific classes of phenomena are emerging. This strategy permits the
investigator to develop and study a range of types rather than determine their frequency
Literature review
Qualitative researchers do not search for data that will support or disprove their
hypothesis. Rather as we read above, they develop theories and propositions from the
data they collect as the research develops. The literature review is a stimulus for your
thinking and not a way of summarising in your own mind the previous work in the area
that can blind you to only considering existing concepts and conceptual schemes, as in
quantitative method. New findings cannot always be fitted into existing categories and
concepts, and the qualitative method, with its more open-minded approach, encourages
other ways of looking at the data. The literature review should be a sounding board for
ideas, as well as finding out what is already known and what specific methodologies
have been used. Often research reports identify additional questions that would be
fruitful to pursue.
The promise of the qualitative mode can be seen in its emphasis on naturalistic
investigative strategies. These methods could enable the researcher to focus on
complexities and qualities in educational action and interaction that might be
unattainable through the use of more standardised measures. An explication of
‘meaning’, rather than the isolation of ‘truth’, is identified as the goal.
Within social science research, the typical qualitative approaches involve ethnography
survey and action research, with observation and interviewing as the major techniques.
Purpose
Interpretation Prediction
Contextualisation Generalisation
Understanding the perspectives Causal explanation
of others
Method
Data collection using Testing and measuring
participant observation,
unstructured interviews
Method (continued)
Concludes with hypothesis and Commences with hypothesis and
grounded theory theory
Emergence and portrayal Manipulation and control
Inductive and naturalistic Deductive and experimental
Data analysis by themes from Statistical analysis
informants’ descriptions
Data reported in language of Statistical reporting
informant
Descriptive write-up Abstract impersonal write-up
Role of researcher
Researcher as instrument Researcher applies formal
instruments
Personal involvement Detachment
Empathic understanding Objective
As the table suggests, the choice of research method will be influenced by the
assumptions that the researcher holds about the social world and the people who inhabit
it, and by the sort of study required by the topic under investigation.
References
Abercrombie, N. (1988), The Penguin Dictionary ofSociology, Penguin, Harmondsworth.
Glaser, B. and Strauss, A. (1967), The Discovery of Grounded Theory, Aldine, Chicago.
Further reading
Berg, B. (1989), Qualitative Research Methods for the Social Sciences. Allyn & Bacon, Boston.
Eisner, E. & Peshkin, A. (eds) (1990), Qualitative Enquiry in Education, Teachers College Press,
New York.
Firestone, W. (1987), ‘Meaning in method: the rhetoric of quantitative and qualitative research’
Educational Researcher 16, pp. 16-21.
Flick, U. (1998), Introduction to Qualitative Research, Sage, London.
Glesne, C. & Peshkin, A. (1992), Becoming Qualitative Researchers. Longman, New York.
Goodwin, L. & Goodwin, W. (1984), ‘Qualitative vs quantitative research or qualitative and
quantitative research’, Nursing Research 33, pp. 378-9.
Jacob, E. (1988), ‘Clarifying qualitative research.’ Educational Researcher 17, pp. 16-24.
Marshall, C. & Rossman, G. (1989), Designing Qualitative Research, Sage, Newbury Park.
Mason, J. (1996), Qualitative Researching, Sage, London.
Reismann, C.K. (ed.) (1993), Qualitative Studies in Social Work Research, Sage, London.
Strauss, A. & Corbin, J. (1999), Basics of Qualitative Research, Sage, London.
Richardson, J.T. (ed.) (1996), Handbook of Qualitative Research Methods for Psychology and Social
Science, BPS Books, Leicester, UK.
Introduction
The word ethnography literally means ‘writing about people’. In a broad sense,
ethnography encompasses any study of a group of people for the purpose of describing
their socio-cultural activities and patterns.
In early anthropological studies, ethnographers (for example, Malinowski 1922),
working through in-culture informants, gathered data first-hand about the ways in which
members of a group ordered their life by means of social custom, ritual and belief. By
compiling and organising this information, ethnographers constructed pictures of that
group’s cultural and perceptual world. In ethnography, people are not subjects; they are
experts on what the ethnographer wants to find out about. Over time, a greater range
of theory and method for ethnographic fieldwork has developed, involving concepts
and approaches suitable for describing such social subgroups as motorcycle gangs and
juvenile delinquents, social situations such as classrooms and courtrooms, and open
public scenes such as street corners and hospital wards.
Ethnography essentially involves descriptive data collection as the basis for
interpretation. It represents a dynamic ‘picture’ of the way of life of some interacting
social group. As a process, it is the science of cultural description. Ethnography is a
relevant method for evaluating school life, hospital life, prison life, etc., since these
contexts are essentially cultural entities.
Typical concerns have been the development of pupil identities, teachers’ perceptions
of pupils and their abilities, the ‘management’ of classroom knowledge, pupils’
definitions of school subjects, prisoner—guard relationships, sick role behaviour, and so
on. (Examples of such work are collected in Stubbs and Delamont (1976) and Woods
and Hammersley (1977).)
An ethnographic approach, for example, to the everyday tasks of teaching and
curriculum planning, whatever the professional area, does not define curriculum simply
as ‘a relationship between a set of ends and a set of means’, as a statement of intended
learning goals together with methods for goal achievement. We can view a curriculum
as a process in which there is constant interpretation and negotiation going on among
and between academic staff and students. In this sense, a curriculum is the everyday
activities in the classroom. The conceptual and methodological tools of ethnography
get at this aspect of curriculum planning and teaching.
Ethnography accepts that human behaviour occurs within a context. A classroom
never stands in isolation from larger cultural and social landscapes, such as local and
national, political or economic processes and values. Educational activities take place
against a background of premises, interests and values concerning what it means to be
a student or teacher, and what constitutes worthwhile knowledge and learning. These
features are implicit in the choices made, and in justifications given by participants. In
other words, academic tasks are accomplished with prior presuppositions, beliefs, and
anticipations. Inevitably, these perspectives are shaped within larger social and political
contexts. These relationships need to be examined as part of the classroom. Ethnography
takes this larger context into account.
Ethnography can be a useful way of providing descriptions of what actually happens
in a school district, health authority, etc. These descriptions may help administrators
ensure that policy development is based on, and directed to, the actual situation rather
than to an ideal or imaginary situation. For example, administrators developing policy
concerning ‘rowdiness’ in schools should have some idea about the extent and varying
interpretations of ‘rowdiness’ already occurring. To develop a policy dealing with ‘rowdy
acts’, descriptions of various contexts of such acts are needed. Players in the context of
a game in the gymnasium give particular meanings to ‘rowdiness’ recognised by all
participants as a part of physical education; on the other hand, similar relations in the
school cafeteria would incur the wrath of the administrators. Social—cultural factors of
an educational institution, time ofday, relationship to holidays and community events,
or final examinations, are just a few of the factors which may give meaning and
situational legitimacy to the notion of rowdiness. To make sense at all, policy has to take
into account the contextualness of this student behaviour. Even more realistic examples
of policy development could be given in the areas of report cards and reporting
procedures, student attendance, community use of facilities, vandalism, and many other
school-related issues which may arise. These kinds of policy formulations benefit from
(and even require) explicit descriptions of the situationally defined rules, expectations,
intents, perceptions and interpretations held and utilised by participants. Without an
understanding of these contextual factors, policy makers misunderstand and distort the
issues they purport to redress. Ethnographic methods can be one way to help supply the
needed descriptive basis for policy development and its implementation in the changing
world of education, social work, health, etc.
Process
Ethnographers argue that meanings and interpretations are not fixed entities. They are
generated through social interaction and may change over the course of interaction.
Actors’ identities are also subject to processes of ‘becoming’, rather than being fixed and
static. No single meaning or identity is assumed; there are multiple and competing
definitions current in almost every social situation. The metaphor of negotiation is often
used to capture the processes of interaction whereby social meanings are generated, and
a precarious social order is produced.
Naturalism
Ethnographers recognise that the things people say and do depend on the social context in
which they find themselves. They urge, therefore, that social life be studied as it occurs, in
natural settings rather than ‘artificial’ ones created only for the purposes of the research.
Furthermore, they do not seek to manipulate and control what goes on in these settings, but
rather to minimise their own impact on events so as to be able, as far as possible, to observe
social processes as they occur naturally without the intervention of researchers. Their aim is
thereby to maximise the ecological validity of their findings. Just as Lorenz swims with his
goslings, or Schaller lives with mountain gorillas, so ethnologists live the lives of the people
they study. To expect more from the ethnological study of teachers and children for a lesser
effort seems naive indeed.
Holism
Those working in the ethnographic tradition also stress the need to see social life within
the general context of a culture, subculture or organisation as a whole. The actions of
individuals are motivated by events within the larger whole and thus cannot be
understood apart from it.
The ethnographer must be aware of the classroom setting within a wider context: the
surrounding vicinity, the milieu of the values and beliefs, the larger social environment. A
school is a reflection in some way of the neighbourhood in which student and teacher live,
Multiple perspectives
Early anthropological ethnographers argued that ‘savages’ were not ‘superstitious’ or
‘mentally inferior’ to western observers, but rather employed different, equally rational
‘world-views’. Contemporary ethnographic approaches take a similar view and, rather
than imposing their own modes of rationality on those they study, attempt to
comprehend social action in terms of the actors’ own terms of reference. As a result,
they are well suited to the detection of ‘unofficial’ versions ofsocial reality. What people
do and what they ought to do are very often different. Because of this, there is frequently
a discrepancy between what people do and what they say they do. Therefore, one must
look beyond the ‘public’ and ‘official’ versions of reality in order to examine the
unacknowledged or tacit understandings as well.
Thus, studies in education have highlighted ‘unofficial’ perspectives in a number of
contexts; for example, by drawing attention to the ‘hidden curriculum’. This is the set
ofimplicit messages and learning that go on in addition to, and sometimes in opposition
to, the ‘official’ curriculum. A classic example ofsuch an approach was undertaken by
the authors of Boys in White: Student Culture in Medical School (Becker, Geer, Hughes
& Strauss, 1961), in which the authors documented in considerable detail the hidden
curriculum of medical education at Kansas University.
Individuals have interpretations based on their experience from the unique vantage
point of their life and biography. These personal interpretations include the perceptions,
intentions, expectations and relevances through which each one of us makes sense of
things. Participants understand classrooms and programs in accord with their own
subjective interpretations. A teacher may interpret a classroom and program in terms of
formal training (for example, theories of learning, psychological constructs, instructional
methodologies), instructional goals, and beliefs about students and the subject matter.
On the other hand, each student may interpret the same class in terms of past school
experiences, immediate goals, beliefs about teachers and the subject matter, expectations
of education, purposes for attending school, and views of their own potential. Factors
such as these influence each participant’s experiences in that classroom.
The purposes for using ethnography are to uncover and describe group social relations
such as:
* the understandings (e.g. beliefs, perceptions, knowledge) which participants share
about their situation;
* the routine methods (e.g. social rules, expectations, patterns, roles) by which their
situation is structured; ;
* the legitimisations by which participants justify the normality and unquestioned
character of their situation; and
* the motives and interests (for example, purposes, goals, plans) through which
participants interpret their situation.
Multiple techniques
Ethnographic ‘fieldwork’ is not a homogeneous method, but involves a variety of
techniques of data collection. The most commonly employed approach is that of
participant observation, whereby the fieldworker directly observes, and to some extent
takes part in, everyday life in a chosen setting (a school, prison, bureaucracy, rural
community, adolescent gang, etc.).
Observations are recorded in the form of detailed fieldnotes, which may be made on
the spot and amplified subsequently, or written up as soon as possible after leaving the
field. In recent years, audio and videotape recordings have been increasingly used to
obtain permanent records of social interaction. In addition, the ethnographer may
engage in interviewing, collecting and analysing documentary material, and may also use
the techniques of survey research to supplement the field notes. All the material gathered
is reported, described and interpreted to form the ethnographic study.
Ethnographic fieldwork
Ethnography does not fit a linear model of research. Instead, the major tasks follow
a kind of cyclical pattern, repeated over and over again, as outlined in Figure 23.1.
Compare this figure with the quantitative linear sequence in Figure 4.1, p. 42.
Writing an
ethnography
report
Analysing
ethnographic
data
Selecting an
ethnographic
project
Making an Asking
ethnographic ethnographic
record questions
Collecting
ethnographic
data
Writing an ethnography
This last major task in the research cycle occurs toward the end of a research project.
However, it can also lead to new questions and more observations. Writing an
ethnography forces the investigator into a new and more intensive kind of analysis.
Those who begin their writing early, and when they still can make observations, will find
that writing becomes part of the research cycle.
Pre-entry
Ethnography is conducted in the context of the situation under study. Ethnographers
recognise the fundamental need to go where participants spend their time. Therefore,
preparation for entry must be carefully planned, as the entire study depends in large part on
the group’s acceptance of the researchers.
In organisations, official agencies and so on, there are individuals who, by virtue of
their office, have the authority to act as gatekeepers. They can grant or withhold formal
permission to enter and participate in the life of the organisation. Dealings with such
gatekeepers can therefore be an extremely important part of the design and conduct of
ethnography.
Gaining entry is best accomplished through a mutual contact who can recommend
the researcher to the gatekeeper. Gatekeepers are often wary of the proposed length and
intimate participation of the fieldworker; they normally seek reassurances that the
research will not prove unduly disruptive, i.e. that the fieldworker will not prevent people
from getting on with their normal work, force an entry into private meetings or
conversations, and so on. Gatekeepers need to be reassured that relations of trust and
confidentiality will not be abused; as in any variety of research, ethnographers have to
ensure the anonymity of the members concerned and that nothing will be made public
that is detrimental to individuals. These issues are particularly critical in the context of
ethnography, since the actions and beliefs of the actors will be documented in some
detail, and ethnographers are more likely than most other researchers to see ‘behind the
scenes’.
Gatekeepers often engage in impression management. Quite naturally, they do not
want to find the ethnographer producing an unflattering portrait of them and their
work or their organisation. Consciously or unconsciously, they will ‘put on a show,
attempting to influence the initial impressions the fieldworker receives. The fieldworker
must remain alert to such possibilities and record them systematically, since they
constitute valuable data in their own right—throwing light on the gatekeepers’
perceptions and preoccupations.
Most gatekeepers are not familiar with the ethnographic style ofresearch. If they have
any expectation ofsocial research, the survey will probably be their model. Fieldworkers
are often asked, therefore, to spell out their ‘hypotheses’ and to show their draft
questionnaires or interview schedules. Given a commitment to progressive focusing and
a flexible research strategy, it may be rather difficult to establish one’s legitimacy as a
social researcher.
It may be a good tactic for researchers and initial contact person(s) to form a
community steering committee of five or six. This provides ethnographers with several
research advantages. Members of this committee give the support and legitimacy needed
to help overcome possible suspicion when conducting a local study. In the eyes of
community members, the study may have greater relevance and acceptance if they
perceive the steering committee as having some control over what is done by the
outsiders, thereby making easier entry for negotiations with members of the group. By
working through a steering committee, researchers find it easier and faster to acquire
documents such as old newspapers, minutes of meetings, pupil absence figures, former
curricula, or even letters. If this committee is representative of wide school interests—
for example, parents, labour, administration—it becomes invaluable for validating data
collected by ethnographers; the members determine whether the ethnographer’s
descriptions actually reflect the situation. In this manner, a self-correcting procedure
(triangulation, p. 419) is built into data collection and interpretation.
Costing the study is also an important pre-entry task. Ethnographic methods require
considerable time to use, as does the analysis of data and writing ofdescriptions.
A school classroom is a familiar setting for research and potentially requires negotiation
with many gatekeepers. List the various persons with whom a researcher might have to
negotiate in order to gain access.
During the first few days of entry, observers should be somewhat unobtrusive and
learn how to act and behave in the setting. Collecting data is secondary to becoming
familiar with the setting and putting ‘locals’ at their ease. Rapport can be helped by
fitting in with established routines, showing interest in all that goes on, and being honest
about what you are doing.
No less important is the time of entry. Just about every group, organisation, social
movement (or whatever) has its own rhythms, timetables and calendars. It may therefore
be important to plan the timing of the research to take account of such timetables (for
example, the cycle of the school year). Particular periods of the school calendar can have
Sampling
Even when settings have been specified, it is usually necessary to be more selective still.
Even in fairly circumscribed social settings, there will be too much going on for it all to
be observed equally. As with other styles of research, therefore, samples must be drawn
for detailed investigation and recording.
The highly rational prescriptive procedures of experimental statisticians are seldom
truly applicable in sampling informants or events for participant observation studies.
Rather, the ‘most common’ data referents are the units themselves—the ‘persons, acts
(or events) and time’ that serve as representative dimensions of the study.
Make a list of some specific educational situations in which one could engage in
participant observation.
Observation
Most well-documented observation studies have involved the researcher spending many
months, or even a year or so, immersed in a community or group, and becoming generally
accepted as one of the group. Most of these studies start off largely unstructured, as the
researcher has little idea about what it is they precisely want to observe, or what might go
on. There are no initial checklists, simply observation of events, situations and behaviours,
which are then written up and gradually, as more data accumulates, tentative guiding
hypotheses, categorisation, conceptual frameworks and some theoretical underpinning
coalesce to give some body, focus and direction to later stages.
Observing groups
A great number of observation schedules have been produced for observing groups.
Many have been developed out of the interaction process analysis approach of Bales
(1950). This is a way of coding the individual’s behaviour within a group context under
twelve headings, sufficiently comprehensive to cover most behaviour exhibited in groups.
One of the most frequently used systems in education has been that of Flanders (1970)
which was derived from the original Bales’ method. Flanders established ten categories
of teacher/pupil behaviour which the observer could use to categorise and record
classroom behaviour. Observers are required to record the behaviour every three seconds
by entering an appropriate number on a prepared chart. This proved to be very
demanding, as many of the categories had subsections. Moreover, making a judgement
every three seconds proves too exacting for most observers, even after memorising the
categories. Hardly have you put in a tally mark for a particular category, than another
value judgement is immediately required for a sample of behaviour you have just glanced
at, and so on—a veritable treadmill.
1 2 3 4 5 6 Totals
Ss
Secretary VAISS VS 6
anes
José vy Vi 4 3
Sasha af 7 2
Kim i v v v 4
al
Mary V/ J 3
Pat v v 2
lz | ]
Leon JJ SJ | v 5
Totals 7 4 S 8 5 8
FIGURE 23.3 Seating plan recording individual verbal behaviour over first five
minutes ofdiscussion. The category numbers refer to the behaviour categories listed on
p. 409.
Ss
Sasha Pat José
1
C4 > (pes Chairperson
1
oO wo
coll
an
Secretary
Kim Mary
Interviews
Many fieldworkers complement data from participant observation with information
taken from interviews. In the course of an interview the researcher can, among other
things, investigate in more detail an informant’s typifications ofpersons and events (‘Is
he usually like that . . .?’, “Would you say that was typical . . .?’, ‘Could you give me an
outline of a typical . . .?’). Informants may be asked to reflect and comment on events
that have already been observed directly by the ethnographer. In addition though, they
may be used to gain information about events which occurred in this setting before the
ethnographer arrived, and events within the setting to which the ethnographer does not
Survey techniques
Some fieldworkers employ survey techniques in the course of ethnographic projects, to
gather background data on populations or samples under investigation, or to try to assess
the generality of observations made in a limited range of situations. Such an approach
assumes that a survey can be used to ‘check’ the representativeness of the ethnographic
data, and hence the generality of the interpretations. (Chapters 29 and 30 consider
survey methods in more detail.)
In ethnographic investigations, surveys are based on information first gathered
through the preceding less formal and more unstructured methods. Once this
background work has been done, construction of survey instruments can begin. These
generally take the form of confirmation instruments.
Advantages of observation
The implicit assumption behind observation is that behaviour is purposive and expressive
of deeper values and beliefs. Perhaps the greatest asset of observational techniques is that
they make it possible to record behaviour as it occurs. All too many research techniques
depend entirely on people’s retrospective or anticipatory reports of their own behaviour.
Such reports are, as a rule, made in a detached mood, in which the respondent is
somewhat remote from the stresses and strains peculiar to the research situation. The
degree to which one can predict behaviour from statistical data is at best limited, and the
gap between the two can be quite large. In contrast, observational techniques yield data
that pertain directly to typical behavioural situations—assuming, of course, that they are
applied to such situations. Where the researcher has reason to believe that such factors as
detachment or distortions in recall may significantly affect the data, they will always prefer
observational methods. Sometimes a study demands that what people actually do and say
be compared with their account of what they did and said. Obviously, two methods of
collecting data must be employed in such inquiries—observation and interviewing.
Moreover, some investigations deal with subjects (for example, infants) who are not
able to give verbal reports of either their behaviour or their feelings, for the simple reason
that they cannot speak. Such investigations necessarily use observation as their method
of data collection. Spitz and Wolf (1946), through the observation of behaviour of babies
in a nursery, were led to the conclusion that prolonged separation of a child from a
previously attentive mother may lead to a severe depression.
In addition to its independence of a subject’s ability to report, observation is also
independent of the subject’s willingness to report. There are occasions when research
meets with resistance from the person or group being studied. Teachers may not have
the time, or they may not be inclined, to be interviewed; pupils may resent being singled
Limitations of observation
On the other hand, observation has its limitations. We have listed as an asset the possibility
of recording events simultaneously with their spontaneous occurrence. The other side of the
coin is that it is often impossible to predict the spontaneous occurrence of an event precisely
enough to enable us to be present to observe it; for example, incidents of aggressive behaviour
in the classroom.
One prevalent notion about a limitation of observational techniques, however—the
idea that observational data cannot be quantified—is a misconception. Historically,
observational data have, it is true, most frequently been presented without any attempt
at quantification. This is not to imply that all observational data must be quantified, but
it is important to note that they can be, for example into categories for chi-square analysis.
Whatever the purpose of the study, four broad questions confront the observer:
What should be observed?
How should observations be recorded?
What procedures should be used to try to assure the accuracy of observation?
NFWhat
GO
Hr relationship should exist between the observer and the observed, and how can
such a relationship be established?
1 Observe a young child and in long hand record everything you observe in his/her
behaviour during a 5-10 minute period. Choose a time when the youngster is
playing to simplify the observations. When you have done this, answer the following
questions:
a To what extent do you feel you were able to record all of the behaviour? What
might you have missed?
b Do you feel you were biased towards observing only certain features?
c Did you concentrate on motor activity or verbal activity?
d What did you learn about the child’s behaviour that you did not know before?
e Do you feel that observing the child altered his/her behaviour in any way? How?
Could you have avoided it?
f Did you interpret his/her behaviour from your point of view?
2 Repeat the above exercise with a different child. Use two observers simultaneously.
Compare the two records at the end.
a To what extent were the two observers looking at the same type of behaviour?
Was one observer recording more general behaviour than the other?
How might we increase inter-observer agreement?
c Was there any behaviour which was interpreted differently by each observer?
Emerging patterns
Once a researcher has established the categories within which the data are organised and
has sorted all bits of data into relevant categories, the ethnography, as a portrayal of a
complex whole phenomenon, begins to emerge. The process is analogous to assembling
a jigsaw puzzle. The edge pieces are located first and assembled to provide a frame of
reference. Then attention is devoted to those more striking aspects of the puzzle picture
that can be identified readily from the mass of puzzle pieces and assembled separately.
Next, the puzzle worker places the assembled parts in their general position within the
frame and, finally, locates and adds the connecting pieces until no holes remain. Thus,
analysis can be viewed as a staged process by which a whole phenomenon is divided into
its components and then reassembled under various new rubrics. The creativity of
ethnographic analysis, however, lies in the uniqueness of the data (or parts), and in the
singularity of reconstructed cultures (or pictures).
Problems of interpretation
Qualitative researchers, whether in the tradition of sociology or anthropology, have wrestled
over the years with charges that it is too easy for the prejudices and attitudes of the researcher
to bias the data. Particularly when the data must go through the researcher’s mind before it
is put on paper, the worry about subjectivity arises. Does perhaps the observer record only
what they want to see rather than what is actually there? Qualitative researchers are concerned
with the effect their own subjectivity may have on the data they produce. Is a pupil climbing
a tree in the playground adventurous/foolhardy/naughty, etc.?
How do you make interpretations about the emotional states of the persons you
observe? If you say the child was angry/pleased/contented/anxious, then you are making
inferences which are of a different order than those based on sex and age, or on role
relationships. We say they are of a different level because they are much less closely tied
to what you actually observe, and there is more room for alternative explanations.
Suppose, for instance, you said that the child was happy because he or she was smiling.
What do you regard as the relative merits of participant and non-participant observation?
Reliability
Reliability is based on two assumptions. The first is that the study can be repeated. Other
researchers must be able to replicate the steps of the original research, employing the same
categories of the study, the same procedures, the same criteria of correctness and the same
perspectives. But because ethnographic research occurs in natural settings and often is
undertaken to record processes of change, it is especially vulnerable to replication
difficulties. A study of a racial incident at an urban secondary school, for example, cannot
be replicated exactly because the event cannot be reproduced. Problems of uniqueness and
idiosyncracy can lead to the claim that no ethnographic study can be assessed for reliability.
The second assumption is that two or more people can have similar interpretations by
using these categories and procedures. However, in ethnographic research, it is difficult for
an ethnographer to replicate the findings of another, because the flow of information is
dependent on the social role held within the group studied and the knowledge deemed
appropriate for incumbents of that role to possess. Thus, conclusions reached by
ethnographers are qualified by the social roles which investigators hold within the research
site. Other researchers will fail to obtain comparable findings unless they develop
corresponding social positions or have research partners who can do so.
Crucial also to reliability is inter-rater or inter-observer reliability, or the extent to which
the sets of meanings held by multiple observers are sufficiently congruent that they describe
and arrive at inferences about phenomena in the same way.
This is a key concern to most ethnographers. Of necessity, a given research site may
admit only one or a few observers. Without the corroboration of other observers, such
investigations may be seen as idiosyncratic, lacking a careful and systematic recording of
phenomena.
Validity
Establishing validity necessitates demonstration that the propositions generated, refined or
tested, match the causal conditions that exist in human life. The issues involved in matching
scientific explanations of the world with its actual conditions resolve into two questions.
First, do scientific researchers actually observe or measure what they think they are
observing and measuring? This is the problem of internal validity. Solving it credibly is
considered to be a fundamental requirement for any research design.
Second, to what extent are the abstract constructs and postulates generated, refined or tested
by scientific researchers applicable across groups? This addresses the issue of external validity.
The claim of ethnography to high internal validity derives from the data collection and
analysis techniques used by ethnographers. First, the ethnographer’s common practice of
living among participants and collecting data for long periods provides opportunities for
continual data analysis and comparison to refine constructs and to ensure the match between
scientific categories and participant reality. Second, informant interviews, a major
ethnographic data source, necessarily must be phased close to the empirical categories or
participants, and are less abstract than many instruments used in other research designs.
Third, participant observation—the ethnographer’s second key source of data—is con-
ducted in natural settings that reflect the reality of the life experiences of participants more
accurately than do more contrived or laboratory settings. Finally, ethnographic analysis
Triangulation
A commonly used technique to improve the internal validity is triangulation.
Triangulation may be defined as the use of two or more methods of data collection in the
study of some aspect ofhuman behaviour. In its original and literal sense, triangulation is
a technique of physical measurement—maritime navigators, military strategies and
surveyors, for example, use (or used to use) several locational markers in their endeavours
to pinpoint a single spot. By analogy, triangular techniques in the social sciences attempt
to map out, or explain more fully, the richness and complexity of human behaviour by
studying it from more than one standpoint and/or using a variety of methods, even
combining qualitative and quantitative methods in some cases.
Exclusive reliance on one method may bias or distort the researcher’s picture of the
particular slice of reality being investigated. The researcher needs to be confident that the
data generated are not simply artefacts of one specific method of collection. Where triangu-
lation is used in interpretive research to investigate different actors’ viewpoints, the same
method—for example, observation—will naturally produce different sets of data. Further,
the more the methods contrast with each other, the greater the researcher’s confidence. If,
for example, the outcomes of a questionnaire survey correspond to those of an observational
study of the same phenomena, the researcher will be more confident about the findings.
A more complex triangulation in a classroom ethnography study may involve teachers’
ratings of pupils, school records, psychometric data, sociometric data, case studies,
questionnaires and observation. Triangulation prevents the investigator from accepting
too readily the validity of initial impressions.
Triangulation contributes to verification and validation of qualitative analysis by:
¢ checking out the consistency of findings generated by different data-collection
methods; and
¢ checking out the consistency of different data sources within the same method.
External validity
External validity depends on the identification and description of those characteristics
of phenomena salient for comparison with other similar types. Once the typicality or
atypicality of a phenomenon is established, bases for comparison then may be assumed, and
results may be translated for applicability across sites and disciplines.
A study is of little use to other researchers if its theoretical basis, or the constructs around
which it is organised, are so idiosyncratic that they are understood only by the person who
executed the study. The lack of comparability and translatability reduces the usefulness of
a study to interesting but unscientific reading. Ethnographic studies are generally case
studies from a single setting and it is difficult to translate them to other similar settings.
Ethnography is the study of people in their context. It is aimed at understanding behaviour from the
perspective of the participants, and to capture social reality through fieldwork in natural settings.
Ethnographic research is cyclic, as incoming data raises more questions and emerging hypotheses.
Observation and interviewing are the main data gathering techniques. Ethnographic research
cannot employ the conventional judgements of reliability and validity. Replication is impossible
given the subjective and once-only nature of the data. Generalisation is not feasible as statistical
sampling is not involved. Triangulation is the major way in which validity can be assessed in
ethnographic studies.
References
Argyle, M. (1969), Social Interaction, Methuen, London.
Bales, R.F. (1950), Interaction Process Analysis, Addison-Wesley, Reading, Massachusetts.
Becker, H.S., Geer B., Hughes, E.C. & Strauss, A.L. (1961), Boys in White: Student Culture in Medical
School, University of Chicago Press, Chicago.
Ekman, P. (1982), Emotion in the Human Face, Cambridge University Press, New York.
Flanders, N. (1970), Analysing Teaching Behaviour, Addison-Wesley, Reading, Massachusetts.
Glaser, B.G. & Strauss, A. (1967), The Discovery of Grounded Theory, Aldine, Chicago.
Hamilton, D. (1976), Curriculum Evaluation, Open Books, London.
Hargreaves, D. (1967), Social Relations in a Secondary School, Routledge and Kegan Paul, London.
Further reading
Adelman, C. (1981), Uttering, Muttering: Collecting, Using and Reporting Talk for Social and Educational
Research, London, Grant McIntyre.
Bailey, K. (1989), Methods ofSocial Research, Free Press, New York.
Blaikie, N. (1988), Triangulation in Social Research: Origins, Use and Problem, Paper presented at the
Conference of the Sociological Association of Australia and New Zealand, Canberra.
Bliss, J., Monk, M. & Ogbom, J. (1983), Qualitative Data Analysis for Educational Research, Routledge,
London.
Burgess, R. (ed.) (1985), Field Methods in the Study ofEducation, Falmer Press, London.
Cobb, A. & Hagemaster, J. (1987), “Ten criteria for evaluating qualitative research proposals,’ Journal of
Nursing Education, 26, pp. 138-43.
Dillon, D.R. (1989), ‘Showing them that I want to learn and that I care about who they are. A micro
ethnography of an English-reading classroom, American Education Research Journal 26, pp. 227-59.
Fetterman, D. (1998), Ethnography: Step by Step, Sage, London.
Glesne, C. & Peshkin, A. (1992), Becoming Qualitative Researchers, Longman, NY.
Goetz, J.P. & Le Compte, M.D. (1984), Ethnography and Qualitative Design in Educational Research,
Academic Press, Orlando.
Grills, S. (ed.) (1998), Doing Ethnographic Research, Sage, London.
Hammersley, M. (1991), Whats Wrong with Ethnography, Routledge, London.
Henwood, K. & Pidgeon, N. (1995), ‘Grounded theory and psychological research’, The Psychologist,
March, pp. 115-18.
Jorgensen, D. (1989), Participant Observation: A Methodology for Human Studies, Sage, Newbury Park.
Kirk, J. & Miller, M. (1986), Reliability and Validity in Qualitative Research, Sage, Beverly Hills.
LeCompte, M.D. & Pressle, S. (1993), Ethnography and Qualitative Design in Educational Research,
Academic Press, New York.
Lincoln, Y.S. & Guba, E.G. (1985), Naturalistic Inquiry, Sage, Beverly Hills.
Miles, M.B. & Huberman, A. (1984), Qualitative Data Analysis: A Sourcebook of New Methods, Sage,
Beverly Hills.
Potter, J. & Wetherell, M. (1987), Discourse and Social Psychology, Sage, London.
Rossman, G.B. & Wilson, B.L. (1985), ‘Numbers and words: Combining quantitative and qualitative
methods in a single large-scale evaluation study’, Evaluation Review, 9(5), pp. 627-43.
Schonsul, J. & Le Compte, D. (eds) (1999), The Ethnographer’s Toolkit, 7 vols, Sage, London.
Stringer, E., Agnello, M.F. & Conant-Baldwin, S. (eds) (1997), Community Based Ethnography, Laurence
Erlbaum, New York.
Wolcott, H. (1986), [nside Schools: Ethnography in Educational Research, Routledge & Kegan Paul, New
York.
Wolcott, H. (1990), Writing up Qualitative Research, Sage, Newbury Park.
y Semi-structured interviewing
This has been used either as part of a structured interview or an unstructured interview,
as investigators from both persuasions feel that this may help their study. Rather than
having a specific interview schedule or none at all, an interview guide may be developed
for some parts of the study in which, without fixed wording or fixed ordering of
questions, a direction is given to the interview so that the content focuses on the crucial
issues of the study. This permits greater flexibility than the close-ended type and permits
a more valid response from the informant’s perception of reality. However, the
comparability of the information between informants is difficult to assess and response
coding difficulties will arise.
It is the making public of private interpretations of reality.
According to Taylor and Bogdan, open-ended or in-depth interviews are:
... repeated face-to-face encounters between the researcher and informants directed towards
understanding informants’ perspectives on their lives, experiences or situations as expressed
in their own words (1984, p. 77).
* with the contacts being repeated, there is a greater length of time spent with the
informant, which increases rapport;
¢ the informant’s perspective is provided rather than the perspective of the researcher
being imposed;
* the informant uses language natural to them rather than trying to understand and fit
into the concepts of the study;
* the informant has equal status to the researcher in the dialogue rather than being a
guinea pig.
The rationale behind open-ended interviewing is that the only person who
understands the social reality in which they live is the person themself. No structure
imposed by the interviewer will encapsulate all the subtleties and personal interpretations.
At the end of the academic year a student reports sick to the university’s health service
complaining of headaches, fatigue, stomach upset. The doctor imposes an often-used
structure on this and informs the student that they have a virus infection which will
clear up in a few days’ time. If the doctor had used a more open-ended approach and
encouraged the student to talk, the dialogue would provide revelation of worry over
exam failure, loss of esteem feelings, difficulties of facing parents and friends with the
results, deciding whether or not to drop out, etc. The apparent lack of structure to the
dialogue will provide a window into the routinely constructed interpretations and
habitual responses of each individual. Open-ended interviewing depends on verbal
accounts.
Explain when you would use open-ended interviewing rather than structured
interviewing.
Questioning techniques
The techniques which counsellors—particularly non-directive counsellors—use in their
counselling sessions are equally valuable to open-ended non-directive interviewers.
In non-directive counselling the counsellor makes considerable use of parroting
(mirroring) and minimal encouragers to keep the informant conversing. These must be
used effectively too by the open-ended interviewer. ‘Parroting’ or ‘mirroring’ is repeating
back to the informant the last few words they said, or the gist of what they said, e.g. “You
were late’. When the mirroring involves feeling, it is often termed ‘reflecting’, e.g. “You
feel unhappy with the support you get from your principal’. Accurate mirroring shows
the informant that you are listening and understanding, encouraging them to continue.
Minimal encouragers are single words or short phrases that encourage or reinforce the
informant, e.g. ‘I see’, “Go on’, ‘Can you tell me more?’, ‘Yes’, ‘Hmm’, ‘What happened
next?’. Parroting and minimal encouragers combined with such non-verbal
communication techniques as eye contact and head nods will ensure that the informant
continues to speak in what they perceive as a warm, accepting interpersonal context.
Devise some minimal encouragers you could use as part of your interviewing technique.
Listening skills
These are better thought of as attending skills and involve the same qualities and skills
as those required by good counsellors. Only by displaying empathy and acceptance,
conveying respect and creating an ethos of trust will the interviewee be able to enter
into a valid relationship with you, in which they are willing to convey their real feelings,
thoughts and emotions. In attending to an interviewee you must be an active listener,
look interested, be sensitive to verbal and non-verbal cues, using as it were a third ear.
Remember, the interviewee is noting your verbal and non-verbal signals and building up
a picture of how open, genuine, interested and encouraging you are.
Words are not the whole message. Listening is not simply hearing, for we can hear
without listening. You must attend to the content of the words and the feeling behind
them. You can ask for clarification, you can summarise and check out inconsistencies so
you are sure you are picking up the message accurately. Never jump to conclusions, and
avoid personal prejudices and blocks that hinder understanding of what the other person
is really saying. Take time to listen and give time for the other person to finish.
Non-verbal communication
We communicate with our whole bodies, not just with our tongues. Actions, gestures,
facial expressions, body movements and body positions—all speak louder than words.
Interviewees often don’t realise they are communicating non-verbally, confirming,
emphasising, or even contradicting the verbal message. Learn about some of the more
significant non-verbal cues and how these vary from culture to culture. Non-verbal
signals are an important part of the interview data.
A pilot study
A pilot study can test many aspects of your proposed study. It does so under circumstances
that do not count, so that when they do count you have more faith in what you are doing.
A cover story
A cover story is the initial verbal or written presentation of yourself to the gatekeeper and
others who will be involved in your research. It does more than simply say who you are
and what your study is about; it also prepares others to take part more effectively. It must
cover what you will do with the results, their confidentiality, how you will record the
data, how long a session will last, how many sessions, reassurance that there are no right
answers, and that your role is not judgemental or evaluative, but understanding.
Solicited narrative
Here the researcher obtains a written account of the story and uses this as a source of
discussion points in follow-up sessions. This technique is often used in life history and
diary interview methods.
Fieldnotes
Many ethnographic and open-ended interview research projects can generate over 1000 pages
of fieldnotes which need to be analysed. Fieldnotes usually cannot be coded into numerical
data and usually are transcribed, category-coded and filed. The purpose of the coding and
filing is to enable the investigator to sort and organise the obtained information into patterns
and themes.
Fieldnote data will not only include records of conversations, but also details ofsetting
and investigator's impressions/observations. The fieldnotes will additionally include the
investigator's reflections on the conversation and setting. Many interviewers like to keep
the descriptive content and reflexive parts separate, as they derive from different persons.
Fieldnotes should be written up as soon as possible, and note-taking must be
considered compulsory. Like all other types of research, it involves hard work, time and
discipline.
During the first days ofa research project the investigator will take down everything;
as the project becomes refined and focused, the notes will be more selective. Notes
should concentrate on answering whol what! wherel when! how!why questions.
The fieldnotes really separate into three files. The transcript
file contains the records
of the interviews (see Figure 24.1). The transcript file should have a large margin on the
right in which comments can be placed, and on the left, numbers alongside, to locate
the conversation on the tape.
The personal file holds the reflections of the interviewer and a description of the
setting. All your thoughts and impressions should be included in a frank way. It should
read as you think—forget the grammar. The personal file should also contain full details
of how you gained permission, how you maintained relations, and how you left the field
and the success of these strategies. Comments on methodological problems are relevant
in the personal file.
The third file is the analytic file which identifies and discusses the conceptual issues
and emergent themes. It is usually organised around topic areas and is the basis for the
analysis of the data.
Coding
So the first stage in analysing the interview data is coding, i.e. classifying material into
themes, issues, topics, concepts, propositions. Coding cannot be done overnight. Many
interviewers re-read their notes many times before they can begin to grasp the major
themes. Some of this coding may begin while the data is still being collected, as particular
issues are raised consistently across interviews. This early coding assists the interviewer
to focus on essential features of the project as they develop.
This is part and parcel of the analytic induction method where the general statement
about the topic is constantly refined, expanded and modified as further data are obtained.
For example, you may be studying why some tertiary students apply to live in university
halls/colleges. After interviewing the first six students you develop some tentative
propositions about why they make this choice. In subsequent interviews you will tend
to refine your questions along the line of these propositions. Further propositions will
emerge as more interviews are conducted. Later interviews will test the validity of the
propositions.
Woods (1976) used open-ended interviews to explore pupils’ experiences of school
and found that ‘having a laugh’ was a recurrent theme. Woods then set about classifying
types oflaughter in the classroom and their functions.
Content analysis
Content analysis is used to identify themes, concepts and meaning. It is a form of classifying
content. These elements can be counted in numerical terms as well as examined for meaning.
But when looking at the latter there is the problem of hidden meaning, of reading between
the lines, and we will never know whether our reading between the lines is what the informant
was meaning. Each interview is analysed for themes/topics. As the research focus becomes
Stages in coding
1 The first stage in coding is to develop a list of coding categories. Then a short name
is assigned to each and a number to each subcategory. For example, classroom
activities (CA in short) may be a category, while teaching (CAI), marking (CA2) and
administration (CA3) are sub-categories.
2 In the margin of the transcript file, the data can now be coded by the appropriate
code, e.g. CA2, as the file is read. The code may refer to a phrase, a sentence or a
paragraph. On occasions, there may be a double reference in the verbal unit and in
this case it is double-coded. ;
3 After codes have been allocated to the text in the transcript file, data coded to each
category needs to be collected together. Here you can use either index cards on which
you paste cut-up sections of the text, or place the cuttings into manila folders. The
former method is the best. You must photocopy your transcript file or ensure it is held
ona PC so you do not lose your only copy when you cut and paste.
ATLAS/ti is extremely useful for text interpretation, text management and theory
building. It could be the preferred program if a researcher wishes to construct linkages
between any elements of the qualitative database—for example, text segments and
memos. This is a very flexible tool for constructing any kind of network.
The design of the user interface is such that most of the analysis is conducted on-
screen. Consequently, a wide variety of functions to support this style of working is
offered. The program is especially useful for research groups whose members want to do
coding, memoing and theory building independently, but want to share their results.
This program was originally developed to support the analysis of open-ended questions
in survey questionnaires, where the method of case-oriented quantification is employed.
Since then, additional features have been added, for example, for the retrieval of co-
occurring text segments. All files used and created by the program that contain code or
case information are stored in standard dBase format. This offers the competent user
extensive possibilities for using other software tools in order to modify these files, or to
subject them to a different type of analysis from that offered by the program.
All files (texts, codes, numerical and other case-oriented data) are saved in dBase
format and can therefore easily be exported to statistical programs such as SPSS and
SAS and re-imported to MAx after modification. SPSS files can also be directly created
for exporting numerical data. Documents can be divided into paragraphs, permitting a
structuring of texts such as open-ended questions. Code and word frequencies can be
calculated.
Code-a-Text
Dr A. Cartwright, Centre for the Study of Psychotherapy, University of Kent,
Canterbury, UK
Originally designed to aid analysis of therapeutic conversations, it has now been applied
to other texts such as fieldnotes, and responses to open-ended questions.
QSR NUD*IST
Lyn Richards and Tom Richards. Qualitative Solutions & Research Pty. Ltd., Box 171,
La Trobe University Post Office, Victoria 3083, Australia
Distributed by Sage Publications Ltd., 6 Bonhill Street, London EC2A 4PU, UK
NUD*IST is a program for facilitating theory building by searching for words and
phrases and coding data. From the coding it will search for links among the codes and
build a hierarchical network of code patterns, categories and relationships in the original
data. It will code data in more than one way to provide multiple perspectives and enable
changes in codes to be effected as a deeper understanding of the data emerges. This
makes it a very useful tool in strategies of hypothesis development and grounded theory.
The user is invited (but not forced) to develop a hierarchical code structure that can
be represented graphically and can be used for multiple types of retrievals. Among the
most powerful retrieval functions are COLLECT, which allows for retrieving all
segments or memos attached to a code and all of its subcodes, and INHERIT, which
QUALPRO was originally a collection of routines for ordinary coding and retrieval that
could be executed via DOS and by using a simple command shell. This collection has
now been extended by the addition of functions for co-occurring code searches and
matrix displays. Algorithms for the calculation of interceder reliability and for computing
matrices displaying agreement and disagreement between coders, are unique features of
this program. This information can be used to improve the code definitions and
procedures, and hence the precision of coding. The program is particularly useful for
research groups concerned with the robustness of the coding scheme.
Text can be entered into the program directly or imported as an ASCH file. The
smallest coding unit is the text line. Up to 1000 codes can be attached to one document.
Selective retrievals of text segments are supported and memos can be recorded. Memos
can be linked to whole documents and text segments.
In every ordinary retrieval, the program can retrieve, together with the text, the line
numbers of overlapping presented segments coded with another code. All Boolean
operators can be applied in a search for text segments so that nested and overlapping text
segments can be retrieved, and also text segments to which a certain combination of
codes does or does not apply. Code frequencies can be calculated. The interceder
reliability can be determined too.
The ETHNOGRAPH v.4.0
John Seidel, Susanne Friese, D. Christopher Leonard
Qualis Research Associates, PO Box 2070, Amherst, MA 01004, USA
This was one of the earliest and most widely distributed programs in the field. The
strength of the program is its functions to assist researchers working in the tradition of
ethnography and interpretive sociology who are more concerned with the interpretive
analysis of texts than with theory building and hypothesis examination.
The software facilitates the management and analysis of text-based data, such as
transcripts of interviews, focus groups, fieldnotes, diaries, meetings and other documents.
Case summaries
Many researchers use case summaries as a means of analysing their data. For example, it
is useful to employ case studies of students to illustrate and support arguments in
investigating student behaviour or school processes. However, the basis on which case
studies are chosen is often obscure and may simply reflect a bias towards, or desire to
support, a particular theoretical rationale. Again, the use of index cards is advised. Each
card can contain information on the informant, including their categorisation on a
number of concept codes, a summary of the interview and analytic comments taken
from the analytic file. These case summaries not only clarify and sort information from
within a case but permit comparison across cases too.
Keep a record of all interviews with parents, their length, reason, content,
conclusion/result, etc.
Unstructured and semi-structured interviewing are the major tools of qualitative research. Their
advantage is that the informant’s perspectives are provided using language natural to them. This limits
the effect of the researcher’s preconceptions and biases and beliefs in directing the line of interviewing.
The interviewer requires listening skills and non-directive questioning techniques. Interview data requires
coding so that a content analysis can be used to identify themes, concepts and categories.
References
Burgess, R. (1984), /n the Field, Unwin, London.
Glaser, B.G. & Strauss, A.L. (1967), The Discovery of Grounded Theory, Aldine, Chicago.
Miles, M. & Huberman, M. (1984), Qualitative Data Analysis, Sage, Beverly Hills.
Taylor, S. & Bogdan, R. (1984), /ntroduction to Qualitative Research Methods, Wiley, New York.
Woods, P. (1976), “Having a Laugh’, in The Process ofSchooling, eds M. Hammersley & P. Woods,
Routledge and Kegan Paul, London.
Further reading
Berg, B. (1989), Qualitative Research Methods
for the Social Sciences, Allyn and Bacon, Boston.
Boyatis, R. (1998), Thematic Analysis and Code Development. Transforming Qualitative Information,
Sage, London.
Brooks, M. (1989), Instant Rapport, Warner Books, New York.
Cohen, A.K. (1955), Delinquent Boys, Free Press, New York.
Foddy, W.H. (1988), Open Versus Closed Questions: Really a Problem of Communication. Paper
presented to the Australian Bicentennial Meeting of Social Psychologists. Leura, New South Wales,
August.
Gerson, E. (1985), ‘Computing in qualitative sociology’, Qualitative Sociology, 7, pp. 194-8.
Gahan, C. & Hannibal, M. (1998), Doing Qualitative Research Using QSR NUD*IST, Sage; London.
Green, A. (1995), ‘Verbal protocol analysis’, The Psychologist, March, pp. 126-9.
Hargreaves, D.H., Hestor, S.K. & Mellor, F.J. (1975), Deviance in Classrooms, Routledge and Kegan
Paul, London.
Kelle, U. (ed.) (1995), Computer Aided Qualitative Data Analysis, Sage, London.
Lacey, C. (1970), Hightown Grammar, Manchester University Press, Manchester.
Minichiello, V. et al. (1990), In Depth Interviewing, Longman Cheshire, Melbourne.
Richards, L. & Richards, T. (1987), ‘Qualitative data analysis: Can computers do it?’ Australian and
New Zealand Journal of Sociology, 23, pp. 23-35.
What is action-research?
Action-research is the application offact-finding to practical problem-solving in a social
situation with a view to improving the quality ofaction within it, involving the
collaboration and cooperation of researchers, practitioners and laymen.
Kemmis and Grundy (1981) define action-research in education as:
A family of activities in curriculum development, professional development, school
improvement programmes, and systems planning and policy development. These activities
have in common the identification ofstrategies of planned action which are implemented, and
then systematically submitted to observation, reflection and change. Participants in the action
being considered are integrally involved in all of these activities.
It aims to improve practical judgement in concrete situations, and the validity of the
‘theories’ it generates depends not so much on ‘scientific’ tests of truth, as on their
usefulness in helping people to function more intelligently and skilfully. In action-
research, ‘theories’ are not validated independently and then applied to practice. They
are validated through practice. Action-research is a total process in which a ‘problem
situation’ is diagnosed, remedial action planned and implemented, and its effects monitored,
if improvements are to get underway. It is both an approach to problem-solving and a
problem-solving process. The development of action-research philosophy and method
has had a strong Australian input through the work of Kemmis and his colleagues at
Deakin University.
The focus in action-research is on a specific problem in a defined context, and not on
obtaining scientific knowledge that can be generalised. An on-the-spot procedure
designed to deal with a concrete problem, it is a logical extension of the child-centred
progressive approach. If children can benefit intellectually, socially and emotionally from
working together to solve problems in group activities, so too, it is argued, can their
teachers.
There are four basic characteristics of action-research:
1 Action-research is sitwational—diagnosing a problem in a specific context and
attempting to solve it in that context.
2 It is collaborative, with teams of researchers and practitioners working together.
It is participatory, as team members take part directly in implementing the research.
Oo
Hr It is self-evaluative—modifications are continuously evaluated within the ongoing
situation to improve practice.
Lewin challenged an orthodoxy about the role of the social scientist as the
disinterested, ‘objective’ observer of human affairs.
Lewin’s model is an excellent basis for starting to think about what action-research
involves. Two major stages can be identified:
1 diagnostic, in which problems are analysed and hypotheses developed; and
2 therapeutic, in which hypotheses are tested by a consciously directed change
experiment in a real social life situation.
Action-research usually commences with observations in the real world that raise such
questions as ‘Why don’t my everyday experiences in the classroom fit with theory?’,
‘Why hasn’t practice led to predicted results?’. To cope with these and other similar
situations, we tend to formulate our own intuitive implicit theories. These are really the
start of the qualitative research process. Personal theory helps to bring a problem into
view and leads into a more systematic approach to investigating the issue. For example,
the real-world intriguing observation that boys in my class try to avoid obtaining verbal
reinforcement from me by not answering questions leads to a personal theory that
boys don’t want to counter peer group norms. This leads into a tentative guiding
hypothesis for research purposes that suggests that the expectations of the boys’
subculture is stronger than teacher expectations in relation to aspects of classroom
behaviour.
aco<aoniFinSSOdinNCOg nti
Ideg
RY DIAGNOSTIC
SS Plan
es
>
o
2)
fe)
2
LQ
%<)
This cyclic model of problem identification, therapeutic action and evaluation can be
divided into seven substages.
Stage 1
This involves the identification, evaluation and formulation of the problem or general
idea perceived as critical in an everyday teaching situation. ‘Problem’ should be
interpreted in as wide a way as possible to prevent constraints too early about issues/ideas
creating too narrow a focus. Some examples of problems or general ideas are listed here.
¢ Students are dissatisfied with the methods by which they are assessed. How can we
collaborate to improve student assessment?
¢ Students seem to waste a lot of time in class. How can I increase the time students
spend ‘on-task’?
¢ Parents are fairly keen to help the school with the supervision of students’ homework.
How can we make their help more productive?
In other words, the ‘general idea’ refers to a state of affairs or situation a participant
wishes to change or improve on. The original problem may, in fact, change and be
revised in the cyclic process. For example, pupils’ dissatisfaction with the way they are
assessed may merely be a symptom of a more fundamental problem, which may ‘come
to light’ during the course of action-research; for example, the real purposes of
assessment. In which case, a teacher would want to undertake subsequent actions which
tackle that deeper problem, rather than merely treating the symptom. Goals are variables,
not constants. They change over time as a result of the development project itself.
Stage 2
This is the time for fact finding, so that a full description can be given of the situation.
For example, if the problem is ‘pupils wasting time in class’, one will want to know
things such as: Which pupils are wasting time? What are they doing when they are
wasting time? Are they wasting time doing similar or different things? What should they
be doing when they are wasting time? What are they doing when they are not wasting
time? Is there a particular point in the lesson, or time of day, or set of topics, where
pupils waste time the most? What are the different forms in which ‘wasting time’
manifests itself?
All these facts help to clarify the nature of the problem. The collection of this
information can provide a basis for classifying the relevant facts; for example, generating
categories for classifying the different kinds of ‘time-wasting’ which go on.
It can also lead to some fairly radical changes in one’s understanding of the original
idea. For example, one may decide, in the light of this exercise, that many of the things
thought to be ‘time-wasting’ are not, and that many of the things thought not to be
‘time-wasting’ in fact are.
Stage 3
This may involve a review of the research literature to find out what can be learned from
comparable studies, their objectives, procedures and problems encountered. All this is
related and synthesised with the critical review of the problem in Stage 2. Hypotheses
can now be formulated; for example, utterances such as ‘fine’, ‘interesting’, ‘correct’, by
teachers in response to ideas expressed by pupils can prevent the discussion of alternative
ideas, since pupils tend to interpret such feedback as attempts to legitimate the
development of some ideas rather than others. These hypotheses are general statements
which attempt to explain some of the facts of the problem.
Stage 4
Having, through ‘brainstorming’ around a problem, generated some hypotheses,
one can then proceed to gather information which is relevant to testing them. For
example, evidence can be gathered about the extent to which one uses terms such
as good, interesting, right; their effects on pupils’ classroom responses; and the ways
pupils interpret their use. The gathering of this evidence may also suggest further
explanations of the problem situation, which in turn leads to more gathering of
information, etc. This ‘testing’ of the hypothesis is not a statistical testing. It is
seeing whether the evidence is congruent with the hypothesis. Even when one has
tested hypotheses and found them to apply, they should retain the status of
‘hypotheses’ rather than ‘conclusions’, since one can always encounter instances
Stage 5
Before going into action, there is the need to decide on the selection of research procedures
such as choice of materials, resources, teaching method, allocation of tasks. Equally important
are discussion and negotiations among the interested parties—teachers, researchers, advisers,
sponsors. A teacher may need to negotiate some of the proposed actions with colleagues, or
a ‘superior. Their capacity to do their job properly could be influenced by the effects of the
proposed changes, or perhaps they will ‘carry the can for them’, or even intervene
unconstructively if not consulted. For example, a proposed change of syllabus might need to
be negotiated with the relevant head of department, departmental colleagues, the head
teacher, or even pupils and their parents.
As a general principle, the initial action-steps proposed should lie within areas where
the action-researchers have the maximum freedom of decision.
Stage 6
This stage involves the implementation of the action plan. Decisions about the
conditions and methods of data collection (for example, bi-weekly meetings, the keeping
of records, interim reports, final reports, the submission of self-evaluation and group-
evaluation reports, etc.); the monitoring of tasks and the transmission of feedback to the
research team; and the classification and analysis of data.
Even if the action step is implemented with relative ease, it may create troublesome
side-effects which require a shift into fact finding in order to understand how these arise.
And this in turn may require some modifications and changes to the ‘general plan’ and
a revamped action-step.
The choice of evaluative procedures is considered here too, in order to monitor the
implementation:
* One needs to use monitoring techniques which provide evidence of how well the
course of action is being implemented.
* One needs to use techniques which provide evidence of unintended as well as intended
effects.
* One needs to use a range oftechniques which will enable one to look at what is going
on from a variety of angles or points of view (triangulation).
Stage 7
This final stage involves the interpretation of the data and the overall evaluation of the
project, often by writing a case study. Ideally, case study reports should be written at the
end of each cycle, each building on and developing previous reports. At least one full
report should be written at the point where one decides to end a particular spiral of
action and research, and switch to a quite different issue or problem.
A case study report should adopt a historical format; telling the story as it has unfolded
over time, showing how events hang together. It should include (but not necessarily in
separate sections) accounts of the following:
¢ How one’s ‘general idea’ evolved over time.
¢ How one’s understanding of the problem situation evolved over time.
¢ What action-steps were undertaken in the light of one’s changing understanding of
the situation.
¢ The extent to which proposed actions were implemented, and how one coped with
the implementation problems.
¢ The intended and unintended effects of one’s actions, and explanations for why they
occurred.
¢ The techniques one selected to gather information about:
— the problem situation and its causes; and
— the actions one undertook and their effects.
¢ The problems one encountered in using certain techniques and how one ‘resolved’
them.
¢ Any ethical problems which arose in negotiating access to, and release of, information,
and how one tried to resolve them.
¢ Any problems which arose in negotiating action-steps with others, or in negotiating
the time, resources and cooperation one wanted during the course of the action-
research.
The mode of explanation in case study is naturalistic rather than formalistic.
Relationships are ‘illuminated’ by concrete description rather than by formal statements
of causal laws and statistical correlations. As action-research necessarily involves
participants in self-reflection about their situation—as active partners in the researcch—
accounts of dialogue with participants about the interpretations and explanations
emerging from the research should be an integral part of any action-research report.
Discussions of the findings will take place in the light of previously agreed evaluative
criteria. Errors, mistakes and problems will be considered. At this stage, the cycle is likely
to begin again, with the problem and action modified to meet the evaluation comments.
At the end of several cycles, outcomes of the project are reviewed, recommendations
made and arrangements for dissemination of results to interested parties decided.
Timing
With respect to classroom action-research, a teacher should decide exactly how much
time can be set aside for monitoring the next action-step, when, and its effects. It is no
good collecting more evidence than one can afford to ‘process’ and reflect about. And
it is no good ‘deciding’ to transcribe all recordings when one knows one hasn’t the time
to do it. So the number oflessons monitored and the techniques selected should all be
matched to a realistic estimate of available time. The ‘matching’ process is helped by
working out a timetable.
In schools, the fact that ‘terms’ are usually interspersed by vacations suggests that this
is a natural organisational unit of time in which to complete a ‘cycle’ of classroom action-
research activity. Iwould normally feel it necessary to complete at least three, and perhaps
four, cycles before one ought to be sufficiently satisfied with the improvements effected.
In the context of classroom action-research, this could well mean a commitment of at least
a year.
Design issues
Scientific research involves a fixed purpose and design. Action-research is adaptive,
tentative and evolutionary. Scientific research cannot interpret the present until it knows
Evaluation criteria
The quality of fundamental research and the quality of action-research are judged by
somewhat different criteria. The former is considered to be superior in the degree to
which the methods and the findings warrant generalising to persons and situations
beyond those studied. An investigation is a good one if it adds knowledge to that already
recorded and is available to anyone who wants to read it. The value of action-research,
on the other hand, is determined by the extent to which the methods and findings make
possible improvements in practice.
D
What do you believe the purpose of research is?
Because teachers do tend to find one way of doing things as they advance in years,
action-research has been much more successful in the primary school than in the senior
high school. There are many other factors which may be responsible for this. The
departmental feeling is probably stronger in high school than in the primary school and
there is a strong possibility that ifahigh-school teacher attempts some action-research
in their subject area, other members of the subject team may feel threatened, or believe
that the syllabus is being neglected. Also, the high school teacher probably handles more
pupils per day and has them for shorter periods of time than does the primary teacher.
This means less interaction with each pupil. As a result, the primary school teacher is
perhaps in a more favourable position to involve themselves in action-research. They are
not bound in tight subject matter compartments, remember more of their own academic
training, and are still selecting teaching methods and materials from a fairly wide
repertoire.
Favourable conditions for action-research would include:
References
Kemmis, S., & Grundy, S. (1981), ‘Educational Action Research in Australia’. Paper presented at
annual conference of AARE, November, Adelaide.
Lewin, K. (1952), ‘Group decision and social change’ in eds T. Newcomb & F. Hartley, Readings in
Social Psychology, Holt, New York.
Further reading
Atweh, B., Kemmis, S. & Wecks, P. (1998), Action Research in Practice, Routledge, London.
Ball, S. (1985), ‘Participant observation with pupils,’ in Strategies of Educational Research, ed. R.
Burgess, Falmer Press, Philadelphia.
Burgess, R. (ed.) (1984), The Research Process in Educational Settings: Ten Case Studies, Falmer Press,
London.
Carr, W. & Kemmis, S. (eds) (1986), Becoming Critical: Education, Knowledge and Action-research,
Falmer Press, London.
Clandinin, D.J. (1986), Classroom Practice: Teachers’ Images in Action, Falmer Press, London.
Connell, R.W. (1985), Teachers’ Work, George Allen & Unwin, Sydney.
Croll, P. (1986), Systematic Classroom Observation, Falmer Press, London.
Deakin University, (1988), The Action-research Reader, Revised edn, Geelong, Deakin University
Press.
Erickson, F. (1986), “Qualitative methods in research on teaching.’ in Handbook of Research on
Teaching, ed. M.C. Wittrock, Macmillan, New York.
Goswami, D. & Stillman, P. (eds) (1987), Reclaiming the Classroom: Teacher Research as an Agency for
Change, Boynton Cook, Upper Montclair, New Jersey.
Gregory, J.P. (1989), Action-Research in the Secondary School, Routledge, London.
Hart, E. & Bond, M. (1995), Action-Research for Health and Social Care, Open University, Milton
Keynes.
Hustler, D., Cassidy, T. & Cuff, T. (eds) (1986), Action-Research in Classrooms and Schools, Allen &
Unwin, Boston.
Kilpatrick, J. (1988), ‘Educational research: Scientific or political,’ Australian Educational Researcher,
15(2), pp. 13-30.
Mohr, M. & Maclean, M. (1987), Working Together: AGuide for Teacher Researchers, National Center
of Teachers of English, Urbana, IL.
Nias, J. & Groundwater-Smith, S. (1988), The Enquiring Teacher: Supporting and Sustaining Teacher
Research, Falmer Press, London.
Walker, R. (1989), Doing Research: A Handbook for Teachers, Routledge, Cambridge.
Whyte, W. (ed.) (1991), Participatory Action-research, Sage Publications, Newbury Park.
Case study research is not new. Significant cases are central to the world of medicine and
law, and have long been included in the disciplines of anthropology, psychology, political
science, social work and management.
The case study has had a long history in cducational research and has been used
extensively in such areas as clinical psychology and developmental psychology. For
example, both Freud and Piaget typically used case studies to develop their theories.
Criticism of their techniques damaged the case study approach, but the increased
acceptance of qualitative research and, in particular, participant observation has, as a
corollary, revived the acceptability of the case study.
The case study has unfortunately been used as a ‘catch-all’ category for anything that
does not fit into experimental, survey or historical methods. The term has also been
used loosely as a synonym for ethnography, participant observation, naturalistic inquiry
and fieldwork. This has occurred because case study is a method that can be usefully
employed in most areas of education from a historical case study of a particular school
y ofa particular child in psychology, counselling, special education, social
work, or a psychological process, such as Ebbinghaus’ work on himselfin the field of
memory, to sociological case studies involving the role ofa particular pressure group or
religious order on education provision or the hidden curriculum in a specific private
The case study is rather a portmanteau term, but typically involves the observation of
an individual unit, e.g. a student, a delinquent clique, a family group, a class, a school,
community, an event, or even an entire culture. It is useful to conceptualise a
continuum ofunit size from the individual subject to the ethnographic study. It can be
simple and specific, such as ‘Mr Brown, the Principal’, or complex and abstract, such as
‘Decision-making within a teacher union’. But whatever the subject, to qualify as a case
study it must be a bounded system—an entity in itself. A case study should focus on a
bounded subject/unit that is either very representative or extremely atypical.
The key issue in deciding what the unit of analysis shall be is to decide what it is you
want to be able to say something about in the report. As a sociologist you may wish to
focus on roles, subsystems, etc; as a psychologist you would tend to focus on individuals;
with an interest in educational management you could focus on change in a particular
establishment, the implementation of a particular program, or decision-making processes
of a board of school governors, etc.
A case study is not necessarily identical to naturalistic inquiry. Many are studies of
persons or events in their own environment with rigorous research design, but others are
not. A researcher’s report of observations of a school board is usually naturalistic, but the
school psychologist’s report of special tests on a child is a formal, not naturalistic, study.
While a case study can be either quantitative or qualitative—or even a combination
of both due to the constraints of asample of one or a single unit being studied, with the
restrictions that brings for statistical inference—most case studies lie within the realm
of qualitative methodology. Case study is used to gain in-depth understanding replete
with meaning for the subject, focusing on process rather than outcome, on discovery
rather than confirmation.
A case study must involve the collection of very extensive data to produce
understanding of the entity being studied. Shallow studies will not make any
contribution to educational knowledge. One study the writer is aware of involved
interviewing a teacher for several hours and observing the teacher teach for two periods.
The case study is the preferred strategy when ‘how’, ‘who’, ‘why’ or ‘what’ questions
are being asked, or when the investigator has little control over events, or when the focus
is on a contemporary phenomenon within a real life context. In brief, the case study
allows an investigation to retain the holistic and meaningful characteristics of real life
events. The main techniques used are observation (both participant and non-participant
depending on the case), interviewing (unstructured and structured), and document
analysis. In a case study the focus of attention is on the case in its idiosyncratic complexity,
not on the whole population of cases. It is not something to be represented by an array
of scores. We want to find out what goes on within that complex bounded system.
6 Finally, a case study may be valuable in its own right as a unique case. This is often
the position in clinical psychology or in special education, where a specific disorder,
behaviour manifestation or physical disability is worth documenting and analysing;
or in a school setting where an occasional event such as a teacher being charged for
assault by a pupil, or the planning of shared resources by a primary and secondary
school on the same site would be of interest. The case study may be the best possible
source of description of unique historical material about a particular case seen as
inherently interesting in its own right. Gruber’s (1974) study of Darwin and the
processes by which he arrived at the theory of evolution is an example of this.
Oral history
These are usually first person narratives that the researcher collects using extensive
interviewing of a single individual, for example, the development of a program for deaf
children as seen by a teacher closely associated with the scheme, or a retired person
recounting how they were taught in the early part of this century. The feasibility of this
approach is mostly determined by the nature of the respondent. Do they have a good
memory? Are they articulate? Do they have the time to spend with you? Often the
researchers does not have a person in mind but meets a person as they are exploring the
topic who strikes them as a good subject on the basis of initial conversations.
Situational analysis
Particular events are studied in this form of case study. Often the views of all participants
are sought as the event is the case. For example, an act of student vandalism could be
studied by interviewing the student concerned, the parents, the teacher, the local
magistrate, witnesses, etc. When all these views are pulled together they provide a depth
that can contribute significantly to the understanding of the event. Interviews,
documents and other records are the main sources of data.
Multi-case studies
A collection of case studies, i.e. the multi-case study, is not based on the sampling
logic of multiple subjects in one experiment. If the cases are not aggregated it is
convenient to apply the term ‘case study’ to such an investigation. It is a form of
replication, i.e. multiple experiments. If you had, for example, access to three cases
of a very rare psychological syndrome, the appropriate research design is to predict
the same results for each case—the replication logic. This logic argues that each case
Sampling
You have already been introduced to probability sampling in chapter 6. However, non-
probability sampling is more often applied in a case study. The difference is that in
probability sampling one can specify the probability of including an element of the
population in the sample, make estimates of the representatives of the sample, and
generalise the result back to the population. In non-probability sampling, there is no way
of estimating the probability of being included; there is no guarantee that every element
has had an equal chance of being included, or that the case is representative of some
population; and therefore there is no validity in generalising the account.
The usual form of non-probability sampling is termed purposive, purposeful or
criterion-based sampling, that is, a case is selected because it serves the real purpose and
objectives of the researcher of discovering, gaining insight and understanding into a
particularly chosen phenomenon. This sort of sampling is based on defining the criteria
or standards necessary for a unit to be chosen as the case. A blueprint of attributes is
constructed and the researcher locates a unit that matches the blueprint recipe. Table
26.1 lists some types of cases often sought in purposive or criterion-based sampling.
Name a topic you think is a valuable one for a case study. Identify three major questions
your study would try to answer. Try to rewrite these as propositions that are amenable
to being supported or refuted by evidence.
Interviews
These are one of the most important sources of information. Chapter 24 Unstructured
Interviewing and chapter 30 Structured Interview and Questionnaire Surveys cover
interviewing in considerable detail. Interviews are essential, as most case studies are about
people and their activities. These need to be reported and interpreted through the eyes
of interviewees who provide important insights and identify other sources of evidence.
Most commonly, case study interviewers use the unstructured or open-ended form of
interview, so that the respondent is more of an informant than a respondent. The case
study investigator needs to be cautious about becoming too dependent on one respondent
and must use other sources of evidence for confirmatory and contrary evidence.
Case study workers will also use the focused interview in which a respondent is
interviewed for about one hour on a specific topic, often to corroborate facts already
gleaned from other sources. The questions are usually open-ended with a conversational
tone.
However, at times a more structured interview may be held as part of a case study. For
example, as part of a case study of a neighbourhood creche some formal survey might
be taken of its use by different ethnic groups or family types. This could involve sampling
procedures and survey instruments. But it would form only one source of evidence,
rather than the only source of evidence as in a survey.
Participant and non-participant observations can range from the casual to the
formal. In the formal mode the observer will measure the incidence ofvarious types of
behaviour during certain time periods. This might involve observations in classrooms,
play areas, staff meetings or games arcades. This form of observing will give rise to
some quantitative data. The casual mode may be ad hoc observations made during a
visit when other evidence is being obtained. It is fairly easy to notice such items as how
staff greet each other, how staff divide into subgroups in a staff room while having a
tea break, the condition of equipment, pupil behaviour in corridors, etc. These
impressions and perceptions add to the flavour of the context and to the possible events
that may need further study by interview or more formal observation. Photographs are
a useful extra that can help convey site or behavioural characteristics to a report. The
use of more than one observer is recommended in order to increase reliability of the
observations.
Artefacts
Artefacts may be a technological device, a tool, a work of art, etc. Computer printouts
can be used to assess the use of computers by students and the applications of computers
in the classroom.
Identify some incident that has occurred to you recently. How would you establish the
facts of the incident? Who would you interview? Are there any documents to rely on?
~~
To what extent do you think a case study is distinguished from other methods of
educational research by the techniques it employs?
°3 >N
2z x<
©z >5
=
Boys are favoured for prefects, as football and
& 2)
cricket count more in the Headmaster’s eyes
©2 &
3o
Explanation building
This procedure is similar to pattern matching, in which the case study data are analysed
by building an explanation about the case. To explain a phenomenon implies stipulating
a vet of causal links about it. In most case studies, explanation building has been
employed in those producing narrative data in which explanations reflect some
theoretical propositions. The explanation building process is often iterative; that is, as
the initial proposition is compared with some initial findings it is revised and compared
with further data. This process repeats itself as many times as is needed until the
explanation and theoretical proposition fit. Thus it will be the case that the original
proposition wil) have changed in some degree as evidence is examined and new
perspectives obtained. This gradual reciprocal building up of theory and proposition
allows for the testing of rival explanations and propositions. It may seem as though the
case study investigator is trying to make the proposition fit the evidence, but the
congrucnce between evidence and theory is the vital issue and in the process alternative
theories are being tested and discarded.
Generalisation
A second concern is that case studies provide very little evidence for scientific
generalisation. This objection is where a case is studied to provide a basis for inference
to points not directly demonstrated and with relevance to cases not studied. It has been
a common feature of literature critical of the case study method to assume that
generalising theory is the only worthwhile goal. A frequently heard complaint is, ‘How
can you generalise from a single case?’. Of course, the same question could be raised
about experiments. ‘How can you generalise from a single experiment?’ In fact, scientific
facts are rarely based on one experiment, but on replications that produce consistent
results. The short answer is that case studies, like experiments, are generalisable to
theoretical propositions, not to statistical populations, and the investigator’s goal is to
expand theories and not to undertake statistical generalisation.
While the case study has been criticised as a weak vehicle for generalisation, its purpose
has generally not been that. Case studies are focused on circumstantial uniqueness and
not on the obscurities of mass representation. Complicating interaction effects are not
thought of as hindering understanding. The case study worker appreciates the
complexity of the environment and expects that behaviour is a response to the Gestalt,
a response to the wholeness as perceived by the client, a response to interactions between
the subjective and objective of the situation. Every case is embedded in historical, social,
political, personal and other contexts and interpretations. Clean data sanitised by control
in experimental techniques are not true to life.
The generalisation issue is the one that raises most intellectual problems because what is
inferred is a general proposition from a sample of one. If the uniformity of nature is assumed,
then the objection disappears as any case will do to demonstrate what is true of all other cases
of the same class. This is a standard assumption in the natural sciences but perhaps non-
existent in the social sciences. Yet in psychology, for example, there are two notable examples.
Both Piaget and Freud erected general theories on the basis of unsystematically selected cases.
Piaget in many instances used his own children. He is so naive about it that in The Origins
ofIntelligence in the Child he has no discussion of method or even description of the cases
when they are first introduced; names are simply attached to the reported observations from
which general conclusions are reached. An example of Freud’s use of the case study was the
celebrated study of Little Hans, which he used to prove his theory that neuroses are caused
by repressed impulses that surface in a disguised form.
Piaget termed his method the ‘clinical method’. He is aware of his problems:
[I]t is so hard to find a middle course between systematisation due to preconceived ideas and
incoherence due to the absence of any directing hypothesis (1929, p. 69).
Reliability
It is impossible to establish reliability in the traditional sense. However, the notion of
reliability as applied to testing instruments can be applied to human observers. With
training and practice, the human becomes a more reliable observer. Rather than
replicability, reliability in case studies is more focused on dependability that the results
make sense and are agreed on by all concerned. Ways of establishing reliability involve
triangulation, reporting of any possible personal bias by the investigator, the existence
of an audit trail to authenticate how the data were obtained and decisions made about
data and categories.
To improve reliability and enable others to replicate your work, the steps and
procedures must be clearly explicit and well documented in the final report.
Construct validity
Many case study investigators fail to develop a sufficiently operational set of measures,
and subjective judgement is used to collect the data. There are two ways of improving
construct validity. Firstly, use multiple sources of evidence to demonstrate convergence
of data from all sources. Secondly, establish a chain of evidence that links parts together.
Internal validity
This deals with the question of how well the findings match reality. However, if the
major assumption underlying qualitative research is that reality is ever-changing,
subjective in interpretation and wholistic, and not a single fixed entity, then it is not
feasible to try and measure congruence between the data collected and some notion of
reality. In a case study what is being observed is a participant’s notion or construction
of reality, their understanding of the world. What seems true may be more important
than what 7s true.
Internal validity has been assessed by a number of strategies, such as triangulation, re-
checking with participants as to observer interpretations made, peer judgement, and
long-term observation.
External validity
We need to know whether the study’s findings are generalisable beyond the immediate
case. The analogy being made is the sample-population one of quantitative methodology.
This analogy is incorrect because case studies attempt analytic generalisation in which
the investigator tries to generalise a particular set of data to some broader theory. Of
course, the theory can then be tested by replication. It is important not to confuse the
choice of a case study in which the characteristics of the person or community are the
issue with the selection of the arena. Every study has to be conducted in some setting.
The emphasis of the case study is on the characteristics of the particular case; therefore
external validity is not of great importance. A case study need not even be qualitative in
toto, as there may well be some data available on a person, although no inferential
statistics will be applied. There is, however, an implicit assumption that a case study is
Rigour
Another objection is that since methodological rigour appears slight then results are
suspect and writings of case studies reveal more literary artistry than reliable and valid
explanation. Perhaps it takes longer for exponents of qualitative work to develop the
skills needed for a rigorous study. The routines and research activities are not as neat,
orderly and cookbook-like in fashion as quantitative methods.
So in summary there are critical issues and questions to answer. Case study accounts can
be decried as subjective, biased, impressionistic, and lacking in precision. There are dangers
The report
The major components of the report are usually these:
* purpose of the study; the problem that gave rise to it, philosophical orientation;
* methodology including the sampling decisions, rich description of site/subject,
transaction and processes, and data collection techniques;
* presentation of the data including the patterns, themes and interpretation;
¢ validation of findings/outcomes;
* conclusions.
Figures and other displays should only be used when really integral to the discussion.
While verbatim quotations are often necessary as illustrative examples, long unedited
extracts are rarely needed. At the other extreme long, conceptual and philosophical
The case study design is chosen when a rich descriptive real-life holistic account is required that
offers insights and illuminates meanings which may in turn become tentative hypotheses for
further research, possibly in a more quantitative mode. The unit of study must be a bounded
system, but can range in size from an individual to a whole program/system.
Many case studies are qualitative and involve ethnographic techniques, particularly participant
observation. Sampling is usually non-probability, with the case chosen on the basis of some
relevant criterion. Data analysis involves the devising of a coding system that permits higher order
categories and conceptual analysis to develop.
Reliability cannot be established in the traditional sense, and external validity with a single case
is also unavailable. Internal validity is assessed through triangulation, peer judgement and re-
checking with participants.
References
Bacharach, A. (1965), “The control of eating behaviour in an anorexic’, in Case Studies in Behaviour
Modification, eds P. Ullman & L. Krasner, Holt, New York.
Bernard, J. (1966), Academic Women, World Publishing Co, Cleveland.
Gorer, G. (1955), Exploring British Character, Criterion Press, New York.
Gruber, H. (1974), ‘A psychological study ofscientific creativity’, in Darwin on Man, eds H. Gruber
& P. Barrett, Dutton, New York.
Helson, R. (1980), “The creative woman mathematician’, in Women and the Mathematical Mystique,
eds L. Fox, et al. Johns Hopkins University Press, Baltimore.
Jones, M.C. (1924), ‘A laboratory study of fear’, Journal of Genetic Psychology, 31, pp. 308-15.
King, R. (1978), All Things Bright and Beautiful, Wiley, Chichester.
Lacey, C. (1970), Hightown Grammar, University of Manchester Press, Manchester.
Piaget, J. (1929), The Child’s Conception of the World, Adams and Co., New Jersey.
Srole, L. (1977), Mental Health in the Metropolis, Harper Row, New York.
Szanton, P. (1981), Not Well Advised, Ford Foundation, New York.
Whyte, W. (1943), Street Corner Society, University of Chicago Press, Chicago.
Willis, P. (1977), Learning to Labour, Columbia University Press, New York.
Historical research differs greatly from much of the rest of the research methods
discussed in this text. While it shares a great deal in common with qualitative
methods in its use of documents, interviews, biographies and events and their
interpretation, it may also make use of and analyse quantitative data, such as the
changing demographic origins of the teaching profession over the last century. There
is also a quest for objectivity in historical research, and it subscribes to the same
principles of validity and reliability that characterise all scientific endeavours.
History is a meaningful record, evaluation, systematic analysis and synthesis of
evidence concerning human achievement. It is not a list of chronological events like we
remember it at school. It is an integrated account of the relationships between persons,
events, times and places. For example, it is impossible to discuss the development of
programmed learning without also discussing the research of B.F. Skinner and the
dominance of the philosophy of behaviourism at the time. There may be different
emphases, but it is impossible to separate people, events, time and location. History
enables us to understand the past and the present in the light of the past. It is an act of
reconstruction, undertaken in a spirit of critical inquiry, and prevents us from re-
inventing the wheel.
Historical education research is past-oriented research which seeks to illuminate a
question of current interest in education by an intensive study of the material that already
exists. Since history deals with the past, the purpose of historical research cannot be to
control phenomenon. The research is intended to help understand, explain or predict,
through the systematic collection and objective evaluation of data relating to past
occurrences in order to explore research questions, or test hypotheses concerning causes,
effects or trends that may help to explain present or anticipate future events. The values
of historical research are:
it enables solutions to contemporary educational problems to be found in the past;
it allows re-evaluation of theories, hypotheses and generalisations held about the past,
and how and why educational theories and practices developed;
it stresses the importance of complex interactions in the actions and situations that
determine the past and present; particularly how our present educational system came
about;
it throws light on present and future trends, particularly the guises in which
progressive ideas in education re-emerge; and
it contributes to the understanding of the relationships between politics and
education, between school and society, and between pupil and teacher.
Like other forms of qualitative research, historical research is concerned with natural
behaviour in a real situation, and the focus is on interpretation of what it means in the
context. Unlike other forms of educational research the historical researcher does not
create data, but attempts to discover data that already exist.
The purpose of science is prediction; however, the historian cannot generalise on the
basis of past events. Because most past events were unplanned, and even if planned
did not develop as planned, many uncontrolled variables were present and the
influence of one or two individuals was crucial, so no replication is possible.
The historian must depend on the reported observations of other witnesses of
doubtful competence and of doubtful validity, most of whom are no longer alive.
The historian is trying to complete a jigsaw puzzle with parts missing, not knowing
what the final picture is, and in fact creating that final picture by filling in the gaps
with inferences.
The historian cannot control the conditions of observation or manipulate variables
when the events have already happened.
However, in defence it is argued that:
the historian does delimit a problem, formulate hypotheses, and raise questions to be
answered, gathers and analyses primary data, tests hypotheses as consistent or
inconsistent with that analysis and evidence, and finally formulates conclusions or
generalisations.
the historian may gather information from a variety of sources and vantage points
which can provide a form of triangulation.
although the historian cannot control variables, it is arguable whether other forms of
educational and social science research do so effectively, particularly in non-laboratory
studies in the classroom, playground or youth club.
Revisionist history
This is an attempt to reinterpret events that others have already studied.
As in all research, it is necessary to determine whether there is a sufficient and
accessible data base for your desired topic to permit a successful study. If there is a
surfeit of data you will need to narrow the topic down to a shorter time period or a
particular aspect. As with all types of quantitative research, the definition process
should continue as you collect and analyse the data. In doing so you may uncover
new issues and insights, or redirect the focus.
Procedure
Historical research tends to be idiosyncratic, depending both on the individual doing the
research and the nature of the topic. In general however, there are six steps:
1 Identification of the topic and specification of the universe of data required to address
the problem adequately;
2 Initial determination that such data exists and is available;
3 Data collection through consideration of known data, the seeking of known data
from primary and secondary sources and the unearthing of new data and previously
unknown data;
4 [nitial writing of report;
Nn Interaction ofwriting and additional data search as gaps become apparent;
6 Completion ofinterpretative phase.
Sources of data
Four types of historical data sources are used: documents, oral records, artefacts and
quantitative records. Primary sources are documents written by a witness to the events,
whereas secondary sources are secondhand versions and therefore less accurate. Secondary
sources are used as back-up data and when primary data is not available.
Primary sources
Documents
These are records kept and written by actual participants in, or witnesses of, an event.
Examples are minutes and records of formal and informal organisations, autobiographies
and biographies, letters, diaries, census information, contracts, certificates, medical
records, community organisation/school newsletters, programs of sports/religious/
educational/social events, curriculum materials, books, films, recordings, reports,
newspapers, etc.
Artefacts/relics
These are remains of a person or group and for education research could be sites or
remains of old school buildings, hospitals, industrial sites; copies of textbooks, reading
books or industrial equipment no longer used; copies of disused procedures, say for school
principals, employee rules in industry, social workers’ guidelines; old examination papers
and student projects found in cupboards; old school furniture, old medical records or
disease incidence and outdated treatment procedures, etc. These relics often give valuable
clues as to how schooling, working and daily life were conducted in the past.
Oral testimony
This is the spoken account of a witness such as a teacher, pupil, parent, governing body
member, etc. This category also includes tales, myths, ballads, songs and rhyming games
that can be obtained in personal interviews as witnesses relate their experiences and
knowledge.
Secondary sources
The writer of the secondary source merely reports what the person who was actually
present said or wrote. It is secondhand material and does not have as much worth
or validity as a primary source. Errors often result when information is transmitted
from one person to another. A history textbook is obviously a secondary source. A
school textbook may be either type of source. It is a secondary source when used as
a textbook, but a primary source when a researcher is studying the changes in
Data collection
Those involved in historical education research cannot create new data and must work
with what already exists although some of it may be unknown at the start of the research
and only comes to light through the investigation. Much historical research is conducted
in detective-like fashion, whereby information is traced to a source, those knowledgeable
about the event or situation contacted and used as informants, and documents located.
In general, quality historical research depends on sufficient primary data rather than
secondhand data.
The particular research topic will suggest the types of data that must be sought. A
study of a local educationalist will require the location of biographical material, letters,
interviews with those who knew/were taught by/were colleagues of the person,
photographs, diaries, newspaper cuttings, etc. A study of attendance patterns in a region
would need access to school records. Most primary sources can be located in libraries,
museums, education department and university archives, and personal collections.
Information must be recorded as many records cannot be loaned. This means making
photocopies where possible or resorting to manual recording. Photographs of artefacts
are necessary.
Data analysis
Most of the data used in historical research have lives of their own, in that they were not
created in the first place for research purposes. The data were created for someone else’s
purpose or administrative function. Therefore the data may be biased, distorted and
somewhat invalid when used for other purposes. Thus the researcher must evaluate the data
in a critical way, establishing the authenticity of the source, including the date and author,
and evaluating the accuracy and worth of the statements. The central role of the historian is
the interpretation of data in the light of historical criticism. Each fact and supposition must
be carefully weighted and added to the case, leading to the research conclusion. Most
researchers organise either by date or by concept/issue.
External criticism
This establishes the genuineness or authenticity of the data. Is the document a forgery?
We may need to establish its age by examining language usage, spelling, handwriting
style. This may involve chemical tests on the ink and the paper, or on parchment, cloth,
wood and paint, depending on the sort of relic. We also need to check whether the
document or relic is consistent with the technology of the period.
Internal criticism
After authenticity has been established, we still need to evaluate the accuracy and validity
of the data. So although genuine, we need to ensure that they reveal a true picture. Were
the writers honest? Biased? Too antagonistic or too sympathetic? Were they sufficiently
acquainted with the topic? What motives did they have to write about or record the
event or person? How long after the event was the record made? Does the account agree
with other accounts? What was the purpose and in what circumstances was it produced?
Is it complete, edited, altered? Was the author an expert or lay person? How long after
the event was the document produced? Is it liable to memory distortion? Was the author
partisan—a supporter of a particular course of action?
The historian must carefully weigh the extent to which causality can be inferred and
generalisation justified. Historical evidence, like a one-shot case study, can never be repeated.
There is no control group, so one can never be sure that one event caused another. The best
that can be done is to establish a plausible connection between the presumed cause and the
effect. Similarly, the historian must assess the extent to which one educator or school was
reflective of the general pattern at that point in time. Even if bias is detected, it does not mean
that the document is useless. A prejudiced account can reveal the pressures and political
processes that were being brought to bear at the time. The principle in document analysis is
that everything should be questioned.
Conclusion
Educational historical research is difficult and demanding, lacking the standard
methodology of experimental approaches. It involves considerable time searching for and
reading documents, locating individuals and travelling distances on occasion to undertake
these tasks. For these reasons, historical research is not frequent in education faculties, as
students seldom have sufficient time or financial support to do these tasks.
However, despite these drawbacks, historical research has its own special rewards. It
is fun to discover things about the past that give shape to present ideas and patterns of
thought in education, and to show the contribution ofothers, often long since dead, to
the process and achievements of education. It is the sort of research that can be pursued
alone with no rigid timetables or artificial constraints like experimental approaches. It
is a labour of love, limited only by energy and enthusiasm.
Historical research is an integrated account of people, places, events and times, invoking both
qualitative and quantitative methods and data. Historical research involves a wide range of studies
from individual biographies and educational movements through to trend analysis, all undertaken
in idiosyncratic ways.
The researcher uses both primary and secondary sources of data. It is often difficult to assess
reliability and validity as the past event/person cannot be replicated, data are often fragmentary,
and authenticity may be difficult to assess. Internal and external criticism are used in an attempt
to overcome this.
Further reading
Best, J. (1984), Historical Enquiry in Education: A Research Agenda, American Educational Research
Association, Washington DC.
The presentation offindings is the culmination of the qualitative research process. After
all, the purpose of research is not only to increase your own understanding, but also to
share that knowledge with others. Your efforts are wasted if you cannot disseminate the
results. Writing a report also helps you to clarify your thoughts and arguments.
But when it comes time to write up your qualitative study you can feel completely out
of control, facing too many choices: what is the order of presentation? which evidence?
active or passive voice? how long or short? etc. The reporting phase is more difficult to
do in qualitative research than in quantitative research, where there is a conventional
linear sequence to a research report which deals in a quite short and precise way with the
study for publication in a journal. Unfortunately, qualitative reports do not have a
uniformly acceptable outline. Nor do ethnographic, action-research or case study reports
usually end up as journal articles.
Because of the uncertain nature of qualitative reporting, investigators find that this
compositional phase puts the greatest demand on them. Inexperience in composing should
not deter an investigator from utilising quantitative methodology. However, much more
practice is needed than for a journal article in the quantitative mode. One indicator of whether
a person will do well at writing a qualitative report is whether they are good at and enjoy
writing essays and detailed letters. Another pointer is whether the report is seen as a chore or
an opportunity to make a significant contribution to knowledge.
A report or article based on qualitative research is not an opportunity to ‘fly a kite’;
to provide an ‘off the cuff view of an event. Rather, it should be a logical, descriptive
and analytic presentation of evidence that has been systematically collected and
interpreted. It seems formidable if viewed as a single task. It is therefore more
encouraging to break down the task into smaller sections, some of which are drafted
while the research goes on, and then place all the subtasks in sequence. No one can sit
down at the end of a qualitative investigation with blank paper and all the fieldnotes, and
start writing. The first step is to decide who the audience will be.
Getting started
Novice investigators are big procrastinators. The key to decreasing the composition
problem at the end of the research phase is to commence writing and preparing the
report as the study is progressing. Don’t leave it all to the end. Re-formatting
observational notes/interview details, analysing and coding recent observations and
interviews, reviews of literature, building up a bibliography/reference file are ongoing
activities while the research continues, but are part of the report writing process. This is
the splitting up of the task into subtasks referred to above.
Report writing should start early in the conduct of the study. Certain sections of the
report will always be draftable before data collection and analysis have begun. For
example, after literature has been reviewed, a first draft of the bibliography and
methodology section can be made. Additional citations can always be added to the
bibliography, and if some are incomplete these can be tracked down as the study
proceeds rather than become a chore holding up the final report at the end.
The methodology section can be drafted early as the major procedures for data
collection and analysis should already be part of the design. The methodology section
should contain arguments and issues concerning the selection of the
cases/informants/techniques. The next section on qualitative and quantitative
information gleaned about the cases/issues can be started before analysis begins. While
the evidence should demonstrate the case convincingly, all the evidence and
Linear—analytic structure
This could be considered the standard approach. The sequence follows the standard
journal report from statement of the problem, through review of literature/theory,
methodology, results and discussion. This structure is comfortable for those studies that
involve a single issue/problem/case in an explanatory, descriptive or exploratory study;
4
for example, the classic single case study, action-research narrative augmented where
appropriate with tables, charts, etc.
Comparative structure
A study that compares alternative descriptions or explanations of several
cases/problems/issues, or is iterative of the same case/issue from different points of view,
or involves more complex comparisons between various subprograms of one action-
research study, would employ this approach. Each case/problem/subprogram or section
will probably be presented initially as a separate chapter with later cross-case analysis and
results. For example, this approach could be used to compare different conceptions of
how the appointment of a school principal was made from the perspectives of the
candidates, the appointment committee, the staff, etc.
In a variant of this structure, the whole of the report may consist of cross-case analysis
only with no separate sections devoted to separate cases, issues, programs, etc.
Information on different cases would, therefore, be scattered through each chapter; each
chapter dealing with a theme or proposition across cases.
Chronological structure
For the study of an event over time—for example, the introduction of anew examination
system or the integration of an immigrant pupil—a sequence of chapters covering the
early, middle and late phases of the event would be most appropriate. There is a major
problem to avoid in this approach: it is that most investigators spend too much of the
report on the introductory stages detailing early history and background and insufficient
on the later stages.
Theory-building structure
The sequence of chapters or sections in this approach will follow some theory building
logic. Each chapter or section will unravel a further part of the argument with compelling
evidence. The entire sequence should be a linked argument following through to a well-
supported conclusion.
Suspense structure
This study is presented in reverse to the usual. The outcome or conclusions are presented
first, while the remainder ofthe report is employed to support the outcome presenting
alternative explanations as required.
Unsequenced structure
This structure is often used in descriptive case studies where the sequence of chapters is
of no great importance. A descriptive study of a school might have sections on staffing
policy, student discipline and rules, role of parents and friends, groups, etc., but the
order in which each is presented is not crucial.
Micro-ethnological structure
If you choose to do a micro-ethnography, your report must focus on intimate behaviours
ina single setting, narrowing in on more specific aspects ofinteractions in order to break
down the setting more and more. This continual breaking down and dissection of events
will form the organising sections of your report.
Macro-ethnological structure
In a macro-ethnography, you lay out the whole realm of acomplex situation, covering
all aspects that are relevant to your theme. Each section may cover a different aspect but
demonstrate its relationship to the whole.
The introduction
This provides the general background and general statement of the problem needed to
understand the importance of the focus; in other words, what the research is attempting
to do. Placing the study in the context of current literature, theory or debate is a major
strategy here. Many investigators apply an existing theory to a study and try to extend
or refine it. For example, Hargreaves, Hestor and Mellor (1975) used labelling theory
which had been developed in the area ofsocial deviance as a way ofexplaining deviance
in the classroom.
The introduction will include, where necessary, a review of pertinent literature/theory.
The literature review is a stimulus for your thinking and not a way of summarising in
your own mind the previous work in the area that can blind you to only considering
existing concepts and conceptual schemes as in quantitative method. New findings
cannot always be fitted into existing categories and concepts, and the qualitative method
with its more open-minded approach encourages other ways oflooking at the data. The
literature review should be a sounding board for ideas, as well as finding out what is
already known and what specific methodologies have been used. Often, research reports
identify additional questions that would be fruitful to pursue.
The core
This makes up the bulk of the manuscript, getting its direction from the introduction,
and must explain the processes by which you obtained your data and how you
interpreted them. It usually commences with a justification and account of the research
methods used to gain the evidence. The detail of the method enables readers to evaluate
the reliability and validity of your approach. The rest of the core is concerned with the
presentation of your evidence related to the focus/theme/topic, arguing and illuminating
as you go, and, if necessary, revising your initial proposition, enlarging or changing your
focus. At all times you must ask yourself: Does this relate directly to my
focus/argument/proposition? If it does not, leave it out; it may be the theme of another
later paper, but not this one. This will keep you on track and prevent the report
becoming a receptacle for every observation, statement, document you obtained.
Other parts of the core depend on the sort of study you did, whether single case,
multi-thematic, action-research or ethnography. You may have to write comparative
sections and discuss patterns across cases. A variety of different forms of report is briefly
itemised below. But whatever the specific content of each section in the core, each should
also have an introduction, a middle and an end. The introduction will inform the reader
what that section contains and how it relates to other sections. The middle will provide
the evidence and argument, while the end will summarise and provide a link to the next
section.
What the qualitative researcher is doing is telling a story—‘Here is what I found, and
here are the details to support that view’. Use subheadings frequently, as these help to
structure the report and may often reflect the way respondents have structured their
world. Look for places where your general statements are too dense or lengthy, and see
if you can insert a brief but telling example to break up the prose. Readers are advised
to consult chapters 23 and 24, where detailed methods of analysing observations,
interviews and survey data can be found.
Deciding what evidence to use is like a balancing act between the particular and the
general. Your writing must clearly demonstrate that your abstract ideas (summaries of
what you saw) are firmly grounded in what you saw. A good qualitative report is well
documented, with descriptions taken from the data to illustrate and substantiate the
Confidentiality
Should informants, events, locations, etc. be accurately identified or remain anonymous?
The most desirable option would be to identify everyone and everything, as this
disclosure enables readers to recognise the reality of the study and locate it in their own
experiences. However, privacy laws and the confidentiality that has been promised before
informants would talk, or would allow observation of events, definitely precludes this.
If you have promised confidentiality, you must follow that through. It is possible to
maintain anonymity of respondents although the context, event, location are identifiable
as individual behaviours; open-ended interview responses, etc. can be reported but not
attributed to a named individual.
Helpful tips
¢ Break the report down into manageable parts.
¢ Prepare parts of the report as your research proceeds, e.g. bibliography, literature
review.
¢ Establish the objective(s) or question(s) you wish to answer and write a summary
introduction linking general question/focus with the proposition and previous
theory/literature.
* Go through the draft looking for words and sentences that can be left out without
changing the meaning or through elimination make the meaning clearer.
¢ Write in the active rather than in the passive voice.
¢ Use short sentences and avoid jargon.
¢ Ground your writing in specific examples.
¢ Have friends/colleagues read the draft and comment.
Remember in writing a qualitative report that there is no single conventional model—
diversity reigns. Your style of presentation should suit the topic, be comfortable to you
and, above all, present your study in a well-documented argument, providing the reader
with a rich flavour of what you investigated.
The following three research reports illustrate the way in which four particular
researchers have investigated and written up their topics. As you are aware by now,
qualitative research reports can be structured and reported on in a plethora of ways.
Even these three topics could have been written up in different ways by other
investigators. So do not feel constrained by these examples in the way in which you wish
to present your material. Remember, clarity, precision and a logical structure are
important to its readers.
The first report, ‘I feel sorry for supply teachers’ by Wood and Knight, is an example
of a mini-ethnographic study in the classroom. A lengthy literature review is avoided and
the paper simply lays down the theoretical context—pupil expectations—and the specific
issue: why pupils respond to supply teachers as they do. The methodology is outlined,
followed by selected evidence in the form of interview responses. The bulk of the paper
then relates the findings to theories and previous studies in expectation, labelling and self-
fulfilling prophecy areas. Finally, the paper makes suggestions for improving how relief
teachers are seen by pupils. It is a short and very readable piece, aimed at an audience of
teachers and education administrators.
The second report, ‘Nasr’s development as a writer in his second language: The first
six months’ by Elliott, illustrates the case study approach. The paper reports an
investigation of the development ofwriting behaviour in a second language ofone child
over a six-month period.
The developmental changes are recorded, described and analysed chronologically
within categories of genre, language skills and strategies. The argument and discussion
References
Hargreaves, D.H., Hestor, S. & Mellor, F. (1975), Deviance in Classrooms, Routledge, London.
Whyte, W. (1955), Street Corner Society, University of Chicago, Chicago.
Further reading
Clark-Carter, D. (1997), Doing Qualitative Psychological Research. From Design to Report, Psychology
Press, New York.
An ethnographic study
This paper examines pupils’ views and expectations of ‘supply’ teachers in case-study fashion. Given the limited
time in which supply teachers are in contact with any one class, teacher reputation and initial encounters are
seen to be critical in determining the success or failure of supply teachers in the classroom. Suggestions are
offered for improving the situation and effectiveness of supply teachers.
501
as a team. All six classified themselves as ‘normal’ in their schoolwork. With respect to their
responses to supply or relief teachers, Karen claimed to be one ofthree pupils in the class who
start the ‘fun’.4 The claim by Karen was backed up by the others who also labelled her as a “loud-
mouth’. Jenny said she was a ‘loud-mouth’ also. However, Michelle, Chris, Scott and Troy are
all labelled as ‘quiets’. All agreed they were to varying degrees ‘pests’ for their classroom teachers,
but were much worse for supply or relief teachers.
Two group and three individual interviews were held in a small withdrawal room next door
to the classroom. During the first interview five of the children (Jenny was absent) discussed
questions passed by the interviewer, who then jotted down notes in answer to each question. The
next session involved individual interviews with Karen, Chris and Scott. This was a question and
answer section, during which the interviewer was able to jot down answers verbatim. (Karen
was sent in first ‘to get her out of the way’ according to her classroom teacher.) Jenny’s only
interview was on the last visit. This final session was another group interview during which the
children helped to construct taxonomies on teachers and students.
The direction of questioning was shaped by the interviewer's own recent experiences as a
supply teacher, her conversations with other supply, relief and classroom teachers and confessions
from her thirteen year old son on the subject of supply teaching.
Initially, the study was to have been limited to the study of responses of pupils to supply
teachers. However, due to their inability to differentiate between supply and relief teachers, relief
teachers were also included.
Karen: ‘He’s old and boring. Always talking about history. Always! So we have fun! We throw
chalk. He can’t see properly so we aim at his bald spot—but I do work for our teachers
because our teachers would send us to the office or give us an essay.’
Jenny: ‘If rubber bands are handy, depending on who the supply teachers are, and if they don’t
know what’s happening, we fling rubber bands about.’
Further information on the criteria by which pupils judge whether to cooperate or not came
from Michelle, Karen, Chris, Scott and Troy who agreed that supply teachers don’t have as
much control because ‘they don’t know us’ or ‘they don’t know us that well’ so ‘we have fun’.
Teacher behaviours that elicit more work also elicit cooperation and so the pupils do not usually
have ‘fun’. The reverse was also true.
Karen: ‘If they aren’t strict the kids tend to play up. Also they don’t do their work—like, if they
get away with one thing they will try something worse and more daring, like starting a
rubber band fight.’
While pupil behaviour for the supply/relief teachers varies from that for classroom teachers,
as Chris said: ‘Children who muck up for the supply teachers, usually muck up for our own
teachers, but not as bad. Goody goodies and stiffs don’t muck up.’ A sense of fair play, of right
and wrong and of how far to go with supply/relief teachers entered into their interactions with
each other and the supply/relief teachers. They weighed up the risks involved. For example:
Chris: ‘I feel sorry for supply teachers to have to put up with us. We shouldn’t really muck
around. If we had a supply teacher I’d probably do my work because the teacher would
report back, but if the supply teacher has no control over the class and gives us no work,
I would misbehave and so would everyone else’.
Troy: ‘I wouldn’t act up if the supply teacher was a neighbour or a family friend ’cause they
might tell Mum.’
or Karen: ‘Sometimes you get a supply teacher in the face (with a rubber band). That’s not fair.’
If the supply teacher has a reputation with other classes, pupils listen to the rumours and “We
believe what they say’ (Jenny). They did more work or less work depending on the reputation
held by the supply/relief teachers. For known teachers (usually relief teachers) previously built
up the reputations led the pupils to act as they previously had acted with this same teacher [sic].
Karen: ‘Fun. It’s not something that happens every day. Natural instinct for when Mr Smith
walks in! Hey fun!’
The ‘fun’ in this particular classroom is orchestrated by the three ‘starters’ and spreads rapidly
in wavelike motion as each group joins in, if the supply/relief teacher has no control. If there is
no control, immediately after the ‘starters’ have begun some form of ‘fun’ (be it paper plane or
rubber band throwing), the ‘loudmouths’ and ‘fools’ join in followed by the others with the
exception of the ‘stiffs’. If the teacher has some measure of control, the ‘quiets’, ‘goody goodies’
and ‘squares’ are not as likely to join in. In talking about who joins in, informants said:
Karen: ‘Stiffs don’t. They say they might dob on you. A group at the front might muck up. Stiffs
and squares at the back don’t.’
Scott: “Karen, Michael and Adam (the ‘starters’) start it off. . . spreads very fast. The stiff tells
everyone to be quiet. She’s trying to work. She won’t join in. We don’t plan it. Someone
just talks to someone else and it spreads.’
report 1 503
Chris: ‘I have dares with friends to have rubber band fights. If they’re the same standard as me
they probably would—not with goody goodies.’
Despite all this, the children genuinely wanted to be controlled by their teacher; no matter
whether it was a supply, relief or classroom teacher in charge. Boredom from too easy work
resulted in ‘fun’ sessions, but too much ‘fun’ in turn led to boredom. Pupil behaviour was shaped
(as noted previously) by available opportunities, teacher reputation and behaviour, and concerns
for fair play. Who is involved in classroom ‘fun’ and how and why this involvement spreads
quickly is critical. These responses are shaped by teacher control which in turn relates back to
teacher reputation, behaviour, personal characteristics and management skills.
Using strategies which are known or come to hand, both teacher and pupil who are ‘thrust
together in enforced intimacy’ negotiate to develop working relationships (Beynon and Atkinson
1984, pp. 256-7). This negotiation, which takes place in two phases, is referred to by Ball (1980)
as a ‘process of establishment’. He defines this as:
an exploratory interaction process involving teacher and pupils during their initial encounters
in the classroom through which a more or less permanent, repeated and highly predictable
pattern of relationships and interactions emerges. (Ball 1980, p. 144)
According to Delamont (1983, pp. 112-13), during phase one the pupils observe their new
teacher ‘to develop a series of hypotheses about the kind of teacher’ they now have. The second
phase is an active one during which the pupils are ‘real horrible’, as they ‘muck up’ to discover what
parameters of control the teacher is seeking to establish over their behaviour and to find out if ‘the
teacher has the tactical and managerial supply skills’ to defend these parameters.
In this interactionist perspective, the focus is on emergence and negotiation. The idea of‘process
of establishment’ gives an explanation for what goes on in the informants’ room when a supply/relief
teacher arrives. As noted above, the results of such pupil testing periods are not ‘foregone
conclusions’ (Ball 1980, p. 150). Only known teachers who had in some way already proved
themselves were safe from further testing. For them the pupils were willing to work. All unknown
supply/relief teachers (except some ‘known’ by reputation) were tested. Here the conventional
wisdom (‘most experienced teachers insist that the teacher must, if he (sic) is to survive, define the
situation in his own terms at once’) seems justified (Hargreave’s pupils 1972, p. 232).
This strategy has particular relevance to supply teaching as most encounters are initial encounters
and ‘long term establishment’ of classroom rules and routine which ‘takes weeks to establish’ is not
possible.
What the supply/relief teacher must bear in mind is that teacher reputations are being formed
and will be passed on to other classes, who will react positively or negatively depending on the
reputation. The Year 7 pupils used their ‘fun’ to see how far a teacher would let them go. They
expected to be controlled. Supply/relief teachers who did not control them were immediately given
‘bad’ reputations. Beynon and Atkinson (1984, p. 261) showed that such ‘mucking up’ could:
The supply/relief teachers associated with the Year 7 class have indeed been tested. Not only
have they developed reputations but so have the three ‘starters’ (Karen, Michael and Adam).
Groups willing to join in the ‘fun’ (depending on the risk of punishment involved) were
identified as ‘fools’, ‘loudmouths’ and even the ‘quiets’ and ‘squares’.
Informants calculate the risks involved from reputations ofteachers, previous associations, or
‘cues and information’ given out by the teacher ‘to the pupils the moment he or she walks into
the classroom’ (Ball 1980, p. 146). While having ‘fun’, informants and their cohorts were able
to find out how much noise or lack of manners (e.g., calling out) individual supply/relief teachers
would tolerate and how much or how little work they would have to do. All these points were
identified by Ball as information gathered by pupils in his study. Where one teacher may accept
certain behaviour as normal, another may feel this type of room behaviour is bad indeed. Hence
pupils are faced with differing teacher expectations. In the same way some pupils may see certain
teacher behaviour as unreasonable while others perceive the same behaviour as and ‘quite within
the limits of a teacher’s role’ (Education Department of Western Australia 1981, p. 151). This
study showed the types of teacher for whom these informants do more work or less work. This
correlates highly with behaving or having ‘fun’. It also coincides with research showing teachers
who are liked or disliked by pupils.
The issue and expectation of teacher control remains central. The observation of Marsh,
Rosser and Harre (1978, p. 38) on ‘softness’ was validated by this study’s informants:
Being a soft teacher was seen to be one of the worst categories of offence. The pupils are
insulted by weakness on the part of those in authority who they expect to be strong.
For a supply teacher who has lost control of his or her class it is obvious that the subsequent
relationship is one dictated by the pupils, not the teacher. This situation results in little or no
appropriate learning taking place.
Labelling
Reputations were built up by pupil labelling of supply/relief teachers by pupils as well as by the
labelling of pupils by pupils and teachers (classroom, supply or relief). Here it is important to note
that for the imputation of ‘deviance’, the interaction of two parties, labeller and labelled, is
necessary (Hargreaves 1976, p. 201). That is to say, deviance arises:
not when persons commit certain kinds ofacts: it arises when a person commits an act which
becomes known to some other person(s) who then defines (or labels) the act as deviant.
(Hargreaves, Hester and Mellor 1975, p. 3-4)
This process oflabelling appears to be taking place in the informants’ classroom between at
least three different groups oflabellers and labelled. These are:
Labelled Labellers
‘starters’, ‘fools’ supply/relief teacher
supply/relief teacher pupils
‘starters’, ‘fools’ pupils
report 1 505
However, not all who are labelled take notice of the label or respond to the labelling in any
way. To some the labelling does not appear valid and may be discounted (Hargreaves 1976).
Factors influencing the acceptance (or otherwise) of the label include how often the labeller
labels the labelled, the extent to which the labelled values the labeller’s opinion, the extent to
which others use the label on the labelled, as well as the ‘public nature of the labelling’. Being
labelled in front of a class of pupils is more degrading and severe a punishment than being
labelled in private. Being repeatedly publicly labelled by respected persons can be seen as possibly
leading to some change in the person labelled.
Consequences of labelling can be seen as social control or as leading to deviance. Under
conditions of what Edwin Lemert (in Hargreaves 1976, p. 203) sees as primary deviation, where
the person who is labelled is deterred from repeating or committing acts seen as deviant, or is able
to justify or deny the actions (and hence is able to neutralise or normalise the labelling), the
labelling does not appear to cause a loss of self-regard or result in changing social roles. Under
these conditions labelling appears as a form of social control and hence can be viewed as having
‘positive’ results. However, where the person who has been labelled discounts the label (such as
when the labeller’s opinion is not valued), no social control results but neither does further
deviance result from the labelling.
Further deviance resulting from the application of labelling is viewed by Lemert (in Hargreaves
1976, p. 203) a secondary deviation. He defines secondary deviation as:
deviant behaviour, or social roles based upon it, which becomes a means ofdefence, attack or
adaptation to the overt and covert problems created by the societal reaction to primary
deviation.
When labelling fails as a form of social control but instead angers the labelled person, this
person may react by committing further deviance. His/her coping strategies for dealing with the
labelling, which cannot be normalised, may create a cycle of deviant acts from which the labelled
person (now deviant) cannot escape. The deviant may become stigmatised and feel victimised.
An example of this cycle of behaviour may be interpreted from information of incidents between
a supply/relief teacher and the informants’ five years prior to the interviews. The teacher
concerned, according to informants, was a bad teacher who could not control the class, who
threw chalk and blackboard dusters and locked children in a cupboard.
Perhaps these teacher actions were forms of survival strategies that the teacher felt he needed
to protect his job, but the children in the class labelled this teacher as bad and did not cooperate.
Hence the labelling was communicated to the teacher through pupil behaviour including lack
of cooperation. In striving to control the class the teacher was forced into more deviant behaviour
and no doubt felt victimised by the actions ofthe pupils. He was stigmatised by the pupils, who
treated the teacher badly and were in turn badly treated by the teacher. This is an extreme
example of teacher deviance.
Petrie (1981, p. 138) holds a similar view. The ‘starters’ live up to the negative expectations
of the supply/relief teacher as well as the rest of the class. Unfortunately since the teacher and
pupil expectations are negative, deviant behaviour (negative self-fulfilling prophecies) results
from the labelling.
Changes in expectations ofsupply/relief teachers and pupils may change the amount of‘fun’
had while these teachers are present. Ramsay, Sneddon, Grenfell and Ford (1983, p. 279) found
in their study that in successful schools “The children were rarely held to blame. . . Thus the risk
of low expectations and subsequent lowered pupil performance was minimised’. Hence they
concluded that ‘the level of the teachers’ expectations, both academically. and socially, marked
off the ‘successful’ schools from the less successful’. This points to a need for high positive
expectations in the informants’ classroom. High expectations of the ‘starters’ by the supply/relief
teacher, high expectations of the supply/relief teacher by the pupils as well as high expectations
of the ‘starters’ by the rest of the class are needed.
Clearly this type of teacher is unlike those supply teachers in this study who ‘do not care’.
Hence, rather than concentrating on surviving in the classroom, the supply/relief teacher must
be aware of not communicating negative feelings or low expectations to his/her pupils.
From the descriptions given by informants some supply/relief teachers are preoccupied with
survival. As Dale (1977, p. 49) notes, “discipline is not necessary only to facilitate teaching, but
also for teachers to survive the classroom’. Woods (1979, p. 258) suggests that by ‘increasing
resources and/or lessening demands’, the concern over survival would be lessened. Some teachers
possibly are so preoccupied with survival that concern for the pupils is minimised and the
importance of high expectations not considered. For successful teaching high expectations must
be held.
report 1 507
Effectiveness of supply teachers
As the use of supply/relief teachers is seen as necessary to prevent interruption to other classes,
these teachers contribute towards successful or unsuccessful schooling. For successful teaching,
Woods’ suggestions (given above) could be implemented by the schools. Rumours generated by
pupils and supply/relief teachers must have some basis. More weight could be given to such
rumours to help weed out the ‘rotten eggs’, such as the supply or relief teacher who would not
only throw chalk and dusters at the pupils but locked some in the cupboard. Others to be weeded
out are those who have no control over the class. This lack of control can lead to labelling and
perhaps secondary deviation. Schooling which is seen as possibly creating deviants cannot be
viewed as successful.
Although bureaucracies are usually resistant to change, they can produce radical change.
Hence it is feasible to envisage a Department of Education empowered to radically alter the
status, working conditions and job opportunities to those teachers currently employed as supply
and relief teachers. Specifically the current system within the Queensland Education Department,
whereby a Local Relieving Teacher (LRT) is attached to a school and is on secondment to other
schools means that unknown LRTs can be brought in from other schools while the LRT at
his/her home school can gain a reputation as a bad or good teacher. With each new initial
encounter and consequent ‘sussing out’ this reputation can be confirmed. A bad LRT then is
possibly stigmatised and victimised by the pupils and in return labels pupils and possibly sets in
motion the process of deviation. Until each school is free to choose its own supply teachers this
situation is likely to recur. For successful schooling the bureaucracies need to empower each
school to choose its own supply teachers.
Final comment
Teacher reputations and survival are important in all the initial encounters that supply/relief
teachers are involved in. Prevention or lessening of problems caused by this lessens labelling of
pupils and prevents the construction of‘deviant’ pupils. Aids for supply teachers in the form of
resources and/or fewer demands on them could encourage the formation of high expectations
being held by the teacher for the pupils and vice versa. Clarification of the teacher’s role and
acknowledgement by the school, parents and Department of Education of the difficulty and
importance (not just baby-sitting) of supply teaching might lead to higher pupil cooperation and
less wasted time. Enabling each school to select their own supply teachers should result in pupils
facing known, competent teachers, who hold high and yet valid expectations (in the pupil’s eyes
too) and who thus help with successful schooling through appropriate teaching.
Notes
1 For example, Catsoulis, 1981; Connell, Ashenden, Kessler and Dowsett, 1982; Corrigan, 1979;
Delamont, 1983; Denscombe, 1985; Hargreaves, 1972; Jackson, 1968; Macpherson, 1983; Walker,
1987; Werthman, 1977; Willis, 1977; Woods, 1979, 1980a, 1980b, 1983.
2 The literature on the problems of‘substitute’ teachers is extensive; there is, however, a dearth ofstudies
on pupils’ views of such teachers.
3 The ethnographic approach seeks to ‘describe a culture in its own terms’. A ‘mini-ethnography’ presents
the information shared by two or more people that defines some aspect of their experience’ (Spradley
and McCurdy, 1972) here, pupils’ definitions of agrade seven classroom.
4 ‘Fun’ is this group’s word for ‘stirring’. Occasionally some informants used ‘muck up’, but Scott’s
answer when asked to label what he and the others were doing, ‘Fun! We don’t really call it anything,
but that’s what it is!’
References
BALL, S.J. (1980) ‘Initial encounters in the classroom and the process of establishment’, In Woods, P.
(ed.) Pupil Strategies. Croom Helm, London, pp. 143-61.
BEYNON, J. and ATKINSON, P. (1984) ‘Pupils as data-gatherers: mucking and sussing’, in Delamont, S. (ed.)
Readings on Interaction in the Classroom. Methuen, London, pp. 255-72.
CATSOULIS, C. (1981) “Teachers and students at Johnholme College’, in D’Urso, S. and Smith, R. (eds)
Changes, Issues and Prospects in Australian Education, 2nd ed. University of Queensland Press, pp.
130-6.
CONNELL, R., ASHENDEN, D., KESSLER, S. and DoWSETT, G. (1982) Making the Difference: Schools,
Families and Social Division. Allen and Unwin, Sydney.
Corrican, P. (1979) Schooling the Smash Street Kids. Macmillan, London.
DALE, R. (1977) “The hidden curriculum for the sociology of teaching’, in Gleeson, D. (ed.) /dentity and
Structure. Issues in the Sociology of Education. Nafferton Books, Driffield, pp. 44-54.
DELAMONT, S. (1983) Interaction in the Classroom. 2nd ed. Methuen, London.
DENSCOMBE, M. (1985) Classroom Control: A Sociological Perspective. Allen and Unwin, London.
EDUCATION DEPARTMENT OF WESTERN AUSTRALIA (1981) “Nature and extent of the discipline problem’,
in D’Urso, S. and Smith, R. (eds) Changes, Issues and Prospects in Australian Education, 2nd ed.
University of Queensland Press, pp. 151-70.
Foster, L.E. (1987) Australian Education. A Sociological Perspective. 2nd ed. Prentice Hall, Sydney.
HARGREAVES, D.H. (1972) Interpersonal Relations and Education. Routledge and Kegan Paul, London.
HarGREAVES, D.H. (1976) ‘Reactions to labelling’, in Hammersley, M. and Woods, P. (eds) The Process
of Schooling: Sociological Reader. Open University Press, London, pp. 201-7.
HarGREAVES, D.H., HESTER, S. and MELLOR, F. (1975) Deviance in Classrooms. Routledge and Kegan
Paul, London.
JACKSON, P. (1968) Life In Classrooms. Holt, Rinehart and Winston, New York.
MACPHERSON, J. (1981) “Classroom “mucking around” and the Parsonian model of schooling’, in D’Urso,
S. and Smith, R. (eds) Changes, Issues and Prospects in Australian Education. 2nd ed. University of
Queensland Press, 143-50.
Marsh, P., Rosser, E. and HArRrE, R. (1978) The Rules ofDisorder, Routledge and Kegan Paul, London.
PETRIE, S. (1981) ‘School structures and delinquency’, in D’Urso, S. and Smith, R. (eds) Changes, Issues
and Prospects in Australian Education. 2nd ed. University of Queensland Press, pp. 137-42.
Ramsay, P., SNEDDON, D., GRENFELL, J. and Forp, I. (1983) ‘Successful and unsuccessful schools: A
study in Southern Auckland’, Australian and New Zealand Journal of Sociology. 19(2), pp. 279-304.
SPRADLEY, J.P. and McCurpy, D.W. (1972) The Cultural Experience: Ethnography in complex society,
S.R.A., Chicago.
report 1 509
WALKER, J. (1987) Louts and Legends. Male Youth Culture in an Inner-city School, Allen and Unwin, Sydney.
WERTHMAN, C. (1977) ‘Delinquents in schools: A test for the legitimacy of authority’, in Cosin, B. et al.
(eds) School and Society, 2nd ed. Routledge and Kegan Paul, London, pp. 34-43.
WILLIS, P. (1977) Learning to Labour, Saxon House, London.
Woops, P. (1979) The Divided School, Routledge and Kegan Paul, London.
Woops, P. (1980a) Pupil Strategies, Croom Helm, London.
Woops, P. (1980b) Teacher Strategies, Croom Helm, London.
Woops, P. (1983) Sociology and the School, Routledge and Kegan Paul, London.
Author
Elizabeth Wood is a supply teacher for the Queensland Department of Education. John Knight
lectures in the sociology of education at the University of Queensland.
Source: Knight, E. & Knight, J., Unicorn Journal of the Australian College ofEducation, Canberra, vol. 15,
no. 1, Feb. 1989, pp. 36-43.
Marietta Elliott
La Trobe University
During the first 6 months of the school year of 1985, at Brunswick Language Centre, | observed Nasr as he
was learning to write in his second language.
The most significant change which occurred is that Nasr gained an appreciation of the way in which English
written language is different from spoken language. That is, rather than merely recording his spoken language,
Nasr became a writer in English.
The changes manifested themselves not only in the product, namely the texts themselves, but also in the
processes by which they were produced. These processes can be both directly observed, as recorded on
videotape or in the observational diary, which was kept once weekly, or inferred from the product.
The major ways in which the last piece is more ‘developed’ is that Nasr has chosen a more ‘advanced’ genre,
and the piece conforms more strictly to one genre, rather than also containing elements of other genres.
Nevertheless, the earlier pieces mark important, transitional stages and | have therefore chosen to call these
intermediate forms ‘intertext’.
Nasr gains mastery over linking mechanisms more characteristic of written than of spoken language; he moves
from co-ordination to subordination, and through the use of reference and ellipsis, he gradually eliminates the
various forms of redundancy. Acquisition of form and function of the past tense is regarded as essential for the
production of sustained narrative and, as such, can also be viewed as a form of cohesion.
In Nasr’s case the changes in the writing behaviour include an increase in pause length and a reduction in the
number of pauses, changes in the number and type of revisions made, and differences in the way in which
input from the teacher is generated.
Introduction
I last saw Nasr in July, 1986, one year after he had left the Language Centre. He had changed
from the playful boy I remembered to a serious young man. His voice had deepened, and he had
grown several inches.
He was happy at Brunswick High School, and had just graduated from E.S.L. to mainstream
classes, where he had no difficulty participating. He was also studying Arabic, but he found the
work too easy.
He showed me two pieces of his current work: a fictitious interview and a fictitious newspaper
report. Not only was his expression virtually error-free, but he was using colloquial expressions
like ‘take care of yourself, kids’, and complex sentences such as ‘He was wearing an overcoat
even though it was a mild day’.
How had Nasr achieved all this in 18 months?
The study
I wished to observe students’ writing development in a situation where they would have
considerable choice both in topic and language, so that their writing could be as independently
conceived as possible, both with regard to content and with regard to the language selected.
511
The teachers at Brunswick Language Centre were enthusiastic about putting a writing
program in place, therefore all the students I have observed have come from this Centre. As
these students stay for a period of six months before moving on to school, and have intensive
English tuition for this time, six months is the time frame ofthe study.
Brunswick Language Centre is in inner-suburban Melbourne. It caters for about 60 students,
both ‘New Arrival’ and ‘Intensive’ (students who have been in Australia for some time but are
experiencing problems).
Four cases are currently being analyzed, of whom one will be discussed in this report. The
recordings were made during writing classes, so that the individual observed remained within the
normal classroom setting. Each student was observed on a weekly basis. A videocamera was
trained on his or her script and moving hand. A microphone recorded any speech. Other
observations were recorded in a diary which I kept. All scripts from the writing classes were
collected. In addition, the case study students and their families were interviewed with an
interpreter present.
Using all these different techniques ofdata collection, I hoped to get an accurate and sensitive
description of students’ writing development in their second language.
Nasr
Nasr was 14 when he came to Brunswick Language Centre. He had completed 10 years of
schooling in Lebanon, and had already been introduced to Roman script through French. He
was very proud ofhis knowledge. Muhammad Alman’s (the Arabic-speaking aide at Brunswick
Language Centre) opinion of his first language writing was that he had good ideas, but made
grammatical and spelling errors.
Nasr was living in Brunswick with his mother, sister, brother-in-law and their baby, and his
brother, who was then about 18. His brother had already been in Australia about a year and spoke
reasonably fluent English. His father arrived in Australia from Lebanon a few months after the
study began.
Nasr is a Lebanese Christian; religion and patriotism are closely linked and of supreme
importance to him. Margaret, one ofhis teachers, reminisced about that mixture of seriousness
and playfulness which is so characteristic of him:
He was so enthusiastic and lively, and he really enjoyed the writing. In fact all that class did.
He did take the writing seriously, although he wasn’t academic. (Pers. comm. 15.12.86).
awareness of the differences between oral and written communication affects the ability to
write well...
It is the growth of this awareness with reference to his second language that is seen as the most
crucial factor in Nasr’s development as a writer. Whereas at the beginning his writing seems to
be little more than recorded speech, by the end of the six months during which I observed him,
Nasr showed both by the texts he was producing, and by his behaviour while composing, that
he was aware of the demands of producing written rather than oral English.
Though Nasr produced only a few pieces ofwriting in his first language, it is clear that from
the beginning he had a reasonably sophisticated idea about written language in general. Firstly,
as he had learnt some French, he already knew the Roman alphabet. He learnt very early to use
the English alphabet orally when enquiring about spelling. Even in the early pieces he was using
rhetorical devices in English (My Contry).
However, in spite of previous knowledge and aptitude, many language-specific aspects have to
be relearnt and the writer must to some extent forge a new identity which incorporates the
experiences s/he has undergone. By the end of the first six months Nasr could not be expected to
have completed his development. Idea units in the last piece are in fact shorter than in some earlier
pieces, owing to the fact that he had not yet gained a clear concept of idea unit boundaries at the
earlier time. He has not learnt to revise at the organizational level. He has not written any complete
expository text (Martin and Rothery, 1981). However, he has written observation/comment,
report, recount and narrative. By the end of 6 months, he has a good appreciation of what is
required of narrative. He has gained mastery over various kinds of cohesive links, in particular
subordination, ellipsis and reference and the form and function of the past tense, essential to the
production ofsustained narrative. From the change in his revision strategies we can conclude that
he has gained an appreciation of the way writing, as distinct from speech, can be polished.
report 2 513
to features such as the following being more characteristic of written, rather than spoken English
(see Chafe, 1985:110 for a more complete listing):
use of passives;
use of nominalization;
subordination rather than co-ordination.
Writing is regarded as permanent, as an artifact, whereas speech is considered dynamic and
evanescent. Thus with writing, greater accountability, concern with ‘evidentiality’ (Chafe,
1985:118) is required. At the same time, writing can be altered, restructured, polished.
Successful writers have also learnt which ‘register’ or ‘genre’, defined by Martin and Rothery
(1981:2) as
is appropriate to their particular communication task. Whilst there can be similarities between
some spoken and some written genres, they generally call for different structures and language.
It is true that modern technology, in particular the tape-recorder and the computer, are now
closing the distance between speech and writing, and that, as Beaman (1982:51), and Tannen
(1982:14) have pointed out, many of the differences outlined are a function of genre rather than
of medium. Nevertheless, Nasr’s English writing does gradually acquire many of the features
which have been presented here as characteristic of written language, and by his behaviour, for
example his revision strategies, he shows that he has gained an appreciation of the differences
between the spoken and the written medium. Therefore these features are considered significant
when we examine Nasr’s development as a writer in his second language.
Development of genre
In the discussion of Nasr’s development of genre, I will use the taxonomy provided by Martin
and Rothery (1981). Though it was devised for young English as a mother tongue writers, it may
be useful to see whether the system could be applied to second language learner writers. There
are, however, several problems inherent in Martin and Rothery’s approach, which I will present
here as unanswered questions. After examining Nasr’s development I will suggest that certain
modifications, or perhaps further amplifications need to be made to their concepts.
Martin and Rothery (1981) have given a sequential account of genres required at various
stages of schooling, starting with ‘observation/comment’, which then splits into two strands: a
‘narrative’ strand, consisting of‘recount’, ‘narrative’ and ‘thematic narrative’ (for definitions, see
Martin and Rothery, 1981:11—12) and an ‘expository’ strand, which consists of ‘report’,
‘exposition’ and ‘literacy criticism’. They accuse the system ofvaluing the ‘narrative’ above the
‘expository’ and claim that many never master this latter genre, because it is explicitly taught.
The following questions remain unanswered:
1) Given that report and ‘embryonic’ exposition appear as early as year 2, what is the
significance of the order as presented?
2) | What is the place of input in the developmental sequence, or is the sequence an order of
difficulty?
3) How do students move from one genre to another, given that, as Martin and Rothery
state, any topic or field might call for a number of different responses?
4) Why are mixtures, or ‘melanges’ regarded as so inferior given that the genre outlines are
‘guidelines’ rather than ‘straight jackets’ (Martin and Rothery, 1981:47)?
5) What importance would they assign development within the genre?
(Term Il)
(9) 11/6/85 The fellow and the red observation/comment
(10) 11/6/85 My house observation/comment
(11) 2/7/85 New Name for teachers observation/comment
(12) 2/7/85 My Memories observation/comment
(13) 9/7/85 | like dogs poem
(14) 9/7/85 | like stars poem
(15) 14/7/85 draft 3 of My Contry (My Country)
(16) 18/7/85 Boy in the Sea narrative/observation/comment
(17) 18/7/85 The names observation/comment
(18) 6/8/85 War in my Village narrative
Note: The spelling in the titles has been slightly amended to facilitate comprehension.
I will now briefly discuss the schematic structure of the more important L2 pieces, four of
which are reproduced in the appendix.
The first recorded English piece, J like Australia (21/2/85), consists of a series of
autobiographical statements according to two basic sentence models, which could be regarded
as the written equivalent of what has often been described as a ‘formulaic’ stage oforal language
development (Nicholas, 1985), where Nasr is making use of‘prefabricated structures’ (Hakuta,
1974:287) some of which have been provided by the teacher, and some found in a children’s
book (e.g. ‘I like boats’).
report 2 515
In the second piece of significance, a ‘Lie’: My School B.L is on the Moon (3/4/85) Nasr does
what Martin and Rothery (1981:52) have considered impossible: he plays creatively with a text
before he has mastered its genre. The idea was presented to him by the teacher. He combined
his knowledge ofschool vocabulary with his knowledge of food, names; making teacher, students
and classroom into a kind of food or drink. Although the fictitious description he has created is
not coherent, Nasr has fun manipulating the few structures (basically the copula and ‘have’
constructions) and the little vocabulary he has mastered so far.
After this time he became dissatisfied with playing and with writing simple pieces and did not
write for almost two weeks.
My Contry (15/4/85) is extremely difficult to categorize. It contains elements of three different
genres: of report, the information that Lebanon is being attacked by surrounding nations,
observation/comment, as the author's feelings are included, and exposition, as Nasr has attempted
to provide a reason for Lebanon’s tragedy. The reason is more symbolic than logical. Lebanon
is seen as both small and excellent, therefore a threat to surrounding nations, just as Jesus Christ
was gentle and good, and so was crucified. It has been suggested (Robert Paths, pers. comm.) that
this piece could be a form of written prayer, which fits in with the fact that Nasr attended a
Church service just prior to writing it. Though it would still be based on spoken language, it 1s
a more formal spoken genre, and the model would have been in L1.
My Contry is regarded as a transitional piece, where elements of several genres appear in
preliminary form. This becomes clear when we look at the next piece, which is predominantly
a recount, with elements of narrative beginning to emerge: The first day I came to School
(22/4/85). Whilst Nasr and his fellow students are the main protagonists, there is also a small
crisis: the fear which the Teacher in Charge has inspired and the strangeness of the new situation:
Nasr thought ‘him came to hit me’. His fear is resolved, we are not told how, except that Nasr
refers incidentally to the passing of time in the coda (‘and now... .’). Now students and teachers
alike are remembered in his prayers.
When we look at Nasr’s activities and output early in Term 2, we are reminded that
development does not occur in a nice linear progression, but in fits and starts, with long fallow
periods. Nasr retreats back into observation/comment, with short pieces about his house, his
favourite colours, using more ‘formulaic’ sentences about his teachers, his classmates and
‘memories’. Some of these constructions are suggested by the teacher, some by classmates or by
student publications. One genre which lends itself well to ‘prefabricated structures’ is poetry, and
Nasr takes to this with enthusiasm. He had previously written down some songs and poems that
he knew in Arabic and French, and now adapts children’s rhymes and follows his teacher
Margaret's suggestions to write little poems.
Boy in the Sea (18/7/85) marks a tremendous leap forward. It is predominantly a narrative,
as it has a crisis (a boy is drowning in the sea) and resolution (Nasr saves him). At the beginning
it sounds more like a recount: Nasr and his family go to the beach. The actual story is fictional,
but the excursion to the beach is probably true. This is perhaps why Nasr has trouble integrating
the two parts of the story. He realizes the need for some conclusion but is not sure what is
appropriate, reverts to observation/comment:
‘I was happy and I like to go their (there) and onther (another) time’. (D.1).
War in my Village (6/8/85), written only three weeks later, does not suffer from these
problems. It is a pure and unadulterated narrative. There is an introduction which sets the scene.
Nasr is contrasting the external conditions, the rain and the cold, with his cheerful mood (‘It was
a good morning but it was raining and very cold’). He establishes an atmosphere of cosiness
Language
The development of cohesion
The ability to create sustained written text independently is regarded here as the most vital skill
for any writer to acquire. One way of observing the acquisition of this skill is to trace the
appearance of various types of cohesion. Halliday and Hasan (1976) have provided a useful
framework. Though a later version of cohesion analysis has been published (Halliday, 1985a),
the 1976 framework has proved productive for the current description. Halliday and Hasan
(1976:13) define cohesion as follows:
The concept of cohesion accounts for the essential semantic relations whereby any passage of
speech or writing is enabled to function as text. We can systematize this concept by classifying
it into a small number of distinct categories—reference substitution, ellipsis, conjunction,
and lexical cohesion . . . Each of these categories is represented in the text by particular
features—trepetitions, omissions, occurrences of certain words and constructions—which have
in common the property of signalling that the interpretation of the passage depends on
something else. If that ‘something else’ is verbally explicit, then there is cohesion.
Of the various forms of cohesion, only reference and conjunction will be considered here.
Reference is a way of directing the listener or reader to an item which has either gone before
(anaphoric reference) or which is to come (cataphoric reference). The principal linguistic expressions
of reference are personal pronouns, possessive adjectives, deictics and the definite article: e.g.
Do you know John? He’ a friend of mine. Reference may be to something else in the text
(endophoric reference) or in the situation (exophoric reference) (Halliday and Hasan, 1976:145).
Conjunction is a way of‘relating to each other linguistic elements that occur in succession but
are not related by other structural means’ (Halliday and Hasan, 1976:227). They can be paratactic
report 2 517
or Aypotactic (Halliday and Hasan, 1976:322), or we could call them co-ordinating (e.g. ‘but’) or
subordinating (e.g. ‘although’).
The most marked changes in the development ofcohesion in Nasr’s writing have occurred in
the areas of Reference and Conjunction. Though Halliday and Hasan are mainly concerned with
relations between sentences, at Nasr’s stage, the development of cohesive links takes place
predominantly, though not exclusively, within sentences.
Nasr has two tasks ahead ofhim, firstly to find the appropriate linguistic way to create links
in English, for example to acquire the standard form of the personal pronoun, which is clearly
an ESL function. Secondly he must discover the form of linking which is characteristic of written
rather than spoken English (see discussion, page 124). The second task is similar to that facing
young English writers (Beaman, 1984:50).
Reference
21/2/85 (school) chis name Bronsvik languiche cintre
25/2/85 | like boats because the boats its verry beautufoul
4/3/85 they body is lemons
the chair is 25 kls and she salad
the room have six window and she (changed to be) is beans
22/3/85 | like red because red...
| like yellow because the coulour.. .
| like cat and bird because the animals . .
| like soccer because it
| like teacher because—— teacher me
12/4/85 (translation) | love Jesus and evryone love Him His protect evryone
15/4/85 My contry was lovely contry it was...
My contry is... but hisis...
22/4/85 Miss R. | like she because she...
Mr Bill | like him...
| like my school because she is a moon
11/6/85 | like this coulour because the red one it’s My books coulor its a happy coulor evry day and |
like evryone and happy and the yellow colur | like it (‘s’ crossed out) because it’s my best
clothes and it’s happy too. ‘
(my house) . it’s a good cat...
the dog is an Alsatian | like him very much | like my house because | was born in it.
2/7/85 Mr bill he love all the studints
Miss Phon she like the camera and she love all the studints too
2/7/85 | remember...
and when | remember this things i cry and i wand from god to protect all Mi memories.
18/7/85 . . because it was a very hot day . . . my father and my mother they make the food .. . saw a
boy shouting help help . . . | arrived to him and talled him give me your hand he gave me his
hand and | pull him to me. . . | got him and | went to the beach and | talled my parent what
happened.
6/8/85 War in my Village
Something was happened one of my village dead . . . my village yong they got the shooting
gun and they went to protect the village and we went to my grandfather’s house to stay their.
My cosins talled us that someome killed that mam (man). Suddnely we hear someone is
shooting so that was the yongs how had the shooting gun they found the killer and that’s all
my story and it’s true.
Note: Only endophoric references have been underlined because they are considered most important for writing development.
It is this form of reference which allows the reader to rely only on the text, rather than on outside information, as is the case
with exophoric reference.
Conjunction
22/2/85 (a) beacuse (because) | wath (watch) Saturday and Sunday
‘and’ as co-ordinator
22/2/85 (b) | like boats beacuse its verry beautufoul
4/3/85 ... and the hair is jelly
... and the cloose (class) is water
. and one with eye
. and she salad
. and verry smale
. and she is beans
22/3/85 | like red because red is very hot. (Model provided by teacher)
12/4/85 | write story of Jesus because...
| love Jesus and evryone love Him because His protect evryone and when we go and when we
mouve the Jesus protect we (us).
15/4/85 ...butnow...
. because Lebenon....
. why? because...
. but his is the good
and is the lovely. ..
.antheend...
22/4/85 but his the boss. . .
and when his open the door
and his look
report 2 519
and said
after | ask him what your name
and after 10 minits
and evryone came
and the first teacher
and her name is...
and she start the first lesson
and she asked...
and now | pray...
22/4/85 (b) because she is very happy . . .
when she teach...
but | like him...
because she is a moon (written in Arabic first)
11/6/85 (a) | like this coulour because...
and it’s a happy coulor...
and | like
| like it because
and it’s happy...
11/6/85 (b) ...catandadog...
but the dog is Alsatian... .
| like my house because | was born in it
2/7/85 and when | remember this things | cry.
| wand (want) (from) god to protect all mi memories.
9/7/85 A star that is shooting across the dark sky.
A dog that is dreaming very still.
4/7/85 Lebanon which | means my country. (My country D.2)
18/7/85 ... because it was a very hot day.
and my father and my mother they make the food.
suddnely...
...and | arrived to him
| talled him give me your hand
and | pull him to me
and | went to the beach...
and | talled my parents what happened.
and | like to go to their onther (another) time.
(the beach) which were my parents ther . . . (D.2)
6/8/85 when | waked up.
but it was raining and very cold.
Suddnely when we siting near the fire. . .
and they went to protect the village.
and we went to my grandfather’s house,
because my father was not home.
and my cosine talled us
that someone killed that mam (man)
. . So that was the yongs,
how (who) had the shooting gun.
and that’s all my story,
and it’s true.
In talking about Nasr’s development of conjunction, I shall be mainly concerned with actual
conjunctions and these mainly within the sentence, though sentence boundaries are not initially
clear. Nasr, in fact, is initially under the impression that conjunctions can start a sentence.
I have included in the discussion of ‘conjunction’ the use of relative pronouns, though strictly
speaking these are in a different category, as the links between clauses are structural. However,
the aim is to demonstrate Nasr’s growing ability to combine idea units. He progresses from using
conjunctions to using constructions with relative pronouns, which are more ‘integrated’.
In the first piece (21/2/85) no conjunctions are used at all. ‘And’ is used as a co-ordinator:
Here it has an additive rather than a causative function. However, the next day (22/2/85) Nasr
was using it in the correct way, having found a model in a child’s book which he has slightly
adapted:
‘I like boats because it’s verry beautufoul’.
4/3/85—Many units are not joined by any conjunctions at all but ‘and’ is used six times.
22/3/85—The following pattern was provided:
‘T like... because...’
12/4/85—Nasr translated an L1 report ofachurch service. In this translation we find not only
‘because’ but ‘and’ as an additive conjunction together with ‘when’ (temporal). Firstly, in this
translated section, Nasr has used two new conjunctions. Secondly, he has integrated more clauses.
The second ‘because’ clause is dependent on the first one and there are two ‘when’ clauses
dependent on the main clause. Through translation, Nasr has been able to use several new
elements. These have featured in his subsequent writing.
By 14/5/85 Nasr is learning to combine clauses. Four is the highest number there:
It was the good contry of all contries. but now my contry is not good because Lebanon I
means my contry the israil soldier shooting the libanes soldier and the people in my contry
no just israil and syria, evryone shoot my contry.
He does not have control over the linking process and the effect here is somewhat meandering.
Nevertheless this is developmentally a very significant step, because he is already attempting to
create longer idea units.
22/4/85—While the predominant conjunction is still “and’ (7x), ‘after’ (temporal) appears for
the first time. At this stage, Nasr is using ‘and’ together with other conjunctions such as ‘when’
or ‘after’. It is not clear why he does this—though it is a very common characteristic of spoken
language (e.g. ‘and I said, and then he said and... ’).
The next ‘integration’ feature occurs on 9/7/85 in the little poems, the model for which was
a children’s hook:
Once again, Nasr has used ‘prefabricated structures’ to’ experiment with new forms.
On 14/7/85 and 18/7/85 Nasr tries to link clauses using the relative pronoun:
14/7/85 ‘which I means my country’ (Draft 2)
report 2
‘Where’ is in fact a combination (even in form) of which and there. On 18/7/85 the first link
beyond the sentence ‘suddnely’ also appears. By 6/8/85 (the final piece) his use of the relative
pronoun is closer to target (except for spelling):
‘. .. How (who) had the shooting gun’.
This piece is remarkable for the number of units which have been ‘integrated’.
My village yong they got the shooting gun and they went to protect the village and we went
to my grandfather s house to stay their because my father was not home
Nasr retains control of this very long sentence. Everything is happening at once, people are
scattering in all directions. In addition, in other parts of the piece, linking has occurred beyond
the sentence level as in ‘Suddnely, when...’ and ‘so that was...’.
Development of conjunction can be very broadly characterized as going from co-ordination
to subordination. However, within this continuum there are further refinements, in the degree
of ‘integration’ or connectedness between the units being joined together. The ‘additive’ is the
simplest one. Though ‘because’ is used, the relation expressed is an additive one (21/2/85).
Next is ‘because’, the ‘causative’ relation which seems out of place but may be explained by
the fact that teachers are very fond of asking ‘why’. The next is the ‘adversative’, ‘but’, then
comes the ‘temporal’, both ‘co-ordinating’ and “‘subordinating’ as in ‘after’ and ‘when’. Finally
we see some linking at a more profound structural level than conjunction, such as ‘which’ and
‘that’ and links beyond the sentence such as ‘So that...’ and ‘Suddnely...’
Form and function of the past tense
As is the case with other features, Nasr faces two separate tasks simultaneously. It is this
simultaneity which distinguishes second language writing development from first writing
development. One task is to maintain consistency in the use of the past tense. The other task is
to learn how the past tense is formed. Though Nasr has progressed in both these areas by the end
of six months, mastery is by no means complete.
In the whole of Nasr’s output, there are only 4 pieces featuring the past tense at all. By the
time he tried a recount he understood what the past tense was for and he had some idea of how
to form it. On 15/4/85, though he used only one past verb form, the copula, and he did not
attempt to sequence past events, he did contrast a past and a present state.
One week later, on 22/4/85, Nasr’s knowledge was clearly at a transitional stage. He knew
there are two different ways of forming the past tense, and he was sure of ‘asked’, which is used
correctly throughout. Other verbs he was unsure of, and he tried out different forms, which I
would like to claim is not a random phenomenon but is a sign that he had narrowed down his
report 2 523
hypotheses about the form and was awaiting confirmation. These include: ‘said’ where he has:
(Draft 1) ‘said’, ‘said’, ‘say’ and ‘is say’
There are several verbs that remain in the incorrect form. No variation found in their forms:
It is reasonable to assume that Nasr has no basis on which to determine whether these forms
are correct or not. For example, both weak and strong verbs are represented.
By 18/7/85 Nasr has added several new forms to his repertoire. There are many more forms
that are secure, there is far less of what I have called experimentation. Common strong forms such
as ‘went’, ‘saw’, ‘swam’, ‘got’, and ‘ate’ (Draft 2) and the weak form ‘happened’ all remain correct.
‘Started’ gets the ‘ed’ added on at a later stage.
By 6/8/85, while he is not completely confident, and not all the verbs used are in their
conventional form (e.g. ‘was happened’, ‘tailed’, ‘waked’) he has control over a variety of verbs
in the past tense (‘got’, ‘killed’, ‘went’, ‘had’ and ‘found’). He does revert to the present tense at
one point
Suddnely I hear someone is shooting, so that was the yongs . . .
However, a single example provides insufficient evidence for any conclusion as to intention.
Strategies
Thus far attention has been focussed mainly on the text. Even so, the aim of the analysis has been
to gain some understanding of the processes underlying Nasr’s writing development in his second
language. One strategy Nasr seems to be using is what I have called ‘the exploration of variation’,
where Nasr has experimented with different possible forms, whether it be the personal pronoun,
or the past tense.
Several other strategies may be inferred from the texts. One very important strategy which has
received great attention in relation to oral second language development (e.g. Kellerman,
1978:59; Pit Corder, 1978:90) is the use of L1. Another strategy which may be observed by
comparing different drafts of Nasr’s pieces, is revision. They must be looked at in conjunction
with videodata, as Nasr carries out many of his revisions on the run, much like repairs in
conversation.
When asked, ‘If you don’t know how to write something, how do you find out?’ he said the
dictionary was the best way.
He also uses translation, though not very often, to access language which is developmentally in
advance of the stage he has otherwise reached. The construction he found remains in his inter-
language, to be used at first in a formulaic way, then gradually assimilated. There are two instances
where a translation strategy was used, both from My Contry (15/4/85). In the first instance, it is
inferred. In the second, there is direct evidence from videotape and from translation of an L1 text.
... and now Lebenon crucified on the cross.
Though this item, the only instance of the use of the passive voice, was produced through
interaction with the teacher (see page 148) it must have been conceived in L1, firstly because it was
a metaphorical rather than a literal use of the concept, secondly because he had written an L1
version of the Easter service so that he had been working on the whole concept in L1 (12/4/85).
I pray to Jesus to protect my contry.
This item is more significant, though less specialized, because it is gradually integrated into
Nasr’s interlanguage. Initially it is used as a formula, subsequently he is able to manipulate
smaller units and recycle them.
report 2 525
We can find the origin of this construction in the L1 piece previously mentioned: a report of
the Easter service (12/4/85). Nasr translated part of this piece himself. I was not present so do
not know if he used the dictionary.
The self-translated text reads as follows:
I write now story of Jesus because I love Jesus and evryone love Him because His protect
evryone. and when go and when we mouve the Jesus protect we.
He could not translate the rest, nor could several puzzled Moslem translators. Some months
later we finally discovered by accident that the text consisted of the Lord’s Prayer and the
Catechism, and Nasr was delighted. He spent some lessons typing out the English version
(29/7/85).
The section Nasr left in Arabic (subsequently translated by Muhammad Alman) reads in part:
I ask Jesus Christ to protect Lebanon for us from all those troubles.
As well as evidence from similarities in L1 text and self-translation, we have evidence recorded
on videotape (15/4/85) that Nasr actually looked back to the L1 piece during the composition
of My Contry not once, but several times.
Table 5 Nasr’s subsequent use of the construction: ‘I pray to Jesus to protect my contry’.
22/4/85 ... and now | pray to Jesus to protect my school and my teachers and my freinds
(sic.: Draft 1)
22/7/85 | wand (from) god to protect all mi memories
(Note the time lag—3 months)
29/7/85 And at the end | pray for Jesus to protect my contry and my family and all the world too.
(‘Jesus story’—English version of Catechism)
6/8/85 they went to protect the village.
We can see from these examples that both the vocabulary item and the structure have
remained. Though Nasr has not made extensive use of translation, these two instances are most
productive for his development.
Revision
Revision is seen by Graves (e.g. 1983:151) as being at the heart of the writing process and
essential to success as a writer. However, the ability to revise involves an understanding of the
differences between formal written prose and speech. As speech, at least in terms of the students’
experience, is generally more ephemeral than writing, revision of the complete spoken text is not
possible (though repairs of course are). Initially, when Nasr is doing little more than recording
his oral language, he is not concerned with revision, nor can we expect him to be. Only gradually
does Nasr’s whole concept of revision change, as he becomes more confident at manipulating
English written forms.
There are two places where Nast’s revision strategies may be observed: (1) the videodata,
where he shows evidence of making running repairs, and (2) where multiple drafts of the same
piece exist.
It would be interesting to compare the phenomenon of running repairs in writing with that
of ‘repair’ in speech, although this is beyond the scope of this paper. One important difference
in speech is that repairs are often the result of interaction with another person, whereas in writing,
it is the author himself who provides the impetus. The increase in scope of his revisions, and the
(3) 22/4/85 The First day | came to school 2 drafts (no videodata)
D.1 to D.2: (see page 137)
18/7/85 Boy in the Sea 2 drafts
Revisions within D.1
we went—we arrived (see Input)
quickly (orally)—suddnely
| swim (orally)}—I swam
| sa (crossed out)—1 talled him
I—he gave me his hand
after (crossed out)—and | pull him to me
and (crossed out)—onther (another) time
report 2 527
And | tallied my parents Whi my parents
were ch
ther
what happened We ate and we went home
To ther To ther
After they (erased)—Because my father was not their (Inserted before “after my cosins . ..)
(Note: D_2 is the same date as D1 unless otherwise stated)
The teacher, Rosemary, provided instructions about revision and drafting on 21/2/85.
However, Nasr did not start revising independently till 15/4/85. He may have remembered and
stored the instructions, but at this stage he was rewriting rather than revising, as he did not look
at his first draft and the only change ‘palastin * palestan’ corresponds to a change in his
pronunciation.
22/4/85 The First Day I Came to Scheel
While there are marginally more verbs in the past tense in D2 (2 more, though D.1 has ss
came’) Nasr mainly seems to be experimenting with different forms instead of consistently
14/7/85 The availability of typewriters in term nwo was strong motivation for revision. Nast
returned to 2 much earlier piece (My Consry) and revised it, adding elements which improved
cohesion (‘and’-—‘bur), correcting spelling or syntax (e.g. ‘comity — coumiry Smeailler” Deals)
adding detail (e.g. crusifid /ke Juses).
18/7/85 Boy inthe Sea has the most intens revisions both within D.1 and in D2. Here
setofive
Nasr is concerned with word choice such as asking for ‘arrived’ rather than ‘went’ and ‘suddenly
rather than ‘quickly’. At the same time we see the growth of ‘within draft’ revisiens, as Nasr
develops the ability to stand back from his writing during the composing process itself and
anticipate the reader's need.
For example on 6/8/85 (Warinmy Village) he changes ‘they’ to “my cosins’ and adds because
my father was not home’ as an explanation of why the children did not go to their own home
during the shooting.
The past tense continues to fearure prominently in revisions, an indication that whilst Nasr
is aware ofthe need to be consistent, he still has not mastered the form completely.
Nasr has developed in the following ways:
— Some of the processes of revision have become more automatic.
— Many of the revisions are now concerned with the needs of the reader and with adding
elements more characteristic of written English. .
— He is now able to select from alternative wordings in his repertoire (or his teacher's
repertoire, see Input page 148).
Pauses
Through observation of Nasr’s behaviour while composing, we can come just 2 little dlaser to
an understandi ng processing strategies. By comparing the number, length and reason
ofhis actual
for the breakshe takes during the composingof an early and a late piece, we can also observe the
development which has taken place in his language processing capacities.
My Contry
was hove (3) —
ly contry (3)
before
100 years ago
It was the good contry of all contri (20) ies
(asks the teacher for plural of ‘country )
but now (15)
my contry Is (3)
not good (26)
because (8) (writes ‘n’ and then erases it)
Lebenon (13)
| means my contry (14)
the Israil soldier shooting the libenes soldier and the people in my contry no just (14)
Israil (5)
and palistin (3)
and syria (22)
evryone (19)
shoot my contry (9)
(corrects spelling—adds ‘r’)
and now (4)
Lebenon (53)
(See Input, page 148—Teacher helps with ‘crucified’)
Why? because my contry is the smaller (4)
of all contries but his is the good and is the lovely of all the contries (3)
and the end (4)
| pr (5) ay for Je (3) sus (4) to protect my contry.
War in my Village
it was a good morning when | waked up (32)
but it was raining (21)
and very cold (32)
suddnely when we (5)
siting near the (3)
fire (69)
something was happened one of my village die (11)
dead (7)
my uncle tailed us (38) (interruption)
my village
yong they got the shooting gun and they went to protect the village and we went (12)
to my grandfather’s house to stay their (17)
(writes ‘after they’, erases it (5)
because my father was not home after (12)
my cosins talled us that someone killed that mam (man)
suddnely we hear (38)
someone is shooting so that was the yongs (11)
report 2 529
(adds ‘s’)
how had the shooting gun they found the killer (13)
(see Input—asks teacher about ‘killer’)
and that’s all my story and it’s true.
Although there are no videotapes in existence for pieces before 15/4/85, it seems reasonable
to infer from Nastr’s early invented spelling and from the fact that he consistently subvocalizes
while writing, that he initially concentrated on merely recording his oral language. On 21/2/85
he wrote ‘Bronsvik languiche cintre’ for ‘Brunswick Language Centre’, although he had already
learnt some words as ‘gestalts’: e.g. ‘father’, ‘mother’.
By 15/4/85 he was already beginning to censor himself as he wrote. The most common reason
for a break is that he is unsure of spelling: e.g. ‘lovely —he said the word three times, and he
initially left out ‘r in ‘contry’ and went back to put it in.
During the composition of My Contry (15/4/85) Nasr stopped more frequently, less time
than during War in my Village (6/8/85):
Table 8 Pauses in two texts
The length of pauses is more inconsistent in the earlier pieces because Nasr stopped when he
had some difficulty. Otherwise he wrote steadily, keeping pace, word for word, with his
subvocalization.
In the later pieces there is more evidence of forward planning as Nasr will say a whole phrase
or clause before writing, and also reads aloud what he has written when he reads. He uses a rising
intonation rather like taking a running jump, to create a link between what he has written and
what is still to come.
Further evidence of more extensive planning at the later stage can be seen in the positioning
of pauses, which now coincide more frequently with idea unit boundaries. The first sentence of
each piece provides a clear illustration of this point (slashes mark pauses):
(My Contry)
My Contry/was love/ly/contry before/100 years ago/it was/the good contry of all contr/ies
(War in my Village)
It was a good morning when I waked up/but It was raining/and very cold.
Nasr initially used production strategies more characteristic of speakers than writers in that
he worked in very short ‘message chunks’ (Gumperz et al, 1984:8) and showed little evidence of
forward planning.
The video data have provided evidence of the development in Nasr’s language processing
capacities in English in that he became able to plan and revise increasingly large chunks. This
would seem to be an essential pre-condition for the production of written text.
By 18/7/85 he is using circumlocution in order to obtain the words ‘arrived’ and ‘suddenly’.
NASR: We went... we came, not we came, we drove in the car, and after we...
ale Arrived?
N: Yes, arrived.
N: Miss, what's this called . . . quickly! quickly!
it: Suddenly?
N: (as he writes) Suddnely ...
In D.1, Nasr wrote ‘sea-clothes’ but he knew this was not the conventional word, so that
when he was writing D.2 he asked:
Miss, how do you say ‘sea-clothes’ . . . ?
He also knew that ‘weared’ is not the right form of the past tense of ‘wear’, but he was quite
content to write it in D.1. Later he asked:
What is the past tense of ‘wear’?
By 6/8/86 he is able to describe quite precisely the context in which the item is required. He
is using the teacher to confirm a hypothesis:
NASR: Miss who killed the man... if Isaid one killed, and the people found . . . the killer?
Here Nasr is actually demonstrating the process of turning oral language into language more
appropriate to formal written text: the transformation of verbs into nouns, or nominalization.
There are possibly personality factors involved in the extent to which students are able to
make use of a relatively unstructured situation. Nasr seemed to benefit; he is an outgoing person,
who used “message expansion strategies’ (Pit Corder, 1983:17) from the beginning. Initially,
while his knowledge of English was limited, he used gesture, which is still a very effective method
in some cases. Then he went to circumlocution and to neologism, and finally he was able to
specify quite exactly what he needed, either by providing the exact context or by using
grammatical terminology for the required item.
Some input from teachers took a long time to percolate, for example the notion of revision.
This does not imply that the input should not be provided, but that immediate reaction should
not be the only criterion for something having been learnt. It might also suggest that, whilst the
relationship between developmental sequence and input is still not clear, it should be taken into
account when providing input. For examples, students must learn to co-ordinate before they can
subordinate, and they cannot revise their writing if they are still struggling to record their speech.
Premature demands may well be detrimental, as they could engender a feeling of failure.
It is encouraging that Nasr seemed to be able to generate the input he needed at the
appropriate time, and that he was able to apply knowledge gained elsewhere without being
specifically instructed in the writing class.
report 2 531
Conclusion
At the beginning, I posed the question: How had Nasr achieved all this? Nasr’s achievements may
be summarized so follows:
1) — In respect of genre, he followed Martin and Rothery’s (1981) sequence part of the way.
He learnt gradually to differentiate the school-based genres of observation/comment,
recount and narrative.
2) He gradually built up a repertoire of cohesive links in English which enabled him to move
from writing a series of statements to creating written text. The ability to create more
‘integrated’ text was associated with an increase in his English language processing
capacity, which enabled him to combine larger chunks of written text.
3) He learnt to revise. Initially he did not revise, as he was occupied with recording his oral
language. Next he became concerned with matters of spelling and syntax, and finally came
to a limited understanding of style options. He also learnt to revise on the run, to stand
back during the composing process itself.
I suggest that these are necessary skills to master, especially for school-based writing.
Nasr also had something to say. Acquiring standard forms of expression may make the writing
more accessible, but not necessarily more interesting. I confess to a passing nostalgia for the non-
standard eloquence of Nasr’s early work (e.g. My Contry), with its repetitions, grammatical
parallelism of ‘no just israil and palistin and syria evryone shoot my contry’ and the religious
fervour of ‘I pray to Jesus to protect my contry’. Nasr learns to cut all that sort of thing out, he
learns to play the English game.
When we come to how Nasr did all this, we are left with two problems which, for the
moment, must remain unsolved:
1) It is not clear at this stage what the effect of the differences between the Arabic system and
the English system—both the school system and the text system—might be.
2) Though it is known that Nasr had basic literacy in his L1, it is not known precisely how
sophisticated his understanding of written language was in his L1, so that it is not entirely
clear which concepts involved transfer of L1 knowledge and skills, and which involved
cognitive advances. I did not specifically ask him to compose in L1 because I wanted to
investigate under what circumstances he would do so. There will be data available from
other cases in respect of this particular question.
The study has generated the following hypotheses:
1) Nasr learnt because he was not interfered with and was allowed to experiment. He was
not corrected unless he asked to be. The question of whether he would have been
successful in a more structured classroom is beyond the scope of this study and must
remain unanswered.
2) Factors which influence the choice of learning strategy include:
— personality;
— linguistic distance between L2 and L1;
— previous literacy skills.
Nasr chose to use his L2 oral language as a basis for his L2 written language, though he used
his L1 at times. Other students may choose a translation strategy. This may involve a different
developmental path. After observing Nasr and other students learning to write in their second
language for the past two years, |am convinced that the choices are best left in the hands of the
learners themselves.
Appendix
Typed version of four of Nast’s texts
| LIKE AUSTRAlla
Pime NASR NAbbaut I’me gaw scoule chis NAme Bronsvik languiche cintre I live 12 osborne
st Me House Not fraway of the scoule
P'Ime libanise. 'ime live and my brother together and My father cisty in-laws and Mother
report 2 533
My schoul B.L is on the moon
My schaul bieng one from big schouls is on the moon. My school is building from chocolate,
the teacher is tomato the table is water melon, and the hair is jelly, the books is pineapple the
pens is cucumbers the bag is lions the rubber is soup and the cloose is water.
.the students have two very big head and one with the eye, the jeans is Nember 200 and 250
kls there are three hand, 50 fingers and 5 leg. they body is lemons, the chair is 25 kls and she
salad. the window is oranges and very smale. The Room have six window and he is beans. two
blak.board from the plum.
My COnTry
My conTry was lovely conTry before 100 years ago iT was The good contry of all contries. but
Now my contry in Not good because lebanon I means My contry The israil soldier shooting The
libanes. soldier and The people in My contry No Just israil and palistin and syria evryone shoot
My contry.
and Now Lebanon crucified on the cross. why? because My contry is the smaller of all The
contries but His is The good and is The lovely of all The contries. anThe end I pray to Jesus to
protect My contry.
war in My village
it was a good Morning when I waked up. buT itwas Raining and verycold. suddnely when we
siting near the Fire somthing was happend one of My village dead My uncle talled us.
My village young they got The shooTing gun and they wenT to proTect the village and we
wenT to My grandfaTher’s house To sTay Their.
Because My Father was NoT home. after My cosins talled us thaT someone killed that Mam.
suddnely we hear someone is shooting so That was The yongs how had the shooting gun They
found The Killer. and that’s all My story and its’ True.
Source: Reprinted from Australian Review ofApplied Linguistics 9(2), pp. 120-53, 1986.
Joseph J. Blase
University of Georgia
The study reported in this paper examines teachers’ perspectives on effective school leadership. Formal
interviews, both unstructured and structured, and informal interviews were used to collect data from teachers
in one urban high school in the southeastern United States. Data were collected and analyzed according to
guidelines for grounded theory research. This article describes factors teachers identified with effective school
principals and the impact of these factors on the teachers and their relationships with other faculty, students,
and parents. The research data are discussed briefly in terms of their implications for leadership training and
research.
Research on principals has increased dramatically in recent years. Much of this research has
generated descriptions of what principals do (Dwyer, 1985; Martin & Willower, 1981; Metz,
1978; Peterson, 1977-1978). Other research has specifically investigated the instructional leadership
role of the principalship. Whereas researchers have argued the efficacy of the principal in this role,
some suggest that the primary effects of a principal’s leadership may in fact be indirect (Blumberg
& Greenfield, 1980; Hannay & Stevens, 1985; Lipham, 1981; Silver & Moyle, 1984). In addition,
the literature on school effectiveness has offered images of principals as ‘strong leaders’ and has
linked leadership to, for example, school climate, teacher morale, and organizational performance.
Despite a developing knowledge base regarding the effective school principalship, definitions of
effectiveness and ineffectiveness have relied primarily on test scores or peer nominations. Little
attention has been given to the relationship between leadership and school context variables. Even
here, for the most part, inquiry has focused on manifest outcomes and has provided scant data
regarding the process of leadership associated with such outcomes (Greenfield, 1984). And although
some studies provide detailed qualitative descriptions of school context (Becker, 1980; Cusick,
1983; Waller, 1932) and the principalship (Dwyer, 1985; Wolcott, 1973), negligible data exist that
describe meanings associated with principals’ actions specifically from the teachers’ perspective
(March, 1984; Sergiovanni & Corbally, 1984). Consequently, the ‘thick descriptions’ necessary for
understanding the complex nature of leadership in terms of its effect on teachers and the
sociocultural context of the school are noticeably lacking. These types of qualitative data are essential
to building descriptions and substantive theories of school-based leadership grounded in the
meanings, values, norms, beliefs, and symbolic structures characteristic of school cultures.
Although this article focuses specifically on dimensions of effective school leadership, the data
were drawn from a comprehensive case study of the working lives of teachers. By focusing on the
teachers’ perspective on effective and ineffective principals, this study was able to determine the
impact of high school principals on teachers and their relationships with others. Generally, it was
found that the leadership orientation of principals appears to have strong effects on the
sociocultural contexts of schools. For instance, the data demonstrate that effective principals
positively affected the specific meanings teachers attributed to core issues—for example,
participation, equitability, and autonomy—from which social and cultural patterns seemed to
evolve. This article describes findings associated with teacher perceptions of effective (and to
535
some extent ineffective) principals and briefly discusses these findings in terms of their
implications for leadership training and research.
In general, the data base used for this article was detailed and consistent, thereby making
possible the exploration of categories as well as emergent relationships. As a result, a high level
of theoretical integration (i.e., the strength of interrelationships between and among categories
and themes) was achieved. This was evident for relationships between major themes (e.g.,
principal’s effectiveness) and constituent categories (e.g., accessibility), between and among
categories associated with the effectiveness theme (e.g., accessibility and problem solving), and
for relationships between categories describing the leadership orientation of effective school
principals and their impact on teachers and others.
report 3 537
The researcher collected and analyzed all of the study data. Glaser and Strauss (1967) insist
on the necessity of this, given the requirement of deep involvement and the complex and cyclic
nature of data collection and analysis in grounded theory inquiry. A panel of four experts (two
professors and two doctoral students) was consulted when questions arose about coding or
interpretation of the data. Finally, 12 teachers were asked to critique the researcher’s description
of substantive categories and hypotheses derived from the raw data (see ‘Results’ section of this
article) and the presentation format (Gruba, 1981). In five instances, teachers suggested that
additional data (brief quotations) be included for purposes of clarification.
In total, more than 400 hours were spent in the school setting during the 21/2-year period. It
is believed that this amount of time and the rapport that developed between researcher and
teachers served to increase the validity of the data produced. Moreover, throughout the
interviewing process, teachers were asked to give detailed examples for all statements made
(Becker, 1980; Bogdan & Taylor, 1975). Data were systematically examined for consistency
within and between interviews (for the same individual) and between and among all formal and
informal interviews (Connidis, 1983; Denzin, 1978). Given the length of time spent at the
research site and the utilization of joint collection and coding procedures, the researcher was
able to explore with teachers important ambiguities and questions that surfaced in the data. Data
were also assessed in terms of whether they were solicited/unsolicited, stated/inferred, and subject
to researcher influence (Bogdan & Taylor, 1975; Denzin, 1978; Erickson, 1986; Glaser, 1978;
Glaser & Strauss, 1967).
The data sections of this article describe what teachers themselves perceived as the major
dimensions of effective school leadership. Because of space limitations, only abbreviated excerpts
(quotations) from the study data are included to illustrate selected ideas. A discussion of data
related to ineffective principals and their impact on the school context (Blase, in press) is beyond
the scope of this article. However, the concluding discussion reflects some of the implications of
both data sets.
Task Factors
Accessibility. Accessibility refers to availability and visibility. Principals who were available to
teachers ‘arrived at work early and stayed late,’ ‘worked hard and long hours,’ ‘circulated a lot,’
and were ‘involved in everything-—’you see them everywhere.’ Teachers explained that such
principals were prepared to deal with the large number of teacher- and student-related problems,
issues, and questions that typically ‘cropped up’ during the school day. Principals who were
visible spent significant amounts of time ‘in the school .. . in the halls and cafeteria, where . . .
trouble with students was likely to occur.’ According to the study sample, such principals seemed
to help control and stabilize teacher and student behaviour. To illustrate, teachers disclosed that
the principal’s presence prevented certain problems altogether: ‘Kids are just better when the
principal is around, and teachers are more willing to get involved.’
Accessible principals were also viewed as ‘informed’ and ‘aware of what was going on in the
school.’ Thus, their decisions ‘made sense’ to teachers. Consequently, accessible principals were
perceived as using impromptu opportunities to discuss and support teacher goal attainment.
Generally, such principals made authoritative decisions, shared information, coached, advised,
and provided support to teachers and students. In addition, visibility in the classroom, at athletic
events, and at evening social activities seemed to carry important symbolic implications. The
willingness of the principal to ‘mix with . . . teacher and student’ was, in the teachers’ view,
related to ‘caring’, ‘guts,’ empathy, dedication, and generally the kind of leadership involvement
essential to individual and organizational improvement.
More broadly, attitudes and behaviours associated with accessible principals helped connect
teachers to their schools as a whole (‘I feel like a real part of the school’) and appeared to
strengthen teacher respect for and rapport with principals. Positive interactions initiated by these
principals were described as precipitating reciprocal actions from teachers. The accessibility of
principals and the positive interplay related to it seemed to enhance organizational cohesiveness
by reducing the social and psychological distance commonly present in superordinate-
subordinate relations.
Teachers associated accessibility with other categories—problem solving, time management,
goals/direction, knowledge, (‘principals with nothing to offer quickly retreat to the office’) and
report 3 539
jejuased yoddns
Hulpueysiapun
Ayuapt
‘SISIXE UOISUS] B|GeJAPISUOD ‘sjediouLd aAOaIjo Ag Pa1e}siUlLUPe S|OOYOS U! UAAA ‘JeU} Oe} BY} 0} ANP eq AewW Siu “Sa}e}s Huljee} eAljeHou Ul SeseeOap Se ,S]Oe}Je, pessnosip AjjuenbeJ} SIsyoes| ‘OJ0/)
J@YOES}
0}
UM Ayijiqe
aseasou|
JON
JOYIeE |
pejeloosse
JUS1Eq-
4aN SGN SGNdGN JdN 4aN ul
ul
aseaiou|
SAlJONPOIdUOU
pue
JOI|JUOD
IOI JUOJJUOD
Sua}qoid
pue
eseai9eq
Ul
uOIsUd}
BAINSOC
peye|e1)
-Aynoey
UOe1Ogel|Oo0}
pue
SUOOe19}UI (19A0-||IdS)
suonoesejul
aaonpoid
Ajnuap!
Ul BY} SISeEqJO}
saseq
Jo
07
Ajiiqe
eseei0eq
194YIe@|-
Buea
eBSeatoU|
eseau9Uy
eseesou|
xdON 4aN 4daN JaN
ul
ul
ul
ul
aseasou|
|
Jayoee
1D!|{UOD
SSOUBAIJEAGWOD
Ajigeys
fo
sdiysuonejas sseu SSUSS JOI|JUOD
(diysiaumo
}O
snoo}
pue }UOLJUCD
SUU9|GOIJd
eseasou|
ul
Ayjiqejoipeud
pue
‘Aunbiquue
ul
eseaioeq eseesou|
(}YyBI\s)
UI
-8AIS@YOO
UOoaJIP
SWW9|qO/d }UOJJUOD
WeIHOJd
Hulrjeuuuolsioep
UJIM SJOYIO
SUI} Pue
eul|diosip
SdiySUOe|el
JOIPEJd AjUEp!
JO
ASN
quepnis-
Juepnys 0} JUEIOIYS O}
jeuoljes
AjIIge Ajige
|
194089
ul ul ul Ul Ul
aseaineg
aseasjou eseaJOU eseasoU aseeloU
Si9yORS}
PUe
JO
SdIApe
Pue
WSIOI}UO
ul
soue}deode
eseaioU
UlBseaiOUAjljige
JUSLWAAJOAU!
Ul
|O1]UOD
S]UBPN}IS
O}
aseasOU ySe}
Ul
Blu}
UO SSOUBAI}NE}Je
aseasoU g0ue}sip JUsLUdOJaASp
pue
aoueuej}ulelU
jeoi6ojoyoAsd-jeio0s swa|qoid
jusWUanaiyoe
ul
sseujnjedoy
jnoge
je0b
eseaiouU ul)
BUIA|OS-LUa|qoJd
eseaJoU
INOIAeYaqsiwu
aseeineq
ul
Juspnis WOJ})
a1jOquuAS
(diysiapea Ajauui}
suolsioep
aye
ul
Aqijige
0}
UleseeloUJOI|JUOD
UOIe1H9e}U!
eseal0eq
Ul eseesou
AjuJeweoUN
pue SNDOJ/UOHOAJIP
Ul@BSeasoU
Jo
Ul
wei6oid pa}sem
ssadons
eseaioU eseal0eq
ul
aul} Ajisuejul
jo
suua|qojd
eseal0eq
Ul
Juepnis
juawaneiyoe
uoisnjuod ‘Ajuleyaoun 40 sBuljee} ul aseasneq
‘yeGue ‘uoljesjsnsy JO SBuljee, ul aseaineq
Aynbiquue
JOO
alyssapee]
sjqiwweosiq
‘aoueed ‘uolessni
JOJJUOD pue
Ajuieyeoun Ajuieveoun
‘ebue
SHuljae}
BSeaJoU|
Ul
JO
‘QOUSPIJUODUOHONpEYy
UI
SHulj99a}
JO
‘UOISUS} JO
WSI|EUDISSEJOJd
eSeasOU|
U!
BSUaS aseasoeq
ul
sHhuljea}
$0
Ajulewaoun WuSILUI}dO
UleseesoU
Jo
‘uolelsniy
SHuljae}
eseas0eg
ul
pejoedse
AQUa}SISUOD
= ON
401984
L
JGNx
suoijejoedxe uole}UaLO
Ayyiqisseooy asitedxe/abpe|Mouy UONde1Ip/S}EOD YyOnosy-Mo}|O4 HUIA|OS-LU8|GO14
pue SSOUBNISIN9Q
1e9|9
a|qeuoseal
jol|yUOD
“6'2)
(Buniqyoeg JuauujiWWos ABieue jeoisAyd
pue ‘jeuoljoua ‘aAI}!UB09 ul esesJOU
Ul
JUBLUSAJOAU! JeEINDIINIeXS
JUBWAAI|OAU!
SI (sjuaied
PEzIUBODeI
uoi}eJedood
ajeioud Ayjnoe} ul aseasou pue uolonsjsul ssejo ul eseasou| ‘QOUBPIJUOD ‘LU9S]S9-]}9S Ul] eSeeJOU pJemel pue esieid :uoiluB0oeY
fI)
Ul
uleseasoU
W99}SE-}|8S Ul BSeasoU
esesl9eq
jOl|JUOD
Ayjiqeyoipeidun
PUe
jUaWU}WUWWOD
Ayigeyinbe/sseusle4
report 3
JOYS B|OUM, BY} O}
jeolsAyd
‘JEUOIOLWA
ABjaue
‘AAI}!UHOO
JUSLUJILULUOD JO aSUaS Ul BSedJOU
eseasou|
esesJOU
Wwajqoid
HulAjos
dno6
sjuepnjs
pue
‘UO!eOIUNLULWOO
U!
ul
‘UOHe1OGe}|OO
JO}
joedsel
‘Ayiqeibajj0o
JUSLUCO|eAep Led} U! BSedJDU| uolssedu09/80Ue138|0} Ul Sees] W99}S9-}|9S Ul BSedJOU|
Sdiysuole|as
eseesou|
‘YsN}
UOHezIIGeys
Adue}sisuood
ulaseasou
usijeuoissejoid Jo asuas ul eseasoU| uoneynsuod/uoledioiwed
JON
jeoB joouos pue ssejo ul eseesoU
ul
JO
UOIJEJUOIJUOD/OddNS
uoNeonpe jo enjea ul eseasou jualu}WWOS ABsjeue jeolsAud pue
BSeasoU|
Ul
SUI}
JO}
UONONJSU! SAILIUBOO
eseasoU|
Ul
JEUOIOLUS
uoe1edood 1ayoes} U! eseasDU| josdseJ juapn}s ul eseesou| LU88}S9-4/9S Ul @Se@JOU|
yoddns jejussed ul aseasou| sseuanisayoo dnoi6 ul eseasou| S]UapPN}s JO |O4]UOD Ul eseesoU ymoi6
eseesoU|
ul
jeuoissejoid
JOI|JUOD
suoljoesejul
Aiyssapea|
JSNJ} ‘UOHEDIUNWILUOD UI esealduU| — SWUa|qOJd SAjOS 0} AjI|IGe Ul eBseasJOU AQualolyje J8YyIeS} Ul BSessoU|
| JUSIEg-
aaijonposd pue jeuoles Ul eseeloU|
JOYIEO
JO
/EAIOULd
(418S)
JEYyIe2
J9Yyoea
JOYIeA
quapnj}s-
10198,
|-
184082
|
|
SJOUIO YIM Sdiysuole|as pue siayoes} UO JOedUU! JIay| :Si0jOe} diusiapes| peyejei-UONelspisuCD Z AIGWL
personal traits such as authenticity, compassion, friendliness, security, trust, open-mindedness,
and optimism.
Consistency. Consistency refers to the compatibility of principals’ behaviour and decisions
with existing policies, programs, rules, regulations, and norms. Consistency in the enforcement
of rules regarding student discipline appeared to be most salient in this vein. To teachers,
consistency was perceived to enhance their ability to control students in two ways: preventatively
(‘if [students] know what the teacher says goes . . . there will be fewer problems’) and remedially
(‘a student is sent to the office . . something happens . . . he is dealt with’). As with support (to
be discussed), teachers indicated that consistency by principals also reduced the possibility that
students would personalize punishments, blame the teacher, ‘build up defensiveness and
resentment, and exacerbate social-emotional tensions. Clearly, the data suggest that consistency
was inversely related to levels of classroom conflict and tension. Indirectly, consistency was
perceived to increase teachers’ abilities in curriculum planning and instruction (‘when discipline
is a problem, your lessons are geared for control . . . learning takes a back seat’). In general,
teachers reported that principal consistency helped to maintain a ‘rational’ (understandable and
fair) organization: ‘Inconsistency ruins everything . . . the whole system is ruined.’
Consistent principals were seen as being adept at withstanding pressure to make decisions for
‘political’ reasons at the expense of sound educational practice. (For example, teachers believed
that ineffective principals frequently ‘caved in’ to parental demands for preferential treatment of
students.) Teachers indicated that consistent principals helped reduce the tension and conflict
among individuals and groups, conflict that was viewed as salient under inconsistent principals.
According to the teachers, consistency was related to goals and direction, clear expectations,
problem-solving orientation, knowledge/expertise, fairness/equitability, and personal traits such
as honesty and security.
Knowledge/expertise. Although teachers associated effective administrators with a broad range
of competencies, they emphasised formal knowledge of curriculum and research in the content
areas. However, informal knowledge, ‘awareness of teachers’ problems and students needs,’ was
also indicated as important to effective administration. Knowledgeable principals were described
as ‘intelligent,’ ‘worldly,’ “experienced,” ‘perceptive,’ ‘prudent,’ ‘analytical,’ “having substance,’
‘well-rounded,’ and ‘well-educated.’ One teacher remarked, ‘He doesn’t have to be an Einstein,
but he can’t be a big dumb jock . . . many are.’ These sentiments were echoed by another, who
stated: ‘My suspicion is that many people in administration . . . their primary track has been
athletics. This does not equip a person to be an effective principal over an entire school
curriculum.’
Teachers pointed out that principals demonstrated knowledge by giving helpful advice,
showing awareness of the school, and maintaining ‘real involvement (e.g., attending activities)
in all aspects of the school. The latter point cannot be overstated. Broad and productive
involvement “in the whole school’ was stressed repeatedly. From the teachers’ standpoint, such
involvement was seen to offset the forces of favouritism and limited what appeared to be a natural
tendency toward fragmentation in the schools described by teachers. The data suggest that
principals’ knowledge was linked to levels of commitment, communication, and cohesiveness.
Knowledge and expertise were strongly correlated by teachers with accessibility, decisiveness,
goals/direction, problem solving, participation, fairness, and personal factors such as compassion,
friendliness, security, intelligence, and working long hours.
Clear and reasonable expectations. This category refers to the school administrators’ success in
creating policies, rules, goals, and standards, based on a ‘realistic assessment’ of teachers, and the
ability to communicate the meaning of expectations to teachers. In the simplest sense, teachers
linked clear expectations to skills in verbal and written communication. Clarity of expectations
report 3 543
assuming responsibility for the initiation of programs and for the continued supervision (when
necessary) and material resources essential to maintain and enhance teacher work efforts.
(Ineffective principals were often viewed as failing to provide resources for long-term maintenance
of new programs.) Effective principals were considered ‘proactive,’ ‘involved,’ and ‘facilitative’
in terms of supplying resources (‘she would get the process going . . . you could count on her
help’). Teachers reported that their time and energy were utilized more efficiently as a result.
Follow-through means ‘that there’s a serious effort to [help] you . . . to achieve your goals . . .
no follow-through . . . it’s all wasted.’
Many teachers also discussed follow-through with regard to receiving timely supervision (“he
kept us informed’; ‘you get a definite response to a concern you shared’; ‘I knew how the situation
was progressing’). Teachers explained that follow-through by principals worked to reduce
uncertainty, assisted in the clarification and determination of real goals (‘I watch to see if they
are serious’), and allowed teachers opportunities to make plans and decisions consistent with
those of principals.
According to the teachers, follow-through was directly related to support, goals/direction,
clear expectations, decisiveness, accessibility, participation, and the personal factors of working
long hours, compassion, and authenticity.
Ability to manage time. Effective principals were defined as individuals who managed time
efficiently. This meant that much administrative and clerical work was completed on their own
time. Although effective principals were ‘constantly busy’, they did not overschedule or
overcommit themselves during the school day. Thus, according to teachers, such principals were
accessible at various times and in various locations in their schools to talk, advise, and make
decisions required by teachers and students.
In the most fundamental sense, the effective management of time on the principals’ part was
evidenced during faculty meetings. Effective principals prepared and followed agendas. They
were also able to facilitate discussion and coordinate input in an efficient manner (‘they solicit
input... control the process . . . some teachers will talk forever’). All in all, teachers explained
that effective principals respected teachers’ time, ‘don’t run overtime,’ or ‘don’t meet just because
it's Wednesday.’
Efficient time management was linked by teachers to working long hours, accessibility, follow-
through, goals/direction, clear expectations, and participation.
Problem-solving orientation. An effective problem-solving orientation was associated with the
ability of principals to interpret and conceptualize problems in ways that make sense to teachers
(‘they understand the whole picture’). Principals who demonstrated such an orientation were
perceived as ‘thoughtful’ and ‘prudent’ (‘he doesn’t shoot from the hip’). From the teachers’
standpoint, competence in problem solving was judged largely in terms of the principals’ ability
to confront and reduce tensions associated with interpersonal conflicts. (The ineffective
principals’ approach to conflict, in contrast, was frequently defined as exacerbating problems.)
In general, the data point out that effective principals had rational responses to problems. Such
principals usually employed an incremental approach (‘there was a step-by-step approach ...a
definite way of looking at things’) which, according to the teachers, relied on skills in problem
definition (‘together we were able to construct a picture . . .’; “questions she raised helped clarify
.’) and problem solution.
In approaching problems, effective principals were seen as ‘positive’ and sensitive to the feeling
of teachers (“I wasn’t made to feel incompetent; there were no put-downs’; ‘she knew how to treat
people’; ‘I didn’t feel useless . . . like I failed’).
Clearly, the problem-solving orientation of effective principals was related to reducing barriers
to teacher performance, which, in turn, seemed to reduce levels of stress and conflict (‘the
Consideration Factors
Support in confrontations/conflicts. Support refers to the willingness of principals to ‘stand behind’
teachers, especially in regard to confrontations with students and parents involving discipline and
academic performance. In matters of student misbehaviour, the actions of effective principals
were seen as reinforcing teacher decisions. Disciplinary referrals, for example, were followed by
timely reprimands/punishments: ‘students were not sent back 10 minutes later smiling... with
a note saying that the student regrets the incident.’ Although principals disagreed occasionally
with teachers, such disagreements were approached ‘positively,’ ‘discreetly,’ and ‘constructively.’
Teachers indicated that the support of principals was related to decreases in classroom
misbehaviour. Many problems were either ‘siphoned off or prevented altogether (‘students know
they will be punished . . . are less likely to test . . . to take chances’). In effect, the classroom’s
vulnerability to disruption and the displacement of teacher work effort seemed to be reduced.
It should be explained that although effective principals tended to support teachers in conflicts
with students and their parents, this was by no means easy. Teachers explained that principals
frequently had to respond to conflicting expectations regarding student control problems,
educational considerations (e.g., program placement), and political factors (e.g., accommodating
parents). Within the political realm, for example, teachers frequently required support from
principals in confronting parents who demanded less homework and, occasionally, grade changes
and who objected to the placement of their children. In many cases, effective principals were
considered skilful at ‘clearing up the communication problems’ or ‘convincing a parent that the
school was trying to help their child.’
In essence, the data suggest that the actions of supportive principals increased the probability
of productive interaction among teachers, students, and parents. Teachers reported that under
effective principals, they were able to be more ‘open’ and straightforward’ with students and
parents. Hence, meetings, conferences, and telephone conversations tended to produce ‘real’
communication, that is, ‘an understanding of a student’s problem . . . on both sides.’ Indeed,
teachers reported that there was less suspicion and defensiveness, a greater consensus regarding
decisions and goals, and an inclination in parents to ‘cooperate with the school . . . for the child’s
sake.’
Finally, some teachers discussed support in terms of teacher development. For instance,
teachers indicated that encouragement by principals to attend workshops and conferences and
to take university course work facilitated professional growth and self-esteem (‘she did everything
she could to build us up . . . that’s why I am what I am today’).
Teachers related support to accessibility, follow-through, consistency, clear expectations,
goal/direction, knowledge/expertise, praise/recognition, and personal factors such as compassion,
open-mindedness, and security.
report 3 545
Participation/consultation. This category describes the principals’ willingness to develop
meaningful channels (‘not for image building’) for teachers to express their expertise, opinions,
and feelings, especially with regard to programs and student discipline. The data point out that,
although effective principals were perceived as generally more knowledgeable than their
ineffective counterparts, teachers believed that even an effective principal’s understanding of
curriculum, the content areas, and the problems and needs of teachers and students was
insufficient: ‘The school is too complex for any one person’; ‘some principals have been out of
the classroom for 10 years.’ Teachers argued that ‘many problems could not be resolved without
their help.’ In this respect, teachers linked shared decisionmaking to the quality of decisions
made by principals. The data indicate that the teachers’ participation in decisionmaking was
usually consultative (‘the principal made decisions based on input’) or shared (decisions seemed
to evolve naturally from discussion with principals).
It was apparent that positive interactions between principals and teachers and among teachers
increased under administrators with a participatory perspective. Without exception, teachers
reported that effective principals encouraged teacher participation by developing open
relationships. It was implied that the social and psychological distance between teachers and
administrators, typically associated with ineffective principals, was reduced as a result.
A participatory orientation in principals was linked to trust and respect for teachers. This was
further related to teachers’ sense of professionalism (‘I’m recognized for my professional
knowledge’) and the development of collegiality (‘it's a partnership’; ‘I work with him, not
around him’) between teachers and principals. Participation was viewed as helping connect
teachers to the school processes, programs, and goals (‘I’m a part of the whole’; ‘you’re on a
team that’s going somewhere’; ‘you are important to others’). To be sure, teachers implied that
principals who encouraged participation positively affected both quantitative (e.g., time, energy)
and qualitative (e.g., caring) dimensions of their involvement in work.
Participation/consultation was linked by teachers to time management, problem-solving
orientation, knowledge/expertise, accessibility, support, praise/recognition, and delegation of
authority. Personal factors included authenticity, compassion, friendliness, trust, working long
hours, and open-mindedness.
Fairness/equitability. Although consistency and fairness were strongly interrelated, fairness was
more specifically associated with ‘reasonable’ recognition of the needs and problems of individuals
teachers, students, parents, programs, and departments. Fairness extended to many dimensions
of teachers’ work. Generally, fairness in principals meant recognizing the personal and
professional rights of all teachers; principals who were fair showed no favouritism. In displaying
fairness, teachers explained that principals tended to maintain ‘a broad interest in the entire
program’: ‘Everyone was viewed in the same light.’ Reasonable decisions regarding resource
allocation (‘the debate team was just as important as the football team’), the assignment of
distasteful duties (‘he didn’t dump work on certain teachers . . . jobs were rotated’), the
distribution of rewards and punishments (‘the hotshot coach doesn’t have special status’), and
the handling of interpersonal conflict (‘he listened to both sides . . . used a problem-solving
approach’) were related to teachers’ perceptions of fairness.
According to the data, fairness by principals helped to develop positive personal and
professional identities of teachers (“You grow . . . you are not slighted or put down’). Fairness
seemed to contribute to reductions in ambiguity and unpredictability and to increases in faculty
solidarity. This trait in principals worked to reduce informal status differences and barriers to
communication and support among teachers (‘they made everyone feel they are part of the
team’). Fairness was further related to increased trust among faculties (‘there’s less jealousy here
now ... no reason to be suspicious’) and better faculty morale (‘we're a lot happier... as a group,
report 3 547
trust, respect, self-concept (‘you feel important working for a person like that’), and teacher job
involvement.
Teachers associated principals’ willingness to delegate authority with accessibility,
goals/direction, participation, and personal factors such as compassion and open-mindedness.
The real value ofleadership rests with the ‘meanings’ which actions import to others than in
the actions themselves. A complete rendering of leadership requires that we move beyond the
obvious to the subtle, beyond the immediate to the long range, beyond actions to meanings,
beyond viewing organizations and groups within social systems to cultural entities. (p. 106)
The study reported here has attempted to understand meanings teachers have attributed to
effective school leadership. This, of course, is only one perspective. Other qualitative studies
focusing on how students, parents, superintendents, and school board members perceive effective
principals would be helpful. Such studies would furnish data to help elaborate, interpret, and
undoubtedly contradict some of the findings presented in this article.
References
Bacharach, S.B., & Lawler, E.J. (1982). Power and politics in organizations: The sociol psychology ofconflict,
coalitions, and bargaining. San Francisco: Jossey-Bass.
Barrett-Lennard, G.T. (1962). Dimensions of therapist response as casual factors in therapeutic change.
Psychological Monographs, 76(43), 1-36.
Bass, B.M. (1981). Stogdill’s handbook of leadership: A survey of theory and research. New York: Free Press.
Becker, H. (1980). Role and career problems of the Chicago public school teacher. New York: Arno.
Blase, J. (1985). The socialization of teachers: An ethnographic study of factors contributing to the
rationalization of the teacher’s perspective. Urban Education, 20(3), 235-56.
Blase, J. (1986). Socialization as humanization: One side of becoming a teacher. Sociology of Education,
59(2), 100-12.
Blase, J. (in press). Dimensions of ineffective school leadership: The teacher’s perspective. Journal of
Educational Administration.
report 3 549
Blum, F.H. (1970). Getting individuals to give information to the outsider. In W. J. Filsted (Ed.),
Qualitative methodology: Firsthand involvement in the social world (pp. 83-90). Chicago: Rand McNally.
Blumberg, A., & Greenfield, W. (1980). The effective principal: Perspectives on school leadership. Boston:
Allyn & Bacon.
Bogdan, R., & Biklin, S. (1982). Qualitative research for education: An Introduction to theory and methods.
Boston: Allyn & Bacon.
Bogdan, R., & Taylor, S. (1975). Introduction to qualitative research methods: A phenomenological approach
to the social sciences. New York: Wiley.
Connidis, I. (1983). Integrating qualitative and quantitative methods in survey research on aging: An
assessment. Qualitative Sociology, 6(4), 334-52.
Cusick, P.A. (1983). The equalitarian ideal and the American high school: Studies ofthree schools. New York:
Longman.
Denzin, N.K. (1978). The logic of naturalistic enquiry. In N.K. Denzin (Ed.), Sociological methods: A
sourcebook (pp. 245-76). New York: McGraw-Hill.
Dwyer, D.C. (1985, April). Contextual antecedents ofinstructional leadership. Paper presented at the annual
meeting of the American Educational Research Association, Chicago.
Erickson, F. (1986). Qualitative research on teaching. In M.C. Wittrock (Ed.), Handbook of research on
teaching (3rd ed., pp. 119-61). New York: Macmillan.
Glaser, B. (1978). Theoretical sensitivity: Advances in the methodology ofgrounded theory. Mill Valley, CA:
The Sociology Press.
Glaser, B., & M. Strauss, A. (1967). The discovery ofgrounded theory: Strategies for qualitative research.
Chicago: Aldine.
Greenfield, W. (1984, April). Sociological perspectives for research in educational administration. Paper
presented at the annual meeting of the American Educational Research Association, New Orleans.
Guba, E. (1981). Criteria for assessing the trustworthiness of naturalistic inquiries. Educational
Communication and Technology Journal, 29(2), 75-91.
Hannay, L.M., & Stevens, K.W. (1985, April). Zhe indirect instructional leadership role ofa principal. Paper
presented at the annual meeting of the American Educational Research Association, Chicago.
LeCompte, M.D., & Goetz, J.P. (1982). Problems of reliability and validity in enthnographic research.
Review ofEducational Research, 52(1), 31-60.
Lipham, J.A. (1981). Effective principal, effective school. Reston, VA: American Association of School
Principals.
Lofland, J. (1971). Analyzing social settings: A guide to qualitative observation and analysis. Belmont, CA:
Wadsworth.
March, J.D. (1984). How we talk and how we act: Administrative theory and administrative life. In T.J.
Sergiovanni & J.E. Corbally (Eds.), Leadership and organizational culture: New perspectives on
administrative theory andpractice (pp. 18-35). Chicago: University of Illinois Press.
Martin, W.J., & Willower, D.J. (1981). The managerial behaviour of high school principals. Educational
Administration Quarterly, 17(1), 69-90.
Metz, M.I. (1978). Classrooms and corridors: The crisis of authority in desegregated schools. Berkeley:
University of California Press.
Miles, M.B., & Huberman, A.M. (1984). Qualitative data analysis: A sourcebook of new methods. Beverly
Hills, CA: Sage Publications.
Peterson, K.D. (1977-1978). The principal’s tasks. Administrator's Notebook, 26(8), 1-4.
Sergiovanni, T.J., & Corbally, J.E. (1984). Leadership and organizational culture: New perspectives on
administrative theory and practice. Chicago: University of Illinois Press.
Silver, P., & Moyle, C. (1984, April). School leadership in relation to school effectiveness. Paper presented at
the annual meeting of the American Educational Research Association, New Orleans.
Author
Joseph J. Blase, Associate Professor, Department of Educational Administration, College of
Education, University of Georgia, Athens, GA 30602. Specializations: organizational theory,
sociology of teaching.
Source: American Educational Research Journal, vol. 24, no. 24, pp. 589-610.-Copyright 1997 by the
American Educational Research Association. Reprinted by permission of the publisher.
report 3 551
~ 4 on
Pa ge
PGA
wie ire! olay sont ati nee ane
i tail ee sow,
a ana Matai rg te la bg lg feisiaauia on WY
os? = Xam eh,
a7
eastigi ec oe cymas
nity
1m my oe aly 2 Waalee
creas een ve i. delet *,
8 eed
gh etl 30)
. ww ia : : : <q 6
om bf ew 7 = @ aorpegeel
ee ee me ty Ihe Senet A
ae,’
Me. Gabe rw ott BMEBe
. aaa
es &. hy 9 qapetan
Hep ae >. Ua
ann, O0 b awe b) 1. ome > 37° OS
~—areees! ears 7 7 ;
Nice, WA OGRE rere « hein:' > oem ' Ln
' nf ongn@
et -an P= . :
vtile. 11 Gy Seema. A de. 42 A linn ow ons-
a ye
-
Vejen, CO Cr Skin 1% phacna
: 7 a 7S
ie aL és J ,a & a waaal 4 a q
— ) i = « ai Tikes: ?
met ea ae war "cane i
r P arpa - Su
“el re vedi Cee 'eum)
Attitude surveys, questionnaires and structured interviews are three methods of obtaining
research data used in both quantitative and qualitative research. In the latter mode, they
figure strongly in triangulation processes. This chapter will focus on attitude surveys,
while the next one will consider interviews and questionnaire surveys.
Attitude measurement
introduction
Attitudes are evaluated beliefs which predispose the individual to respond in a
preferential way. That is, attitudes are predispositions to react positively or negatively to
some social object. Most definitions of attitudes suggest there are three major components:
the cognitive, affective and behavioural components. The cognitive component involves
what a person believes about the object, whether true or not; the affective component
is the feelings about the attitude object which influences its evaluation; the behavioural
component reflects the actual behaviour of the individual, though this is rather an
unreliable indication of an attitude.
Attitude scales involve the application of standardised questionnaires to enable
individuals to be placed on a dimension indicating degree of favourability towards the
object in question. The assignment to a position on the dimension is based on the
individual’s agreement or disagreement with a number of statements relevant to the
attitude object. Many hundreds of scales indexing attitudes to a wide range of objects
exist. A valuable collection of scales can be perused in Shaw and Wright (1968).
Despite the existence of many reliable and valid published attitude scales, the
researcher often finds that they wish to assess attitudes to a specific social object for
which no scales exist, or for which scales produced and validated in another culture are
not appropriate in our context. The construction of attitude scales is not difficult, but
there are a number of differing methods of construction, of response mode and of score
interpretation. These various approaches will be considered shortly.
The individual items of statements in an attitude scale are usually not of interest in
themselves; the interest is usually located in the total score or subscores. An attitude
scale usually consists of statements, i.e. the belief component of the theoretical attitude.
These statements could all be preceded by ‘I believe that...’ and are rated on a 3, 5, 7
(or even more) point scale. This rating provides an index of the emotive value of the
affective component of each statement. Of course, the third element of an attitude, the
behavioural act, is not assessed. This behavioural component may not be congruent
with the expressed attitude as measured on the questionnaire, since other factors such as
social convention, social constraints, expectation, etc., may prevent the act which should
follow from being performed. For example, a student who manifests negative attitudes
to ethnic groups other than their own may not display such prejudice in their overt
behaviour because of fear of the law, consideration of what others would think, etc.
Write down now what you remember an interval scale to be, then check your answer
with the material in chapter 8.
Using the sample attitude to teaching items listed on the previous page, which have
already been screened to exclude ambiguous or irrelevant items, act as if you are one of
Thurstone’s judges. Assign to each item a number ranging anywhere between 0.0 and
11.0 ona scale in which a score of 0.0 is given to those items that are most unfavourable
towards teaching; 5.0 is given to neutral items; and 11.0 to the items most strongly
favourable. Do not assign numbers on the basis of how you feel about teaching, but on
the basis of how strongly each item indicates favourability or unfavourability to teaching
in general. When you have finished, compare your judgements with the average
judgements made by the original judges shown at the end of this chapter (p. 564). If
there are large discrepancies between your judgements and the averages, can you think
of reasons why?
Which statistical test of difference would you use if you assumed the data were:
a__ interval
b ordinal
and you wished to test for differences in attitude between two groups?
Important
1 Place your check-mark in the middle of the spaces, not at the boundaries.
2 Besure you check every scale for every concept; do not omit any.
3 Never put more than one check-mark on a single scale.
4 Do not look back and forth through the items. Do not try to remember how you checked
similar items earlier in the test. Make each item a separate and independent judgement.
5 Work fairly quickly through the items.
7 6 5 4 3 2 1
Good X bad
Rigid X flexible
Independent X submissive
Democratic X authoritarian
Disorganised X organised
Cooperative X uncooperative
Nonconforming X conforming
The particular scales included are those the investigator wishes to include—
usually on the grounds of relevance to the attitude under investigation.
To prevent the acquiescence response, set-scale polarity is reversed for pairs in random
order, and for these the scoring on the 1—7 range is reversed. For individuals, a total score
reflecting level of self-evaluation can be obtained on the dubious assumption—as with
most other instruments—that all items are equal in their contribution and that the data
are of the interval type. With groups, such totals would be averaged or an average
response could be computed for each scale. The semantic differential technique appears
appropriate for use with children of 12 years of age and upwards.
Advantages
1 In most uses of the Thurstone and Likert techniques, measurement of an attitude’s
affective aspects is stressed, though cognitive and conative qualities are often
intermingled with the affective judgment. The individual’s beliefs about an object’s
potency, activity, and at times other less important dimensions of meaning, may also
be crucial to their overall attitude, determining whether their behaviour toward an
object is similar to or very different from the behaviour of other individuals whose
evaluative ratings resemble their own. As an example, one subject might rate the
concept PRINCIPAL as unfavourable, strong and active; another subject might rate
PRINCIPAL as equally unfavourable, but also as weak and passive. The first subject
might actively avoid or seek to placate their principal; the second might ignore or
attempt to exploit their principal.
2 A semantic differential is relatively easy to construct.
Disadvantages
1 The assumption of equal interval data may not be sound, and, like the Likert approach,
ordinal data is certainly a more valid assumption.
2 Scales weighted heavily on the evaluative dimension for one concept may not be
strongly evaluative when applied to another concept. It would seem necessary for a
factor analysis to be undertaken to ensure that presumed evaluative scales actually do
index the evaluative dimension when referring to a particular attitude object. The
Validity
The validity of attitude scales is often checked by concurrent validity using known
criterion groups, i.e. sets of individuals who are known in advance to hold different
attitudes to the relevant object. For example, random samples of ALP and Liberal Party
members could act as criterion groups for the concurrent validation of an attitude scale
towards private schools. If the scale differentiated statistically significantly between these
two groups, then it could be said to have concurrent validity.
Predictive validity is also possible by selecting some criterion in the future such as
voting behaviour. Content validity can be gauged by requesting judges to indicate
whether the items are relevant to the assessment of that particular attitude. Finally, of
course, construct validity using factor analysis of the inter-correlations ofitem responses
will demonstrate homogeneity or heterogeneity of Likert and semantic differential scales.
Many attitude scales are multifactorial or multidimensional, in that they do not measure
one unitary attitude but groups of items, each measuring different dimensions of the
attitude. Face validity can cause problems. In order to ensure motivation, the statements
are often fairly obviously related to the attitude object in question. In fact, it is extremely
What criterion would you select to check the concurrent validity of:
1 an attitude towards the wearing of seat-belts scale?
2 an attitude scale towards corporal punishment in schools?
General criticisms
The chief criticism that might be levelled at all attitude scales is concerned with the
indirectness of measurement, i.e. verbal statements are used as a basis for inferences
about ‘real’ attitudes. Moreover, attitude scales are easily faked. Although administering
the scales anonymously may increase the validity of results, anonymity makes it difficult
to correlate the findings with related data about the individuals, unless such data are
obtained at the same time. It seems that we must limit our inferences from attitude-scale
scores, recognising that such scores merely summarise the verbalised attitudes that the
subjects are willing to express in a specific test situation.
Attitude scales are self-report measures and they suffer from the same problems as all
other self-report techniques. What subjects are willing to reveal about themselves would
seem to depend on such factors as willingness to cooperate, social expectancy, feelings
of personal adequacy, feelings of freedom from threat, dishonesty, carelessness, ulterior
motivation, interpretation ofverbal stimuli, etc. The study of human emotions, feelings
and values about objects in the environment is clouded by those very same variables.
Response sets too, such as acquiescence (the tendency to agree with items irrespective
of their content) and social desirability (the tendency to agree to statements which social
consensus would, it is believed, indicate are socially desirable and reject those that are
socially undesirable) fog the data derived from attitude scales. The best way of
eliminating acquiescence is to randomly order positive and negative items to prevent a
subject ticking madly away down the same column.
References
Edwards, A.L. (1957), Techniques ofAttitude Scale Construction, Appleton Century Crafts, New York.
Hinckley, E.D. (1963), “The influence of individual opinion on construction of an attitude scale’,
Journal ofAbnormal and Social Psychology 67, pp. 290-2.
Likert, R. (1932), ‘A technique for the measurement ofattitudes’, Arch. Psychology, p. 140.
Osgood, C.E., Suci & Tannenbaum (1957), The Measurement of Meaning, University of Illinois
Press, Illinois.
Warr, P. & Knapper, C. (1968), Perception of People and Events, Wiley, London.
Further reading
Shaw, M. & Wright, J. (1968), Scales for the Measurement of Attitudes, McGraw Hill, New York.
Thurstone, L.L. (1929), The Measurement of Attitude, University of Chicago, Illinois.
The survey is the most commonly used descriptive method in educational research, and
gathers data at a particular point in time.
The survey can be highly standardised with a schedule of questions which must be
responded to in the same order, with the same wording and even the same voice tone in
the interview to ensure each subject is responding to the same instrument. In less
standardised surveys, there is only enough direction given to stimulate a respondent to
cover the area of interest in depth while having freedom of expression. But the choice
of method is affected by the following considerations, among others:
¢ Nature of population; for example, age, reading or writing skills, wide or localised
geographical dispersal.
¢ Nature of information sought; for example, sensitive, emotive, boring.
¢ Complexity and length of questionnaires/interviews.
¢ Financial and other resources; for example, time.
The aim is to select an approach that will generate reliable and valid data from a high
proportion of the sample within a reasonable time period at minimum cost.
An interviewer-administered survey is more accurate and obtains more returns than
a postal self-completion survey.
Face-to-face interviewing is essential where:
¢ the population is inexperienced in filling in forms or poorly motivated to respond;
¢ the information required is complicated or highly sensitive; and
* the schedule is an open one, requiring individualised phrasing of questions in response
to the respondent’s answers.
The least expensive method, self-administered postal questionnaires, would be the
obvious one to adopt if the opposite conditions held, especially if the population was
highly scattered geographically.
The advantage of the survey is that it can elicit information from a respondent that
covers a long period of time in a few minutes, and, with comparable information for a
number of respondents, can go beyond description to looking for patterns in data. But
However, there are often difficulties in interpreting cross-sectional data. For one
thing, there may be changes from year to year in the variable being studied. For example,
if one were interested in using a cross-sectional approach to examine the development
of number skills berween the ages of four and six, one might assess these skills in two
sasnples of 100 children at each of the two ages. It might then be found that the norms
ed advances in some skills, no difference in others, and decrements in the rest
between the two age groups. However, the actual sample of four-year-old children might,
f followed up after two years, turn out to be much better in all skills than the original
six-yeas-olds in the sample. The reason for this could be that environmental conditions
rdevant to the development of those number skills had changed during this period,
though there are other equally likely explanations.
The cross-sectional method is most often used to produce developmental norms for
different ages, thus allowing one to assess whether a particular child is ahead of or behind
the norm, which is often an important diagnostic question. However, by concentrating
on averages, this approach tells us very little about individual patterns of development,
and may indeed give a false picture of growth. If some children develop very quickly
between the ages of four and five, and others very slowly, this will be obscured in the
cross-sectional data. The impression would be that all develop at a steady rate.
FIGURE 30.2 The longitudinal study—each subject observed (tested) five times over
the course of the study
Time of Measurement
<4]
EHO
Al]
QV VY}
]]O
El]]
<]]{ ]O
EE]
ED}
RYT}|
Basa
EO}
Same subjects measured at ten-year intervals
|c
|
5]/
QJ
By collecting information in this way, one can interpret an individual’s status in terms
of their own past growth, and pick up many of the variations and individual differences
in developmental progress.
A good example of a longitudinal approach is the UK National Child Development
Study (Davie, Butler & Goldstein 1972) which followed up nearly 16 000 children
1 Briefly outline the relative merits of the longitudinal and cross-sectional survey
methods.
2 What do you perceive to be the major factors which may lower reliability and validity
indices in: (a) an interview (b) a questionnaire?
Closed items
The closed items usually allow the respondent to choose from two or more fixed
alternatives. The most frequently used is the dichotomous item which offers two
alternatives only: yes/no or agree/disagree, for instance. Sometimes a third alternative
such as ‘undecided’ or “don’t know’ is also offered. The alternatives offered must be
exhaustive, i.e. cover every possibility.
Open-ended items
Open-ended items simply supply a frame of reference for respondents’ answers, coupled
with a minimum of restraint on their expression. Other than the subject of the question,
there are no other restrictions on either the content or the manner of the respondent’s
reply, facilitating a richness and intensity of response. Open-ended items form the
essential ingredient of unstructured interviewing (see chapter 24).
EXAMPLE
What aspects of this course do you most enjoy? .......eeeee
Open-ended questions are flexible. In interviews, they allow the interviewer to probe
so that they may go into more depth if they choose, or clear up any misunderstandings;
they enable the interviewer to test the limits of the respondent’s knowledge; they
encourage cooperation and help establish rapport; and they allow the interviewer to
make a truer assessment of what the respondent really believes. Open-ended situations
can also result in unexpected or unanticipated answers which may suggest hitherto
unthought-of relationships or hypotheses. The major problem is coding or content
analysing the responses.
A particular kind of open-ended question is the funnel. This starts with a broad
question or statement and then narrows down to more specific ones. An example would
run like this:
a Many school pupils smoke these days. Is this so at your school:
b Do any of your school friends smoke?
c Have you ever smoked?
What are the advantages and disadvantages of open-ended questions compared with
closed questions?
¢ History
¢ Maths
¢ English
¢ Science
° Geography
Ranked data can be analysed by adding up the rank of each response across the
respondents, thus resulting in an overall rank order of alternatives.
A checklist response requires that the respondent selects one of the presented
alternatives. In that they do not represent points on a continuum, they are nominal
categories.
EXAMPLE
Which languages do you speak? (Tick as many as apply.)
J English
J French
_I German
1 Japanese
I Other (please specify)
This kind of response tends to yield less information than the other kinds considered.
Finally, the categorical response mode is similar to the checklist but simpler in that
it offers respondents only two possibilities.
1 Make the questionnaire as ‘appealing to the eye’ and easy to complete as possible.
2 Include brief but clear instructions for completing the form. Construct questions so
they do not require extensive instructions or examples. Print all instructions in bold
type or italics.
If questions appear on both sides ofapage, put the word ‘over’ on the bottom of the
front side of that page.
Avoid constructing sections of the form to be answered only by a subset of
respondents—such sections may lead respondents to believe the form is not
appropriate for them, or it may cause frustration and result in fewer completed forms.
If you have sections that consist of long checklists, skip a line after every third item
to help the respondent place answers in the appropriate places.
Avoid the temptation to overcrowd the pages of your questionnaire with too many
questions. Many people squeeze every possible question onto a page, which can cause
respondents to mark answers in the wrong place. Leave plenty of ‘white space’.
Arrange the questionnaire so that the place where respondents mark their answers is
close to the question. This encourages fewer mistakes.
Avoid using the words questionnaire or checklist on the form itself. Some people may
be prejudiced against these words after receiving many forms not designed with the
care of yours.
Put the name and address of the person to whom the form should be returned on the
questionnaire, even if you include a self-addressed return envelope, since
questionnaires are often separated from the cover letter and envelope.
The following are format considerations unique to interview schedules:
Print questions on only one side of each page of the questionnaire because it
is cumbersome for interviewers to turn to the reverse side of pages during the
interview.
Clearly distinguish between what the interviewer should read aloud and other things
printed on the questionnaire that should not be read. Different type styles can be
used to make this distinction unambiguous.
6 Do not end an interview with an open-ended question because the interviewer will
have a harder time controlling when the interview will end.
7 Leave enough space on each page so that interviewers can record any additional
important information obtained from the respondent.
8 Anticipate responses to open-ended questions and provide a list of these on the inter-
view form to help the interviewer mark responses. This will speed up the interview.
EXAMPLE
Is it desirable to have private schools?
VERSUS
Is it desirable or undesirable to have private schools?
The first suggests an affirmative answer. However, the second does not suggest an
answer and gives the respondent a freer choice. Bias resulting from loaded questions is
a particularly serious problem in interviews because respondents find it more difficult to
disagree with an interviewer than with a self-complete questionnaire.
Furthermore, avoid asking questions that assume that a certain state of affairs existed
in the past. For example, how would you answer the following questions?
a Have you stopped beating your children?
b Do you still design bad questionnaires?
Regardless of whether the respondent answers ‘yes’ or ‘no’, such questions imply
previous participation in the activities about which the person has been asked.
The following are additional suggestions that should be considered when writing
questionnaire items.
Be careful if you use abbreviations. Be certain the people you ask will know what
your abbreviations mean.
Avoid using, ‘if yes, then .. .’ questions on mail surveys. Usually these questions can
be asked in an alternative manner. Consider the following example:
Are you married?
LJ Yes
_J No
Response options
The response options offered to respondents can affect their answers. Confusing options
lead to unreliable results and, usually, low response rates. The following suggestions will
help you design appropriate response options for questionnaire items and ease coding
into the computer:
1 Make certain one response category is listed for every conceivable answer. Omitting
an option forces people either to answer in a way that does not accurately reflect
reality or to answer not at all.
EXAMPLE
How many years have you taught at this school?
0-6 years
7-8 years
In this case, people who have over eight years’ service are unable to answer the
question.
Include a ‘don’t know’ response option any time you ask a question to which people
may not have the answer. When surveys find that many people do not know about a
given issue, that information alone is often very valuable.
Balance all scales around a mid-point in the response answer:
Strongly agree
Strongly agree Agree
Agree ae SFO LD. BE eer Neutral
Disagree Disagree
Strongly disagree
EXAMPLE
Section F
Q1 Do you have any children?
(Circle ONE
number) Yes 1 Go to Q2
No 2 Goto Section G,
Page 18
Q2 How many children do you have?
(Circle ONE
number) One 1
Two Z
Three 3
Four 4
Five or more 5
Notice that the respondent who answers ‘No’ to Q1 in Section F will move directly
to Section G, omitting all questions which are not of relevance to them. As well as
saving the respondent a lot of time, it will also enable you, at the time ofthe analysis,
to easily identify those respondents who have children, and then perform certain
analyses on this subset of the data.
5 Some writers believe that if the mid-point of an agree—disagree scale is labelled
‘undecided’, responses will differ from scales where the mid-point is labelled ‘neutral’.
Therefore, label the mid-point according to the ‘exact’ meaning the scale requires.
6 Arrange response options vertically:
Yes
I No
rather than horizontally:
J Yes 1} No
This helps reduce errors that occur when people mark the blank after the intended
response rather than before it.
7 Make certain the respondent knows exactly what information should be put in ‘fill-
in-the-blank’ items.
EXAMPLE
Incorrect What is your age? .........
Correct What is your age?......years....... months
Pre-testing
By the time a study has journeyed through the planning stage and reached the stage
when the questionnaire is constructed, much effort and money have already been
invested. A pre-test of the questionnaire at this stage is useful to reveal confusing and
other problematic questions that still exist in the questionnaire. Pre-testing involves
Your viewpoint will be assisting the advance of knowledge in the area ifyou complete the
questionnaire, fold it and place it in the included free postage-paid (or stamped) envelope
and return by ...... [date] or ‘as soon as possible’
Please answer the following questions honestly. Your responses will remain strictly
confidential.
Do not sign your name.
List some suggestions for questionnaire construction that might increase the rate of
returns for mailed questionnaires.
Disadvantages of a questionnaire
1 Difficulty of securing an adequate response. Response rates tend to be much lower than
when the interview method is used. While certain strategies, including follow-up
mailings and careful attention to questionnaire design, may result in a response rate
as high as 90-100 per cent, response rates to mail questionnaires seldom exceed
50 per cent and rates between 15-50 per cent are common.
2 Sampling problems. All questionnaires are not returned, so the likelihood of biased
sampling exists as non-respondents may differ significantly from respondents. Usually,
the investigator is unable to learn the reason for non-responses.
Complex instruments, ambiguity or vagueness will cause poor responses.
Go The method is unsuitable when probing is desirable.
HX
Advantages of interviews
1 Flexibility. One of the most important aspects of the interview is its flexibility. The
interviewer has the opportunity to observe the subject and the total situation in which
they are responding. Questions can be repeated or their meanings explained in case
Disadvantages of interviews
1 The main disadvantage of interviews is that they are more expensive and time-
consuming than questionnaires.
2 Only a limited number of respondents may be interviewed due to time andfinancial
considerations. Scheduling of interviews may cause problems also.
3 Finding skilled and trained interviewers with appropriate interpersonal skills may be
difficult. High inter-rater reliability is difficult to achieve.
4 An interviewer effect may result from interaction between the interviewer and respondent.
Factors which may bias an interview include the personal characteristics of the
interviewer (such as age, sex, educational level, race and experience at interviewing);
the opinions and expectations of the interviewer; and a desire to be perceived as
socially acceptable by the respondent. Variations in the use of interview techniques,
including tone of voice and the inconsistent use of probes, also reduce standardisation.
Validity and reliability are seriously affected by all these factors.
WN Respondents may feel that they are being ‘put on the spot’.
1 What are the major advantages of using mailed questionnaires instead of personal
interviews?
2 What are the major advantages of using personal interviews instead of mailed
questionnaires?
1 Left of the vertical line is the interview record as completed by the interviewer; right
of the line are the coding instructions as printed on the record form and as encircled,
or otherwise completed by the researcher. In some circumstances, it would be possible
to combine these two operations.
2 Each piece of information has its two reference numbers. Thus, the ‘Life’s ambition’
of item VI is entered in Column 8 row 3.
3 Some pieces of information (example, I and VII) occupy more than one column. Ina case
such as I where reference numbers up to 999 are provided for but only two of the three
spaces are used, a single-digit number should always be thought of as 003 (say) and a two-
digit number as 079. This latter is the 079 which has been entered at the top of the coding
column.
4 Provision has been made for:
a omissions: those items which the respondent or, in this case, the interviewer, has
failed to complete; and
b additional entries uncategorised by the research worker and thus coded as other.
You may care to try some content coding of survey data for yourself. Below you will find
some answers parents gave to a question in a pilot survey about the things parents talked
to the teachers about when they visited their children’s school. Construct a coding frame
in terms of which the answers can be grouped or coded and compare this frame with
the one actually produced for the main survey, which can be found on page 591.
When you have constructed the coding frame with numbers representing each of
your categories, try to assign each of the answers to the appropriate category, writing
the number of the appropriate category by the side of the verbatim answer. Then count
up the frequency of each code and compare this with Table 30.2 on page 592.
Questions
Have you or your husband had a talk with any of X’s class teachers or the head since
she/he started at (present school)?
If YES, who have you seen? What did you talk to them about?
Responses given by parents (to be coded):
1 Invitation to see her teacher. Told me how X was getting on and asked me to tell him
anything about X, whether she liked or disliked anything about school.
2 Future of child, what work he would have to do to catch up if he needed any
particular subjects for future job. Trouble with his legs. To see R staying on for
another year.
Progress, career.
Oo
Hr Abilities; form master said to go and see him if any problems about child. Wrote also
about his not getting on so well, and got reply.
5 Only a general talk to several parents about how school was run and to see head or
teacher if any problem arose. Form master about X’s work.
6 General progress (anything else discussed at PTA).
N M’s capabilities regarding getting job at Met Office. Progress and behaviour.
8 Progress.
OFFICE
USE ONLY
Column
1. Please insert your Institution's code. [| 1
2. What is the nature of your current
responsibilities ?
Director/Dean of Institution/
es |
Faculty/School
-—— Assistant Director/Dean of
Institution/Faculty/School
Head of Department Lee 2
a
Lecturer/Tutor, etc.
Other (please specify)
3. What is your current classification?
Professor/Principal Lecturer
Senior Lecturer/Lecturer
Senior Tutor/Tutor ita 3
Other (please specify)
4. What is the nature of your current
contract?
Ln Tenurer/Tenure Track
Short-term contract [| 4
5. ee Full-time
Part-time [] 5
6. What proportion (%) of your time
is spent on each of the following?
Answer to $TQ160
Actual coding frame from pilot survey
Code: Coding instruction:
What sort of things have you discussed with the head or any of the teachers (at present
school)?
1 Educational progress at school—including teaching methods, examinations,
homework, reports, extra tuition, attitude to work, how parents can help,
concentration; staying on an extra year, when to leave, what courses/subjects to take,
what class to go into; laziness; anything about curriculum; streaming; leaving school,
quality of teaching staff.
Use this code only for specific discussion/complaints, etc. about the selected child,
not for general talks (5).
2 Further education after leaving school—including going to college (‘leaving school’
means leaving secondary school).
3 Further career—i.e. job, apprenticeship; in future.
4 Behaviour at school—including adjustment to school life, unhappiness, need for
understanding, nervousness, relations with teachers, or with other children (including
bullying), complaints about other children or about teachers; accusations of theft;
discipline, being late.
5 General talk, or nothing in particular—including general matters about the school
such as having a look around, the school facilities, class size in general, talk when first
started (for example to a group of parents, or when introducing child), talk at Open
Day (if not codeable elsewhere); school achievements and aims, school rules, uniform,
or similar initial explanation.
Note: Code 5 can be multi-coded with others. It deals with matters not directly
relevant to the child, but also with any general or private introductory talks to parents
(and child); if specifics are also mentioned, such as uniform, do not code these as
well if they formed part of an introductory talk, but do code them if it was something
the parent brought up/complained about later.
With the tiny sample of parents’ replies you have been coding, it is unlikely that the
frequency counts you have obtained for the different categories will correspond closely
to those above. Nevertheless, you should have found at least “educational progress of
child’ and ‘child’s behaviour at school’ had relatively high frequencies.
Surveys are commonly used in educational and social science research. The descriptive survey
provides information, while the explanatory survey seeks to establish cause and effect. Surveys can
be face-to-face interviews, conducted by telephone, or mailed as a questionnaire. The interview
is more reliable and valid, producing more useable returns than a telephone or mail survey.
Longitudinal and cross-sectional surveys are undertaken to assess changes over time.
A variety of formats are used for survey questions and responses. These range from highly
structured to open ended. Data analysis is mainly by computer and is facilitated by precoding the
question items.
Reference
Davie, R. Butler, N. & Goldstein, M. (1972), From Birth to Seven, Longman, London.
Further reading
Beed, T.W. & Stimson, R.J. (eds) (1985), Survey Interviewing: Theory and Techniques, George Allen
& Unwin, Sydney.
De Vaus, P. (1985), Surveys for Social Planners, Holt, Rinehart & Winston, Sydney.
Fink, A. & Kosecoff, J. (1998), How to Conduct Surveys, Sage, London.
Gahan, C. & Hannibal, M. (1998), Doing Qualitative Research Using QSR NUD“IST, Sage, London.
Kvale, S. (1996), Interviews, Sage, London.
Sapsford, R. (1999), Survey Research, Sage, London.
Stewart, C.J. & Cash, J.W.B. (1988), Interviewing: Principles and Practices, Wm.C. Brown Publishers,
Dubuque, Iowa.
M = 6; SD = 3.16
M=6; SD = 2.74
Oe M218)
GY SD-= 19:13
d_ Has a greater standard deviation because there is a greater dispersal of scores.
2 bis true. All scores are identical.
3 The standard deviation is 3.25.
a The standard deviation is still 3.25.
b SD = 6.50. The standard deviation has doubled.
4 The mean is a poor choice as a measure of central tendency when there are
extremely large or small values in a distribution. An extreme score can exert a
disproportionate effect on the mean.
5 If a distribution is substantially skewed, the median is usually preferred to the mean.
6 a SS = 40; variance = 10; standard deviation = 3.16
b SS = 36; variance = 6; standard deviation = 2.45
g
+ s =~ = 5 . + =r +r
4
nN
“
MAGN“ “4
are possIDie
oS.
distributions
« «
have
. +
the same M\ =>and <tt>
<s
standard
joictcn thloviins 27 score transformation
a ’ eo - ff a grr 4] aad ea - 4 ALT
nm dixctdtutivcn 2
Ww ow “ea a a
2 49 5 FY
J IL DBL
¢ ISS)
9 OOY-
. Ol
a 544AG
\ re!a NS ua
y
9 2
B4IS 2
a 4 Y
4
¢ 444
G : 06/
4 £4
< : 56 /
4 14.95%
9 67.20%
6% (approximately)
SN 4A £0,675
y ee ee
2) xoores between 54 and 66
< 0.01 means the probability of a chance occurrence of less than 1 in 100;
9,001 indicates a probability of a chance occ n 1 in 1000
The mean ego strength level in the Gas Works High School sample.
The population consists of 12-year-old males in the city.
The sample consists of 12-year-old males at the Gas Works High School.
Cue)
Oa The parameter is the mean ego strength level of 12-year-old males in the whole city.
1. a Systematic sampling
b Opportunity sampling
c Stratified sampling
2 Obtain the registers and select students by some random technique—for example
by using a random number table.
3 Arandom sample is defined as one in which every member of the population has an
equal chance of being selected.
4 b
5 Because it ensures strata or groups will be represented.
6 Time and cost if interviewing is reduced when a population is highly dispersed.
7 Noas not complete sampling frame.
te 400
rods Cage
IV teaching style
DV pupil preference
MV thinking style of pupils
IV feedback source
DV teacher behaviour change
MV years of teaching experience
CV age of pupils; subject area of teacher, etc.
IV behaviour modification treatment
DV physiological measure of test anxiety
CV initial level of test anxiety
IV micro-teaching experience
DV questioning techniques
CV sex
MV experience
IV perceptual motor training
DV coordination task performance
This is because with an extreme level of significance like .001, even if the research
hypothesis is true, the results have to be quite strong to be large enough to reject the
null hypothesis. Alternatively, by setting the significance level at, say, 10, even if the null
hypothesis is true, it is fairly easy to get a significant result just by accidentally getting a
sample that is higher or lower than the general population before the study.
aandd
yy,
3
J NESS
3. 18.467
1 a_ The proportion who agree, disagree or are indifferent to the council’s proposals
will be equal.
De y4= 6
ue 2
Oc yes
e That the distribution of responses was not due to chance
2 No
3 Reject null hypothesis.
4 y2=4.79, df=1, p < 0.05 Since this is lower than the tabled value of 3.84 the null
hypothesis is rejected.
5 x2%= 12.99, df= 3, p = 0.05. Since the tabled value is 7.82, the null hypothesis is
rejected.
6b
Pe NGS y4 = 27.87; <.0.05
8 Yes; y2 = 20:00
phi = +0.61; highly significant as chi square = 37.21 and tabled value is 6.63 at 1% level.
1 b=0.344; a=4.28
2 Yah 2
Y=-—5,-1, 3,13
a = the intercept; b = the regression coefficient so for every year autonomy increases by
0.0623; c = 18.17 i.e. a score of 18.0; d = 13.19, i.e. a score of 13
Th 2b30
4b 5c 6d 7d
1. 86,20, 66
2 292 27
i peace Pe,
4 4.09; yes
1 Individual differences do not figure in the ratio of the repeated ANOVA, whereas it
contributes to both numerator and denominator in the independent measures ANOVA.
2 a_ Anxiety and meaningfulness
b 4.02 with df=1 and 56.
Spe ae
H = 4.625; df = 2; tabled value of chi square = 5.99 at 5% level. Cannot reject null
hypothesis.
a=iv;b=i,
c=v; d=vi;f =iii; g = il
a Negative
b Forr=.50, you find z =.549 at the intersection of the row labelled ‘.5’ and the column
labelled ‘.00.’ For r=-—31, the Fisher z is found at the intersection of the row labelled
‘.3’ and the column labelled ‘.01 ‘ (where the value of z is shown as .321), which you
note as —.321 because your result is in the ‘wrong’ direction.
c Z=2.22; p = .0132 one-tailed or .0264 two-tailed
d_ The p value is small enough to convince you that your result differs significantly
from the original one and therefore should not be combined with it without careful
thought and comment. For example, in describing the results of both studies
considered together we should report the differences between them and try to think
¢ of an explanation for their differences.
of the size of the relation between variables X and Y. They can now be routinely
combined by means of a simple meta-analytic technique.
608 index
observation 405—6, 408-12 Friedman two-way non-parametric
non-participant observation ANOVA 331-2
413=16 frequency distributions 67—74
pre-entry 401-3 funnelling 429
sampling 408
strategies 404—5 goodness-of-fit chi square 213-17
research 395-9 grounded theory 433
multiple perspectives 398—9
multiple techniques 399 Hawthorne effect 148—9
naturalism 397 historical research 471-89
process 397 criteria for evaluating 488
reliability 417-18 criticism 487
triangulation 419-20 data analysis 486-7
understanding and interpretation 397 data collection 486
validity 418-19 history and the scientific method
research report 420-1 482
ethics of research limitations and difficulties 488-9
debriefing 20 possible topics 489
deception 19 primary sources 485
experimenter obligations 21 procedure 484
informed consent 18 selecting a topic 484—5
intervention studies 22 sources of data 485—6
involuntary participation 18 types of 483-4
privacy and confidentiality 20 writing report 487
publication of findings 21 hypothesis
right to discontinue 21 formulation 106-8
role-playing 20 criteria for judging 109-10
stress 22 null t11=12
voluntary participation 18 one-tailed, two-tailed 114-16
experimental operational 108—9
design 153 research 108-9
error 134-7 testing 111-13
research report 378-85 Type I and Type II errors 116-17
sources of error 136—7 unconfirmed 110
experimenter
bias 149-50 ideographic approach 3
obligations 21 independent ¢ test 175-86
external validity 358—9 independent variables 125-8, 134
inferential statistics 43, 132
F ratio 300 informed consent 18
face validity 356-7 internal validity 357-8
factor analysis 272 interval level of measurement 121
factorial designs 147 intervening variables 131
fieldnotes 430 intervention studies 22
index 609
interview combining studies by significance level
diary 438-40, 586 3723
schedule 571-4 comparing studies by effect size
interviews 410-11, 423-9, 467 366-9
advantages of interviews 582-3 comparing studies by significance level
analysing data 430-41 371-2
coding 432 file drawer problem 363-4
content analysis 432-38 issues 374—6
closing 429 replication 366
comparison ofinterview methods 584 — methods of knowing 5
conducting 426-9, 582-4 mode 467
disadvantages of interviews 583-4 moderator variables 128-9
fieldnotes 430 multi-case studies 463
structured 424 multiple correlation 27 1—2
semi-structured 424-45 multiple perspectives 398—9
unstructured 425—6 multivariate correlation analysis 266—9
610 index
participation preparation for computer analysis
involuntary 18 586-92
voluntary 18 reliability and validity 585
pattern matching 472-3
Pearson correlation coefficient 235—7 random sampling 85-8
phi coefficient 263-4 randomisation 138
point biserial correlation 260-1 rank order correlation 244—8
population and sample 84 ratio level of measurement 121
positively skewed distribution 68 regression
post-test comparison 145-7 line of best fit 274
power 159-66 multiple regression 290-3
calculating power 166 regression equations 278-81
factors affecting power 162-4 simple linear regression 273-87
importance 165 standard error 284—6
Type I and Type I errors 158-60 related pairs ¢ test 198-202
predictive validity 352-3 effect size 203-4
preliminary sources, checking 32-6 reliability 336-47
primary sources 485 types
probability 74—9 alternate/parallel form 341-2
progressive focusing 404 internal consistency 343-4
psychological abstracts 37 split half 342-3
publication of findings 21 test-retest 340-1
action research 450-1
qualitative research case study 475
approach 10-14, 388-92 ethnography 471
research 10-14 factors affecting reliability 345-6
research report 491-500 questionnaires 585
computer analysis 435—9 standard error of the measurement
confidentiality 498 346—7
helpful tips 499-500 repeated measures ANOVA
limitations 13 replication 6
linear analytic structure 495-8 reports
strengths 13-14 case study 478-9
structures 493—5 ethnographic 420-1
quasi-experimental design 147-8 historical 487
questionnaires qualitative research 491—500
advantages 581 research 378-85
anonymity and confidentiality 584—5 research
coding 587-94 checklist 385
closed items 571-2 design 144-8
data analysis 586 ethics 18-22
design 574-80 hypothesis 108—9
disadvantages 581-82 report 378-85
open-ended items 572—4 topic, selecting 25-8
index 611
review matched/related pairs ¢ test 204-5
educational research 36 partial correlation 269-70
literature 28—32 random samples
research literature 28—30 regression 287—90
scope of 30-2 Wilcoxon 209-11
right to discontinue 21 Z scores
role-playing 20 Spearman-Brown formula 342-3
stage sampling 92
sample standard deviation 49-53
non-response 93—4 standard error
population 84 of the mean 95-102
random 85-8, 99 of the difference 176-7, 179
size 93 of measurement 346-7
sampling 82-94, 389 standard scores 61—5
cluster 90-2 standardising 144—5
error 136 statistical significance 76-80
opportunity 92-3 statistical tests, choosing 151-6
purposive 465 stratified sampling 90
stage 92 structured interviewing
strategy 94—5 advantages 582-3
stratified 90 anonymity 584—5
systematic 89 asking questions 576-8
Scheffe test 303 coding 586—92
scientific approach 3-10 closed items 571—2
characteristics 6—9 conducting 582
limitations 10 data analysis 586
strengths 9 disadvantages 583-84
secondary sources 485-6 open-ended items 572-4
semantic differential scale 560—2 preparation for computer analysis
semi-structured interviewing 424—5 586-90
social reality 388 response options 578—9
software for qualitative analysis validity and reliability 585
436-9 subjective bias 473-4
SP5S.53 surveys 566-92
ANOVA 321-5 advantages of questionnaires 581
chi square 225-8 anonymity and confidentiality
correlation 252-55 584-5
Cronbach alpha 348-50 computer analysis of data 586-8
Friedman test 333 conducting the interview 582-4
graphs 56, 59-60 content analysis 589-92
independent ¢ test 187-9 cross-sectional surveys 569
Kruskall-Wallis test 330-1 data analysis 586
labelling variables 54—5 disadvantages of questionnaires
Mann-Whitney 195-7 581
612 index
interview schedule 571-4 recording interview 429
longitudinal surveys 570-1 semi-structured 424—5
schedule design 574-80
steps in survey research 568 validity 350-60
reliability and validity 585
systematic sampling 89 concurrent 353-4
content 351-2
t distributions 176-7 construct 355-6
t test 175-86 external 358-9
assumptions 182 face 356-7
cffect size 184-5 internal 357-8 ~
power 185-6 predictive 352-3
test-retest method 340-1 action-research 450-1
tetrachoric correlation 261-2 case studies 476
triangulation 419-20, 457 ethnography 418-19
two-tailed hypothesis 114-16 questionnaires 585
Type 1 error 116-17 variables
Type 2 esror 116-17 continuous 123
control 129-30
unstructured interviewing 423-41 dependent 125-8, 134
analysing data 430-40 discrete 122
dosing 429 independent 125-8, 134
coding 432 intervening 131
content analysis 432-5 moderator 128-9
cover story 334 variance 48-9, 236
funnelling 429
listening skills 427 Wilcoxon signed-ranks test 205-8
non-verbal communication 427 within-subjects design 139-40
open-ended 425-26
questioning techniques 426-27 Z scores 62-4, 68, 72, 101-2
613
“are
bi ay io) a -& ve! dhe G2 ahalpadotareterme’ :
* ® vues ai i Ale
lpaibuatynal ;
Uh -#7Eg ienls oluberloe
i tlev »wos 200 Pires TAA nga ;
i HE? > pibiles lies qailidaites
7 04 yoilqnies onesie 7
2a 54. 4
t r! “a TOU) enoiuadinal §
on-2"[ tea?
alt | aerogenes
=~ ?_.p25 seit egaths
s (1 pS Pea! mwa
iv ver ua Sa RYTbotrn saint
= a4 aul 7? Sits bag/ £6102 nuire) see-einod ours
-
i i‘
lon capilia_
a weal s
and yep O5-Ple evbalugoun
7
%.
Boe
- Meio) Plaedi deesiogyil boise
ot phones fw Ql Ohinens [aqyt
oh a a ELT otk 6 i hw] @ 1-1 inom SagyT
eo : ‘y 4)* | @remes ras
FEATURES INCLUDE:
ISBN G7 bd SS
SAGE Publications
London * Thousand Oaks ° New Delhi
eas).1“96392
www.sagepub.co.uk