III - Essentials of Test Score Interpretation
III - Essentials of Test Score Interpretation
III - Essentials of Test Score Interpretation
INTERPRETATION
B Y: S U S A N A U B I N A
RAW SCORES
A raw score is a number (X) that summarizes or captures
some aspect of a persons performance in the carefully
selected and observed behavior samples that make up
psychological tests.
FRAMES OF REFERENCE FOR TEST-SCORE
INTERPRETATION
These procedures, which date back to the 1960s and are also
known as latent trait models, are most often grouped under
the label of item response theory (IRT ).
The term latent trait reflects the fact that these models seek
to estimate the levels of various unobservableabilities,
traits, or psychological constructs that underlie the
observable behavior of individuals, as demonstrated by their
responses to test items.
Computerized Adaptive Testing
One of the main advantages of IRT methodology is that it is
ideally suited for use in computerized adaptive testing (CAT).
Longitudinal Changes in Test Norms
When a test is revised and standardized on a new sample
after a period of several years, even if revisions in its content
are minor, score norms tend to drift in one direction or
another due to changes in the population at different time
periods.
A puzzling longitudinal trend in the opposite direction,
known as the Flynn effect,
CRITERION-REFERENCED TEST INTERPRETATION
In the realm of educational and occupational assessment,
tests are often used to help ascertain whether a person has
reached a certain level of competence in a field of knowledge
or skill in performing a task.
VARIETIES OF CRITERION-REFERENCED
TEST INTERPRETATION
The term criterion-referenced testing , popularized by Glaser
(1963), is sometimes used as synonymous with domain-
referenced, content-referenced, objective-referenced, or
competency testing.
(a) those that are based on the amount of knowledge of a
content domain as demonstrated in standardized objective
tests,
( b) those that are based on the level of competence in a skill
area as displayed in the quality of the performance itself or of
the product that results from exercising the skill.
the term criterion-referenced testing is also used to refer to
interpretations based on the pre-established relationship
between the scores on a test and expected levels of
performance on a criterion, such as a future endeavor or even
another test.
In this particular usage, the criterion is a specific outcome
and may or may not be related to the tasks sampled by the
test.
NORM- VERSUS CRITERION-REFERENCED
TEST INTERPRETATION
Norm-referenced tests seek to locate the performance of one
or more individuals, with regard to the construct the tests
assess, on a continuum created by the performance of a
reference group.
Criterion-referenced tests seek to evaluate the performance
of individuals in relation to standards related to the construct
itself.
Whereas in norm-referenced test interpretation the frame of
reference is always people, in criterion-referenced test
interpretation the frame of reference may be
knowledge of a content domain as demonstrated in
standardized, objective tests; or
level of competence displayed in the quality of a
performance or of a product.
The term criterion-referenced testing is sometimes also
applied to describe test interpretations that use the
relationship between the scores and expected levels of
performance or standing on a criterion as a frame of reference.
When knowledge domains are the frame of reference for test
interpretation, the question to be answered is How much of
the specified domain has the test taker mastered? and
scores are most often presented in the form of percentages
of correct answers. This sort of criterion-referenced test
interpretation is often described as content- or domain-
referenced testing
Planning for such tests requires the development of a table
of specifications with cells that specify the number of items
or tasks to be included in the test for each of the learning
objectives and content areas the test is designed to evaluate.
The usual methods for evaluating qualitative criteria involve
rating scales or scoring rubrics (i.e., scoring guides) that
describe and illustrate the rules and principles to be applied
in scoring the quality of a performance or product.
Mastery testing. Procedures that evaluate test performance
on the basis of whether the individual test taker does or does
not demonstrate a pre-established level of mastery are
known as mastery tests.
Predicting Performance
Sometimes the term criterion-referenced test interpretation
is used to describe the application of empirical data
concerning the link between test scores and levels of
performance, to a criterion such as job performance or
success in a program of study.
Expectancy tables show the distribution of test scores for
one or more groups of individuals, cross-tabulated against
their criterion performance.
Expectancy charts are used when criterion performance in a
job, training program, or program of study can be classified
as either successful or unsuccessful
In norm-referenced testing, the primary objective is to make
distinctions among individuals in terms of the ability or trait
assessed by a test