VALIDITY

VALIDITY
By far the most complex criterion of an effective test-and arguably the most important
principle-is validity, "the extent to which inferences made from assessment results are
appropriate, meaningful and useful in terms of the purpose of the assessment.
How is the validity of a test established? There is no final, absolute measure of
validity, but several different kinds of evidence may be invoked in support. In some cases, it
may be appropriate to examine the extent to which a test calls for performance that matches
that of the course or unit of study being-tested.
There are 5 types of Validity :
1. Content-Related Evidence
If a test actually samples the subject matter about which conclusions are to be drawn,
and if it requires the test-taker to perform the behaviour that is being measured, it can claim
content-related evidence of validity, often popularly referred to as content validity. You can
usually identify content-related evidence observationally if you can clearly define the
achievement that you are measuring.
If you are trying to assess a person's ability to speak a second language in a
conversational setting, asking the learner to answer", paper-and-pencil multiple-choice
questions requiring grammatical judgments does ) not achieve content validity. A test that
requires the learner actually to speak within ' some sort of authentic context does. And if a
course has perhaps ten objectives but only two are covered in a test, then content validity
suffers.
The students had had a unit on zoo animals and had engaged in some open
discussions and group work in which they had practiced articles, all in listening and speaking
modes of performance. In that this quiz uses a familiar setting and focuses on previous
practiced language forms. it-is—somewhat content valid. The fact that it was administered in
written form, however, and required students to read the passage and write their responses
makes it quite low in content validity for a listening/speaking class.
Another way of understanding content validity is to consider the difference between
direct and indirect testing. Direct .testing involves the test-taker in actually performing the
target task. In an indirect test, learners are not performing the task itself but rather a task that
is related in some way. For example, if you intend to test learners' oral production of syllable
stress and your test task is to have learners mark (with written accent marks) stressed
syllables in a list of written words, you could, with a stretch of logic, argue that you are
indirectly testing their oral production. A direct test of syllable production would have to
require that students actually produce. target_ words orally.
2. Criterion-Related Evidence
A second form of evidence of the validity of a test may be found in what is called
criterion-related evidence, also referred to as criterion-related validity, or the extent to which
the "criterion" of the test has actually been reached.
In the case of teacher-made classroom assessments, criterion-related evidence is best
demonstrated through comparison of results of an assessment with results of some other
measure of the same· criterion. For example, in a course unit whose objective is for students
to be able to orally produce voiced and voiceless stops in all possible phonetic environments,
the results of one teacher's unit test might be compared with an independent assessment.
Criterion-related evidence usually falls into one of two categories: concurrent and
predictive validity. A test has concurrent validity if its results are supported by other
concurrent performance beyond the assessment itself. For example, the validity of a high
score on the final exam of a foreign language course will be substantiated by actual
proficiency in the language. The predictive validity of an assessment becomes important in
the case of placement tests, admissions assessment batteries, language aptitude tests, and the
like. The assessment criterion in such cases is not to measure concurrent ability but to assess
(and predict) a test-taker's likelihood of future success.
3. Construct - Related .Evidence
A third kind of evidence that can support validity, but one that does not play as large a
role for classroom teachers, is construct-related validity, commonly referred to as construct
Validity.
A construct is any theory, hypothesis, or model that attempts to v' explain observed
phenomena in our universe of perceptions. Constructs mayor may not be directly or
empirically measured-their verification often requires inferentail data. "Proficiency" and
"communicative competence" are linguistic constructs; "self-esteem" and "motivation" are
psychological constructs.
Construct validity is a major issue in validating large-scale standardized tests of
proficiency. Because such tests must, for economic reasons, adhere to the principle of
practicality, and because they must sample a limited number of domains of language, they
may not be able to contain all the content of a particular field or skill.
The TOEF for example, has until recently not attempted to sample oral production,
yet oral production is obviously an important part of academic success in a university 26
CHAPTER 2 Principles of Language Assessment course of study. The TOEFL's omission of
oral production content, however, is often obviously justified by research that has shown
positive correlations between oral production and the behaviours (listening, reading,
grammaticality detection, and writing).
4. Consequential Validity
Consequential validity encompasses all the consequences of a test, including such
considerations as its accuracy in measuring intended criteria, its. impact on the preparation
of test-takers, its effect on the learner, and the (intended and unintended) social
consequences of a test's interpretation and use.
5. Face Validity
An important facet of consequential validity is the extent to which "students view the
assessment as fair, relevant, and useful for improving learning" (Gronlund, 1998, p. 210),
or what is popularly known as face validity. "Face validity refers to the degree to which a
test looks right, and appears to measure the knowledge or abilities it claims to measure,
based on the subjective judgment of the examinees who take it, the administrative
personnel who decide on its use, and other psychometrics unsophisticated observers".
face validity is not something that can be empirically tested by a teacher or even by a
testing expert. It is purely a factor of the "·eye of the beholder"-how the test-taker, or
possibly the test giver, intuitively perceives the instrument. For this reason, some
assessment experts (see Stevenson, 1985) view face validity as a superficial factor
that is dependent on the whim of the perceiver.
I once administered a dictation test and a cloze test for a discussion of cloze tests as a
placement test for a group of learners of English as a second language. Some learners
were upset because such tests, on the face of it, did not appear to them to test their
true abilities in English. They felt that a multiple choice grammar test would have
been the appropriate format to use.
A few claimed they didn't perform well on the cloze and dictation because they were
not accustomed to these formats. As it turned out, the tests served as superior
instruments for placement, but the students would not have thought so.
Face validity was low, content validity was moderate, and construct validity was
very high.

VALIDITY

Uploaded by

Copyright:

Available Formats

VALIDITY

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VALIDITY

Uploaded by

Copyright:

Available Formats

VALIDITY

You might also like