human and animal behaviour and mental processes with the aim of
Gary (2010) defined psychology as the science of mind and behaviour. The
of terms. Behaviour can be overt or covert, overt behaviours are evident and
can be studied through direct observation, while the covert behaviours are
behaviour. Crucial to this definition are the concepts of validity and
measure, on the other hand a test is described as reliable when it yields the
different observers.
and are also used for selecting employees for training and other personnel
development programmes.
it relates to job aptitude and abilities, and the results are used as guides to
monitor the effect of treatments and interventions. They are also used for
difficulty and for vocational counselling of students. Furthermore,
behaviour to be measured and is mostly used to form the name of the test.
effective test items. The writing and selection of test items is the next step
Haladyna, Downing, and Rodriguez (2002), creating effective test items may
be more of an art than science, although there is a sol id scientific basis for
kinds of items that will be good enough to discriminate among testees on the
desired behaviour, there is a need for item generation; this can be achieved
concise idea for item generation. There may be need to carry out initial job
analysis before test item generation for tests being developed for selection
from this approach which will be subsequently screened to arrive at desired
number of items.
most novice item writers tend to create poor quality, flawed, low cognitive
level test items that do not discriminate in any way. Being a subject matter
item writing skills must be learned and practiced. Haladyna (2004) posited
that for new item writers, it often helpful and important to provide specific
writing skills. Therefore, the feedback from expert and peers is required in
pilot study is required to help revise these items. The data collected from the
pilot study or pretesting are scored and analysed. Item or factorial analysis
is made to enhance the identification of the items that are not suitable for
the test According to Anastasi (1968), the pool of items initial ly generated
the test developer to identify the items that are good enough to be retained
in the final format. The substandard items are either reworded or completely
achieved. The selection of items could be made by the test developer as well
process of test construction is completed when the final format of the test is
prepared. At this stage all substandard items have been revised and the test
norms. The nature of the test whether speed, power, or personality test
Since the inception of test development, the test developer decides the
achievement tests. There is a need for an objectively scorable item format for
(2004), the multiple choice format (and its variants), with some ninety years
of effective use and an extensive research basis, is the item format of choice
and selected response item scores for measuring knowledge and many
cognitive skills Rodriguez (2003). This makes the multiple choice item
format acceptable for achievement tests. Downing (2002a) argued that the
extremely versatile test item form; and can be used to test all levels of the
psychometric properties, and the norms of the test. Any standardized test
that is published for public use must have the manual which must be
written explicitly by the test developer. The standards for test development
specifications and blueprints, their rationale and the evidence the test
NCME, 1999).
i Personality Inventories
self-report. Personality inventories are not often called tests as there are no
right or wrong answers. The term, inventory, is considered more adequate as
and Dahlstorm (1972, 1975). The MMPI was developed by means of criterion
keying approach. The criterion groups for the development of most of the
Inventory (CPI) developed and revised by Harrison Gough (1987). What the
MMPI means for the assessment of pathological behaviour, the CPI is said to
people with serious personality disorders but it has presently been widely
appropriate psychiatric label for people whose behaviours are not perfectly
disordered. While drawing about half of its items from the MMPI, the CPI
was developed specifically for use with normal populations from 13 up. On
the whole however, the CPI is one of the best personality inventories
items. First he obtained a large collection of items by listing the names of all
psychological and psychiatric literature. The item scores were correlated and
subjected to factor analysis, the result was the Sixteen Personality Factor
Questionnaire (16PF).
ii Achievement Tests
performance and thus reveal how much a student has learnt. Course
iii Intelligence Tests
learn, and to deal with novel situations. The most widely used intelligence
tests include the Stanford-Binet Intelligence Scale and the Wechsler scales.
revised in 1937, 1960, and 1972, evaluates persons two years of age and
older and is designed for use primarily with children. It consists of an age-
have charged that intelligence tests favour groups from more affluent
These tests are no doubt useful and to very large extent objective
Some of the items of these tests are designed for American and western
societies. Ethnic and cultural differences were not considered and this
makes generalizability somewhat difficult. Some items on the tests may not
differences, thereby making it difficult to respond to such items and
western tests to suit their culture. An example is the PHSF (Personal, Home,
students, and adults. This questionnaire was adapted and normed for high
used by the western societies are being used to develop local tests in Nigeria.
The strong point of these tests is the cultural consideration for test content
However there exists still a need for local tests in Nigeria (Ekore, 2001). The
bulk of the tests available to psychologists are foreign and not suitable for
Haladyna (1997) argued that knowing the principles of effective item writing
is no guarantee of an item writer's ability to actually produce effective test
Test development involves at lot of financial resources which are not readily
are time consuming and require a lot of efforts and commitment. This
c. Work overload
work overload. Lecturers often teach more courses than necessary and are
these lecturers write and publish papers and articles to remain relevant in
discharge of duties and the ability to multitask is crucial in this setting. The
hardly room for extra engagements. Test development is not given any
in Nigeria
mentors to coach them in the art and science of test construction. According
to Haladyna (2004) for new item writers is often helpful and important to
provide specific instruction using an item writer’s guide, paired with hands
promote psychometrics and increase the number of psychometricians with
in the country.
sabbatical leave should also be given to psychologists for the purpose of test
Anastasi, A., & Urbina, S. (1997). Psychological testing (ih ed.). New Jersey:
Prentice-Hall, Inc.
Dahlstrom, W.G., Welsh, G.S., & Dahlstorm (1975). An MMPI handbook vol.
2. Clinical interpretation. Minneapolis: Minnesota University Press.
Downing, S.M., & Haladyna, T.M. (1997). Test item development: Validity
evidence from quality assurance procedures. Applied Measurement
Education , 10(1 ), 61-82
Ekore, J.O. (2001 ). Total quality management implementation· employees'
needs, organizational type, and design as predictors. An unpublished
Doctorial thesis, University of lbadan, Nigeria.
Gray, P.O. (201 0). Foundations for the study of psychology. Psychology (6th
ed.). New York, New York: Worth Publishers.
Haladyna, T.M., (2004). Developing and validating multiple choice test items.
(3rct Ed. ). Hillsdale, NJ, Lawrence Erlbaum Associates.