General Steps of Test Construction in Psychological Testing
General Steps of Test Construction in Psychological Testing
General Steps of Test Construction in Psychological Testing
Abstract
This research paper describes the general construction steps of effective test Effective test
development requires a systematic, efficient approach to ensure sufficient validity evidence to
support the proposed interpretations from the test scores. A variety of details and issues, both large
and small, comprise the enterprise usually associated with the terms test development and test
construction. All of these details must be well executed to produce a test that estimates examinee
achievement or ability fairly and consistently in the content domain purported to be measured by
the test and to provide documented evidence in support of the proposed test score inferences. This
paper discusses a model of systematic test development.
Keywords: Test development, test Construction, test scoring, reliability, validity.
Psychological testing
Introduction
Test construction is the set of activities involved in developing and evaluating a test of some
psychological function. The steps include specifying the construct of interest, deciding the
test’s function (diagnosis, description of skill level, prediction of recovery), choosing a method
(performance, behavioral observation, self-report), designing item content, evaluating the
reliability and validity of the test, and modifying the test to maximize its utility. In clinical
neuropsychology, the construct of interest is generally a cognitive function, although certain
classes of behavior executive functioning may also be the construct of interest in tests.
Depending upon the construct of interest, different forms of reliability may be differentially
important. For the construct of intellectual functions, temporal reliability may be a preeminent
concern. For the construct of visual spatial construction, interscorer reliability may be an
important concern. (Franzen M.D. Kreutzer J.S., 2011)
1. Planning : The first step in the construction of a test is careful planning , at this stage the test
constructor should observe the following the objectives of the subjects, the purpose for
which the test is administered, the availability of facilities and equipment, the nature of the
teste, the provision for review and the length of the test. At this stage, the test constructor
addresses the following issues.
• What will be the appropriate age range, educational level, and cultural background of the
examinees, who would find it desirable to take the test?
• What will be the content of the test? Is this content coverage different from that of the
existing tests developed for the same or similar purposes? Is this culturally specific?
• The author must decide what would be the nature of items, that is to decide if the test will
be multiple choice, true false, inventive response, or in some other form.
• Whether the test would be administered individually or in groups? Will the test be
designed or modified for computer administration? A detailed agreement for preliminary
and final administration should be considered.
• What special training or qualifications will be necessary for administering or interpreting
the test?
• The test constructor must have to decide about the probable length and time for
completion of test.
• What would be the method of sampling i-e random or selective.
• Is there any potential harm for the examinees resulting from the administration of this
test? Are there any safeguards built into the recommended testing procedure to prevent
any sort of harm to anyone involved in the use of this test?
• How will the scores be interpreted? Will the scores of an examinee be compare to others
in the criteria group or will they be used to assess mastery of a specific content area? To
answer this question, the author must decide whether the proposed test will be criterion
referenced or norm referenced.
• Planning also include the total number of reproductions and a preparation of manual.
2. Writing items for Test: The process of writing good test items is not simple – it requires
time and effort. It also requires certain skills and proficiencies on the part of the writer.
Therefore, a test writer must master the subject matter he/she teaches, must understand
his teste, must be skillful in verbal expression and most of all familiar with various types
of tests. Writing item is a matter of precision. It is perhaps more like computer
programming than writing a prose. The task of the item writer is to focus the attention of
a large group of examinees, varying in background experience, environmental exposure,
and ability level on a single idea. Such a situation requires extreme care in choice of
words. The item writer must keep in view some general guidelines which are essential for
writing good items; These are listed as under.
• Clarity of the item.
• Nonfunctional words should be avoided.
• Avoid irrelevant accuracies.
Psychological testing
3. Preliminary administration of the test. Once the test is prepared now it is time to be
confirming the validity, reliability, and usability of the test. Try out helps us to identify
defective and ambiguous items, to determine the difficulty level of the test and to
determine the discriminating power of the items. It Also helps in Pinpointing the obstacle
level of each item and Aids in a sufficient time ambiguity of the test .Also in the final
transference will decide the set time limit of test and will look upon how efficient the test
will be when performed on the planned sample . Before proceeding toward
administration of the test review by at least three experts. When the test has been written
down and modified in the light of the suggestions and criticisms given by the experts, the
test is said to be ready for experimental try-out.
THE EXPERIENTAL TRYOUT/ PRE-TRY-OUT:
The first administration of the test is called EXPERIMENTAL TRY-OUT or PRE-TRY-OUT.
The sample size for experimental try out should be 100.
The purpose of the experimental try out is manifold. According to Conrad (1951), the main
purpose of the experimental tryout of any psychological and educational test is as follows:
• Finding out the major weaknesses, omissions, ambiguities, and inadequacies of the
Items.
• Experimental tryout helps in determining the difficulty level of each item, which in turn
helps in their proper distribution in the final form.
FINAL TRYOUT: The third preliminary administration is called Final tryout. The sample
for final administration should be at least 100. At this stage, the items are selected after
Psychological testing
item analysis and constitute the test in the final form. It is carried out to determine the
minor defects that may not have been detected by the first two preliminary administrations.
The final administration indicates how effective the test will be when it would be
administered on the sample for which it is really intended. Thus, the preliminary
administration would be a kind of “DRESS REHERSAL” providing a sort of final check
on the procedure of administration of the test and its time limit.
4. Reliability of the final test. Reliability is the degree to which an assessment tool produces
stable and consistent results. Types of Reliability. Test-retest reliability is a measure
of reliability obtained by administering the same test twice over a period to a group of
individuals. It is important to be concerned with a test's reliability for two reasons.
First, reliability provides a measure of the extent to which an examinee's score reflects
random measurement error. Reliability also refers to self-correlation of a test. A correlation
coefficient can be used to assess the degree of reliability; if a test is reliable it should show
a high positive correlation.
Types of Reliability
• Internal reliability: Internal reliability assesses the consistency of results across
items within a test.
• External reliability: External reliability refers to the extent to which a measure
varies from one use to another.
WAYS OF FINDING RELIABILITY:
Following are the methods to check reliability
• Test-retest
• Alternate form
• Split –half method
TEST-RETEST METHOD: It is the oldest and commonly used method of testing reliability. The
test retest method assesses the external consistency of a test. Examples of appropriate tests include
questionnaires and psycho metric tests. It measures the stability of a test over time. A typical
assessment would involve giving participants the same test on two separate occasions. Each and
everything from start to end will be same in both tests. Results of first test need to be correlated
with the result of second test. If the same or similar results are obtained, then external reliability is
established. The timing of the test is important if the duration is to brief then participants may
recall information from the first test which could bias the results. Alternatively, if the duration is
too long it is feasible that the participants could have changed in some important way which could
also bias the results. Utility and worth of a psychological test decreases with time so the test should
be revised and updated. When tests are not revised systematic error may arise.
ALTERNATE FORM: In alternate form two equivalent forms of the test are administered to the
same group of examinees. An individual has given one form of the test and after a period the
Psychological testing
person is given a different version of the same test. The two form of the rest are then correlated to
yield a coefficient of equivalence.
• Positive point. In alternate form no deal to wait for time.
• Negative point. It is very hectic and risky task to make two tests of equivalent level.
SPLIT-HALF METHOD: The split half method assesses the internal consistency of a test. It
measures the extent to which all parts of the test contribute equally to what is being measured. The
test is technically spitted into odd and even form. The reason behind this is when we making test
we always have the items in order of increasing difficulty if we put (1,2,—-10) in one half and
(11,12,—-20) in another half then all easy question/items will goes to one group and all difficult
questions/items will goes to the second group.
When we split the test, we should split it with same format/theme e.g. Multiple questions – multiple
questions or blanks – blanks.
5. Validity of the final test. Validity is the quality of a test which measures what it is
supposed to measure. It is the degree to which evidence, common sense, or theory
supports any interpretations or conclusions about a student based on his/her test
performance. If a test is reliable then it is not necessary, that it is valid but if a test is valid
then it must be reliable.
TYPES OF VALIDITY
• Face validity
• Construct validity
• Criterion related validity
• FACE VALIDITY. Face validity is determined by a review of the items and not using
statistical analysis. Face validity is not investigated through formal procedures. Instead
anyone who looks over the test, including examinees, may develop an informal opinion
as to whether the test is measuring what it is supposed to measure. While it is clearly of
some value to have the test appear to be valid, face validity alone is insufficient for
establishing that the test is measuring what it claims to measure.
CRITERION RELATED VALIDITY. It states that the criteria should be clearly defined by the
teacher in advance. It must consider other teachers’ criteria to be standardized and it also needs
to demonstrate the accuracy of a measure or procedure compared to another measure or
procedure which has already been demonstrated to be valid.
Psychological testing
When psychologists design a test to be used in a variety of settings, they usually set up a scale
for comparison by establishing norms. Norm is defined as the average performance or scores of a
large sample representative of a specified population. Norms are prepared to meaningfully
interpret the scores obtained on the test for as we know, the obtained scores on the test
themselves convey no meaning regarding the ability or trait being measured. But when these are
compared with the norms, a meaningful inference can immediately be drawn.
Types of norms:
• Age norms
• Grade norms
• Percentile norms
• Standard scores norms
All these types of norms are not suited to all type of tests. Keeping in view the purpose and type
of test, the test constructer develops a suitable norm for the test.
Research question
1. Define psychological test and explain its importance?
2. What are the general steps of test construction?
3. How the psychological testing is conducted?
4. What are the main points that one should keep in mind while planning for a
psychological test?
5. Why psychological tests are constructed what they measure?
Rational of the study
The rational of this study are as follows:
• to understand and determine the facts of testing in education in psychology.
• To construct and interpret different stages of construction
To identify the general steps of test construction
Procedure
Psychological testing
the data was collected through Different sources like Google scholar, Wikipedia, Journals,
Past research, and Many different websites. The whole is study was based on Qualitative
papers.
Discussion
‘Test refers to formal situation(s) deliberately created by a tester (teacher) to make the
testee (student) respond to stimulus (stimuli) from which desired information could be
elicited’ Bourdin, (1999). Test is an experience that the teacher creates to serve as basis
for grading a learner to group them according to a laid down standard by an institution.
The learner is referred to as the testee and the various activities a tester (teacher) will
undertake to see that test are made to achieve educational goals are called test
construction. Measurement is assigning quantitative value to a characteristic or a
phenomenon while evaluation is a judgment exercise on certain observations or data
collated by the evaluator. Assessment involves ordering measurement data into
interpretable pattern or forms on several variables.
Qualities of a good test constructor are an important consideration, and some are listed
below:
Test, as an instrument possesses some qualities, which are necessary, before it can be
eligible as a test and usable. A test should therefore possess the under listed
characteristics, which are interdependent and are what makes a test what it should be.
They include
• Validity- when a test fulfils its purpose(s) that is measures what it intended to measure
and to the extent desired then it is valid. The characteristics of testee can blur the time
validity of a test. That is, it can provide false results that do not represent truly what it
intends to measure in a student. If a learner has difficulty in assessing the Internet for
course materials and participation it can send wrong impression on the learner
commitment to log in and ability in course work.
• Reliability- The consistency of test ability to measure accurately what it supposes to
measure is its strength in reliability. It is the ‘extent to which a particular measurement is
consistent and reproducible’.
Psychological testing
• Objectivity- The fairness of a test to the testee, bias test does not portray objectivity
and hence is not reliable. A test that is objective has high validity and reliability
• Discrimination- A good test must be able to make distinction between poor and good
learner; it should show the slight differences between learner attainment and achievement
that will make it possible to distinguish between poor and good learner. What are the
likely criteria to satisfy these conditions?
• Comprehensiveness- Test items that covers much of the content of the course, that is
the subject matter is said to be comprehensive and hence capable of fulfilling purpose. •
Ease of administration- a good test should not pose difficulties in administration.
• Practicality and scoring- Assigning quantitative value to a test result should not be
difficult. Why, what, and how.
• Usability- a good test should be useable, unambiguous, and clearly stated with one
meaning only.
The fundamental principles of test construction are such as (a) Validity, (b) Reliability (c)
(a) Validity:
Tests should have validity, that is, they should measure what they purport to measure. A
perfectly valid test would prospective employees in the same relationship to one another as they
The validity of a test is determined by the relationship between the test and some criterion of
efficiency on the job. The coefficient of correlation has become the most widely employed index
of validity. It is a Statistical index expressing the degree of relationship between the test results
There are two procedures by which the validity of a test is determined, each supplementary to the
other, in the first instances, the test may be administered to employees of known ability already
on the job.
Psychological testing
Those employees who are known to be most efficient score highest on the test, while the least
The second procedure in checking on the validity of a test, and one that should be employed to
supplement the foregoing consists in follow-up studies of the performance of these employees
who have been selected through it. No test can be said to be truly successful until both these
(b) Reliability:
By the reliability of a test is meant the consistency with which it serves as a measuring
instrument.
If a test is reliable a person taking at two different times should make substantially the same
score time. Under ideal conditions a test can never be any more than a sample of the ability being
measured. No test is of value in personnel work unless it has a high degree of reliability. This is
usually determined by one of three methods. 1. by giving the test to the same group at two
2. by giving two or more different (but equivalent) forms of the same test and correlating the
When reliability is determined by the latter method, the test is given but once, but the items are
divided and scores on one-half of the items are correlated with scores on the other half.
(c) Standardization:
tests is the scaling of test items in terms of difficulty. To be of functional value in a test, each
item must be of such difficulty as to be missed by a part of the examines but not by all.
That is, no item is of discriminatory value which is answered either correctly or incorrectly by
everyone. Secondly, since items of different degrees of difficulty will discriminate between
persons of different degrees of ability or achievement, it is essential that the items be well
However, there is difference of opinion amongst the psychologists in this matter of scaling of test
items. The norms can be established upon the basis of performance on the test over a period.
Perfectly adequate norms for a given examination would embrace the entire population eligible
for that examination. For practical purposes, however, norms established upon the basis of a
representative sample are sufficient. What constitutes adequacy, therefore, would vary for
(d) Evaluation:
The evaluation of test results, involving as it does all the problems of scoring and weighting of
items and the assignment of relative weights to tests used in a battery, it surrounded with highly
technical considerations.
Psychological testing
References
(Anderson, n.d.)
http://www.sacmeq.org/sites/default/files/sacmeq/training-modules/sacmeq-training-
module-6.pdf
https://www.researchgate.net/publication/265085817_Test_Construction_Techniques_and_
Principles
(Anastesi.Anne)
file:///C:/Users/smkam/Downloads/TestConstructionTechniqeusandPrinciples.pdf
file:///C:/Users/smkam/Downloads/psk_132_slide_overview_of_test_construction.pdf
Wikipedia
Google scholar