Types of Psychological Test: 1. Achievement and Aptitude Tests
Types of Psychological Test: 1. Achievement and Aptitude Tests
Types of Psychological Test: 1. Achievement and Aptitude Tests
Psychological tests can be grouped into several broad categories. Personality tests measure
personal qualities, sometimes referred to as traits. Achievement tests measure what a
person has learned. Aptitude tests are designed to predict future behavior, such as success
in school or job performance. Intelligence tests measure verbal and/or nonverbal skills
related to academic success. Interest inventories are used to help individuals make
effective career choices.
Of course, ethics is one thing, and the desire to make money is another thing. Therefore
you will often find individuals offering to do all kinds of psychological testing—often on the
Internet—even when they lack the training to administer and interpret such tests.
Intelligence tests attempt to measure your intelligence, or your basic ability to understand
the world around you, assimilate its functioning, and apply this knowledge to enhance the
quality of your life. Or, as Alfred Whitehead said about intelligence, “it enables the
individual to profit by error without being slaughtered by it.”
3. Occupational tests attempt to match your interests with the interests of persons in
known careers. The logic here is that if the things that interest you in life match up with,
say, the things that interest most school teachers, then you might make a good school
teacher yourself.
4. Personality tests attempt to measure your basic personality style and are most used in
research or forensic settings to help with clinical diagnoses. Two of the most well-known
personality tests are the Minnesota Multiphasic Personality Inventory (MMPI), or the revised
MMPI-2, composed of several hundred “yes or no” questions, and the Rorschach (the
“inkblot test”), composed of several cards of inkblots—you simply give a description of the
images and feelings you experience in looking at the blots.
Objective Tests
Objective tests present specific questions or statements that are answered by selecting one
of a set of alternatives(eg. true or false). Objective tests traditionally use a "paper-and-
pencil" format which is simple to score reliably. Although many objective tests ask general
questions about preferences and behaviours, situational tests solicit responses to specific
scenarios.
The MMPI - The Minnesota Multiphasic Personality Inventory is the leading objective
pesonality test. Its hundreds of true-false items cover a broad range of behaviours. A major
advantage of the MMPI is the incorporation of validity scales designed to detect possible
response bias, such as trying to present oneself in a socially desirable way.
Projective Techniques
Projective personality tests use ambiguouis stimuli into which hte test take presumably
projects meaning. This indirect type of assessment is believed by many to more effectively
identify a person's real or underlying personality.
Two leading projective tests are the Rorschach and the Thematic Apperception Test(TAT).
5. Specific clinical tests attempt to measure specific clinical matters, such as your current
level of anxiety or depression.
Reliability
Reliability is the extent to which a test is repeatable and yields consistent scores.
Note: In order to be valid, a test must be reliable; but reliability does not guarantee validity.
All measurement procedures have the potential for error, so the aim is to minimize it. An observed test
score is made up of the true score plus measurement error.
The goal of estimating reliability (consistency) is to determine how much of the variability in test
scores is due to measurement error and how much is due to variability in true scores.
Measurement errors are essentially random: a person’s test score might not reflect the true score
because they were sick, hungover, anxious, in a noisy room, etc.
Reliability can be improved by:
• getting repeated measurements using the same test and
• getting many different measures using slightly different techniques and methods.
- e.g. Consider university assessment for grades involve several sources. You would not consider one
multiple-choice exam question to be a reliable basis for testing your knowledge of "individual
differences". Many questions are asked in many different formats (e.g., exam, essay, presentation) to
help provide a more reliable score.
Types of reliability
There are several types of reliability:
There are a number of ways to ensure that a test is reliable. I’ll mention a few of them now:
1. Test-retest reliability
The test-retest method of estimating a test's reliability involves administering the test to the same group
of people at least twice. Then the first set of scores is correlated with the second set of scores.
Correlations range between 0 (low reliability) and 1 (high reliability) (highly unlikely they will be
negative!)
Remember that change might be due to measurement error e.g if you use a tape measure to measure a
room on two different days, any differences in the result is likely due to measurement error rather than
a change in the room size. However, if you measure children’s reading ability in February and the
again in June the change is likely due to changes in children’s reading ability. Also the actual
experience of taking the test can have an impact (called reactivity). History quiz - look up answers and
do better next time. Also might remember original answers.
2. Alternate Forms
Administer Test A to a group and then administer Test B to same group. Correlation between the two
scores is the estimate of the test reliability
5. Internal consistency
Internal consistence is commonly measured as Cronbach's Alpha (based on inter-item correlations) -
between 0 (low) and 1 (high). The greater the number of similar items, the greater the internal
consistency. That’s why you sometimes get very long scales asking a question a myriad of different
ways – if you add more items you get a higher cronbach’s. Generally, alpha of .80 is considered as a
reasonable benchmark
Validity
Validity is the extent to which a test measures what it is supposed to measure.
Validity is a subjective judgment made on the basis of experience and empirical indicators.
Validity asks "Is the test measuring what you think it’s measuring?"
For example, we might define "aggression" as an act intended to cause harm to another person (a
conceptual definition) but the operational definition might be seeing:
• how many times a child hits a doll
• how often a child pushes to the front of the queue
• how many physical scraps he/she gets into in the playground.
Are these valid measures of aggression? i.e., how well does the operational definition match the
conceptual definition?
Remember: In order to be valid, a test must be reliable; but reliability does not guarantee validity, i.e. it
is possible to have a highly reliable test which is meaningless (invalid).
Note that where validity coefficients are calculated, they will range between 0 (low) to 1 (high)
Types of Validity
Face validity
Face validity is the least important aspect of validity, because validity still needs to be directly checked
through other methods. All that face validity means is:
"Does the measure, on the face it, seem to measure what is intended?"
Sometimes researchers try to obscure a measure’s face validity - say, if it’s measuring a socially
undesirable characteristic (such as modern racism). But the more practical point is to be suspicious of
any measures that purport to measure one thing, but seem to measure something different. e.g.,
political polls - a politician's current popularity is not necessarily a valid indicator of who is going to
win an election.
Construct validity
Construct Validity is the most important kind of validity
If a measure has construct validity it measures what it purports to measure.
Establishing construct validity is a long and complex process.
The various qualities that contribute to construct validity include:
• criterion validity (includes predictive and concurrent)
• convergent validity
• discriminant validity
To create a measure with construct validity, first define the domain of interest (i.e., what is to be
measured), then construct measurement items are designed which adequately measure that domain.
Then a scientific process of rigorously testing and modifying the measure is undertaken.
Note that in psychological testing there may be a bias towards selecting items which can be objectively
written down, etc. rather than other indicators of the domain of interest (i.e. a source of invalidity)
Criterion validity
Criterion validity consists of concurrent and predictive validity.
• Concurrent validity: "uDoes the measure relate to other manifestations of the construct the
device is supposed to be measuring?"
• Predictive validity: "uDoes the test predict an individual’s performance in specific abilities?"
Convergent validity
It is important to know whether this tests returns similar results to other tests which purport to measure
the same or related constructs.
Does the measure match with an external 'criterion', e.g. behaviour or another, well-established, test?
Does it measure it concurrently and can it predict this “behaviour”?
• Observations of dominant behaviour (criterion) can be compared with self-report dominance
scores (measure)
• Trained interviewer ratings (criterion) can be compared with self-report dominance scores
(measure)
Discriminant validity
Important to show that a measure doesn't measure what it isn't meant to measure - i.e. it discriminates.
For example, discriminant validity would be evidenced by a low correlation between between a
quantitative reasoning test and scores on a reading comprehension test, since reading ability is an
irrelevant variable in a test designed to measure quantitative reasoning.
Standardization
Standardization: Standardized tests are:
• administered under uniform conditions. i.e. no matter where, when, by whom or to whom it is
given, the test is administered in a similar way.
• scored objectively, i.e. the procedures for scoring the test are specified in detail so that ant
number of trained scorers will arrive at the same score for the same set of responses. So for
example, questions that need subjective evaluation (e.g. essay questions) are generally not
included in standardized tests.
• designed to measure relative performance. i.e. they are not designed to measure ABSOLUTE
ability on a task. In order to measure relative performance, standardized tests are interpreted
with reference to a comparable group of people, the standardization, or normative sample. e.g.
Highest possible grade in a test is 100. Child scores 60 on a standardized achievement test. You
may feel that the child has not demonstrated mastery of the material covered in the test
(absolute ability) BUT if the average of the standardization sample was 55 the child has done
quite well (RELATIVE performance).
The normative sample should (for hopefully obvious reasons!) be representative of the target
population - however this is not always the case, thus norms and the structure of the test would need to
interpreted with appropriate caution