Procedures For Assessing The Validities of Tests U

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/247742370
Procedures for Assessing the Validities of Tests Using the "Known-

Groups" Method
Article in Applied Psychological Measurement · July 1984

DOI: 10.1177/014662168400800306
CITATIONS READS
147 2,090
2 authors:
John Hattie Ray W Cooksey

University of Melbourne University of New England (Australia)
408 PUBLICATIONS 43,047 CITATIONS 133 PUBLICATIONS 4,413 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Innovative Learning Environments and Teacher Change (ILETC) Project View project
Innovation Based Doctor of Philosophy View project
All content following this page was uploaded by Ray W Cooksey on 03 April 2014.
The user has requested enhancement of the downloaded file.

Procedures for Assessing the Validities
of Tests Using the "Known-Groups" Method
John Hattie and Ray W. Cooksey
University of New England
If a test is
"valid," one criterion could be that test and to decide whether the test has provided
scores must discriminate across groups that are theo- useful predictive validities in situations sim-
retically known to differ. A procedure is outlined to ilar to his own (p. 31).
assess the discrimination across groups that uses only
information from means. The method can be applied Thus, there are three major purposes of an in-
to many published tests, it provides information that strument : (1) to sample a domain of content, (2)
relates to the construct validity of the test, and it pre- to measure some psychological trait, and (3) to
sents a way to identify how a new sample can be re- determine relationships with other variables. Re-
lated to previous studies.
lated to these three purposes are three types of
validity: content, construct, and criterion validity.
Validity is an elusive concept, and statistical In an early and seminal paper on the validity of
methods related to the validity of tests are noted
psychological tests, Cronbach and Meehl (1955)
by their paucity. Validity has been defined in a discussed various methods for experimentally in-
variety of ways, for example, in terms of truth- vestigating validity, particularly construct validity.
fulness (Mehrens & Lehmann, 1975), in terms of
One of their methods was the known-groups method.
methods of investigating interpretations (Cron-
&dquo;If our understanding of a construct leads us to
bach, 1971), and in terms of the appropriateness two groups to differ on the test, this ex-
of inferences from test scores (American Psycho-
expect
pectation may be tested directly&dquo; (Cronbach &
logical Association, 1974). Validity refers not to Meehl, 1955, p. 287). Thus if a test is &dquo;valid,&dquo;
a measuring instrument but to the purpose for which
one criterion could be that test scores should dis-
the instrument is used.
criminate across groups that theoretically are ex-
According to the APA Standards for Educational pected to be different on the trait measured. For
and Psychological Tests (APA, 1974), a test man-
ual
example, a test of self-actualization should be able
to discriminate between groups of counselors and
can provide evidence that will enable the user
to evaluate the appropriateness of the item
psychiatric patients, or between persons before and
after they have been to encounter groups. If this is
content, to determine whether the test is an
so, there is evidence of the usefulness of the test
acceptable measure of a specified construct, as a decision-making instrument and evidence that
it can be generalized on a meaningful psychological
APPLIED PSYCHOLOGICAL MEASUREMENT trait across different samples of people.
Vol. 8, No. 3, Summer 1984, pp. 295-305
@ 1984
The known-groups method has been used only
Copyright Applied Psychological Measurement Inc
0146-6216/84/030295-Il$1.80 by a relatively small number of researchers (Ho-
295
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.
May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/
296
gan, 1975a, 1975b; Pettegrew & Wolf, 1981; Rest, ences between known groups has been to collect
1974, 1976, 1977; Rhoads & Landy, 1973; Smith raw data from a number of groups. Given raw data
& Apfeldorf, 1975). Perhaps the most comprehen- from a number of groups there are few problems
sive use of the method has been by Rest and his in assessing the discrimination across groups. Yet
coworkers (Rest, 1974, 1976, 1977; Rest, Cooper, it is rare to have access to samples of sufficient
Coder, Masanz, & Anderson, 1974) in validating size from differing groups. Recently, there has been
the Defining Issues Test (DIT). In the DIT a di- a surge of interest in aggregating data across many
lemma is presented such as the &dquo;Heinz and the samples (e.g., meta analysis, Glass, 1977). These
Drug&dquo; dilemma used extensively by Kohlberg: methods, however, are most appropriate as a means
Heinz’s wife is dying of cancer and a chemist has of aggregating data from studies that use different
a drug that might save her, but the chemist is charg- measures where all the means and variances are
ing an exorbitant price for the drug and Heinz arbitrary and diverse. This problem does not exist
cannot raise the money. Should he steal the drug when collating data from a single dependent mea-
in an attempt to save his wife? Following the di- sure, that is, from one test.
lemma, 12 statements are presented that express It is possible a priori to form meaningful groups
various considerations or questions in making a from a large collection of samples and ask whether
decision. The task is to decide which considera- the means differ between these groups. For ex-
tions or questions are crucially important and which ample, many samples of summary statistics from
are not. a test of self-actualization could be meaningfully
Rest argued that more educated persons should grouped into subsets such as counselors, self-
generally have higher scores on the DIT than less actualizers, meditators, criminals, pre- and post-
educated subjects. From 66 studies, Rest found that encounter groups, fakers, students, and disturbed
there were indeed differences: individuals. Differences could be hypothesized be-
tween these groups and the significance of these
differences could be assessed using ANOVA (and
planned comparisons). From summary statistics such
as sample size, means, and standard deviations,
analysis of variance statistics can be computed

(Burrill, 1971; Gordon, 1973; Huck & Malgady,
1978).
However, there are problems with the use of a
priori groups. Scott (1968) claimed that it is mis-

From this and other comparisons, Rest found leading to infer the extent of validity from the sig-
that education and IQ had the most consistent re- nificance level of a statistical test between two means.
lations to the DIT. The relations were in the ex- Such tests are substantially affected by the size of
pected directions; thus Rest concluded that this evi- groups and by the way in which the samples are
dence proffered much support for the validity of selected. Very large mean differences can be ob-
the DIT. tained for instruments that have little predictive
This example (and other examples referenced) value simply by an opportunistic selection of the
have all assumed a priori differences between groups. known groups. Scott recommended that the known-
Clearly, if the theory on which the test is based groups method should be based on representative
enables in advance the prediction of which groups samples of the population to which the instrument
will be differentiated by the test, then that seems will be applied.
to offer more support for construct validity than The purpose of this paper is to describe a pro-
showing that after the fact, groups can be found cedure based on the known-groups method that
that have different scores on the test. attempts to avoid many of the problems of the a
Typically, the procedure in analyzing differ- priori method.
297
A Suggested Procedure for Establishing groups that are expected a priori to fall at var-
Validity Using the Known-Groups Method ious points on the dimension purportedly mea-
There are four steps in the suggested procedure: sured by the test. All possible pairings of the
1. From a review of literature, a large number of groups can be presented and each person asked
means are gathered on the test in question (or to first identify the group expected to score
on the subtests within the test). These sets higher, on the average, on the test. Second,
should come from a variety of studies, and the they are asked to indicate the degree of pref-
samples should reflect the kinds of groups to erence for the so-identified &dquo;higher&dquo; group.
which the test intends to be generalized. With The indications of degree, with appropriate
recent advances in computer searching of the sign, then can be scaled using an individual
literature (see Geahigan & Geahigan, 1982), differences model (cf. Coxon, 1982; Davison,
it should not be too difficult to obtain a large 1983; Schiffman, Reynolds, & Young, 1981).
set of means. From an inspection of the dimension(s), it
2. Distances between the group means are cal- is possible to determine, at a minimum, a rank
culated using a Euclidean distance metric. ordering of the groups. This rank order can be
Multidimensional scaling can then be used to correlated with the rank order from Step 2. Of
represent the distances spatially. From this course, there is much more information in the
scaling, it is possible to assess the dimen- scaling solution and it is possible, for example,
sionality of the samples and, if the test is dis- to correlate the mean coordinates from Step 2
criminating among groups, it should be pos- with the coordinates from the scaling of this
sible to identify the dimensions in a meaningful step. If this correlation is very high (and at
way. Certainly, it may be possible to look at minimum, significantly different from zero),
the groups at the polar extremes of the di- this would suggest that the test can meaning-
mension(s) to determine whether the dimen- fully scale groups in an expected manner.
sions are scaling the groups in the desired di- 4. If the scaling solution of Step 2 indicates that
rection. the test has dependable interpretations, it is
Most likely, there will be much overlap be- possible to devise methods so that users of the
tween various &dquo;clearly&dquo; identified groups test can determine where their new group should
(simply because the groups cannot be uniquely be placed in relation to other groups. If all the
classified). A box-plot (Tukey, 1977) of the group means are available, it is possible to
coordinates of these groups can be used to directly place a new group into the scaling
assess whether they are reasonably homoge- solution described above. If the means are not
neous. available, it is possible to cluster analyze the
The researcher could stop at this point hav- coordinate values from the dimension(s) into
ing obtained substantial evidence as to the ad- a number of groups. For each cluster the means
equacy ot the test to make meaningful dis- of the (sub)tests can be calculated. Then the
tinctions between groups to which the test pur- squared distance between the means of each
ports to relate. Yet there are two profitable cluster (preferably standardized using total-
further steps. group standard deviations so that all subtests
3. Independent of the above steps, meaningful are expressed in the same units) and the means
and easily identifiable a priori group labels can of the new group can be determined. The new
be presented as pairwise stimuli to persons sample can then be located in that cluster where
knowledgeable of the domain tapped by the the squared distance is minimum.
test. The labeling of the groups may not always Thus from the first three steps, the test user can
be easy. In such cases some consensus among determine how a test discriminates across many
various judges may be needed. The prime con- groups and whether the resulting pattern of dis-
cern at this step, however, is to use meaningful crimination is meaningful. From the fourth step,

298
the user can determine where a new sample can be of variance of the disparities that is accounted for
located relative to other groups. These procedures by the multidimensional scaling model. While
directly relate to the validity of a test in terms of Kruskal prefers stress (see Kruskal & Wish, 1978),
whether a test can make appropriate inferences across Young prefers R-squared (see Schiffman et al.,
many groups, and the procedures allow indications 1981 ; Young & Lewyckyj, 1979). From these mea-
of the underlying dimension(s) of the test. sures, a one-dimensional solution was chosen as
providing the best fit.
The stress value was.11, which falls below
An Example:
Kruskal and Wish’s (1978, p. 54) suggested cri-
The Personal Orientation Inventory terion. The R-squared value was very large-.97.
The Personal Orientation Inventory (POI; Shos- Thus, a large amount of variance was accounted
trom, 1974) purports to measure aspects of mental for (in all the disparities) by one dimension.
well-being or self-actualization. It consists of 150 Table 1 presents the 107 samples ranked ac-
pairs of alternative value judgments that are scored cording to the stimulus coordinates. It is reasonably
on 2 major and 10 subsidiary scales. These scales clear that there is a pattern in the ordering of the
measure inner directedness, time competence, self- groups. Those that would be expected to be more
actualizing, existentiality, feeling reactivity, spon- self-actualized are ranked higher than less self-ac-
taneity, self-regard, self-acceptance, nature of man, tualized groups. The positive end is identified by
synergy, acceptance of aggression, and capacity groups of persons that have been classified as high
for intimate contact. genuine counselors and by those who have been to
encounter groups, advanced therapy, or who have
had appropriate training in the behaviors measured
Step 1
by the POI. At the other end are those asked to
From a review of the literature, 107 samples fake the test or to appear &dquo;self-actualized,&dquo; and
were collected from both published and unpub- disturbed persons. In between these extremes are
lished studies (full references and all data are avail- groups of students, adults, those disposed to go to
able upon request from the authors). These samples encounter groups, and prisoners.
came from a cross section of samples and are based From the order of the samples, it seems that it
on 11,001 persons. is more justifiable to divide students into university
(average rank = 55, N 18), college (average
=
rank =
70, N = 11), and secondary school stu-
Step 2
dents (average rank =
86, N 7). That there is a
=
The Euclidean distances between the means on progression from secondary school through col-
the 12 POI scales across the 107 groups were input lege, university, and adults (average rank =
44,
into the ALSCAL program (Young & Lewyckyj, N =
17) suggests that age may be a factor in self-
1979). A classical nonmetric scaling solution was actualization. When a two-dimensional solution was
used. There was one matrix (107 x 107), the mea- plotted, there were indications that the second di-
surement level was specified as ordinal, and a sim- mension related to age. The pattern, however, was
ple Euclidean model was used. (If there was only far from clear.
one and not multiple scales in the test, then a one- There were some interesting placements for some
dimensional scaling solution would perfectly re- samples. For example, a group of persons from
construct the distances.) Alcoholics Anonymous is placed very high. This
There are various ways to assess dimensionality. is probably because the program used in the treat-
The most commonly used methods are to report ment very much aims at increasing self-respect,
values for stress and R-squared. The stress value self-awareness, and inner directedness.
is the square root of the normalized residual sum From the 107 samples, it is possible to identify
of squares. The R-squared indicates the proportion 10 groups:

299
adults; Although there is much overlap between the

disturbed persons (hospitalized psychiatric pa- groups, it is clear that many groups are dissimilar
tients, practicing alcoholics, and neurotic per- from each other. For example, those who had re-
sons) ; ceived training in self-actualized activities scored
those predisposed to attend encounter groups (in- higher than criminals, college students, disturbed
cludes those who have indicated a willingness persons, and those who attempted to fake the test.
to go to encounter groups); College and secondary students are not too dissim-
postencounter groups (those given the POI after ilar, but both these groups are different from uni-
they have attended an encounter group); versity students.
university students; Most of the groups are reasonably homogeneous
college students; but some are not, such as those who had received
secondary school students; training, postencounter groups, and disturbed per-
criminals; sons. In the received-training group, one of the two
fake asked to try to fake the POI

&dquo;good&dquo; (those groups that received training in POI-related skills
to appear &dquo;self-actualized&dquo; people); and
as had a mean coordinate much lower than the group
those who have received training (this includes mean (z = - 1.2). The group of postencounter per-
groups that have been given training related to sons can be divided into two much more homo-
self-actualization, such as counselors, and those geneous groups depending on whether the partic-
given training in POI-related activities). ipants were university (Mean 1.54, SD .52,
= =
Figure 1 presents box-plots of the coordinates N = 5) or college-secondary students (Mean .06,

=
for these 10 groups. The center verticals are means, SD =

.36, N 4). This is a good illustration of
=
the length of the box is two standard deviations, the problems of a priori grouping: different con-
and the length from the end of the box to the vertical clusions could have resulted depending on whether
dash (1) is one standard deviation. these groups were combined or separated.
Figure 1
Box Plots of the Coordinates from the Ten Groups

300

301
Another example of the problems of a priori

grouping relates to drug addicts. There was one
sample of drug addicts and they were classified as
disturbed persons, but the coordinate of the drug
addicts was .27, which is much higher than the
mean coordinate (z =
+ 1.91) for disturbed per-
sons. It seems that perhaps they may not be best
classified as disturbed.
Figure 1 does illustrate that, overall, the POI can

make distinctions between groups expected to dif-
fer in self-actualization. The above analyses of
sample means indicate that a user of the POI can
have much confidence in using it to make depend-
able decisions regarding the construct or self-ac-
tualization as it relates to specific groups of interest.
Step 3
N
v
ro,a Independently of the above steps, a group of 26

staff and advanced students in education were pre-
I
0 sented with all possible pairings of the 10 groups
4
v
(adults, disturbed persons, predisposed to encoun-
ro
an ~4
ter groups, postencounter groups, university stu-
; o
Gw
Ij u)
dents, college students, secondary students, fake
G
o 4~
aN
(1)
&dquo;good,&dquo; criminals, and received training). They
G
were asked to indicate their preference for one group
«#~4
a) o over the other in terms of self-actualization, and
o0
4 u
Id
E~ In
also to indicate their view of the differences be-
p tween the groups on a scale from 1 (hardly any
W
o difference) to 99 (completely different).
~4
<1) A nonmetric individual differences model was
~40
used to scale the 26 10 x 10 matrices. The scaled
1
c
lo
coordinates from the preferences of the 26 persons
M
are presented in Table 2. The correlation between
the rank ordering from this sample and the rank
order of the groups’ mean coordinates from Step
2 was .79 (r = .86 for the actual coordinates). Only
two groups are misplaced in order by three posi-
tions. The 26 persons classified adults as higher
and criminals lower than the test results. Generally,
the correspondence is moderately strong and gives
the user much confidence in the POI as a discrim-
inator on the trait of self-actualization.
Step 4
The coordinate values from the one-dimension

302
Table 2
MDS Coordinates from the Preferencesof 25 Persons
solution from Step2 were then clustered into four cluster mean of 1.85 (Groups 1 to 12 in Table 1);
groups using a modification of the ISODATA pro- the second cluster had 34 samples with a mean of
cedure (Ball, 1970; Blashfield & Aldenderfer, 1978; .64 (Groups 13 to 46), the third had 46 samples
Cooksey, 1982). ISODATA yields successive non- and a mean of -.47 (Groups 47 to 92), and the
hierarchical partitions of a sample into from 1 to fourth had 15 samples and a mean of - 1.40 (Groups
10 mutually exclusive clusters. The modification 93 to 107).
entailed using increase in eta-squared in a scree- To further investigate these results from the
type test as a criterion for establishing the most ISODATA clustering, it is prudent to work back
likely number of clusters. Four clusters seemed to to the individual POI scales to see how well they
best represent the composition of the sample (eta- differentiated the four clusters of samples. Thus,
squared .89; a five cluster solution would only
=
the means on the 12 POI scales for the four clusters
have added a trivial 3.8% additional explanation were computed. These means are presented in Ta-
of differences in the coordinate values). ble 3. The total group is based on 11,001 persons.
The first cluster contained 12 samples with a Clearly, the clusters differ in terms of their degree
Table 3
Means of the 12 POI Scales Clustered Into Four Groups
and the Total Sample Means and Standard Deviations

303
of self-actualization. There were significant dif- most self-actualized group (means: 94.8, 18.5, 21.1,
ferences between the four means on each scale, 18.0, 14.1, 12.6, 18.1, 12.5, 8.1, 18.3, 21.1, and
but this should not be too surprising as the clus- squared distances of .27, 3.56, 14.08, 29.11). In-
tering procedure aims to partition samples into groups six months later the administrators
terestingly, re-
so as to maximize group differences. mained in the top group.

To ascertain where a new sample can be placed
along the dimension is easy if the means from all Conclusions
the groups are available. The user merely reruns
the scaling procedure adding in the new group. If One of the most important characteristics of a
all the means are not available, then the procedure test, if not the most important, is its validity. The
for including a new group is not so straightforward. quality of interpreting meaningful group differ-
In this latter case, the new sample can be placed ences very much depends upon evidence of the
along the self-actualizing dimension by calculat- validity of the test.

ing, for each cluster separately, the squared Eu- One of the problems has been that there are very
clidean distance between the means from the new few empirical methods that assess the validity of
sample and the means in the cluster (standardized a test, or more correctly, the validities of a test.
using the information for each scale in the Total The two most commonly used empirical methods
column of Table 3). The minimum squared dis- are factor analysis (cf. Hattie, 1981 for an example
tance over the four clusters indicates in which of of factor analysis and the POI), or multitrait-mul-
the clusters the new sample is most likely to be timethod analysis (Campbell & Fiske, 1959; Wat-
located. kins & Hattie, 1981). Another method first sug-
For example, Osbome and Steeves (1982) pre- gested by Cronbach and Meehl (1955) and termed
sented means for a group of counselors who had the known-groups method has not been used very
completed a counseling practicum. These means extensively.
were (in the same order as in Table 3) 101.9, 20.0, As initially conceived and used, the known-groups
23.1, 21.0, 18.8, 15.2, 14.4, 19.3, 12.6, 7.9, 18.7, method involved comparing groups that were the-
22.9. The squared distances were 2.6 for Group oretically expected to be different on the construct
1, 8.8 for Group 2, 23.3 for Group 3, and 40.1I measured. It has been pointed out that there are
for Group 4. Since the minimum squared distance problems with a priori groupings, such as using
is 2.6, this sample is closest to Group 1 and can samples based on a small number of persons or
be classified as a very self-actualized group. At the using unrepresentative samples, and there are dif-
other end is a sample of nonmethadone-treated ad- ficulties classifying samples into meaningful and
dicts (Cryns, 1974). The means were 74.4, 12.6, homogeneous groups.
16.6, 16.0, 14.0, 10.5, 7.5, 10.4, 9.9, 5.7, 12.9, This paper has suggested an alternative method
16.1. The squared distance values were 41.69, for assessing the validity of a test. This involves
21.07, 7.80, and 4.00, for the four clusters re- locating a large sample of means and then scaling
spectively. The minimum squared distance is from the distances between the sets of means. The re-
the last cluster, thus this sample is very low on the sulting dimension(s) should be interpretable and
dimension of self-actualization. should be meaningful in terms of the construct(s)
Another group consisted of 36 YMCA admin- the test purports to measure.
istrators before and then after going to an encounter A group of experts can be asked to scale various
group (Reddy, 1973). Before attending the en- groups, and the rank order of the scaled coordinates
counter groups, the administrators were classified can then be compared to the order from the scaling
into Group 2 (means: 88.1, 17.8, 20.4, 20.5, 16.0, of the actual test means. Provided that the groups
12.9, 12.4, 17.4, 12.2, 7.6, 16.8, 18.9, and squared presented to the &dquo;experts&dquo; are reasonably homo-
distances of 4.08, .27, 4.71, 13.96), whereas after geneous and clearly distinct from each other, there
attending the encounter group, they were in the should be a close correspondence between these

304
Parameters other than the mean. Paper presented

orderings if the test is to be considered valid. This to
Canadian Educational Research Association, New-
procedure is very similar to the procedure for nam- foundland.
ing factors that was presented in Hattie (1981). Campbell, D. T., & Fiske, D. W. (1959). Convergent
Finally, it was suggested that the information from and discriminant validation by the multitrait-multi-
the above procedures can be used to assign new method matrix. Psychological Bulletin, 56, 81-105.
groups to some point on the dimension(s) of the Cooksey, R. (1982). A modified version of the ISODATA
test. program. Unpublished manuscript, University of New
These methods were illustrated using the POI.
England, Centre for Behavioural Studies in Education,
Armidale, Australia.
The means from 107 samples were found to scale Coxon, A. P. M. (1982). The user’s guide to multidi-
along one dimension. Inspection of how the groups mensional scaling. Exeter NH: Heinemann.
were ordered along the dimension indicated that Cronbach, L. J. (1971). Test validation. In R. L. Thorn-
dike (Ed.), Educational Measurement (2nd. ed., pp.
groups expected to be more self-actualized were at
the higher end of the dimension and low self-ac- 443-507). Washington DC: American Council on Ed-
ucation.
tualized groups were at the lower end of the di- Cronbach, L. J., & Meehl, P. E. (1955). Construct va-
mension. There appears to be much evidence that lidity in psychological tests. Psychological Bulletin,
the POI can discriminate between groups in a 52, 281-302.
meaningful way. The ordering of groups along the Cryns, A. G. (1974). Personality characteristics of her-
oin addicts m a methadone treatment program: An
dimension was very similar to the ordering from a
exploratory study. The International Journal of Ad-
group of experts. Further, after clustering the co- dictions, 9, 255-266.
ordinates into four clusters, it was demonstrated Davison, M. L (1983). Multidimensional scaling. New
how new groups could be assigned to one of these York NY: Wiley.
four clusters. Geahigan, C., & Geahigan, P. (1982). Using computers
to search the educational literature: A primer. Con-
Overall, the results suggest that the POI appears
temporary Education Review, 1, 179-193.
to be reasonably valid. The POI seems to reflect
Glass, G. V. (1977). Integrating findings: The meta-
an underlying construct of self-actualization and analysis of research. In L. S. Shulman (Ed.), Review
can be used to meaningfully discriminate between of Research in Education (Vol. 5, pp. 351-379). Itasca
various groups of persons. IL: Peacock Publications.
The method has been used here to demonstrate Gordon, L. U. (1973). One-way analysis of variance
using means and standard deviations. Educational and
only one aspect of validity, and other procedures Psychological Measurement, 33, 77-88.
such as multitrait-multimethod analyses and factor Hattie, J. A. (1981). A four-stage factor analysis ap-
analysis can provide additional information. Given proach for studying behavioral domains. Applied Psy-
the paucity of methods to assess validity, the method chological Measurement, 5, 77-88.
of discrimination between groups using multidi- Hogan, H. W. (1975a). Test of the validity of the Wil-
son-Patterson conservatism scale. Perceptual and Mo-
mensional scaling and clustering is offered as an tor Skills, 40, 795-801.
additional procedure for the test user to marshall Hogan, H. W. (1975b). Validity of a symbolic measure
evidence in support of claims of validity. of authoritarianism. Psychological Reports, 37, 539-
543.
Huck, S. W., & Malgady, R. G. (1978). Two-way anal-
References ysis of variance using means and standard deviations.
Educational and Psychological Measurement, 38, 235-
American Psychological Association. (1974). Standards 237.
for Educational and Psychological Tests and Man- Kruskal, J. B , & Wish, M. (1978). Multidimensional
uals. Washington DC: Author. Scaling. Beverly Hills CA: Sage.
Ball, G. H. (1970). Classification Analysis. Menlo Park Mehrens, W. A., & Lehmann, I. J. (1975). Measure-
CA: Stanford Research Institute. ment and Evaluation in Educational Psychology (2nd
Blashfield, R. K., & Aldenderfer, M. S. (1978). Com- Ed.). New York: Holt, Rinehart & Winston.
puter programs for performing iterative partitioning Osborne, J. W., & Steeves, L. (1982). Counseling prac-
cluster analysis. Applied Psychological Measurement, ticum as a facilitator of self-actualization. The Alberta
2, 533-541. Journal of Educational Research, 28, 248-256.
Burrill, D. F. (1971). Analysis of variance generalized: Pettegrew, L. S., & Wolf, G. E. (1981). Validating
305
Measures of Teacher Stress. Nashville TN: George Theory, Methods and Applications. New York: Ac-
Peabody College for Teachers. (ERIC Document Re- ademic Press.
production Service No. ED 213 743) Shostrom, E. L. (1974). Manual for the Personal Ori-
Reddy, W. B. (1973). The impact of sensitivity training entation Inventory. San Diego CA: Educational and
on self-actualization: A one-year follow-up. Small Industrial Testing Service.
Group Behavior, 4, 407-413. Smith, W. J., & Apfeldorf, M. (1975). Scales which
Rest, J. (1974). Manual for the Defining Issues Test: An measure behavioral reactions to illness during hospi-
Objective Test of Moral Judgment Development. Min- talization and attitudes towards hospitals. Psycholog-
neapolis MN: University of Minnesota. ical Reports, 36, 719-724.
Rest, J. (1976). Moral Judgment Related to Sample Tukey, J. W. (1977). Exploratory Data Analysis. Read-
Characteristics. (NIME Report No. 24988). Minne- ing MA: Addison-Wesley.
apolis MN: University of Minnesota, College of Ed- Watkins, D., & Hattie, J. A. (1981). An investigation
ucation, Department of Educational Psychology. of the constructs validity of three recently developed
Rest, J. (1977). Development in Judging Moral Issues
— personality instruments: An application of confirma-
A Summary of Research Using the Defining Issues tory multimethod factor analysis. Australian Journal
Test. (Technical Report No. 3). Minneapolis MN: of Psychology, 33, 227-284.
University of Minnesota, College of Education, De- Young, F. W., & Lewyckyj, R. (1979). ALSCAL4 User’s
partment of Educational Psychology. Guide. Chapel Hill NC: Data Analysis and Theory
Rest, J. R., Cooper, D., Coder, R., Masanz, J., & Associates.
Anderson, D. (1974). Judging the important issues in
moral dilemmas—An objective measure of develop-
ment. Developmental Psychology, 10, 491-501. Acknowledgment
Rhoads, R. F., & Landy, F. J. (1973). Measurement of
attitudes of industrial workgroups towards psychology Preparation of this article was supported by a grant to
the first author from the Australian Research Grants
and testing. Journal of Applied Psychology, 58, 197-
Committee.
201.
Scott, W. A. (1968). Attitude measurement. In G. Lind-
zey & E. Aronson (Eds.), The Handbook of Social Author’s Address
Psychology (Vol. 2, pp. 204-273). Reading MA: Ad-
dison-Wesley. Send requests for reprints or further information to John
Schiffman, S. S., Reynolds, M. L., & Young, F. W. Hattie, Centre for Behavioural Studies, University of
(1981). Introduction to Multidimensional Scaling: New England, Armidale, N.S.W., Australia, 2351.

View publication stats

Procedures For Assessing The Validities of Tests U

Uploaded by

Copyright:

Available Formats

Procedures For Assessing The Validities of Tests U

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Procedures For Assessing The Validities of Tests U

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Procedures for Assessing the Validities of Tests Using the "Known-

Article in Applied Psychological Measurement · July 1984

John Hattie Ray W Cooksey

SEE PROFILE SEE PROFILE

Innovation Based Doctor of Philosophy View project

The user has requested enhancement of the downloaded file.

analysis of variance statistics can be computed

priori groups. Scott (1968) claimed that it is mis-

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

adults; Although there is much overlap between the

fake asked to try to fake the POI

Figure 1 presents box-plots of the coordinates N = 5) or college-secondary students (Mean .06,

for these 10 groups. The center verticals are means, SD =

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Another example of the problems of a priori

Figure 1 does illustrate that, overall, the POI can

ro,a Independently of the above steps, a group of 26

The coordinate values from the one-dimension

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

so as to maximize group differences. mained in the top group.

along the self-actualizing dimension by calculat- validity of the test.

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

Parameters other than the mean. Paper presented

Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.

You might also like