The Test of Performance Strategies (TOPS 2) - Development and Validation of TOPS 2 Short form8JUN2020
The Test of Performance Strategies (TOPS 2) - Development and Validation of TOPS 2 Short form8JUN2020
The Test of Performance Strategies (TOPS 2) - Development and Validation of TOPS 2 Short form8JUN2020
The Test of Performance Strategies (TOPS 2): Development and validation of TOPS 2 short
form
1
Abstract
Purpose: The Test of Performance Strategies, TOPS 2 (Hardy, Roberts, Thomas & Murphy,
(2010) may be perceived as too long, especially when used in conjunction with a battery of
other instruments (Marsh, Martin, and Jackson, 2010). Therefore, the purpose of this study
was to examine the psychometric properties of refined TOPS 2 and to develop a robust,
Design and Method: The recommended criteria (see Marsh, et al., 2010) are applied in
selection of items for TOPS 2 short form (TOPS 2-S). A minimum of three items per factor
was recommended by Kline (2005) and Marsh et al. (2010), which the present study adopted.
Confirmatory factor analysis (CFA) and exploratory structural equation modelling (ESEM)
was conducted with Mplus using maximum likelihood estimation to investigate the factor
Results and conclusion: The TOPS 2 and TOPS 2-S reliability estimates were consistently
high for all factors. The CFA and ESEM showed that the confirmatory fit indices met the cut
off in the criteria advocated by Hu and Bentler (1999). The multitrait multimethod (MTMM)
analysis showed that TOPS 2-S had met the criteria for convergent and discriminant validity.
The Test of performance Strategies Short Form (TOPS 2-S) is available for research
2
The present study critically examined the psychometric properties of the Test of Performance
Strategies (TOPS 2), a popular measure of the mental skills and strategies used by athletes in
competition and during practice. The main aim of this research was to develop a robust,
of a range of psychological skills during practice and in competition (Hardy et al., 2010). It
has 17 factors with items designed to measure eight practice and nine competition skills.
These practice skills are goal- setting, imagery, attention control, self-talk, activation,
negative thinking subscale in competition scale. All subscales have four items each.
Lane, Harwood, Terry and Karageorghis (2004) investigated the original TOPS and found it
to be lacking good psychometric properties (see supplementary notes). This prompted Hardy
et al. (2010) to develop TOPS 2. Hardy and colleagues conducted a study to examine the
factor structure of the questionnaire with 220 Australian, 120 North American and 225
British athletes from 48 different sports across a wide range of ability levels. They found the
initial fit for both nine factor competition and eight factor practice subscales were acceptable
(CFI, TLI ≥ .95; RMSEA ≤ .06). The further refinement of TOPS 2(version 3) showed that
the fit for the competition model improved while for the practice model fit improvement was
substantial (see Hardy et al., 2010). Cronbach’s alphas range for competition was from .62
(Automaticity) to .89 (Emotional Control) while for practice the ranges was from .71
The purpose of the present study was to develop a robust, reliable, and valid short form of
TOPS 2. This study followed recommendations in line with the construct validity approach
suggested by Marsh et al. (2010). These four basic guidelines were adopted to evaluate the
TOPS 2 and its short form (TOPS 2 S): (a) A strong original instrument was a requirement
for developing short form; (b) Short form should retain the content coverage of each factor;
(c) Each factor on the short form must be adequately reliable; and (d) The short form must
This study used confirmatory factor analysis (CFA) and exploratory structural equation
modeling (ESEM) to test the structural integrity of TOPS 2 and TOPS 2 S. The main focus
was on the application of CFA. However, some comparison of CFA result was made with
ESEM, a new and evolving statistical procedure. ESEM provides confirmatory tests of a
priori factor structures, relations between latent factors and multigroup/multioccasion tests of
full measurement invariance strategy (Marsh, Liem, Martin, Morin and Nagengast, 2011;
Marsh, Morin, Parker & Kaur, 2014). ESEM integrates the best features of exploratory factor
analysis (EFA), CFA and structural equation modelling (SEM) (Marsh, Nagengast, Morin,
Parada, Craven & Hamilton, 2011). Marsh and colleagues (2014) emphasized that when there
is a well- defined a priori factor structure ESEM can be used as a confirmatory tool.
Method
Sample.
Existing data from the TOPS 2 instrument (version 3) was obtained from original authors of
TOPS (Thomas, Murphy & Hardy. 1999). This data was from a sample of 538 participants,
composed of 286 males (53.2%) and 252 females (46.8%), from diverse sports groups.
4
Criteria applied in the selection of items for the TOPS 2 S short form.
Five goals were established for selecting items for TOPS 2 S (see Marsh et al., 2010). The
items selected for each subscale were based on the confirmatory factor analysis with
guidelines listed below (see supplementary material for further explanation). Items retained:
1. Had highest factor loadings from each subscale in CFA (this also matched the item-
indicate how much fit would improve if allowed loading onto a factor other than the
3. Had low correlated uniqueness (CUs) with other items in the same subscale, as
indicated by Mplus’ modification indexes. If more than one item had high correlated
the researchers).
Researchers (Hu & Bentler, 1999; Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Marsh,
et al., 2011) have suggested a number of fit statistics derived from the minimised discrepancy
function. In this study emphasis was placed on the absolute fit indices such as, RMSEA, the
Tucker-Lewis index (TLI), and comparative fit index (CFI) to evaluate goodness of fit.
RMSEA values of .05 and less reflects a model of close fit while values between .05 and .08
indicate reasonable fit (Marsh et al., 2011). The TLI and CFI indices lie between zero (0) and
one (1). Values exceeding 0.95 are typically taken to be excellent fit while values greater than
0.90 are taken to reflect acceptable fit to the data (Marsh et al., 2010). Marsh, Hau and Wen
(2004) suggested that CFI fit indexes should not be treated as golden rules but rather form the
5
basis of preliminary interpretations that must be followed in relation to the specific details of
the research.
The CFA and ESEM was conducted with Mplus (version7.11) using maximum likelihood
estimation to investigate the factor structure of the 51 items TOPS 2-S and to compare it with
the factor structure of the 68 items TOPS 2. Primarily CFA was used to develop TOPS 2
short form. Marsh, et al. (2014) stated the current CFA standards are too restrictive and hence
many psychological instruments used in applied research do not meet the minimum criteria of
acceptable fit. They recommend use of ESEM in conjunction with CFA to get more definitive
results.
Multiple sets of ESEM factors can be defined in ESEM as ESEM or CFA factors where
(Marsh et al., 2014). Marsh and colleagues argue that a stronger a priori model is facilitated
through target rotation in ESEM. See Marsh et al. (2014) for further explanation.
Marsh, Asci, Thomas, 2002) and was one of the standard criteria for evaluating instruments
such as TOPS 2. Marsh, Morin Parker and Kaur (2014) established that, compared to CFA
model which can provide results in inflated correlations among different factors, ESEM is
well suited to the construction of latent MTMM correlation matrices that can be evaluated
with Campbell-Fiske guidelines. Marsh and colleagues argued that several MTMM ESEM
studies provide strong approaches to evaluating discriminant validity. This study, therefore
examined both the ESEM and CFA approaches to MTMM analysis for competition vs
Correlations among sixteen latent constructs (Table 3) form the MTMM matrix. The 16x16
representing the 8 scales common to the practice and competition; correlation matrix relating
to the sixteen TOPS-2-S factors can be seen in the MTMM matrix (Table 3, 4).
The four validation processes advocated by Campbell and Fiske (1959) were: (a) Entries in
the validity diagonal (also known as monotrait-monomethod values) were examined and the
evidence of convergent validity is shown with the diagonal values being significant from zero
and also if sufficiently large; (b) The higher values in the validity diagonal compared to the
values lying in its column and row in the heterotrait-heteromethod triangles suggests
discriminant validity; (c) The third process involves comparing the values of a given variable
in the validity diagonal with its values in the heterotrait-monomethod triangles and in this
process the variable should correlate higher to measure the same trait than with measures of
different traits that employ the same trait; and (d) All of the heterotrait triangles of
monomethod and heteromethod should show the same pattern of trait interrelationship. The
Results
Reliability
TOPS 2 reliability estimates were consistently high for each factor (Table 1). Preliminary
investigation of TOPS 2 showed good reliability across all 17 factors. The TOPS 2-S
reliability estimates were also consistently high for each subscale (Table 1) and showed good
reliability across all 17 factors. Cronbach alpha values (calculated to show the internal
consistencies of the TOPS 2 and TOPS 2-S questionnaires) obtained for both overall scales
were above .80. All competition alphas were .80 and over except for automaticity (.77). The
practice scale alphas were .70 and over except for activation (.69). However, the Cronbach’s
alphas calculated for TOPS 2-S subscales were lower than .80 but above .69. The lowest
7
alpha of .69 for 3-items Activation (TOPS 2- S) matches with alpha obtained for 4-item
(TOPS 2) Activation scale (α = .72). Due to the low 4-item (Activation) alpha it was
anticipated that the alpha value for this 3-item subscale will be lower but very close to .70
(see Table 1 for the internal consistency estimates for all competition and practice subscales).
The confirmatory fit indices fell in the traditional criterion advocated by Hu and Bentler
(1999). The RMSEA, CFI and TLI all show adequate to good fit for the TOPS 2 and TOPS 2-
S (see Table 2). Overall ESEM shows a better model fit compared to CFA results.
Competition subscales. The fit for the TOPS 2- S competition scale, containing 27
items, showed good model fit with: RMSEA = .04; CFI = .97; TLI = .97. There was a
noticeable improvement in the RMSEA, CFI and TLI for the short form version in
comparison to the CFA index obtained from TOPS 2 analysis. The CFA for the TOPS 2 nine
factor competition scales, containing 36 items showed good support for the model (Table 2).
In comparison to CFA the ESEM results of the TOPS 2 S for competition was: RMSEA =
.03; CFI = .99; TLI = .98. There was an improvement in all fit indices for ESEM. The ESEM
for 4-item TOPS 2 also showed significant improved model fit compared to the CFA (see
Table 2).
Practice subscales. The practice scale with 3-item per subscales with a total of 24
items showed adequate fit with: RMSEA = .05; CFI = .95; TLI = .94. This result is similar to
the TOPS 2 practice scale which also had adequate model fit (see Table 2). There was a slight
improvement in the fit indices (CFI & TLI) for the TOPS 2-S practice analysis when
support for 3-items per subscale as a more parsimonious model. Another source of support
for a more parsimonious 3-items model was indicated by an increase in incremental fit
8
indexes CFI and TLI. In comparison to CFA the ESEM for TOPS 2 S practice also showed
some improvement in model fit (see Table 2). ESEM results for TOPS 2 S practice showed:
RMSEA = .05; CFI = .97; TLI = .98. There was only a marginal improvement in CFI index.
The competition results in ESEM for 4-item TOPS 2 also showed significant improved model
fit when compared to the CFA results (see Table 2). The above results indicate that ESEM
Factor Loading. The sixteen target factor loadings (see table) range for 3-items
competition and practice scales were both from .68 to .90. The 3-item model had a slightly
better factor loading range for both competition and practice subscales when compared to the
MTMM.
The MTMM results obtained for TOPS 2-S (Table 3) show that the eight convergent
validities (shaded in grey) were consistently high (Mean = .75, Range = .586 - .915); only the
coefficients for attentional control (.59), goal setting (.67) and activation (.67) were less than
.70. Similar pattern of results was obtained when an ESEM MTMM analysis (Table 4) was
carried out between competition and practice scales (Mean = .75, Range = .564 - .991). The
results also show that the validity diagonal values for both ESEM and CFA analysis were
higher than the values in its column and row in the heterotrait-heteromethod (HTHM)
triangles. The reliability diagonal values indicated the TOPS 2-S has met the criteria for
convergent validity.
The correlations between different factors administered on different method in the heterotrait-
heteromethod (HTHM) sub square matrix (Mean = .37; Range = .124 - .628) are lower than
the values in the validity diagonals (see Table 3). The information presented above can be
9
taken as the justification for validation of discriminant validity for TOPS 2-S. Similar results
The CFA correlations between different factors on the same method, heterotrait-monomethod
(HTMM) correlations in the diagonal sub matrixes (Mean = .49, Range = .120 - .871) are
only slightly larger than the HTHM sub-triangle correlations and also substantively lower
than the convergent validity in most cases. All convergent validity values are higher than
HTMM correlations except these: CA-PA (.667) had ten out of 14 higher HTMM triangle
values, CAC-PAC (.586) also had ten out of 14 higher HTMM triangle values while for CG-
PG (.667) only two out of 14 correlations had higher HTMM triangle values. Matching
The comparison of diagonal validity values of 56 HTHM and 56 HTMM correlations can be
a validation process for discriminant validity. The results obtained show that the criteria had
met the requirements for most variables. The diagonal correlations provided evidence for
discriminant validity. The goal setting, imagery, attention control, self-talk, emotional
control, automaticity, and relaxation correlations all met requirements for discriminant
validity.
The fourth aspect of the question of validity can be seen when the same pattern of trait
(Campbell & Fiske, 1959). The profile similarity indexes are similar in all heterotrait
The ESEM MTMM comparison showed that the factor correlations among different factors
were substantially smaller than the corresponding CFA factors (Tables 3, 4, 5). ESEM and
CFA both provided good fit to the TOPS 2 S. Marsh, et al. (2014) however, state that ESEM
10
routinely indicate a more accurate estimation of the factor correlation and thus recommends
that both ESEM and CFAs should be applied to the same data for a thorough analysis.
Discussion
The present study examined the appropriateness of the short form of the popular and widely
used TOPS instrument. The psychometric properties, such as reliability, factor structure,
correlated uniqueness, and cross loadings, of TOPS 2 and TOPS 2-S were thoroughly
examined. It is also demonstrated here that the ESEM indicated a better model fit and can be
The evaluation guidelines proposed by Marsh et al. (2010) was adopted to develop TOPS 2-
S: Four basic relevant guidelines used for the present study are presented below:
short form. The present study started with a strong TOPS 2 instrument. The conceptual
problems of original TOPS raised by Lane et al. (2004) were eradicated by Hardy et al.
(2010) in their refinement of TOPS. The CFAs reported in Hardy et al. (2010) and the present
study provided strong support for the use of TOPS 2 to measure the use of psychological
(b) Short form should retain the content coverage of each factor. The MTMM
analysis of the TOPS 2 and TOPS 2-S was used to test this assumption and both demonstrate
that the content coverage of the two instruments is invariant (see Marsh et al., 2010). The
MTMM analysis, where multiple traits are assessed by multiple methods and parallel analysis
of the short and long form, is a strong approach and provides strong support for construct
validity of the TOPS 2-S and its equivalence with the TOPS 2 (see Marsh et al., 2010).
11
(c) Each factor on the short form must be adequately reliable. Marsh et al. (2010)
recommended reliability estimates should ideally be .80 or higher whereas Smith et al. (2000)
accepted reliability coefficients of .70 to be adequate. The alphas obtained for TOPS 2 short
form subscales, in the present study were, comparatively, close to the alphas of the TOPS 2
long form. Reliability Cronbach alphas indicated that the TOPS 2-S compared very well to
the TOPS 2 and showed good reliability across all seventeen factors. There seem to be a very
small loss of reliability (mean reliability for TOPS 2 and TOPS 2-S competition scale were
.86 and .83 respectively; the mean reliability for TOPS 2 and TOPS 2-S practice scale were
.81 and .78 respectively). The TOPS 2-S results only marginally differ from the TOPS 2 long
acceptable for factors with fewer item, such as in this case where each factor has 3 items (see
Loewenthal, 2001). Moreover .69 is only marginally below .7 and is accepted as adequate
reliability due to the less number of items in the subscales (see Loewenthal, 2001;
(d) Short form must retain the factor structure of the original form. The CFA and
ESEM results indicated a good to adequate fit for seventeen factors TOPS 2 and TOPS 2–S.
The confirmatory factor analyses separately examined the competition and practice scale is
concurrent with previous TOPS studies (see Hardy et al. 2010; Lane, et al., 2004). The
analysis of seventeen factor structure of both TOPS 2 and TOPS 2-S provided a good fit for
the model. The fit for the nine-factor competition scales was good for both the TOPS 2 and
the short form. The eight-factor practice scales for both TOPS 2 and the short form did not
show as good fit as the competition scales but was adequate and acceptable.
12
Conclusion
A robust, reliable, and valid short form of TOPS 2 was developed. CFA and ESEM results
indicated that TOPS 2-S maintained high reliability and the construct validity. Moreover, the
TOPS 2-S factors relate to the established and cognate TOPS 2 constructs. Finally, a strong
and robust 51-item Test of Performance of Strategies retaining all the qualities of its original
form was developed. The range of items retained was able to maintain the breadth and depth
of each factor. It was important to retain the same TOPS2 structure and thus all seventeen
factors to maintain its reliability and validity. It is anticipated that the refinement of TOPS 2
will help reduce administration time (especially when supplemented with other
questionnaires) and further encourage researchers to use this measure in their applied
research. Moreover, applied researchers may find TOPS 2-S a practical replacement for
TOPS 2 and information obtained through this instrument can enable them to plan, build and
Note. 4 item comp = competition items from the TOPS 2; 3 item comp = competition items from the TOPS 2
short form; 4 item prac = practice items from TOPS 2; 3 item prac = practice items from the TOPS 2 short form.
14
Table 2. Confirmatory factor analysis of TOPS 2, TOPS 2-S and ESEM of TOPS 2-S
TOPS 2 TOPS 2 S
Chi-
Note: df = Degrees of Freedom, RMSEA = Root Mean Square Error of Approximation, CFI = Comparative Fit Index, TLI = Tucker-Lewis Index, CFA = Confirmatory
PG 1.000
PI .707 1.000
PAC .530 .366 1.000
PST .705 .670 .460 1.000
PA .653 .568 .704 .669 1.000
PEC .214 .167 .495 .354 .562 1.000
PAU .223 .280 .367 .273 .496 .350 1.000
PR .449 .600 .195 .625 .428 .175 .120 1.000
CG .667 .490 .364 .488 .485 .155 .171 .265 1.000
CI .517 .870 .191 .537 .507 .201 .167 .453 .550 1.000
CAC .423 .444 .586 .454 .540 .501 .328 .203 .522 .469 1.000
CST .539 .628 .339 .915 .549 .330 .199 .567 .556 .669 .592 1.000
CA .453 .471 .415 .536 .667 .474 .357 .279 .527 .583 .871 .720 1.000
CEC .216 .267 .336 .334 .358 .744 .220 .124 .295 .310 .705 .474 .729 1.000
CAU .315 .380 .383 .364 .540 .427 .729 .238 .436 .428 .746 .522 .848 .563 1.000
CR .362 .498 .162 .571 .422 .201 .129 .854 .301 .486 .303 .661 .408 .233 .347 1.000
CNT .391 .421 .431 .528 .500 .585 .266 .335 .436 .461 .739 .722 .773 .757 .661 .436 1.000
16
Note. Letters starting with P have Practice subscales and those starting with C have competition subscales: G = goal setting, I = imagery, AC = attention control, ST = self-
Note: ESEM = exploratory structural equation modelling, HTMM = heterotrait monomethod, HTHM =
References
Hardy, L., Roberts, R., Thomas, P. R. & Murphy. S. (2010). Test of Performance Strategies
Hu, L. & Bentler, P. M. (1998). Fit indices in covariance structure modelling: Sensitivity to
453.
Kline, R. B. (2005). Principles and practice of structural equation modelling (2nd ed.). New
York: Guildford.
Marsh, H. W., Asci, F. H., & Tomas-Marco, I. (2002). Multitrait-multimethod analysis of two
Marsh, H. W., Ellis, L. A., Parada, R. H., Richards, G. & Heubeck, B. G. (2005). A short
Marsh, H. W., Hau, K. T. & Wen, Z. (2009). In search of golden rules: Comment on
hypothesis-testing approaches to setting cut off values for fit indexes and dangers in
Marsh, H. W., Liem, G. D., Martin, A. J., Morin, A. J. S. & Nagengast, B. (2011).
Marsh, H. W., Martin, A. & Jackson, S. (2010). Introducing a short version of the
criteria, and applications of factor analyses. Journal of Sport & Exercise Psychology,
32, 438-482.
Marsh, H. W., Morin, A. J. S., Parker, P. D. & Kaur, G. (2014). Exploratory structural
Marsh, H. W., Nagengast, B., Morin, A. J. S., Parada, R. H., Craven, R. G., Hamilton, L. R.
Tabachnick, B. G. & Fidell, L. S. (2013). Using multivariate statistics (6th ed.) Boston:
Pearson.
Thomas, Murphy & Hardy (1999). Test of performance strategies: Development and