Using The NBME Self Assessments To Project.17

G TESTING CONDITIONS Moderator: Ann Frye, PhD
Using the NBME Self-Assessments to Project Performance on USMLE Step 1 and Step 2:
Impact of Test Administration Conditions
AMY SAWHILL, AGGIE BUTLER, DOUGLAS RIPKEY, DAVID B. SWANSON, RAJA SUBHIYAH, JOHN THELMAN,
WILLIAM WALSH, KATHLEEN Z. HOLTZMAN, and KATHY ANGELUCCI
ABSTRACT
Problem Statement and Background. This study examined the
extent to which performance on the NBME
Comprehensive
Basic Science Self-Assessment (CBSSA) and NBME Compre-
hensive Clinical Science Self-Assessment (CCSSA) can be
used to project performance on USMLE Step 1 and Step 2
examinations, respectively.
Method. Subjects were 1,156 U.S./Canadian medical students
who took either (1) the CBSSA and Step 1, or (2) the CCSSA
and Step 2, between April 2003 and January 2004. Regression
analyses examined the relationship between each self-assess-
ment and corresponding USMLE Step as a function of test
administration conditions.
Results. The CBSSA explained 62% of the variation in Step 1
scores, while the CCSSA explained 56% of Step 2 score
variation. In both samples, Standard-Paced conditions pro-
duced better estimates of future Step performance than Self-
Paced ones.
Conclusions. Results indicate that self-assessment examina-
tions provide an accurate basis for predicting performance on
the associated Step with some variation in predictive accuracy
across test administration conditions.
Purpose of Study
All medical students who wish to practice (allopathic) medicine in
the United States must pass Step 1 and Step 2 of the United States
Medical Licensing Examination (USMLE). Step 1 and Step 2 are
computer-based tests administered under secure conditions at Pro-
metric
Test Centers worldwide. Step 1 is a 350-item computer-

based test administered over eight hours, while Step 2 is a nine-hour
computer-based examination consisting of 368 items. Approxi-
mately 64% of U.S. medical schools require a passing score on
USMLE Step 1 for promotion to the third year. Additionally,
passing scores on Step 1 and/or Step 2 are required for graduation by
17% and 57% of medical schools, respectively.
1
Performance on
Step 1 and Step 2 are also major factors considered by residency
programs in determining whom to interview and select for resident
positions.
2,3
Thus, from a students perspective, Step 1 and Step 2
are clearly high-stakes examinations, making it useful for students as
well as medical schools to be able to project likely Step 1 and Step
2 performance prior to taking the tests.
Since the inception of USMLE, the National Board of Medical
Examiners (NBME
) has provided paper-and-pencilbased com-

prehensive examinations for the basic and clinical science disci-
plines to interested medical schools. The Comprehensive Basic
Science Examination (CBSE) and the Comprehensive Clinical
Science Examination (CCSE) share the same content coverage and
item formats as their USMLE counterparts, but each contains fewer
items. Because of the similarity between comprehensive examina-
tions and their corresponding USMLE examination, students and
schools commonly nd the comprehensive examinations to be
valuable tools for examinees preparing to take USMLE. The CBSE
and CCSE are securely administered by medical schools and are
commonly used to predict performance on USMLE and to identify
students at risk for failing Step 1 and Step 2.
4,5
In 2003, the NBME introduced a new series of Web-based
self-assessment examinations. Like the medical school-administered
comprehensive examinations, the Web-based self-assessments were
designed to reect the format and content of their analogous
USMLE examination. The Comprehensive Basic Science Self-
Assessment (CBSSA) consists of 200 items recently retired from
the Step 1 item pool, while the Comprehensive Clinical Science
Self-Assessment (CCSSA) contains 184 items recently retired from
the Step 2 pool. In contrast to the Step examinations, the four-
section CBSSA and CCSSA can be taken via the Web at any time
and from any location, provided that the examinees computer is
Internet capable and meets system requirements.
Prior to ofcial implementation of the CBSSA and CCSSA,
examinees were given the opportunity to take the self-assessments
free of charge for quality assurance purposes. Examinees were
provided with vouchers permitting them to take the examinations
during three-day eld tests. Although the self-assessments did not
have to be completed in one sitting, the free self-assessments could
only be accessed and completed during the eld test period. There-
after, users paid a fee to take the self-assessment examinations.
Examinees elect to take the self-assessment test forms under two
timing conditions: Standard-Paced, analogous to the one-hour-per-
section timing of their Step 1 and Step 2 examinations (CBSSA 50
items/section; CCSSA 46 items/section) or Self-Paced, where ex-
aminees have up to four hours to complete each section. Regardless
of the timing condition elected, within an assessment section,
examinees are free to complete test items in any order, skip items,
review responses, and change answers. Examinees are also permitted
to exit and resume the assessment as frequently as they choose,
provided that the allotted time for the section has not expired.
Upon completion of the full self-assessment, examinees are given
immediate feedback in the form of a performance prole, which
includes a total score and a graphical prole (similar to those for
Step 1 and Step 2) indicating general content areas of relative
strength and weakness. The graphical prole denes a borderline
level of performance in each of the content areas addressed by the
individual self-assessment; the CBSSA includes information cov-
ered during basic science education courses, while the CCSSA
includes information covered during the core clinical clerkships.
Examinees may choose to use the performance prole provided by
the self-assessment for further preparation for Step 1 and Step 2.
Total scores on each self-assessment range from 200800 and are
scaled to have a (statistically projected) mean of 500 and a standard
deviation of 100 in reference groups of rst-time takers from
U.S./Canadian schools (2001 Step 1 cohort, 2003 Step 2 cohort).
This study was designed to examine the extent to which perfor-
mance on the CBSSA and CCSSA can be used to predict perfor-
mance on USMLE Step 1 and Step 2, respectively. Since the
CBSSA and CCSSA are composed of retired items from the Step 1
and Step 2 item pools, it was expected that examinees performing
well on the Web-based self-assessments would also perform well on
USMLE Step 1 and Step 2. In addition, the performance of
AC A D E M I C ME D I C I N E , VO L . 7 9 , NO . 1 0 / OC T O B E R S U P P L E M E N T 2 0 0 4 S55
examinees taking the self-assessments under Standard-Paced con-
ditions similar to Step 1 (50 items/hour) and Step 2 (46 items/
hour), was expected to provide a more accurate basis for predicting
performance on USMLE. We also hypothesized that paid assess-
ments would provide a better basis for projecting USMLE scores
than eld tests provided free of charge, since examinees paying for
assessments would likely be more motivated to perform consistently
well across sections.
Method
The rst set of subjects included 848 U.S. medical school students
who took the Web-based version of the CBSSA between April and
December 2003 and subsequently took the USMLE Step 1. The
second set of subjects consisted of 308 U.S. medical students who
completed the Web-based version of the CCSSA between October
2003 and January 2004 and subsequently took the USMLE Step 2.
Subjects within each sample were eliminated if their pattern of
performance on the self-assessment indicated that they had not
made a serious attempt at completing all sections of the form
dened as scoring below chance levels of performance (less than
20% correct) on one or more sections. In addition, they must have
completed their self-assessment prior to their rst attempt on Step
1 or Step 2. Within each of the samples, the subjects were divided
into four subgroups based on their test administration conditions, a
combination of chosen timing condition and whether it was a paid
or free self-assessment. The four subgroups were (1) Standard-Paced
Paid; (2) Self-Paced Paid; (3) Standard-Paced Free; and (4) Self-
Paced Free.
Multiple regression analyses investigated performance on the
associated USMLE Step as a function of (1) performance on the
corresponding self-assessment, CBSSA, or CCSSA; (2) the self-
assessment timing condition elected by the examinee (Standard or
Self-Paced); and (3) the cost of the self-assessment (paid or free).
Results
Table 1 provides descriptive statistical information concerning the
performance of study participants on the self-assessment examina-
tions and USMLE Steps 1 and 2 for each test administration
condition. Average Step 1 and Step 2 scores for study participants
indicate that the ability level of the samples is slightly higher than
the cohorts of rst-time takers from U.S./Canadian schools in 2003.
The CBSSA sample reported a mean Step 1 score of 224 and a
standard deviation (SD) of 22, compared with the latter groups
mean and SD of 217 and 20. The mean and SD of Step 2 scores for
the CCSSA sample was 222 and 23, compared with 217 and 23 for
rst-time takers from U.S./Canadian schools.
Mean performance on USMLE Steps varied markedly across the
four test administration groups, as did mean performance on the
CBSSA and CCSSA. For both self-assessments, higher mean scores
were observed in the Paid groups. These differences among the Paid
and Free test-takers could be due to differences in incentive among
examinees paying for the assessment compared to those who took it
for Free, or to differences in length of time between the self-
assessment and the subsequent Step administration. Free examina-
tions were taken during the eld trial prior to ofcial implementa-
tion of the self-assessments. Therefore, the average length of time
between the tests was generally longer for examinees in the Free
conditions than in Paid conditions (44 days for the CBSSA Free
test-takers compared to 11 days for the Paid test-takers, and 32 and
nine days, respectively, for the CCSSA examinees).
The multiple regression model predicting Step 1 performance
regressed Step 1 scores against CBSSA performance as a function of
each test administration condition. This model explained 62% of
the total variation in Step 1 scores and indicated statistically
signicant differences (p .01) among both the intercepts and
slopes for the lines dened by each of the test administration
conditions. The analogous model for predicting Step 2 scores from
CCSSA performance and test administration conditions explained
57% of the variance in Step 2 scores; statistically signicant differ-
ences (p .01) were observed in the intercepts, but not the slopes,
for the test administration groups. The lack of statistical signicance
for the latter may reect the relatively small sample sizes in some
CCSSA groups and the resulting lack of power to detect such
differences.
Figure 1 graphically depicts the results of the regression analyses
for each sample. In Figure 1(top), the relationship between CBSSA
and Step 1 scores is strongest in the Standard-Paced Paid group (R
2
.69), while the weakest relationship was for the Self-Paced Free
group (R
2
.49). Across most of the CBSSA score range, the
self-assessment score maps into a higher, predicted Step 1 score if
the self-assessment was obtained under Standard-Paced conditions.
This suggests that the extra time allotted under the Self-Paced
conditions resulted in improved performance on the CBSSA; how-
ever, such scores did not provide as accurate an estimate for future
Step 1 scores as the Standard-Paced conditions. Scanning vertically
TABLE 1. Descriptive Analyses of Observed Self-Assessment Scores with Observed and Predicted USMLE Scores by Test Administration Condition
Test Admin.
Condition Form Timing
Subjects
n
CBSSA Step 1 Predicted Step 1
Mean SD Mean SD Mean SD
Paid
Standard-paced 357 499 103 224 22 224 19
Self-paced 139 554 115 229 21 229 15
Free
Standard-paced 257 407 92 225 21 225 16
Self-paced 95 397 83 218 21 218 15
Total 848 469 115 224 22 224 17
n CCSSA Step 2 Predicted Step 2
Mean SD Mean SD Mean SD
Paid
Standard-paced 34 485 105 220 23 220 19
Self-paced 16 453 104 206 23 206 18
Free
Standard-paced 189 446 110 225 24 225 18
Self-paced 69 425 91 219 21 219 13
Total 308 446 106 222 23 222 18
across the four regression lines, an observed score of 500 on the
CBSSA was associated with predicted Step 1 scores of 218, 221,
237, and 241 for Self-Paced Paid, Standard-Paced Paid, Self-Paced
Free, and Standard-Paced Free conditions. Thus, for a given CB-
SSA score in this range, a lower Step 1 score is expected for
examinees in the Self-Paced Paid group, probably reecting: a high
level of motivation to do well on the self-assessment, ample time for
CBSSA completion, and the opportunity to look up answers to
questions while taking the exam. In contrast, if an examinee in the
Standard-Paced Free condition received the same CBSSA score, it
resulted in a higher, expected Step 1 score, reecting the likelihood
that these examinees took the self-assessment earlier in the Step 1
preparation process under realistic Step 1-like pacing conditions.
In the second sample, similar results were obtained for predicting
Step 2 performance from CCSSA scores: the relationship was
strongest in the Standard-Paced Paid group (R
2
.74) and weakest
in the Self-Paced Free group (R
2
.40). Additionally, an observed
score of 500 on the CCSSA was associated with observed Step 2
scores of 204, 218, 224, and 230 for Self-Paced Paid, Standard-
Paced Paid, Self-Paced Free, and Standard-Paced Free conditions;
these results parallel those for the CBSSA/Step 1 sample.
Discussion
Across test administration conditions, performance on the new
CBSSA and CCSSA examinations provided accurate predictions of
performance in USMLE Step 1 and Step 2, with the best predictors
produced when self-assessments were taken under Standard-Paced
conditions. The difference in explained variance probably reects
the greater similarity of Standard-Paced conditions to the test
administration conditions for Step 1 and Step 2, as well as the
opportunity that self-assessment examinees have to use reference
material under the Self-Paced conditions. Additionally, examinees
who paid for the assessments, rather than taking them free of
charge, tended to have self-assessment scores that provided more
accurate estimates of their future Step 1 and Step 2 performance.
These differences may be due to the length of time between the
self-assessments and subsequent Step 1 or Step 2 administrations.
Since the free administrations were available only in the initial
release phase of the self-assessments, most of these examinees may
have taken the self-assessment earlier in their preparation time
period for the associated Step examination.
Comparisons of the results of this study with previous research
6,7
suggest that the self-assessments, taken under Standard-Paced Paid
conditions, provide a more accurate basis for predicting Step 1
(R
2
.62) and Step 2 (R
2
.56) performance than NBME subject
tests given by medical schools (R
2
for NBME basic science subject
tests range: .35 for histology to .50 for pathology; clinical science
subject tests range: .28 for psychiatry to .55 for internal medicine).
The predictive accuracy of the self-assessments could be due to: (1)
the relatively short time interval between the self-assessments
and the Step administrations in this study, (2) the greater test
length of the self-assessments, and/or (3) the greater similarity in
content coverage to the Step examinations. Assuming these results
are replicable in future studies, it appears that the performance
proles provided by the self-assessments should furnish prospective
Step 1 and Step 2 examinees with an excellent basis for judging
their readiness to sit for USMLE.
Correspondence: Amy J. Sawhill, MA, National Board of Medical Examiners, 3750
Market St. Philadelphia, PA 19104; e-mail: [email protected].
References
1. Association of American Medical Colleges (AAMC) Curriculum Directory. Insti-
tutional characteristics http://services.aamc.org/currdir/section1/requirements1.cfm.
Accessed 28 July 2003. Washington, DC: AAMC.
2. Wagoner NE, Suriano JR, Stoner JA. Factors used by program directors to select
residents. J Med Educ. 1986;61:1021.
3. Wagoner NE, Suriano JR. Program directors responses to a survey on variables used
to select residents in a time of change. Acad Med. 1999;74:518.
4. Glew RH, Ripkey DR, Swanson DB. Relationship between students performances
on the NBME Comprehensive Basic Science Examination and the USMLE Step 1:
a longitudinal investigation at one school. Acad Med. 1997;72:1097101.
5. Werner LS, Bull BS. The effect of three commercial coaching courses on Step One
USMLE performance. Med Educ. 2003;37:52731.
6. Holtman MC, Swanson DB, Ripkey DR, Case SM. Using basic science subject tests
to identify students at risk for failing Step 1. Acad Med. 2001;76(10 suppl):S4851.
7. Ripkey DR, Case SM, Swanson DB. Identifying students at risk for poor perfor-
mance on the USMLE Step 2. Acad Med. 1999;74(10 suppl):S458.
Figure 1. (Top) Relationship between scores on CBSSA and Step 1 by test admin-
istration condition. (Bottom) Relationship between scores on CCSSA and Step 2
by test administration condition.

Using The NBME Self Assessments To Project.17

Uploaded by

Copyright:

Available Formats

Using The NBME Self Assessments To Project.17

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using The NBME Self Assessments To Project.17

Uploaded by

Copyright:

Available Formats

G TESTING CONDITIONS Moderator: Ann Frye, PhD

Test Centers worldwide. Step 1 is a 350-item computer-

) has provided paper-and-pencilbased com-

You might also like