Research Article: Norma J. Macintyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

International Scholarly Research Network
ISRN Rheumatology
Volume 2011, Article ID 571698, 8 pages
doi:10.5402/2011/571698
Research Article
Optimizing Reliability of Digital Inclinometer and Flexicurve
Ruler Measures of Spine Curvatures in Postmenopausal Women
with Osteoporosis of the Spine: An Illustration of the Use of
Generalizability Theory
Norma J. MacIntyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

School of Rehabilitation Science, McMaster University, IAHS, Room 403, 1400 Main Street West, Hamilton, ON, Canada L8S 1C7
Correspondence should be addressed to Norma J. MacIntyre, [email protected]
Received 1 December 2010; Accepted 2 January 2011
Academic Editors: A. Adebajo and K. Uusi-Rasi
Copyright © 2011 Norma J. MacIntyre et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
The study illustrates the application of generalizability theory (G-theory) to identify measurement protocols that optimize
reliability of two clinical methods for assessing spine curvatures in women with osteoporosis. Triplicate measures of spine
curvatures were acquired for 9 postmenopausal women with spine osteoporosis by two raters during a single visit using a digital
inclinometer and a flexicurve ruler. G-coefficients were estimated using a G-study, and a measurement protocol that optimized
inter-rater and inter-trial reliability was identified using follow-up decision studies. The G-theory provides reliability estimates for
measurement devices which can be generalized to different clinical contexts and/or measurement designs.
1. Introduction a unitless quantity. As such, it comments on the relative,

reliability of a measure. The SEM is a measure of absolute
Measuring devices are used routinely in rheumatology clini- reliability in that it expresses measurement error in the same
cal examinations and research. Reliability analysis quantifies units as the original measurement. For a measure to be
the consistency of examinee performance [1, 2]. When clinically useful it must possess a sufficiently high reliability
differences arise among repeated measurements performed coefficient and a sufficiently low SEM. Despite the availability
on a truly stable examinee, it is attributed to measurement of reliability studies, it is challenging to select information
error. Not surprisingly, the clinical literature has devoted applicable to a particular clinical context.
considerable attention to reliability studies to ensure that the It is recommended that each clinical setting establish
measurements obtained are reliable [3–5]. It is important to reliability for measurements obtained by their specific asses-
know the magnitude of the error variance for a given mea- sors (raters) on their particular patient population. This
surement in order to determine the confidence in a measured position is held, in part, because studies of measurement
value and to assess change in the examinee over time. Applied error in the clinical arena have predominantly adopted a
in the context of reliability analysis, measurement error is classical test theory (CTT) framework [1, 2]. CTT states
an all-encompassing term that includes inherent variation that the observed score (i.e., the measured value) is equal
in the examinee, inconsistencies within and between raters, to the sum of the true score and measurement error. The
and many other sources of potential variation excluding true true score is conceptualized as the average score that would
differences among examinees under investigation. Typically, be obtained from an infinite number of measurements
two coefficients—the reliability coefficient and the standard performed on a truly stable examinee. Consistent with this
error of measurement (SEM)—are used to characterize the conceptualization is that the distribution of measurement
reliability of a measure [1]. The reliability coefficient is error values would represent a normal distribution with a
2 ISRN Rheumatology
mean of zero. CTT also dictates that true scores and error Table 1: Comparison of differences between classical test theory
scores are independent. CTT defines the reliability coefficient and generalizability theory.
(R) as the ratio of true score variance to observed score
Classical test theory Generalizability theory
variance (i.e., sum of true and error variances)
Universe of admissible
True score
true score variance σt2 observations’ score
R= ,
observed score variance σx2 One identifiable source of Multiple sources of
(1) “error” variance identifiable “error” variances
2
true score variance σt
R= . One-way ANOVA Factorial ANOVA
true score variance σx2 + error variance σe2 “What if ” optimizing “What if ” optimizing
The SEM is equal to the square-root of the error variance. assessment method: assessment method: design
The variance terms are obtained from a one-way analysis Spearman Brown study
of variance (ANOVA). Finally, because measurements take
place in context, measurement properties comment on the
inextricable link among measure, examinees, and measure- a study to estimate the variance components. Within the G-
ment process: tests and measures do not have reliabilities, theory lexicon, this is referred to as a generalizability study
while the measures’ scores do [2]. (G-study). Numerous G-study designs exist [8] and it is
Despite the common use of CTT for characterizing beyond the scope of this monograph to provide a review
reliability, there are several limitations. First, the term “true” of each. Accordingly, for illustrative purpose we will restrict
score can be confusing on several counts. When applied in our commentary to a fully crossed design that is frequently
a reliability context, the true score does not comment on reported and of interest to clinicians and investigators.
the extent to which a measure assesses what it is intended For a fully crossed design, all objects of measurement are
to measure (i.e., its meaning when applied in a validity assessed by all levels of all facets. Once again, suppose
context). Also, an examinee may have different true scores the universe of admissible observations consisted of raters
depending on the study design. For example, the apparent and trials. An investigator conducted a study where two
true score for an examinee may be different for an inter-rater raters each performed three trials on all of the objects
study design compared to a inter-trial study design. A second of measurement (patients). This fully crossed design is
limitation concerns the interpretation of the error term. represented as “patients X raters X trials”. Seven sources of
Although in theory it represents random measurement error, variance can be identified from this study design: patients
there is no way of distinguishing whether this assumption (σ p2 ), raters (σr2 ), trials (σt2 ); the two-way interaction of
patients and raters (σ pr 2 ), patients and trials (σ 2 ), raters and
is true. Furthermore, like the true scores, it is likely that pt
the magnitude of measurement error will be different for trials (σrt2 ); the three-way interaction of patients and raters
2
different study designs. Finally, CTT does not provide a and trials (error, σ prt ). These variance components can be
coherent method for optimizing a measurement process. For used to calculate generalizability coefficients (G-coefficients)
example, an investigator might be interested in determining that are roughly equivalent to R. The equivalent G-coefficient
whether a greater gain in reliability could be achieved for an inter-rater reliability is
by increasing the number of raters or by increasing the
number of assessments by a single rater. Applying CTT, the σ p2 + σt2 + σ pt
2
investigator would conduct two studies. For the results of Ginter-rater = , (2)
σ p2 + σr2 + σt2 + σ pr
2 2
+ σ pt + σrt2 + σ prt
2
each study, the investigator could apply the Spearman-Brown
prophecy formula to estimate the impact of altering the and the equivalent G-coefficient for inter-trial reliability is
number of raters or the number of trials. However, there is no

elegant method for combining the results from these studies σ p2 + σr2 + σ pr
2
to determine whether it is better to increase the number Ginter-trial = 2 . (3)
of raters or to increase the number of trials. Collectively, σ p + σr2 + σt2 + σ pr
2 2
+ σ pt + σrt2 + σ prt
2
these shortcomings led to the development of generalizability

theory (G-theory) [6]. Having identified the variance components from a single
G-theory differs from CTT as summarized in Table 1 G-study, the investigator would then apply these results to
and builds on it in the following ways [7]. Rather than guide decisionmaking concerning the optimal measurement
focusing on a “true” score, the G-theory comments on a strategy. This type of study is referred to as a Decision study
“universe” score. The universe score represents the mean (D-study). A D-study is similar to applying the Spearman-
score for an examinee over all conditions of interest to the Brown prophecy formula; however, with a D-study it is
clinician/investigator. These conditions define the universe possible to examine the impact of varying the number of
of admissible observations. The term “facet” is used to raters and number of trials simultaneously.
describe the conditions of measurement. Thus, in the
previous example, the universe of admissible observations 2. Examplar Application of G-Theory
includes raters and trials. The term “population” is used to
describe the objects of measurement. Having identified the In our clinical research setting, we were interested in
population and facets of interest, the next step is to conduct designing a study involving measurement of spine curvatures
ISRN Rheumatology 3
in postmenopausal women with osteoporosis. Women with best posture throughout the procedure. Each rater followed
osteoporosis are susceptible to deformities in the axial a standardized protocol to acquire triplicate measurements
skeleton including hyperkyphosis and flattened or accen- using the digital inclinometer and the flexicurve ruler.
tuated lumbar lordosis [9]. Clinical practice guidelines for
rehabilitation of women with spine osteoporosis include 3.2.1. Digital Inclinometer. A digital inclinometer (Saunder’s
postural assessment and correction of abnormal spinal digital inclinometer, Empi Therapy Solutions) was used
curvatures [10]. The American Physical Therapy Association according to the manufacturer’s recommended procedure
Section on Geriatrics recommends measuring kypholordosis [15] to measure joint angle at the cervicothoracic, thora-
using a surveyor’s flexicurve ruler [11]. Measuring change in columbar, and lumbosacral junctions as described here in
kyphosis is important since hyperkyphosis is associated with brief. The arch attachment was fixed to the inclinometer,
increased spinal loads which increase the risk for subsequent and the rater held this portion of the inclinometer when
fracture [12], and women with a kyphotic index ≥13 have zeroing the instrument and taking all measurements. The
reduced cardiovascular fitness, muscle strength, and physical following three landmarks were palpated and marked with
function [13, 14]. Although less studied, assessment of small, circular stickers: the C7-T1 interspace (CT), the T12-
lumbar lordosis is also important in this patient group given L1 interspace (TL), and the sacral midpoint from which
that prescription of certain orthoses (e.g., the PTS brace) is the lumbosacral interspace (LS) was identified approximately
contraindicated in those with flattened lordotic curvatures 3.0 cm superiorly. After landmarking, the inclinometer was
due to the loads imparted to this region of the axial placed on a flat vertical surface and the digital reading was
skeleton. Thus, reliable measurement of spine curvatures set to zero degrees. The inclinometer was initially placed at
aids in the classification of women with postmenopausal CT, the angle was then read and recorded by a third person,
osteoporosis at increased risk for fracture, prescription of and the inclinometer was zeroed; the inclinometer was placed
appropriate bracing, and ongoing monitoring of progression at TL, the angle was read and recorded by a third person, the
and response to therapeutic interventions aimed to improve inclinometer was zeroed, and the inclinometer was placed at
abnormal postures. To plan our future study, a pilot study LS, the angle was read and recorded by a third person. The
was needed to evaluate and optimize the reliability of values entire measurement procedure was repeated three times in a
obtained using two common clinical methods for assessing row by each of the two raters who were blinded to the results.
spine curvatures.
Therefore, our purpose was to illustrate the application
of the tools of the G-theory to investigate the inter-trial and 3.2.2. Flexicurve Ruler. A 61-cm long flexicurve ruler (Arts
inter-rater reliability of spine curvature measures in post- Supply Store, Hamilton, ON) was used according to the
menopausal women with osteoporosis of the spine using two instructional CD distributed by the American Physical
common methods—the digital inclinometer and the flexi- Therapy Association Geriatrics Division [11]. The spinous
curve ruler, in order to establish an optimal measurement process of the seventh cervical vertebra (C7) and the LS
protocol. For comparison, the inter-trial and inter-rater interspace were palpated and marked with small, circu-
reliability of these measures were also determined using CTT. lar stickers. The flexicurve ruler was molded along the
participant’s spine, making sure the shape of the thoracic
3. Methods and lumbar curves was retained and that there were no
spaces between the participant’s skin and flexicurve ruler.
3.1. Participants. Nine women were recruited through a Marks were placed on the flexicurve ruler to correspond
local osteoporosis clinic. Women were eligible for inclusion with the C7 mark superiorly and the LS interspace mark
in the study if they were 60 years of age or older, were inferiorly. The flexicurve ruler was carefully removed from
postmenopausal (self-reported absence of menses for more the participant’s spine and placed onto plain white graph
than 1 year), were clinically diagnosed with osteoporosis by paper. The participant’s study identification number, date,
a physician, and had a history of one or more vertebral and measurement number were recorded at the top of the
fracture. Participants were excluded from the study if they graph paper. The C7 spinous process and LS interspace
were not community ambulators, had cognitive difficulties, marks on the ruler were placed along the same vertical
were unable to understand written or spoken English, line. The side of the flexicurve ruler that was contacting the
or had a vertebral fracture within three months prior participant’s skin was traced onto the paper. After tracing the
to commencement of the study. The study protocol was spine curvature on the graph paper, the flexicurve ruler was
approved by our institutional Research Ethics Review Board, straightened and the flexicurve ruler procedure was repeated
and all participants provided written informed consent prior three times in a row by each rater.
to the start of the study. The traced curves were landmarked such that a vertical
line was drawn to connect the C7 mark (most superior
3.2. Spine Curvature Measurements. During a single visit, point), and the LS interspace mark (most inferior point) and
spine curvatures were measured by two raters using two a perpendicular line was drawn at the TL level. For each trial,
different measurement devices. Clothing covering the back KI was calculated according to the following formula:
and footwear were removed to ensure accurate identification
of bony landmarks and consistent standing posture. Par- (thoracic width × 100)
KI = , (4)
ticipants were instructed to stand erect and maintain their thoracic length
4 ISRN Rheumatology
where thoracic width is the greatest width from the thoracic Table 2: Characteristics of 9 postmenopausal women with osteo-
curve to the vertical line and thoracic length is the distance porosis of the spine.
from the C7 mark to the junction of the thoracic and lumbar
curves. Minimum,
Variable Mean (SD)
For each trial, LI was calculated according to the maximum
following formula:
Age (years) 71.6 (8.9) 63, 76
(lumbar width × 100)
LI = , (5) Height (cm) 156.1 (8.7) 147.2, 162
lumbar length
Weight (kg) 71.2 (24.2) 59.4, 94
where lumbar width is the greatest width from the lumbar
curve to the vertical line joining C7 and the LS interspace, Cervicothoracic angle 36.1 (9.99)
and lumbar length is the distance from the junction of the 17.5, 49.2
(degrees)a
thoracic and lumbar curves to the LS interspace.
Thoracolumbar angle 51.4 (13.72) 27.2, 72.0
(degrees)a
3.3. Raters. The raters, an undergraduate student with no
prior experience using either method of measurement and a Lumbosacral angle 31.9 (9.17) 15.0, 50.2
physiotherapist with minimal prior experience using a digital (degrees)a
inclinometer and no prior experience with the flexicurve Kyphotic Indexb 13.2 (5.07) 5.8, 19.5
ruler, received brief training. The user’s manual for the digital
inclinometer [15] was studied, and an instructional CD on Lordotic Indexb 13.9 (3.22) 9.0, 18.2
how to use a flexicurve ruler to measure spine curvatures [11] a
calculated as mean of the average values acquired by each of the two raters
was viewed by each tester. Practical experience was gained for each subject using the digital inclinometer.
b segment width × 100/segment length; calculated as mean of the average
by completing the measurement protocols during two mock
values acquired by each of the two raters for each subject using the
trials prior to the start of the pilot study.
flexicurve ruler.
3.4. Statistical Analyses. Descriptive statistics were calculated

using SPSS v18 (www.spss.com). G-theory was applied using Table 4 compares and contrasts the estimates of variance
G String III version 5.4.2 for Windows [16]. First, a G- components that are determined for measures of KI using
study was completed to estimate G-coefficients for the overall G-theory and CTT. Both methods partition variance due
variation that can be attributed to the sources of variation to patients, however, the error variance in CTT includes
(called facets which in this case are the patients, the trials other sources of variance depending upon the measurement
and the raters) and their interactions and the proportions design. When assessing inter-rater reliability, the error
of variation attributed to trials and raters. Follow-up D- variance component includes variance due to trial. When
studies were performed to identify the optimal measurement assessing inter-trial reliability, the error variance component
protocol for obtaining reliable measures of spine curvatures includes variance due to rater.
by varying the number of raters and the number of Table 5 shows that the estimates of inter-trial and inter-
trials. G-coefficients ≥0.80 were considered desirable. For rater reliability of the spine curve measures are comparable
comparison, CTT was also applied. Inter-trial reliability was whether using G-theory or the CTT. The inter-trial reliability
determined for each rater based on variance components for was high for all measures and inter-rater reliability was
between- and within-subject factors, and the average of the greatest for KI.
two values is reported. Inter-rater reliability was determined Data from the G-study were used to establish a reliable
for each trial based on variance components for between- measurement protocol through D-studies. Figures 1(a) and
and within-subject factors and the average of the three 1(b) illustrate how the inter-trial reliability changes with
values is reported. Absolute reliability of each spine curvature increasing numbers of trials when varying numbers of raters
measure was also determined as the standard error of the perform the measures. For a given rater, all measures are
measurement (SEM) calculated as the square-root of the reliable. Minimal gains in reliability are achieved when
mean square estimate for the error term determined using performing more than 1 trial using the digital inclinometer
G-theory and CTT. (Figure 1(a)) and when performing more than 3 trials
using the flexicurve ruler (Figure 1(b)). Figures 1(c) and
4. Results 1(d) illustrate how the inter-rater reliability changes with
increasing numbers of raters when different numbers of
The characteristics of the patients are summarized in Table 2. trials are performed. Measures of CT, TL, and LS angle
The average spine curvature measures acquired by each of have acceptable reliability when measured by 5, 2, and 3
the raters are shown in Table 3. Six of the nine women in raters, respectively, and there is minimal improvement in
our convenience sample exceeded the clinical cutpoint for reliability when more than 1 trial is completed by each
hyperkyphosis (KI ≥ 13) according to measures acquired rater (Figure 1(c)). By comparison, measures of KI and LI
by at least one of the raters. All women were living acquired in duplicate by two raters have acceptable reliability
independently in the community. (Figure 1(d)).
ISRN Rheumatology 5
Table 3: Mean (SD) spine curvature values over 3 trials acquired by 2 raters in 9 women with spine osteoporosis.
Cervicothoracic anglea Thoracolumbar anglea Lumbosacral anglea Kyphotic indexb Lordotic indexb
Patient
Rater 1 Rater 2 Rater 1 Rater 2 Rater 1 Rater 2 Rater 1 Rater 2 Rater 1 Rater 2
1 51.0 (2.6) 41.7 (0.6) 77.3 (1.5) 66.7 (4.0) 34.0 (2.6) 28.3 (4.9) 16.5 (0.5) 16.8 (1.6) 13.9 (0.2) 11.4 (3.3)
2 48.7 (12.7) 16.8 (1.0) 36.0 (11.3) 27.0 (2.0) 44.3 (4.0) 39.7 (2.1) 6.6 (0.6) 47.0 (1.7) 19.6 (1.0) 7.4 (0.7)
3 41.0 (0.0) 22.7 (0.6) 47.7 (1.2) 47.3 (0.6) 20.7 (1.5) 38.7 (1.2) 12.3 (1.1) 12.9 (0.4) 9.4 (1.2) 10.2 (0.5)
4 18.7 (2.1) 16.3 (1.2) 28.7 (3.2) 25.7 (2.1) 30.0 (0.0) 31.7 (1.2) 5.7 (0.7) 5.9 (1.1) 11.7 (0.4) 12.3 (1.2)
5 42.0 (3.5) 42.3 (1.5) 51.0 (5.0) 65.7 (1.5) 22.0 (3.6) 36.0 (2.0) 15.6 (1.7) 18.4 (1.6) 17.1 (2.3) 17.0 (1.4)
6 42.7 (3.8) 55.7 (1.5) 55.7 (3.5) 77.0 (2.6) 31.0 (2.0) 33.7 (1.5) 18.6 (0.5) 20.4 (1.1) 16.3 (0.7) 14.4 (0.7)
7 28.7 (1.5) 34.7 (2.1) 39.0 (1.0) 44.7 (2.1) 16.0 (1.7) 14.0 (1.0) 8.8 (1.4) 7.6 (0.8) 8.0 (2.3) 9.9 (1.4)
8 28.3 (0.6) 40.0 (1.0) 45.3 (0.6) 57.0 (1.7) 33.0 (1.0) 29.0 (2.0) 13.4 (0.8) 15.0 (0.6) 13.5 (1.0) 16.3 (0.7)
9 39.0 (3.6) 47.0 (2.0) 54.0 (3.6) 59.3 (3.1) 38.0 (2.6) 38.0 (1.0) 16.2 (0.6) 19.3 (1.6) 15.8 (0.9) 16.5 (1.2)
a
measured using digital inclinometer, degrees
b measured using flexicurve ruler.
Table 4: Estimates of variance componentsa for Kyphotic index using G-theory and classical test theory.
Classical Test Theory σ 2

Variance component G-theory σ 2
Rater 1 Rater 2 Trial 1 Trial 2 Trial 3
Patient 25.263 21.227 30.303 23.593 25.733 25.233
Rater 0.488 — — — — —
Trial 0.083 — — — — —
Patient ∗ rater 0.563 — — — — —
Patient ∗ trial 0 — — — — —
Rater ∗ trial 0.098 — — — — —
Error 1.023 0.919 1.256 1.901 2.974 1.641
a
estimates having negative values are set to zero.
Table 5: Reliability of spine curvature measures acquired in triplicate by 2 raters in 9 postmenopausal women with osteoporosis of the spine
estimated using generalizability theory (G-Theory) and classical test theory (CTT).
Inter-trial reliability Inter-rater reliability

Measures of spine curvature
G-theory CTT G-theory CTT
Cervicothoracic angle
Reliability coefficient 0.960 0.960 0.566 0.601
SEM (degrees) 2.281 2.040 7.505 7.091
Thoracolumbar angle
SEM (degrees) 3.090 2.703 7.868 7.786
Lumbosacral angle
SEM (degrees) 2.498 2.367 6.223 6.213
Kyphotic index
SEM 1.097 1.040 1.474 1.461
Lordotic index
SEM 1.427 1.390 1.794 1.701
SEM: standard error of the measurement provides an estimate of absolute reliability and is expressed in the same units as the measure.
6 ISRN Rheumatology
1 1
0.8 0.8
G-cofficient
G-cofficient
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Number of trials Number of trails
Cervicothoracic angle-1 rater Kyphotic index-1 rater
Cervicothoracic angle-2 raters Kyphotic index-2 rater
Cervicothoracic angle-3 raters Kyphotic index-3 raters
Thoracolumbar angle-1 rater Lordotic index-1 rater
Thoracolumbar angle-2 raters Lordotic index-2 rater
Thoracolumbar angle-3 raters Lordotic index-3 raters
Lumbosacral angle-1 rater
Lumbosacral angle-2 raters
Lumbosacral angle-3 raters
(a) (b)
1 1
0.8 0.8
G-cofficient
G-cofficient
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 1 2 3 4 5
Number of raters Number of raters
Cervicothoracic angle-1 trail Kyphotic index-1 trial
Cervicothoracic angle-2 trials Kyphotic index-2 trials
Thoracolumbar angle-1 trial Lordotic index-1 trial
Thoracolumbar angle-2 trials Lordotic index-2 trials
Lumbosacral angle-1 trial
Lumbosacral angle-2 trials
(c) (d)
Figure 1: The results of the design study for optimizing inter-trial reliability are illustrated in which the influence of having different numbers
of raters is shown as a function of the number of trials for (a) spine curvature angles (degrees) measured using the digital inclinometer and
(b) kyphotic index and lordotic index measured using the flexicurve ruler. The results of the design study for optimizing inter-rater reliability
are illustrated in which the influence of performing different numbers of trials is shown as a function of raters for (c) spine curvature angles
(degrees) measured using the digital inclinometer, and (d) kyphotic index and lordotic index measured using the flexicurve ruler.
ISRN Rheumatology 7
5. Discussion determine the results following more extensive training of

novice raters, inclusion of an expert rater, and verification of
This study aimed to illustrate the application of the tools landmarks identified by each rater. Nonetheless, these results
of G-theory to establish a measurement protocol with opti- provide estimates of reliability that can be generalized to
mal inter-trial and inter-rater reliability for assessing spine assessors with minimal levels of experience assessing posture
curvatures in postmenopausal women with osteoporosis of and demonstrate that when the same rater measures spine
the spine. Estimates of inter-trial and inter-rater reliability curvatures, the measures are consistent.
of spine curvature measures acquired using the digital incli-
nometer and flexicurve ruler were similar whether using G-
theory or CTT approaches. G-Theory provides an advantage 6. Conclusions
in utilizing even small datasets to explore the effect of We intend the results of this study to be used at the
changing aspects of the study design (e.g., number of raters discretion of clinicians and investigators who are using
and number of trials) in order to identify the optimal mea- measures of spine curvatures obtained using the flexicurve
surement protocol for a particular clinical or research setting. ruler or digital inclinometer in the clinical assessment of
Reliability of outcome measures needs to be established individuals with osteoporosis. Furthermore, this approach
for each specific clinical environment or research labo- may be replicated to identify other measurement protocols
ratory. In our example, all measures of spine curvature that optimize reliability. Ultimately a suitable compromise
had acceptable reliability (high reliability coefficients and between a feasible measurement protocol and acceptable
low SEM) when performed by the same rater in triplicate reliability for each particular clinical or research setting must
(Table 3). No literature was found describing the reliability be identified. G-theory provides an alternative to CTT that
of measures of spine curvatures in postmenopausal women enables efficient identification of an optimal measurement
with osteoporosis of the spine using the digital inclinometer. protocol based on data collected in a reliability study having
However, KI inter-trial reliability (0.96) and inter-rater reli- a single study design.
ability (0.92) were comparable to or exceeded that reported
by investigators using CTT (Lundon et al. [3]: 0.86 ≤ ICC
≤ 0.97; Arnold et al [4]: 0.86 ≤ ICC ≤ 0.91). These findings Acknowledgments
suggest that brief training was adequate for acquiring reliable
measures of KI. For all the measures, it would be preferable The authors thank the participants who volunteered for their
to have the same rater perform the measurements in women study and Leslie Beaumont for her assistance in recording
with osteoporosis of the spine whether being followed in the the spine curvature measures for each rater. This study
clinic or enrolled in a longitudinal research study. was funded in part by the Natural Science and Engineering
Inter-rater reliability for LI measures was adequate given Research Council of Canada (NSERC)—Discovery Grant
the G-coefficient of 0.75 in combination with a low SEM (NJM).
(1.72). However, inter-rater reliability of spine curvature
measures acquired using the inclinometer was not adequate References
with G-coefficients varying from 0.57 to 0.73 and SEM
varying from 6.22 to 7.87 degrees. The use of D-studies [1] J. C. Nunnally, Psychometric Theory, McGraw-Hill, Toronto,
provided an efficient way to optimize the measurement Canada, 1978.
process. We determined that inter-rater reliability could be [2] S. Messick, “Validity,” in Educational Measurement, R. L. Linn,
improved satisfactorily for the TL angle and LS angle by Ed., ORYZ Press, Phoenix, Ariz, USA, 3rd edition, 1993.
[3] K. M. A. Lundon, A. M. W. Y. Li, and S. Bibershtein, “Interrater
having 5 raters acquire the measures 4 times. Scenarios for
and intrarater reliability in the measurement of kyphosis in
optimizing inter-rater reliability of CT angle fell outside the
postmenopausal women with osteoporosis,” Spine, vol. 23, no.
realm of clinical feasibility. We did not have to conduct 18, pp. 1978–1985, 1998.
different studies to determine whether greater gain in [4] C. M. Arnold, B. Beatty, E. L. Harrison, and W. Olszynski,
reliability would be achieved by increasing number of raters “The reliability of five clinical postural alignment measures for
or increasing the number of assessments. We were able to women with osteoporosis,” Physiotherapy Canada, vol. 52, pp.
acquire this information based on measures obtained in only 286–294, 2000.
9 women representative of our target study population. [5] M. R. Hinman, “Interrater reliability of flexicurve postural
A limitation of this study may be the inclusion of measures among novice users,” Journal of Back and Muscu-
assessors with varying levels of clinical experience. Neither loskeletal Rehabilitation, vol. 17, no. 1, pp. 33–36, 2003.
assessor had used the flexicurve ruler before, however, the [6] L. J. Cronbach, R. Nageswari, and G. C. Gleser, “Theory of
physiotherapist had over 20 years of experience performing generalizability: a liberation of reliability theory,” The British
Journal of Statistical Psychology, vol. 16, pp. 137–163, 1963.
physical assessments in general clinical practice. By building
[7] R. L. Brennan, Statistics for Social Science and Public Policy:
the different experience levels into the study design, we Generalizability Theory, Springer, New York, NY, USA, 2001.
could illustrate nonzero sources of variance. However, the [8] R. J. Shavelson, N. M. Webb, and G. L. Rowley, “Generalizabil-
mean spine curvature measures acquired by each rater ity theory,” American Psychologist, vol. 44, no. 6, pp. 922–932,
varied considerably, particularly when using the digital 1989.
inclinometer, and this study was not designed to determine [9] E. Itoi, “Roentgenographic analysis of posture in spinal
the accuracy of the measures. It would be interesting to osteoporotics,” Spine, vol. 16, no. 7, pp. 750–756, 1991.
8 ISRN Rheumatology
[10] F. J. Bonner Jr., M. Sinaki, M. Grabois et al., “Health

professional’s guide to rehabilitation of the patient with
osteoporosis,” Osteoporosis International, vol. 14, supplement
2, pp. S1–S22, 2003.
[11] C. Lindsey and N. Bookstein, “Kypholordosis Measurement
Using a Flexible Curve (Instructional CD),” American Physical
Therapy Association Section on Geriatrics, 2007.
[12] A. M. Briggs, J. H. Van Dieën, T. V. Wrigley et al., “Thoracic
kyphosis affects spinal loads and trunk muscle force,” Physical
Therapy, vol. 87, no. 5, pp. 595–607, 2007.
[13] R. K. Chow and J. E. Harrison, “Relationship of kyphosis to
physical fitness and bone mass on post-menopausal women,”
American Journal of Physical Medicine, vol. 66, no. 5, pp. 219–
227, 1987.
[14] D. M. Kado, M. H. Huang, E. Barrett-Connor, and G.
A. Greendale, “Hyperkyphotic posture and poor physical
functional ability in older community-dwelling men and
women: the Rancho Bernardo Study,” Journals of Gerontology
A, vol. 60, no. 5, pp. 633–637, 2005.
[15] H. D. Saunders, “Saunder’s digital inclinometer: user’s guide,”
United States, Empi Therapy Solutions, 2008.
[16] R. Bloch, “G String III [computer program]. Version 5.4.2 for
Windows. [Hamilton, ON:] Accompanied by: 1 user man-
ual (pdf),” 2010, http://www.fhs.mcmaster.ca/perd/download/
g string
MEDIATORS of
INFLAMMATION
The Scientific Gastroenterology Journal of

World Journal
Hindawi Publishing Corporation
Research and Practice
Diabetes Research
Disease Markers
http://www.hindawi.com Volume 2014
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of International Journal of

Immunology Research
Endocrinology
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Submit your manuscripts at

http://www.hindawi.com
BioMed
PPAR Research
Research International
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of
Obesity
Evidence-Based
Journal of Stem Cells Complementary and Journal of
Ophthalmology
International
Alternative Medicine
Hindawi Publishing Corporation Hindawi Publishing Corporation
Oncology
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Parkinson’s
Disease
Computational and
Mathematical Methods
in Medicine
Behavioural
Neurology
AIDS
Research and Treatment
Oxidative Medicine and
Cellular Longevity
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Research Article: Norma J. Macintyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

Uploaded by

Copyright:

Available Formats

Research Article: Norma J. Macintyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Article: Norma J. Macintyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

Uploaded by

Copyright:

Available Formats

International Scholarly Research Network

Norma J. MacIntyre, Lisa Bennett, Alison M. Bonnyman, and Paul W. Stratford

Correspondence should be addressed to Norma J. MacIntyre, [email protected]

Received 1 December 2010; Accepted 2 January 2011

Academic Editors: A. Adebajo and K. Uusi-Rasi

1. Introduction a unitless quantity. As such, it comments on the relative,

these shortcomings led to the development of generalizability

3.4. Statistical Analyses. Descriptive statistics were calculated

Classical Test Theory σ 2

Inter-trial reliability Inter-rater reliability

5. Discussion determine the results following more extensive training of

[10] F. J. Bonner Jr., M. Sinaki, M. Grabois et al., “Health

The Scientific Gastroenterology Journal of

Journal of International Journal of

Submit your manuscripts at

You might also like