KWW 098

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

American Journal of Epidemiology Vol. 184, No.

10
© The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of DOI: 10.1093/aje/kww098
Public Health. All rights reserved. For permissions, please e-mail: [email protected] Advance Access publication:
October 21, 2016

Practice of Epidemiology

Comparison of Standardization Methods for the Harmonization of Phenotype


Data: An Application to Cognitive Measures

Lauren E. Griffith*, Edwin van den Heuvel, Parminder Raina, Isabel Fortier, Nazmul Sohel,

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


Scott M. Hofer, Hélène Payette, Christina Wolfson, Sylvie Belleville, Meghan Kenny,
and Dany Doiron
* Correspondence to Dr. Lauren E. Griffith, Department of Clinical Epidemiology and Biostatistics, 1280 Main Street West,
MIP Suite 309A, Hamilton, ON, Canada L8S 4K1 (e-mail: griffi[email protected]).

Initially submitted June 3, 2015; accepted for publication September 7, 2016.

Standardization procedures are commonly used to combine phenotype data that were measured using differ-
ent instruments, but there is little information on how the choice of standardization method influences pooled es-
timates and heterogeneity. Heterogeneity is of key importance in meta-analyses of observational studies
because it affects the statistical models used and the decision of whether or not it is appropriate to calculate a
pooled estimate of effect. Using 2-stage individual participant data analyses, we compared 2 common methods
of standardization, T-scores and category-centered scores, to create combinable memory scores using cross-
sectional data from 3 Canadian population-based studies (the Canadian Study on Health and Aging (1991–
1992), the Canadian Community Health Survey on Healthy Aging (2008–2009), and the Quebec Longitudinal
Study on Nutrition and Aging (2004–2005)). A simulation was then conducted to assess the influence of varying
the following items across population-based studies: 1) effect size, 2) distribution of confounders, and 3) the
relationship between confounders and the outcome. We found that pooled estimates based on the unadjusted
category-centered scores tended to be larger than those based on the T-scores, although the differences were
negligible when adjusted scores were used, and that most individual participant data meta-analyses identified
significant heterogeneity. The results of the simulation suggested that in terms of heterogeneity, the method of
standardization played a smaller role than did different effect sizes across populations and differential confound-
ing of the outcome measure across studies. Although there was general consistency between the 2 types of
standardization methods, the simulations identified a number of sources of heterogeneity, some of which are
not the usual sources considered by researchers.

cognition; harmonization; individual participant data; meta-analysis; standardization

Abbreviations: CCHS, Canadian Community Health Survey; CSHA, Canadian Study on Health and Aging; IPD, individual
participant data; NuAge, Quebec Longitudinal Study on Nutrition and Aging.

To explore many important scientific questions (e.g., psychological, lifestyle, and health status data, on hundreds
understanding the influence of lifestyle, psychological, of thousands of participants. Although many of these indi-
social, nutritional, or genetic factors on disease or pheno- vidual cohorts and data sources are large, multiple data sets
typic outcomes), researchers need to link both genotype and are sometimes required when studying rare outcomes or
phenotype data. Currently, investigators for large national gene-environment interactions or when exploring the influ-
and international cohorts, such as those from the Canadian ence of geographical and cultural variations in exposure-
Longitudinal Study on Aging (CLSA) (1), UK Biobank (2), outcome relationships. To maximize the utility of publicly
and LifeLines (3) from the Netherlands, are collecting a funded projects and increase the speed of scientific discov-
wide range of information, including biological, social, ery, there has been a worldwide push to combine multiple

770 Am J Epidemiol. 2016;184(10):770–778


Standardization Approaches to Harmonization 771

data sources in order to explore important research ques- The Rey Auditory Verbal Learning Test (13), a 15-item
tions (4). word-learning test, was used to measure short-term memory
The current gold standard for analyzing multiple data in the CSHA and the CCHS. The test is one of the most
sources as part of a systematic review is individual participant widely used neuropsychological tests (14) and generally has
data (IPD) meta-analysis, because it provides flexibility with good test-retest reliability (0.51 ≤ r ≤ 0.86) (15). The
regard to the types of analyses that can be done and thus Buschke Cued Recall Procedure tests memory under condi-
provides reliable results (5). It also increases the power to tions of free recall (hereafter referred to as the Free Buschke
explore differential treatment effects in randomized controlled test) and cued recall (hereafter referred to as the Total
trials and allows for adjustments of confounding factors in Buschke test). The CSHA used English and French versions
meta-analyses of observational studies; however, it is time of the 12-item Buschke memory test (16), and NuAge used
consuming and costly to conduct (6). Combining IPD is also a French version of the 16-item Free and Cued Selective
scientifically and technically very challenging. Ensuring data Reminding Test adapted from Grober and Buschke (17).
compatibility and content equivalence through harmonization Free recall and cued recall have acceptable sensitivity
allows integration of information from different studies/data- (62%–100%) and specificity (94%–100%) when comparing

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


bases and can thereby permit pooling of data from a large individuals with Alzheimer disease to healthy controls (18).
number of studies to obtain valid results. It also allows one We also used the Health Utility Index as an indirect measure
to properly explore the similarities and discrepancies across of memory. The Health Utility Index has been used in many
studies, jurisdictions, or countries and improve the validity settings and has been shown to have strong validity and reli-
and reliability of research results. ability (19, 20). Furthermore, the cognition subscale of the
Deriving combinable phenotypic variables using algo- Health Utility Index has been shown to be correlated with
rithmic methods, for example, creating common categories the Rey Auditory Verbal Learning Test and other neuropsy-
of weight from continuous data, is fairly straightforward. chological tests (21).
There is less research, however, on how best to harmonize Potential confounding variables were selected from each
complex constructs such as cognition measures. We con- of the 3 data sets based on the demonstrated relationship with
ducted an environmental scan of meta-analyses in the area cognition and physical activity in the literature (22) and were
of cognition to explore the current practices of harmoniza- endorsed by a technical expert panel (23). These variables
tion and found that in most aggregate data meta-analyses, included sociodemographic and lifestyle factors (age, sex,
researchers used standardization to combine cognitive mea- educational level, income, country of birth, smoking status,
sures across studies (7). Standardization methods are often and alcohol consumption) and anthropometric and health
utilized because they are easily implemented and do not conditions (height, weight, body mass index, hip circumfer-
require complex modeling, such as latent variable analysis. ence, heart rate, diastolic and systolic blood pressures, and
Although many studies used these methods, we could not self-reported diagnosis of high blood pressure, stroke, diabe-
find general guidelines for the selection of which specific tes, or myocardial infarction and family history of high blood
standardization method to use or find information on their pressure, stroke, diabetes, or myocardial infarction).
performances in IPD meta-analyses. We therefore undertook An algorithmic approach using DataSchema and Harmo-
a case study and simulation to explore the influence of com- nization Platform for Epidemiological Research was used to
monly used standardization methods on harmonization of harmonize physical activity and potential confounding varia
cognition measures in IPD analyses. We chose to focus our bles (24, 25). In short, a priori rules were used to determine
analysis on the relationship between physical activity and whether the information collected in a given study could be
memory because of known associations (8, 9), and we used used to generate a variable that would be common among
data from 3 large Canadian studies to examine how stan- all data sets. Selection and definition of variables, rule crea
dardization methods influence the overall estimates of effect tion, and decisions about whether or not a variable could be
and measures of heterogeneity in a 2-stage IPD meta-analysis. harmonized were based on protocols involving iteration
We further explored the robustness of our results using a between domain experts and a validation panel. The compa
simulation study. In the present study, we provide evidence tibility of each study’s data was assessed on a 3-level scale
of the influence of using easily implemented procedures for of matching quality: complete, partial, or impossible match.
harmonization of complex constructs. Variables that were a complete or partial match in all 3 data
sets were included. There were complete or partial matches
across all studies for 14 targeted variables: physical activity,
METHODS age, sex, income, educational level, country of birth, height,
weight, body mass index, alcohol consumption, diabetes,
We included individual-level data from following 3 high blood pressure, and myocardial infarction. The remain-
Canadian studies: the Canadian Study on Health and Aging ing variables could not be included because they were not
(CSHA) (10), the Canadian Community Health Survey on recorded across all studies or because they did not represent
Healthy Aging (CCHS) (11), and the Quebec Longitudinal the same information.
Study on Nutrition and Aging (NuAge) (12) (Web Table 1,
available at http://aje.oxfordjournals.org/). Each study pro- Statistical analyses
vided population-based data on adults who were 65 years
of age or older, including results from neuropsychological We studied 2 commonly used standardization methods,
tests and physical activity level. T-scores and category-centered scores, to create combinable

Am J Epidemiol. 2016;184(10):770–778
772 Griffith et al.

memory scores across studies in order to examine whether educational level. The relationships between the confounders
or not these approaches provided similar results in terms of and physical activity level were selected consistently across
overall effect estimates and measures of heterogeneity in a cohort studies (homogeneous associations). Memory scores
2-stage IPD meta-analysis. T-scores are dependent on the full were generated with latent variables that were generated per
underlying distribution of cognitive measures in each study cohort study, indicating the true memory ability of indivi-
and have been used to create norms and compare different cog- duals. Conditionally on the latent construct, we applied a
nitive measures on a common scale (26). Category-centered binomial distribution to simulate a sum score on memory.
scores use the mean and standard deviation for a common The latent variable was affected by age, sex, educational
demographically determined group (within studies) that is pre- level, and physical activity level. We simulated homoge-
sumed to be homogeneous with respect to the cognitive neous or heterogeneous associations between the latent vari-
measures to standardize or “center” the individual cognitive able memory and the 3 potential confounders (confounder
measures. More details about the standardization methods are association with memory = homogeneous or heterogeneous),
provided in Web Appendix 1. We applied the scores to our and we simulated a homogeneous or heterogeneous associa-
case study and also separately undertook a simulation study to

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


tions between physical activity and memory across cohort
examine the robustness of our case study findings. studies (effect size of physical activity = homogeneous or
Case study. T-scores were standardized with respect to heterogeneous). Details of the simulation study are provided
selected covariates (age, sex, and educational level) using lin- in Web Appendix 1 and Web Tables 2–4. We then applied a
ear regression analysis. Category-centered scores were stan- similar analysis approach, as in the case study. For each of
dardized with respect to a homogeneous subgroup with a the 8 possible scenarios, we generated an average effect size,
sufficient sample size across all studies. We standardized to the power, and the average I2. The simulation analyses were
the subgroup of female participants with a high educational conducted using SAS, version 9.2 (SAS Institute, Inc., Cary,
level and an age range of 70–74 years (Web Appendix 1). North Carolina) (36).
We conducted a 2-stage IPD meta-analysis in which
summary estimates were first created for each study and
then combined across studies using traditional aggregate RESULTS
data meta-analysis methods (27). We restricted the meta-
analyses to memory constructs that are presumed to be The average ages of the CCHS participants (73.2 years)
most similar. The Rey Auditory Verbal Learning Test and and NuAge participants (73.7 years) were younger than that
Free Buschke test both measure noncued recall, whereas of CSHA participants (79.7 years) (Table 1). The CSHA
the Total Buschke test is a cued test and more similar to participants tended to have a lower levels of education and
Health Utility Index because they are both susceptible to income (adjusted to 1992) than did the CCHS and NuAge
ceiling effects (28). Separate meta-analyses were conducted participants. In addition, fewer participants in CSHA re-
for each combination of compatible memory scores. Thus, ported being born in Canada. More CSHA participants re-
we used only the following combinations of scores from ported a low level of physical activity compared with CCHS
the CCHS, CSHA, and NuAge studies, respectively: Rey, and NuAge participants. The CCHS participants reported
Rey, and Free Buschke; Rey, Free Buschke, and Free high blood pressure and diabetes more often than did CSHA
Buschke; and Health Utility Index, Total Buschke, and or NuAge participants.
Total Buschke.
For each of the 3 combinations, we calculated effect sizes Combined data set analysis: 2-stage IPD meta-analysis
that were unadjusted, unadjusted and calculated using parti-
cipants with complete data for all potential confounders, and Table 2 and Web Table 5 present the meta-analysis re-
adjusted. We used Hedges’ g on the weighted mean differ- sults for the combinations of cognitive measures. The over-
ences of the T-scores, and category-centered scores between all estimated effect sizes were small, ranging from 0.07 to
participants reporting no or low physical activity and to 0.18. None of the Health Utility Index/Total Buschke/Total
those reporting moderate or high levels of physical activity Buschke summary estimates of association were statisti-
(29). We applied the random effects model (30, 31) and as- cally significant. The cognitive measure combination most
sessed heterogeneity with the Q statistic (32) and the I2 sta- likely to result in a statistically significant overall estimate
tistic (33). An I2 greater than 50% was considered to was the Rey/Rey/Free Buschke (4 of 6 comparisons). In
indicate substantial heterogeneity (34). Meta-analyses were most analyses, significant heterogeneity was found, and
conducted using MetaAnalyst 3.0 (Tufts Evidence Based none of the analyses had an I2 value less than 50%. Six of
Practice Centers, Medford, Massachusetts) (35). the 18 analyses had a P value greater than 0.05, and 1 had
Simulation study. The covariates age, sex, and educa- a P value greater than 0.10. The analyses with the least het-
tional level were generated independently and uncorrelated erogeneity were also associated with the Rey/Rey/Free
from each other separately for 3 cohort studies by using the Buschke combinations (4 of 6 analyses with P < 0.05). Of
normal and Bernoulli distributions. In this way, we could the 6 analyses that did not indicate statistically significant
generate populations that were homogeneous or heteroge- heterogeneity at the P < 0.05 level, 5 included T-scores, as
neous with respect to these 3 potential confounders. A 3-level did the 1 analysis that indicated a lack of heterogeneity at
ordinal variable for physical activity level was generated the P < 0.10 level. In general, the results for the adjusted
through the continuous logistic distribution in which the T-score and category-centered score analyses were more
mean level depended on the 3 covariates age, sex, and similar than were the unadjusted analyses.

Am J Epidemiol. 2016;184(10):770–778
Standardization Approaches to Harmonization 773

Table 1. Baseline Demographic and Health-Related Characteristics of Participants With Cognition Dataa, Canadian Community Health
Survey-Canadian Longitudinal Study on Aging (2008–2009), Canadian Study of Health and Aging (1991–1992), and Quebec Longitudinal
Study on Nutrition and Aging (2004–2005)

Study
Characteristic CCHS-CLSA (n = 7,107) CSHA (n = 1,730) NuAge (n = 432)
No. % Mean SD No. % Mean SD No. % Mean SD

Age, years 73.2 5.9 79.7 7.0 73.7 4.0


Age group, years
65–74 4,162 58.6 367 21.2 265 61.3
75–85 2,945 41.4 976 56.4 167 38.7
>85 387 22.4
Female sex 4,103 57.7 1,084 62.7 232 53.7

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


Highest level of education
Low (0–8 years) 1,342 19.0 841 48.9 66 15.3
Medium (9–12 years) 2,664 37.7 619 35.8 171 39.6
High (≥13 years) 3,055 43.3 270 15.6 195 45.1
Missing 46
Household income, $
<10,000 75 1.1 39 6.9 3 0.7
10,000–14,999 412 5.8 108 19.1 20 4.6
15,000–19,999 727 10.2 35 6.2 19 4.4
20,000–29,999 1,287 18.1 66 11.6 66 15.2
30,000–39,999 975 13.7 31 5.5 90 20.8
40,000–49,999 694 9.8 21 3.7 57 13.2
50,000–59,999 591 8.3 15 2.7 50 11.6
60,000–69,999 379 5.3 9 1.6 19 4.4
≥70,000 1,433 20.2 8 1.4 54 12.5
Preferred not to answer 981 13.8 81 14.3
Do not know 34 23.6
Missing 20 3.5 54
Not asked 1,163
Canadian 5,781 81.4 1,166 67.4 387 89.6
Ever/current alcohol useb 6,550 92.2 344 22.8 411 95.1
Level of physical activity
None 1,161 16.3 788 45.5 46 10.6
Low 777 10.9 203 11.7 45 10.4
Moderate to high 5,169 72.7 493 28.5 341 78.9
Missing 246
Heightb
Male 3,000 174.7 7.2 607 170.5 7.7 200 168.5 7.4
Female 4,075 160.4 6.4 993 157.3 7.4 232 155.4 5.7
All 7,075 166.5 9.7 1,600 162.3 9.9 432 161.5 9.2
Weightb
Male 2,990 82.8 14.3 622 72.6 12.7 200 80.0 12.9
Female 4,011 68.6 13.8 1,032 60.3 12.5 232 66.4 12.8
All 7,001 74.6 15.7 1,554 64.9 13.9 400 72.7 14.5
Table continues

Am J Epidemiol. 2016;184(10):770–778
774 Griffith et al.

Table 1. Continued

Study
Characteristic CCHS-CLSA (n = 7,107) CSHA (n = 1,730) NuAge (n = 432)
No. % Mean SD No. % Mean SD No. % Mean SD

Chronic conditionsb
High blood pressure 3,993 56.2 614 35.5 206c 47.7
Stroke 283 4.0 226 13.5 0c
Diabetes 1,258 17.7 228 13.2 40c 9.3
Myocardial infarction 876 12.4 263 15.2 57c 13.4

Abbreviation: SD, standard deviation.


a
Shown are potential confounding variables for each of the 3 data sets. The compatibility of each study’s data was assessed on a 3-level scale

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


of matching quality: complete, partial, or impossible match. Variables that were a “complete” or “partial” match in all 3 data sets were included.
b
Total number may not add up to total sample because of missing values. Percentages were calculated excluding missing data.
c
Partial match.

Simulation study DISCUSSION

Adjustment for age, sex, and educational level increased In the present study, we explored whether 2 standardiza-
the effect sizes on average by approximately 4%–13%, tion methods, T- and category-centered scores, can influence
despite the fact that the category-centered scores and T-scores estimates of effect and heterogeneity when outcomes are
are essentially corrected for these 3 variables (Table 3). The measured using different scales or instruments. Researchers
increase was greater for the category-centered score than for conducting meta-analyses often use measures of heteroge-
the T-score; however, the 2 scores gave identical results neity, which may be defined as the proportion of total vari-
when the effect sizes were adjusted for the covariates age, ation in measured pooled risk estimates that is due to
sex, and educational level. between-study heterogeneity rather than to chance, as an
The effect sizes for all adjusted and unadjusted scores indication that findings across studies are consistent and
were affected by the different simulation settings (i.e., thus can be pooled. In the case study, there is a suggestion
homogeneous or heterogeneous: 1) effect size of the associ- that important heterogeneity may be masked by one’s
ation between physical activity and memory, 2) population choice of standardization procedure. When using a criterion
distribution of confounders, and 3) relationship between of I2 > 50%, all analyses indicated there was important het-
confounders and memory). Homogeneous and heteroge- erogeneity. When the criterion of PQ < 0.05 was used,
neous associations of physical activity and memory had the however, 6 of the 18 analyses indicated there was not sta-
largest influence in terms of change in the overall effect tistically significant heterogeneity; 5 of the 6 analyses
size. The reason is that the pooled estimates were not iden- involved the T-score. Because the T-scores are standard-
tical for these 2 settings. However, the association between ized to the same mean across studies, it was expected that
physical activity and memory for the different settings of the T-scores would reduce between-study heterogeneity
age, sex, and educational level should have been identical, when compared with the category-centered score, espe-
because that was consistent across all settings. Whether we cially in the unadjusted analyses. In fact, in the adjusted
pooled homogeneous or heterogeneous populations with analysis, the same results were found regardless of the
respect to the distribution of confounders (age, sex, and method of standardization.
educational level) had the least influence on the pooled es- In the case study, also we found that the effect estimates
timates. Different relationships between the confounders of physical activity on memory based on the unadjusted T-
and memory across studies also influenced the pooled re- score and category-centered score were similar, but the
sults because the heterogeneous setting was 6%–17% magnitudes of those using the category-centered scores
larger than the homogeneous setting. tended to be larger. In the adjusted analysis, these effect es-
The I2 clearly detects when the association between timates based on the category-centered scores and T-scores
physical activity and memory is different across studies. were nearly identical, and they were closer to the unad-
However, a large I2 is also observed when the association justed T-scores than were the unadjusted category-centered
between physical activity and memory is consistent across scores. This is supported by the simulation analysis and im-
studies. This occurs when the influences of the confounders plies that the method of standardization may be less impor-
on memory are different across studies. Populations that tant if standardized measures are adjusted for a common
are heterogeneous with respect to the distribution of con- set of important confounders. If only unadjusted analyses
founders have only a limited influence on the I2 compared are available, the T-scores may be preferable in terms of
with homogeneous populations. Power was also most influ- bias, because they are already adjusted for important con-
enced when the influence of confounders on memory was founders. It was interesting, however, that there was still
heterogeneous across populations. residual confounding, because the effect estimates based on

Am J Epidemiol. 2016;184(10):770–778
Standardization Approaches to Harmonization 775

Table 2. Summary Hedges’ g Values for the Weighted Mean Difference of Combinations of Memory Tests in People Who Reported No or
Low Physical Activity Compared With People Who Reported Moderate or High Levels of Physical Activitya, Canadian Community Health
Survey-Canadian Longitudinal Study on Aging (2008–2009), Canadian Study of Health and Aging (1991–1992), and Quebec Longitudinal
Study on Nutrition and Aging (2004–2005)

Q Statistic for
Study/Memory Test Given and Type of Outcome Hedges’ g 95% CI I2 P for Heterogeneity
Heterogeneity

Unadjusted
CCHS/RAVLT; CSHA/RAVLT; and NuAge/Free Buschke
T-score 0.12 0.01, 0.23 0.64 5.5 0.06
Category-centered score 0.16 0.01, 0.30 0.78 8.96 0.01
CCHS/RAVLT; CSHA/Free Buschke; and NuAge/Free Buschke
T-score 0.14 −0.03, 0.31 0.85 12.92 0.002

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


Category-centered score 0.18 −0.01, 0.36 0.88 16.46 <0.001
CCHS/HUI; CSHA/Total Buschke; and NuAge/Total Buschke
T-score 0.07 −0.04, 0.19 0.65 5.72 0.06
Category-centered score 0.10 −0.05, 0.24 0.80 9.76 0.008
Unadjusted Using Participants With Complete Data for All Potential Confounders
CCHS/RAVLT; CSHA/RAVLT; and NuAge/Free Buschke
T-score 0.12 0.01, 0.22 0.55 4.47 0.11
Category-centered score 0.16 0.01, 0.30 0.75 7.84 0.02
CCHS/RAVLT; CSHA/Free Buschke; and NuAge/Free Buschke
T-score 0.14 −0.03, 0.32 0.82 11.35 0.003
Category-centered score 0.18 0.0001, 0.36 0.85 13.37 0.001
CCHS/HUI; CSHA/Total Buschke; and NuAge/Total Buschke
T-score 0.07 −0.05, 0.19 0.64 5.51 0.06
Category-centered score 0.10 −0.05, 0.24 0.77 8.65 0.01
Adjusted Effect Estimates
CCHS/RAVLT; CSHA/RAVLT; and NuAge/Free Buschke
T-score 0.11 −0.02, 0.23 0.66 5.81 0.06
Category-centered score 0.11 −0.02, 0.23 0.66 5.81 0.06
CCHS/RAVLT; CSHA/Free Buschke; and NuAge/Free Buschke
T-score 0.14 −0.05, 0.32 0.85 13.38 0.001
Category-centered score 0.13 −0.05, 0.31 0.85 12.95 0.002
CCHS/HUI; CSHA/Total Buschke; and NuAge/Total Buschke
T-score 0.08 −0.05, 0.21 0.69 6.39 0.04
Category-centered score 0.08 −0.04, 0.20 0.67 6.10 0.047

Abbreviations: CCHS, Canadian Community Health Survey; CI, confidence interval; CSHA, Canadian Study of Health and Aging; Free
Buschke, Buschke Cued Recall Procedure under conditions of free recall; HUI, Health Utility Index; NuAge, Quebec Longitudinal Study on
Nutrition and Aging; RAVLT, Rey Auditory Verbal Learning Test; Total Buschke, Buschke Cued Recall Procedure under conditions of cued
recall.
a
Shown are results from separate meta-analyses for combinations of compatible memory tests for each study. CCHS included the RAVLT
and HUI; CSHA included the RAVLT, Free Buschke, and Total Buschke; and NuAge include the Free Buschke and Total Buschke.

the T-scores in the simulation still increased by approxi- activity on memory across population. Interestingly, sub-
mately 4%. stantial heterogeneity was also evident when the relation-
In the simulation study, we compared the 2 standardiza- ship between the confounding variables and the outcome
tion methods across a number of scenarios to examine the differed across the studies, even when the population distri-
types of heterogeneity that researchers generally explore in butions of the confounders and the effect sizes of physical
a meta-analysis. We found the method of standardization activity on memory were consistent across cohorts and
and the population characteristics had only a small influ- regardless of whether or not the effect estimates were
ence on heterogeneity. As one would expect, heterogeneity adjusted. This implies that in terms of sources of heteroge-
was evident when we varied the effect size of physical neity, the method of standardization plays a much smaller

Am J Epidemiol. 2016;184(10):770–778
776 Griffith et al.

Table 3. Summary Hedges’ g for the Weighted Mean Difference for Simulated Memory Tests in People Who Reported No or Low Physical
Activity Compared With People Who Reported Moderate or High Levels of Physical Activitya, Canadian Community Health Survey-Canadian
Longitudinal Study on Aging (2008–2009), Canadian Study of Health and Aging (1991–1992), and Quebec Longitudinal Study on Nutrition and
Aging (2004–2005)

Effect of Physical Activity Population Confounder Effect on Memory Type of Outcome Effect Size Power Average I2

Unadjusted
Homogeneous Homogeneous Homogeneous T-score 0.57 100 16.6
Homogeneous Homogeneous Homogeneous C-score 0.53 100 17.1
Homogeneous Homogeneous Heterogeneous T-score 0.62 100 87.6
Homogeneous Homogeneous Heterogeneous C-score 0.57 100 82.7
Homogeneous Heterogeneous Homogeneous T-score 0.58 100 27.7
Homogeneous Heterogeneous Homogeneous C-score 0.54 100 30.0

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


Homogeneous Heterogeneous Heterogeneous T-score 0.62 100 91.9
Homogeneous Heterogeneous Heterogeneous C-score 0.57 100 89.8
Heterogeneous Homogeneous Homogeneous T-score 0.39 73.2 95.4
Heterogeneous Homogeneous Homogeneous C-score 0.36 56.2 95.6
Heterogeneous Homogeneous Heterogeneous T-score 0.46 19.9 98.2
Heterogeneous Homogeneous Heterogeneous C-score 0.41 13.5 98.0
Heterogeneous Heterogeneous Homogeneous T-score 0.40 62.8 96.0
Heterogeneous Heterogeneous Homogeneous C-score 0.37 46.9 96.0
Heterogeneous Heterogeneous Heterogeneous T-score 0.47 10.2 98.4
Heterogeneous Heterogeneous Heterogeneous C-score 0.42 7.0 98.3
Adjusted
Homogeneous Homogeneous Homogeneous T-score 0.59 100 18.4
Homogeneous Homogeneous Homogeneous C-score 0.59 100 18.4
Homogeneous Homogeneous Heterogeneous T-score 0.65 100 88.5
Homogeneous Homogeneous Heterogeneous C-score 0.65 100 88.5
Homogeneous Heterogeneous Homogeneous T-score 0.60 100 29.9
Homogeneous Heterogeneous Homogeneous C-score 0.60 100 29.9
Homogeneous Heterogeneous Heterogeneous T-score 0.64 100 92.5
Homogeneous Heterogeneous Heterogeneous C-score 0.64 100 92.5
Heterogeneous Homogeneous Homogeneous T-score 0.41 72.7 95.8
Heterogeneous Homogeneous Homogeneous C-score 0.41 72.7 95.8
Heterogeneous Homogeneous Heterogeneous T-score 0.48 19.4 99.4
Heterogeneous Homogeneous Heterogeneous C-score 0.48 19.4 99.4
Heterogeneous Heterogeneous Homogeneous T-score 0.41 61.5 96.4
Heterogeneous Heterogeneous Homogeneous C-score 0.41 61.5 96.4
Heterogeneous Heterogeneous Heterogeneous T-score 0.48 9.2 98.6
Heterogeneous Heterogeneous Heterogeneous C-score 0.48 9.2 98.6

Abbreviation: C-score, category-centered score.


a
Shown are results from meta-analyses for 3 scenarios in which the following were either homogeneous or heterogeneous across
population-based studies: 1) effect size, 2) distribution of confounders, and 3) relationship between confounders and the outcome.

role than does differential confounding of the outcome conducted 2-stage meta-analysis. Although the results are
measure across studies and that a significant I2 can be ob- often similar, there are occasions when 1-stage and 2-stage
tained even when the “standard” sources of heterogeneity meta-analyses can provide different parameter estimates and
are not existent across studies. different conclusion (37); however, it is not clear whether the
This also has implications for conducting aggregate data use of a 1-stage rather than a 2-stage model affects mea-
meta-analyses. To fully explore the contribution of these sures of heterogeneity. We expect that a 1-stage IPD analy-
factors to heterogeneity, one requires exploration of study- sis would be able to better address the heterogeneity in each
specific data and IPD meta-analysis. In our analyses, we of the effect sizes. Indeed, using random coefficient models

Am J Epidemiol. 2016;184(10):770–778
Standardization Approaches to Harmonization 777

makes it possible to study heterogeneity for each effect size. (Hélène Payette); Research Institute of the McGill
Furthermore, exploring whether or not the outcome being University Health Centre, McGill University, Montreal,
measured is unidimensional and consistent across studies Quebec, Canada (Christina Wolfson); Department of
would also require more complex modeling. For example, Epidemiology, Biostatistics and Occupational Health,
latent variable modeling allows for simultaneously use of McGill University, Montreal, Quebec, Canada (Christina
information on all measures of a construct, testing of the Wolfson); and Research Center, Institut Universitaire de
goodness of fit of the proposed model, and testing of whether Gériatrie de Montréal and Psychology Department,
or not there is consistency of the measures across data sets. Université de Montréal, Montreal, Quebec, Canada (Sylvie
When using the other methods of standardization, researchers Belleville).
implicitly assume that all instruments are measuring the same The groundwork for this manuscript is based on the
construct, and this assumption is generally not verified. methods research report Harmonization of Cognitive
Data from observational studies are presented in this arti- Measures in Individual Participant Data and Aggregate
cle; methods to retrospectively harmonize outcome, expo- Data Meta-Analysis, funded by the Agency for Healthcare
sure, and covariate data were used. If one were applying Research and Quality, United States Department of Health

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


harmonization methods to a meta-analysis of randomized and Human Services, under contract No. HHSA 290 2007
controlled trials, using unadjusted measures of effect would 10060 I. L.E.G. is supported by a Canadian Institutes of
generally be more appropriate, and thus the inclusion of Health Research New Investigators Award. P.R. holds a
covariate data would not be warranted. There are situations, Tier 1 Canada Research Chair in Geroscience and the
however, in which one is interested in effect modification Raymond and Margaret Labarge Chair in Research and
in which combinable covariate data are required. As well, Knowledge Application for Optimal Aging.
in the context of evaluating harms, one is often limited to The authors are solely responsible for the content of the
nonexperimental data. review. The opinions expressed herein do not necessarily
Overall, there was a general consistency between the 2 reflect the opinions of the Agency for Healthcare Research
types of standardization methods, especially when an adjusted and Quality.
analysis was performed. In the case study, there were multi- Conflict of interest: none declared.
ple examples when using the less complex standardization
methods, in which important heterogeneity was not identi-
fied. This masking of heterogeneity happened most often
when using a T-score to standardize the cognition scores REFERENCES
compared with a category-centered score. The simulation
study also identified a number of sources of heterogeneity 1. Raina PS, Wolfson C, Kirkland SA, et al. The Canadian
that can affect the I2, some of which are not the standard longitudinal study on aging (CLSA). Can J Aging. 2009;
sources considered by researchers. One would not be able 28(3):221–229.
to explore these types of heterogeneity in an aggregate data 2. Ollier W, Sprosen T, Peakman T. UK Biobank: from concept
to reality. Pharmacogenomics. 2005;6(6):639–646.
meta-analysis because individual-level data are needed.
3. Stolk RP, Rosmalen JG, Postma DS, et al. Universal risk
Moreover, to fully explore these different sources of hetero- factors for multifactorial diseases: LifeLines: a
geneity and the underlying structure of the construct, more three-generation population-based study. Eur J Epidemiol.
complex models are required. Indeed, standardization by 2008;23(1):67–74.
itself is not harmonization because putting variables on the 4. Thompson A. Thinking big: large-scale collaborative research
same scale can be done with any 2 variables, which does in observational epidemiology. Eur J Epidemiol. 2009;
not necessarily imply that the standardized variables carry 24(12):727–731.
the same information. 5. Stewart LA, Tierney JF. To IPD or not to IPD? Advantages
and disadvantages of systematic reviews using individual
patient data. Eval Health Prof. 2002;25(1):76–97.
6. Griffith LE, Shannon HS, Wells RP, et al. Individual
participant data meta-analysis of mechanical workplace risk
ACKNOWLEDGMENTS factors and low back pain. Am J Public Health. 2012;102(2):
309–318.
Author affiliations: Department of Clinical 7. Griffith LE, van den Heuvel E, Fortier I, et al. Statistical
Epidemiology and Biostatistics, McMaster University, approaches to harmonize data on cognitive measures in
Hamilton, Ontario, Canada (Lauren E. Griffith, Parminder systematic reviews are rarely reported. J Clin Epidemiol.
Raina, Nazmul Sohel, Meghan Kenny); Department of 2015;68(2):154–162.
Mathematics and Computer Science, Eindhoven University 8. Carvalho A, Rea IM, Parimon T, et al. Physical activity and
of Technology, Eindhoven, The Netherlands (Edwin van cognitive function in individuals over 60 years of age: a
den Heuvel); Research Institute of the McGill University systematic review. Clin Interv Aging. 2014;9:661–682.
9. Roig M, Nordbrandt S, Geertsen SS, et al. The effects of
Health Centre, Montreal, Quebec, Canada (Isabel Fortier,
cardiovascular exercise on human memory: a review with
Dany Doiron); Department of Psychology, University of meta-analysis. Neurosci Biobehav Rev. 2013;37(8):
Victoria, Victoria, British Columbia, Canada (Scott M. 1645–1666.
Hofer); Research Center on Aging, CIUSSS de l’Estrie- 10. Canadian Study of Health and Aging Working Group.
CHUS, and Faculty of Medicine and Health Sciences, Canadian study of health and aging: study methods and
University of Sherbrooke, Sherbrooke, Quebec, Canada prevalence of dementia. CMAJ. 1994;150(6):899–913.

Am J Epidemiol. 2016;184(10):770–778
778 Griffith et al.

11. Statistics Canada. Canadian Community Health Survey – across bioclinical studies. Int J Epidemiol. 2010;39(5):
Healthy aging (CCHS). http://www23.statcan.gc.ca/imdb/ 1383–1393.
p2SV.pl?Function=getSurvey&SDDS=5146. Published 25. Fortier I, Doiron D, Little J, et al. Is rigorous retrospective
March 26, 2008. Updated November 27, 2008. Accessed harmonization possible? Application of the DataSHaPER
January 5, 2016. approach across 53 large studies. Int J Epidemiol. 2011;
12. Gaudreau P, Morais JA, Shatenstein B, et al. Nutrition as a 40(5):1314–1328.
determinant of successful aging: description of the Quebec 26. Tuokko H, Woodward TS. Development and validation of
longitudinal study Nuage and results from cross-sectional a demographic correction system for neuropsychological
pilot studies. Rejuvenation Res. 2007;10(3):377–386. measures used in the Canadian Study of Health and Aging.
13. Taylor EM. The Appraisal of Children With Cerebral J Clin Exp Neuropsychol. 1996;18(4):479–616.
Deficits. Cambridge, MA: Harvard University Press; 1959. 27. Riley RD, Simmonds MC, Look MP. Evidence synthesis
14. Butler M, Retzlaff P, Vanderploeg R. Neuropsychological combining individual patient data and aggregate data: a
test usage. Prof Psychol Res Pr. 1991;22(6):510–512. systematic review identified current practice and possible
15. Lezak MD, Howlesonn DB, Loring DW. Neuropsychological methods. J Clin Epidemiol. 2007;60(5):431–439.
Assessment. 4th ed. New York, NY: Oxford University Press; 28. Dion M, Potvin O, Belleville S, et al. Normative data for the

Downloaded from https://academic.oup.com/aje/article/184/10/770/2332844 by guest on 21 May 2024


2004. Rappel libre/Rappel indicé à 16 items (16-item Free and
16. Buschke H. Cued recall in amnesia. J Clin Neuropsychol. Cued Recall) in the elderly Quebec-French population. Clin
1984;6(4):433–440. Neuropsychol 2015;28(suppl 1):S1–S19.
17. Grober E, Buschke H. Genuine memory deficits in dementia. 29. Horn JL. Organization of abilities and the development of
Dev Neuropsychol. 1987;3(1):13–36. intelligence. Psychol Rev 1968;75(3):242–259.
18. Carlesimo GA, Perri R, Caltagirone C. Category cued recall 30. DerSimonian R, Laird N. Meta-analysis in clinical trials.
following controlled encoding as a neuropsychological tool in Control Clin Trials. 1986;7(3):177–188.
the diagnosis of Alzheimer’s disease: a review of the 31. DerSimonian R, Kacker R. Random-effects model for
evidence. Neuropsychol Rev. 2011;21(1):54–65. meta-analysis of clinical trials: an update. Contemp Clin
19. Horsman J, Furlong W, Feeny D, et al. The Health Utilities Trials. 2007;28(2):105–114.
Index (HUI): concepts, measurement properties and 32. Fleiss JL. Statistical Methods for Rates and Proportions.
applications. Health Qual Life Outcomes. 2003;1:54. 2nd ed. New York, NY: John Wiley & Sons; 1981.
20. Kavirajan H, Hays RD, Vassar S, et al. Responsiveness and 33. Higgins JP, Thompson SG. Quantifying heterogeneity in a
construct validity of the health utilities index in patients with meta-analysis. Stat Med. 2002;21(11):1539–1558.
dementia. Med Care. 2009;47(6):651–661. 34. Deeks JJ, Higgins JPT, Altman DG. Chapter 9: Analysing data
21. Findlay L, Bernier J, Tuokko H, et al. Validation of cognitive and undertaking meta-analyses. In: Higgins JPT, Green S, eds.
functioning categories in the Canadian Community Health Cochrane Handbook for Systematic Reviews of Interventions.
Survey-Healthy Aging. Health Rep. 2010;21(4):85–100. Version 5.1.0. Chichester, UK: The Cochrane Collaboration;
22. Coley N, Andrieu S, Gardette V, et al. Dementia prevention: 2011. http://handbook.cochrane.org/. Updated January 19,
methodological explanations for inconsistent results. 2016. Accessed January 19, 2016.
Epidemiol Rev. 2008;30:35–66. 35. Wallace BC, Schmid CH, Lau J, et al. Meta-analyst: software
23. Griffith L, van den Heuvel E, Fortier I, et al. Harmonization for meta-analysis of binary, continuous and diagnostic data.
of Cognitive Measures in Individual Participant Data and BMC Med Res Methodol. 2009;9:80.
Aggregate Data Meta-Analysis. Rockville, MD: Agency for 36. SAS/STAT User’s Guide, Version 8. Cary, NC: SAS Institute
Healthcare Research and Quality; 2013. (AHRQ Publication Inc.; 2003.
No. 13-EHC040-EF). 37. Debray TP, Moons KG, Abo-Zaid GM, et al. Individual
24. Fortier I, Burton PR, Robson PJ, et al. Quality, quantity and participant data meta-analysis for a binary outcome:
harmony: the DataSHaPER approach to integrating data one-stage or two-stage? Plos One. 2013;8(4):e60650.

Am J Epidemiol. 2016;184(10):770–778

You might also like