Confiabilidade Testes Paper Elsa
Confiabilidade Testes Paper Elsa
Confiabilidade Testes Paper Elsa
ABSTRACT. Cognitive function evaluation entails the use of neuropsychological tests, applied exclusively or in sequence.
The results of these tests may be influenced by factors related to the environment, the interviewer or the interviewee.
Objectives: We examined the test-retest reliability of some tests of the Brazilian version from the Consortium to Establish
a Registry for Alzheimer’s disease. Methods: The ELSA-Brasil is a multicentre study of civil servants (35-74 years of age)
from public institutions across six Brazilian States. The same tests were applied, in different order of appearance, by the
same trained and certified interviewer, with an approximate 20-day interval, to 160 adults (51% men, mean age 52 years).
The Intraclass Correlation Coefficient (ICC) was used to assess the reliability of the measures; and a dispersion graph was
used to examine the patterns of agreement between them. Results: We observed higher retest scores in all tests as well
as a shorter test completion time for the Trail Making Test B. ICC values for each test were as following: Word List Learning
Test (0.56), Word Recall (0.50), Word Recognition (0.35), Phonemic Verbal Fluency Test (VFT, 0.61), Semantic VFT (0.53)
and Trail B (0.91). The Bland-Altman plot showed better correlation of executive function (VFT and Trail B) than of memory
tests. Conclusions: Better performance in retest may reflect a learning effect, and suggest that retest should be repeated
using alternate forms or after longer periods. In this sample of adults with high schooling level, reliability was only moderate
for memory tests whereas the measurement of executive function proved more reliable.
Key words: cognitive assessment, reliability, cohort studies.
1
Enfermeira, Mestre em Ciências da Saúde pelo Programa de Pós-graduação em Ciências da Saúde, Universidade Federal de Minas Gerais, Belo Horizonte MG,
Brasil. 2Médica, Doutora em Epidemiologia, Professora Adjunta da Escola de Nutrição, Universidade Federal de Ouro Preto, Ouro Preto MG, Brasil. 3Médica, Dou-
tora em Epidemiologia, Professora Titular do Departamento de Medicina Preventiva e Social, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo
Horizonte MG, Brasil. Coordenadora do ELSA-Brasil. 4Estudante de Iniciação Científica, Faculdade de Enfermagem, Universidade Federal de Minas Gerais, Belo
Horizonte MG, Brasil. 5Médica, Especialista em Geriatria, Doutora em Medicina, Professora Associada do Departamento de Clínica Médica, Faculdade de Medicina,
Universidade Federal de Minas Gerais. Vice-Coordenadora do Programa de Pós Graduação em Ciências da Saúde, Universidade Federal de Minas Gerais, Belo
Horizonte MG, Brasil.
Valéria M.A. Passos. Centro de Investigação ELSA-MG / Hospital Borges da Costa – Av. Alfredo Balena, 110 – 30130-100 Belo Horizonte MG – Brasil. E-mail:
[email protected]; [email protected]
Received May 29, 2013 Accepted in final form august 15, 2013.
related words printed in large letters on cards, with the age range (35-59 and 60-74 years-old) and educational
words shown every 2 seconds and presented in a differ- level (uncompleted high school, completed high school,
ent order on each of the three learning trials, with im- University).
mediate recall. After a 5 minutes’ delay, retention and Reliability, according to ICC values, was classified as
recollection were tested by a free recall and by the rec- poor when equal to zero; slight – from 0.01 to 0.2; fair
ognition of ten previous words that were intermixed – from 0.21 to 0.4; moderate – from 0.41 to 0.6; sub-
with ten distractor words. Verbal fluency tests (VFT) stantial – from 0.61 to 0.8; almost perfect – from 0.81
consisted of asking participants to say in one minute as to 0.9.13
many words as possible related to a specific category of In order to compare our results with other studies,
animals (semantic test) or beginning with the letter F Pearson Correlation Coefficients were also estimated for
(phonemic test). The Trail Making Test B (Trail B), part memory tests and VFT. The Spearman Coefficient was
a, was used to train for Trail B, part b, with the time tak- estimated to compare the Trail B test. The Pearson coef-
en to complete the task computed only for part b. The ficient measures the degree to which a paired group of
participant was instructed to draw lines connecting let- observations in a diagram approaches a situation where
ters and numbers in an order that alternates between each point is located precisely over the straight line,
increasing numeric value and alphabetic order (1,A, 2,B, which means the absence of difference between two ob-
3,C, etc.).The participant had to draw as quickly as pos- servations. Dispersion graphics were used to evaluate
sible, without lifting the pencil tip from the page. Su- the pattern and distribution of scores.
pervisors were instructed to point out the errors. The
test score was the total time to complete the condition, RESULTS
including the time necessary to correct errors.5 The study sample had the same sex and age distribu-
The same tests were applied, albeit in a different tion as the ELSA cohort and compromised 81 (50.6%)
order, between 22/02/2010 and 03/12/2010 by the men and 79 (49.4%) women, 121 (75.6%) adults (35-
same previously-trained and certified interviewer, in a 59 years old) and 39 (24.4%) elderly (60-74 years old).
quiet environment, with good lighting and low levels A higher schooling level (10.6% had uncompleted high
of noise or other distracting stimulations. The order of school, 28.8% completed high school and 60.6% had a
the tests was arranged in such a manner that there was University degree) than the participants of the cohort
always a diverting test, category/phonemic or phone- was observed.4
mic/category VFT, between the word memory test and In addition, higher retest scores on the word mem-
the recall and recognition tests. The Trail B was always ory, recall, semantic and phonemic VFT tests and a
the first or the last test to be performed. The tests were shorter retest time to perform the Trail B (Table 1), were
recorded and later revised. VFT scores were defined by also observed. The ICC varied from 0.35, for the recogni-
previously-trained and certified supervisors from the tion test, to 0.91, for the Trail B, which means that the
ELSA-Brasil research centres. A high level of agreement capacity of the different tests to discriminate between
was observed between each of the six centres and the individuals ranged from between moderate and almost
reference standard.10 perfect, respectively. All the tests presented a positive
correlation, with statistically significant values, reveal-
Statistical analysis. The Epiinfo® 3.5.3 Program, 10 was ing that the retest scores tended to increase linearly in
used for the double data entry, and the STATA® Pro- relation to the test scores (Table 2).
gram, 12 for the statistical analysis. Figure 1 depicts the dispersion graphs correspond-
Descriptive analysis of the tests and retests was gen- ing to Pearson coefficient values for the cognitive tests.
erated by means of the average and the range of varia- The inclination of the line deviating from 45º shows the
tion in first and second application. As homogeneity memory tests and VFT retest scores were higher than
was found only for Trail B data (Bartlett Test <0.05), the the test scores, while the opposite occurred with the
Mann Whitney test was used to compare the average Trail B. The recall test graph shows a higher dispersion
time between test and retest. of values. In the recognition tests, the presence of scores
The Intraclass Correlation Coefficient (ICC) was used close to ten, the maximum limit in the test, is notable.
as the main measure for estimating reliability, since this No influence of sex, age or schooling on reliability
test assesses the total variability caused by differences was found when variability of all test scores were ana-
between individuals. The ICC reliability test was done lysed according to these variables, using stratified ICC
according to the characteristics of the participants: sex, values and their confidence intervals (Table 3).
Table 1. Score distribution of cognitive tests and retests among 160 participants of ELSA-Brasil.
Tests
Measures Word memory Recall Recognition VFT* (animals) VFT (letter F) TRAIL B (seconds)
Range – test 11-28 1-10 8-10 6-35 3-27 29-858
Range – retest 12-30 2-10 7-10 10-34 2-26 31-526
Average – test 21 7 10 19 13 90.0
Average – retest 23.5 8 10 20 14 81.5**
Difference p<0.001 p <0.001 p=0.42 p<0.001 p<0.001 p=0.009
*VFT: Verbal Fluency Tests; **Trail B mean execution time and Mann-Whitney test .
Table 3. Intraclass Correlation Coefficient of cognitive function tests in ELSA-Brasil, by sex, age and schooling.
Variable Word memory Recall Recognition VFT*(animals) VFT (letter F) Trail B
Sex Female 0.63 (0.39-0.86) 0.52 (0.16-0.88) 0.61 (0.00-1.22) 0.54 (0.30-0.78) 0.53 (0.28-0.78) 0.88 (0.80-0.96)
Male 0.45 (0.16-0.74) 0.50 (0.15-0.85) 0.30 (0.00-0.89) 0.53 (0.29-0.77) 0.64 (0.42-0.86) 0.97 (0.96-0.99)
Age group 35-59 0.56 (0.31-0.80) 0.49 (0.15-0.84) 0.26 (0.01-0.78) 0.57 (0.36-0.78) 0.61 (0.41-0.82) 0.90 (0.85-0.95)
(years) 60-74 0.56 (0.24-0.88) 0.47 (0.07-0.86) 0.50 (0.00-1.19) 0.23 (0.00-0.62) 0.78 (0.58-0.97) 0.76 (0.28-1.24)
Schooling Uncompleted 0.27 (0.00-0.88) 0.07 (0.00-0.67) 0.64 (0.01-1.36) 0.07 (0.00-0.89) 0.93 (0.70-1.06) 0.72 (0.24-1.20)
High School
Completed 0.71 (0.47-0.94) 0.56 (0.18-0.94) 0.30 (0.00-0.92) 0.41 (0.07-0.75) 0.49 (0.17-0.81) 0.89 (0.77-1.00)
High School
University 0.49 (0.22-0.76) 0.48 (0.11-0.84) 0.32 (0.00-0.89) 0.45 (0.22-0.69) 0.57 (0.33-0.80) 0.72 (0.24-1.20)
*VFT: Verbal Fluency Tests; **Trail B: Trail Making Test B.
30 10
29
28 9
27
26 8
25
24 7
23
22 6
21
Retest
Retest
20 5
19
18 4
17
16 3
15
14 2
13
12 1
11
10 0
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0 1 2 3 4 5 6 7 8 9 10
Test Test
5 14
4 12
10
3
8
2 6
4
1
2
0 0
0 1 2 3 4 5 6 7 8 9 10 11 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Test Test
18 300
16
14
12 200
10
8
6 100
4 Figure 1. Test and retest dis-
2
0 0 persion graphs for battery of
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 0 100 200 300 400 500 600 700 800 900 cognitive function tests among
Test Test 160 participants of ELSA-Brasil.
the United States of America,5 which used samples of 20 different people should not, however, lead to the conclu-
and 278 people, respectively, aged under 50 years, and a sion that these variables have no effect when assessing
one-month interval between test and retest. These same the validity of these tests. There is evidence that these
studies revealed substantial reliability for the recall test variables interfere with the capacity to distinguish be-
(Pearson coefficient=0.64). Lower reliability for the rec- tween cognitive levels.9,19-20
ognition test was also observed in the American study One limitation of this study is that it was conducted
(Pearson=0.36), but not in the Korean investigation in one of the six ELSA research centers, as it was decided
(Pearson=0.74).15 In our study, lower reliability for the to reduce the variability of using different interviewers.
recognition test may be explained by the ceiling effect, In conclusion, we observed moderate reliability for
where the test values achieved by the sample are close to cognitive tests applied in adults, after a short interval
the maximum, reducing the variability between scores. averaging twenty days. The slight improvement in per-
The better performance in the retests strongly sug- formance across all the retests, compared to the initial
gests a learning effect, as observed in other studies.18 In tests, suggests a learning effect. To avoid this effect, the
the present study, we chose an interval of time similar ELSA-Brasil cognitive evaluation should use alternate
to that adopted in other investigations, which ranged equivalent versions of the test during study waves, esti-
from two to four weeks. Longer periods between the mated to be every three to four years, in order to reduce
tests could increase the probability of real changes in the influence of learning on prospective comparisons of
cognitive function, compromising the test reliability of cognitive tests in this Brazilian adult population.
the investigation; whilst shorter periods are more easily
contaminated by the learning effect. As in other studies, Funding. This work was supported by a grant from the
in an attempt to avoid the learning effect, the retests Ministry of Health and Ministry of Science and Tech-
were arranged in a different order. Despite these precau- nology (FINEP- Financiadora de Estudos e Projetos) for
tions, the influence of the learning effect may have con- ELSA (nº 01 06 0278.00 MG). Prof. Barreto has a grant
tributed to decreasing the reliability of the tests. from the National Research Council of Brazil (CNPq, nº
Considering the Trail B showed almost perfect reli- 01 06 0278.00), Prof. Passos has a grant from the State
ability, it may be useful when a short reapplication in- of Minas Gerais Agency for Research and Technology
terval is necessary. High reliability for the Trail B test (FAPEMIG, n° 17767) and Prof. Giatti has a fellowship
was also found in a German study, using a sample of 55 from the Coordination for the Improvement of Higher
individuals, with a mean age of 46 years and, on average, Education Personnel (CAPES).
10 years of schooling.16
The studied tests presented the advantage of main- Acknowledgments. The authors thank the ELSA-Brasil
taining reliability regardless of sex, age and schooling. participants and the research team involved in the base-
The test’s capacity to be precise even when applied to line study for their contribution to this study.
REFERENCES
1. Lezak MD, Howieson DB, Loring DW. Neuropsychological assessment. 7. Brucki SMD, Malheiros SMF, Okamoto IH, Bertolucci PHF. Dados nor-
4 ed. New York, NY: Oxford University Press; 2004. mativos para o teste de fluência verbal categoria animais em nosso
2. Szklo M, Nieto FJ. Quality Assurance and Control. In: Sklo M, Nieto FJ meio. Arq Neuropsiquiatr 1997;55:56-61.
(eds). Epidemiology beyond the basics. 2nd edition. Sudbury, MA, USA: 8. Brucki SMD, Rocha MSG. Category fluency test: effects of age, gen-
Jones and Bartlett Publishers; 2007:297-348 der and education on total scores, clustering and switching in Brazilian
3. Huley SB, Martin JN, Cummings SR. Planning the measurements: ac- Portuguese-speaking subjects. Braz J Med Biol Res 2004;37:1771–
curacy and precision In: Hulley SB, Cummings SR, Browner WS, Grady 1777.
DG, Newman TB (eds).Designing clinical research. 3rd edition, Philadel- 9. Foss MP, Vale FAC, Speciali JG. Influência da escolaridade na avaliação
phia, PA, USA. Linppicott Willians and Wilkings; 2006:55-67. neuropsicológica de idosos. Arq Neuropsiquiatr 2005;63:119-126.
4. Aquino EM, Barreto SM, Benseñor IM, et al. Brazilian Longitudinal 10. Passos VMA, Giatti L, Barreto SM, et al. Verbal fluency tests reliability
Study of Adult Health (ELSA-Brasil): objectives and design. Am J Epide- in a Brazilian multicentric study, ELSA-Brasil. Arq Neuropsiquiatr 2011;
miol 2012;175:315-324. 69:814-816.
5. Morris JC, Heyman A, Mohs RC, et al. The Consortium to Establish 11. Dean AG, Arner TG, Sunki GG, et al. Epi Info™, a database and statis-
a Registry for Alzheimer’s Disease (CERAD): Part I. Clinical and neu- tics program for public health professionals. Centres for Disease Control
ropsychological assessment of Alzheimer’s disease. Neurology 1989; and Prevention, Atlanta, Georgia, USA; 2007.
39:1159-1165. 12. Stata Statistical Software: Release 10.College Station, Texas: Stata
6. Fillenbaum GG, Belle G, Morris JC, et al. CERAD (Consortium to Estab- Corporation; 2007.
lish a Registry for Alzheimer’s disease): The first 20 years. Alzheimer’s 13. Landis JR, Koch GG. The measurement of observer agreement for cat-
Dement 2008;4:96-109. egorical data. Biometrics 1977;33:159-74.
14. Harrison JE, Buxton P, Husain M, Wise R. Short test of semantic and penho da população brasileira na bateria neuropsicológica do Consor-
phonological fluency: Normal performance, validity and test-retest reli- tium to Establish a Registry for Alzheimer’s disease (CERAD). Rev Psiq
ability. Brit J Clin Psychol 2000;39:181-191. Clín 1998;25:80-83.
15. Lee JH, Lee KU, Lee DY, et al. Development of the Korean Version of 18. Salthouse TA. Selective review of cognitive aging. J Int Neuropsychol
the Consortium to Establish a Registry for Alzheimer’s Disease Assess- Soc 2010;16:754-760.
ment Packet (CERAD-K): Clinical and Neuropsychological Assessment 19. Charchat-Fichman H, Caramelli P, Sameshima K, Nitrini R. Declínio da
Batteries. J Gerontol 2002;57:47-53. capacidade cognitiva durante o envelhecimento. Rev Bras Psiq 2005;
16. Wagner S, Helmreich I, Dahmen N, Lieb K, Tadic A. Reliability of Three 27:79-82.
Alternate Forms of the Trail Making Tests A and B. Arch Clin Neuropsy- 20. Christofoletti G, Oliani MM, Stella F, Gobbi S, Gobbi LTB. The influence
chol 2011;26:314-321. of schooling on cognitive screening test in the elderly. Dement Neuro-
17. Bertolucci PHF, Okamoto IH, Neto JT, Ramos LR, Brucki SMD. Desem- psychol 2007;1:46-51.