Precision Diagnostica y Confiabilidad Del Us en Higado Graso

Diagnostic Accuracy and Reliability of Ultrasonography

for the Detection of Fatty Liver: A Meta-Analysis

Ruben Hernaez,1,2,3* Mariana Lazo,1* Susanne Bonekamp,4 Ihab Kamel,4 Frederick L. Brancati,1,3,5
Eliseo Guallar,3,5,6 and Jeanne M. Clark1,3,5

Ultrasonography is a widely accessible imaging technique for the detection of fatty liver,
but the reported accuracy and reliability have been inconsistent across studies. We aimed
to perform a systematic review and meta-analysis of the diagnostic accuracy and reliability
of ultrasonography for the detection of fatty liver. We used MEDLINE and Embase from
October 1967 to March 2010. Studies that provided cross-tabulations of ultrasonography
versus histology or standard imaging techniques, or that provided reliability data for ultra-
sonography, were included. Study variables were independently abstracted by three
reviewers and double checked by one reviewer. Forty-nine (4720 participants) studies were
included for the meta-analysis of diagnostic accuracy. The overall sensitivity, specificity,
positive likelihood ratio, and negative likelihood ratio of ultrasound for the detection of
moderate-severe fatty liver, compared to histology (gold standard), were 84.8% (95% con-
fidence interval: 79.5-88.9), 93.6% (87.2-97.0), 13.3 (6.4-27.6), and 0.16 (0.12-0.22),
respectively. The area under the summary receiving operating characteristics curve was
0.93 (0.91-0.95). Reliability of ultrasound for the detection of fatty liver showed kappa
statistics ranging from 0.54 to 0.92 for intrarater reliability and from 0.44 to 1.00 for
interrater reliability. Sensitivity and specificity of ultrasound was similar to that of other
imaging techniques (i.e., computed tomography or magnetic resonance imaging). Statisti-
cal heterogeneity was present even after stratification for multiple clinically relevant char-
acteristics. Conclusion: Ultrasonography allows for reliable and accurate detection of
moderate-severe fatty liver, compared to histology. Because of its low cost, safety, and
accessibility, ultrasound is likely the imaging technique of choice for screening for fatty
liver in clinical and population settings. (HEPATOLOGY 2011;54:1082-1090)

atty liver is the accumulation of fat (i.e., macro- of fatty liver, with a prevalence as high as 30% in
vesicular steatosis) within the hepatic paren- many populations.1 NAFLD may lead to fibrosis,2 cir-
chyma. Nonalcoholic fatty liver disease rhosis,3 liver cancer,4,5 liver failure requiring liver
(NAFLD), the presence of fat infiltration in the liver transplant,6 and mortality7, and it is associated with
in the absence of excessive alcohol consumption and type 2 diabetes, metabolic syndrome, and other cardio-
other causes of liver disease, is the most common cause vascular risk factors.8,9 Although NAFLD represents a

HEPATOLOGY, Vol. 54, No. 3, 2011 HERNAEZ ET AL. 1083

major public health challenge, its natural history and identify fatty liver against histology as the gold stand-
determinants are incompletely understood because of ard; (2) estimates of intra- or interrater reliability
limitations in diagnostic technologies and because this (such as kappa statistics or intraclass correlation coeffi-
condition is often asymptomatic until very late, severe cients) of ultrasound to identify fatty liver; and (3)
complications occur. In addition, because of the risk of comparisons of ultrasound to other imaging modalities
progression to more advanced stages, early noninvasive (i.e., CT or MRI) to identify fatty liver.
detection of fatty liver disease is clinically important. We excluded studies that did not use ultrasound for
Conventional B-mode ultrasonography is the most evaluating fatty liver, studies that used ultrasound but
common technique used to assess the presence of fatty did not study fatty liver (e.g., cirrhosis exclusively), and
liver in clinical settings and population studies. How- studies that evaluated ultrasound techniques not com-
ever, several limitations of ultrasonography, including monly used (e.g., Doppler, transient elastography con-
operator dependency, subjective evaluation, and limited trast-enhanced ultrasound, artificial neural networks, or
ability to quantify the amount of fatty infiltration, computer-aided readings, including histogram evaluation
have raised concerns. Indeed, some qualitative and fat quantification, using regions of interest). We
reviews10,11 have questioned the ability of ultrasound also excluded studies using experimental conditions,
to reliably identify fatty liver, although no systematic studies performed in the operating room, studies per-
review has performed a quantitative summary of avail- formed in nonhumans, in vitro or in vivo, and articles
able data on the diagnostic ability and reliability of that did not report original data (e.g., editorials, news,
ultrasound to identify fatty liver, compared to histol- comments, guidelines, and reviews).
ogy, the gold-standard. Data Extraction and Quality Assessment. Three
The main aim of this meta-analysis was thus to sys- investigators (R.H., M.L., and S.B.) independently
tematically review and summarize the available litera- reviewed the search results to determine article inclu-
ture on the diagnostic accuracy (i.e., sensitivity and sion and perform data abstraction. Discrepancies were
specificity) and reliability of ultrasound to distinguish resolved by consensus. For each selected publication,
patients with and without fatty liver, defined as the we abstracted year of publication, country, inclusion
presence of moderate to severe steatosis on liver biopsy criteria, histological definition of fatty liver (i.e., simple
(gold standard). As secondary aims, we sought to sys- steatosis and steatohepatitis), number of participants
tematically review and summarize the diagnostic accu- undergoing ultrasound and comparison tests (if appli-
racy and reliability of different ultrasonographic pa- cable), definitions of fatty liver used in the study, ultra-
rameters or criteria used to diagnose fatty liver (e.g., sonographic parameters evaluated, and reported meas-
presence of liver-to-kidney contrast or scores summing ures of accuracy and reliability. For articles with no
a variety of parameters). And, finally, we planned to reported measure of accuracy, we estimated the sensi-
analyze the available literature on the diagnostic accu- tivity and specificity from the available data. We eval-
racy (i.e., sensitivity and specificity) of ultrasound to uated the quality of each article by applying modified
detect fatty liver, compared to other imaging techni- Quality Assessment of Diagnostic Accuracy Studies
ques (i.e., magnetic resonance imaging [MRI] and (QUADAS)12 and STAndards for the Reporting of
computed tomography [CT]). Diagnostic accuracy studies (STARD) criteria.13
Study outcome was the presence of fatty liver as a
dichotomous variable, using the specific criteria and
Patients and Methods definitions used in each study. For ultrasound, a few
Data Sources and Search. Our search of PubMed studies reported four categories, and we combined the
and Embase included the term ultrasound and differ- normal/mild categories as absence of fatty liver, and
ent combinations of fatty liver using free text and key the moderate/severe categories as presence of fatty liver.
words (Supporting Table 1). The period of the elec- For histology, we used the presence of greater than or
tronic search extended from October 1967 through equal to 20%-30% fat infiltration to define fatty liver,
March 17, 2010, with no language restrictions. We except for Nagata et al. (10%), Guajardo-Salinas
also searched the reference lists of identified reviews (>0%), and Soresi (>5%). We conducted secondary
and abstracted articles. analyses on the diagnostic accuracy using lower levels
Study Selection. We included all studies that pre- of fat infiltration on histology as diagnostic criteria
sented the following: (1) estimates of diagnostic accu- (i.e., <5%, 10%, and 20%-30%).
racy (such as sensitivity or specificity), cross-tabula- Because a number of ultrasonographic parameters have
tions, or correlations of B-mode ultrasonography to been used alone or in combination to diagnose fatty liver;
1084 HERNAEZ ET AL. HEPATOLOGY, September 2011

if data were available, we evaluated the diagnostic accuracy ing 4720 participants, provided data on the diagnostic
of the following parameters: (1) parenchymal brightness, accuracy of ultrasound compared to histology as the
(2) liver-to-kidney contrast, (3) deep beam attenuation, gold standard. The weighted prevalence of histologically
(4) bright vessel walls, and (5) gallbladder wall definition. defined fatty liver across all studies was 31.8%, but the
Given that some studies reported or combined different studies varied with respect to study population and loca-
histological findings, such as inflammation and fibrosis, tion. Twenty-seven studies (55%) were conducted in a
we performed secondary analyses to study how accurate hospital setting or included a mixture of inpatients and
ultrasound was in identifying fatty infiltration with or outpatients. The indication for testing was suspicion of
without inflammation or fibrosis. liver disease in 17 studies and known liver disease in 16
Data Synthesis and Analysis. Sensitivity and speci- studies. The underlying liver disease was a combination
ficity of each study were summarized using the hier- of NAFLD and other pathologies in 36 studies and
archical summary receiver operating characteristics NAFLD only in eight studies. All studies included a
(ROC) curve approach.14 In this method, the relation- representative spectrum of patients. Seventeen (35%) of
ship between logit-transformed sensitivity and specific- the 49 studies did not report the method of ascertain-
ity in each study is quantified by the log diagnostic ment or used a different method of ascertainment in
odds ratio (OR) and the results are used to estimate a controls. Fewer than 50% of studies reported whether
summary ROC curve.15 This method provides sum- the interpretation of the ultrasound had been done with-
mary estimates of sensitivity and specificity, 95% confi- out knowledge of the results of the biopsy.
dence and prediction regions, and summary ROC Overall sensitivity of ultrasound to detect moderate
curves, and it allows for multivariate analysis of to severe histologically defined fatty liver from the ab-
between-study heterogeneity. Between-study heteroge- sence of steatosis (n ¼ 34 studies, 2815 participants)
neity was assessed by plots of the standardized loga- was 84.8% (95% confidence interval [CI]: 79.5-88.9),
rithm of the diagnostic OR versus the inverse of the specificity was 93.6% (87.2-97.0), the positive likeli-
standard error and by the I2 statistic, a parameter that hood ratio was 13.3 (6.4-27.6), the negative likelihood
describes the percentage of total variation across stud- ratio was 0.16 (0.12-0.22), and the summary area under
ies attributable to heterogeneity, rather than chance.16 the ROC curve was 0.93 (0.91-0.95) (Figs. 1 and 2A).
We used clinically important variables to assess We further examined the lower cutoffs for the detection
between-study heterogeneity and fit metaregression of histologically defined fat, and found that ultrasounds
models. Publication bias was assessed visually using the have a diagnostic accuracy for the detection of 10%
effective sample size funnel plot and associated regres- of steatosis between 0.91 and 0.93 and specificity
sion test of asymmetry.17 Statistical analyses were per- between 0.88 and 0.99 (Supporting Table 3).
formed using the STATA commands, METANDI and Heterogeneity for the area under the summary
MIDAS (StataCorp 2007, Stata Statistical Software, ROC curve was substantial (I2, 98%; 95% CI: 97-99).
Release 10; StataCorp LP, College Station, TX). In subgroup analyses, clinically relevant categories only
explained a minor proportion of between-study hetero-
geneity (Supporting Fig. 2). There was no indication
Results of publication or related biases (data not shown).
Our review included 49 studies of diagnostic accu- When ultrasound was used to differentiate the pres-
racy comparing ultrasound to histology (Table 1; Sup- ence of histologically based fatty liver alone versus
porting Fig. 1)18-66 and five studies comparing ultra- other pathological findings, such as hepatitis or fibrosis
sound to other radiological techniques (including three or normal liver (n ¼ 29 studies), overall sensitivity was
studies that reported three-way comparisons between similar (87.2%; 95% CI: 77.8-93.0), but specificity
ultrasonography, another imaging technique, and his- was substantially lower (79.2%; 95% CI: 72.8-84.4).
tology) (Table 2).67-71 Nine of the 49 studies compar- Correspondingly, the positive likelihood ratio was
ing ultrasound to histology also included data compar- lower (4.2; 95% CI: 3.3-5.4), but the negative likeli-
ing each ultrasonographic parameter (e.g., liver-to- hood ratio was unchanged (0.16; 95% CI: 0.09-0.28).
kidney contrast, deep beam attenuation, etc.) and his- Overall, the summary area under the ROC curve was
tology.25,26,31,34,38,40,49,61,62 Finally, 22 studies pro- the same as that for determining fatty liver versus not
vided data on intra- or interrater reliability (Support- (0.93; 95% CI: 0.91-0.95) (Fig. 2B).
ing Table 2).S1-S22 Meta-Analysis of Diagnostic Accuracy of Ultraso-
Meta-Analysis of Diagnostic Accuracy of Ultraso- nography Components Versus Histology. There was a
nography Versus Histology. Forty-nine studies, includ- wide variation in ultrasound parameters evaluated for
HEPATOLOGY, Vol. 54, No. 3, 2011 HERNAEZ ET AL. 1085

Table 1. Characteristics of the 44 Studies of Diagnostic Accuracy Comparing Ultrasound to histology, sorted by publication
year (*)
Author, year (reference) Country Setting Indication Ultrasound/Standard Liver Disease N

Gosink, 1979(18) USA Hospital Suspicion liver disease Mixed 23

Foster, 1980(19) UK Hospital Suspicion liver disease Mixed 60
Youssef, 1980(20) Finland Mixed (inpatient/outpatient) Mixed Mixed 62
Debongnie, 1981(21) Belgium N/R Suspicion liver disease Mixed 44
Spuhler, 1981(22) Germany Hospital N/R Mixed 310
Pirovino, 1982 (23) Switzerland Hospital Known liver disease Mixed 20
Pamilo, 1983 (24) Finland N/R Suspicion liver disease Mixed 24
Yajima, 1983 (25) * Japan N/R Known liver disease Mixed 28
Sanford, 1985(26) * Australia Hospital Known liver disease Mixed 125
Berrut, 1986 (27) Switzerland Hospital Suspicion liver disease N/R 38
Cusumano, 1986 (28) Italy N/R Suspicion liver disease Mixed 22
Needleman, 1986 (29) USA Hospital Known liver disease Mixed 96
Saverymuttu, 1986 (30) UK Outpatient clinic Suspicion liver disease Mixed 90
Tam, 1986 (31) * Taiwan Hospital N/R Mixed 113
Coulson, 1987 (32) UK Outpatient clinic Other Mixed 49
Forsberg, 1987 (33) Sweden N/R Mixed Mixed 24
Sato, 1987 (34) * Japan Mixed (inpatient/outpatient) Known liver disease Mixed 155
Savarino, 1987 (35) Italy Mixed (inpatient/outpatient) Known liver disease Mixed 90
Celle, 1988 (36) Italy Hospital Known liver disease Mixed 90
Lossner, 1988 (37) Germany Hospital Suspicion liver disease Mixed 187
Saitoh, 1988 (38) * Japan Hospital N/R Mixed 38
Yang, 1988 (39) Taiwan Hospital Known liver disease Mixed 90
Ferrari, 1989 (40) * Italy N/R Known liver disease Mixed 121
Nishimura, 1989 (41) Japan Mixed (inpatient/outpatient) Known liver disease Mixed 32
Joseph, 1991 (42) UK Outpatient clinic Suspicion liver disease Mixed 19
Bloom, 1992 (43) UK Mixed (inpatient/outpatient) Known liver disease Mixed 59
Castellano, 1993 (44) Spain Mixed (inpatient/outpatient) Mixed Mixed 46
Nagata, 1993 (45) Japan Hospital N/R Mixed 38
Cardi, 1997 (46) Italy N/R Suspicion liver disease Mixed 12
Kim, 2005 (52) Korea Hospital Other NAFLD 94
Palmentieri, 2006 (53) Italy N/R Suspicion liver disease Mixed 208
Riley, 2006 (54) USA Outpatient clinic Known liver disease Mixed 115
Hamaguchi, 2007 (55) Japan Hospital Suspicion liver disease NAFLD 94
Lee, 2007(56) Korea Hospital Other NAFLD 589
Perez, 2007 (57) USA Hospital Known liver disease Mixed 92
Saluena, 2007 (58) Spain N/R Suspicion liver disease NAFLD 87
Chen, 2008 (59) Taiwan N/R Known liver disease No NAFLD 108
De Moura Almeida, 2008 (60) Brazil Hospital Suspicion liver disease NAFLD 100
Ahmed, 2009 (61) * Egypt Outpatient clinic Known liver disease Mixed 35
Dasarathy, 2009 (62) * USA Outpatient clinic Suspicion liver disease Mixed 73
Guajardo-Salinas, 2009 (63) USA Outpatient clinic Other NAFLD 102
Soresi, 2009 (64) Italy N/R Mixed Mixed 150
Yamashiki, 2009 (65) Japan Hospital Health screening NAFLD 78
Lee, 2010 (66) Korea Outpatient clinic Health screening NAFLD 161

*n ¼ These studies provided data of the accuracy of individual ultrasound parameters compared to histology.

assessing fatty liver (data not shown). Of the 49 studies (88%) studies, deep beam attenuation in 30 (61%), ves-
with histology as a gold standard, parenchymal bright- sels in 28 (57%), liver-to-kidney contrast in 27 (55%),
ness was used as an ultrasound diagnostic criterion in 43 and gallbladder wall definition in 4 (8%) studies.

Table 2. Characteristics of the Five Studies of Diagnostic Accuracy Comparing Ultrasound to Another Imaging Technique,
Sorted by Publication Year
Author, year (reference) Country Setting Indication Ultrasound Liver Disease Standard N

Scatarige, 1984 (67) USA Mixed (inpatient/outpatient) Known liver disease Mixed CT 94
Pacifico, 2007 (68) Italy N/R Suspicion liver disease NAFLD MRI 100
Pozzato, 2008 (69) Italy Hospital N/R NAFLD MRI 60
Edens, 2009 (70) Netherland General population Other NAFLD MRS 18
Mancini, 2009 (71) Italy Outpatient clinic Other NAFLD MRS 40
1086 HERNAEZ ET AL. HEPATOLOGY, September 2011

Fig. 1. Overall sensitivity and specificity of ultrasound to detect moderate-1severe histologically defined fatty liver from the absence of steatosis.

In studies where the accuracy of ultrasonographic ficient (one study).S1-S22 Among studies reporting
parameters of fatty liver definition were evaluated indi- kappa statistics, the number of readers ranged from 1
vidually, sensitivities of liver to kidney contrast, vessel to 15. The range of kappa values for intrarater evalua-
wall brightness, and deep beam attenuation were 98% tion was 0.54-0.92 (six studies) and for the interrater
(75%-100%), 81% (70%-89%), and 59% (45%- evaluation was 0.44-1.00 (14 studies). Studies report-
72%), respectively. Specificity was similar for all com- ing reliability measures for individual components
ponents (range, 93%-95%) (Supporting Table 4). reported similar results across components (Supporting
Systematic Review of the Reliability of Ultraso- Table 5).S1-S22
nography. Twenty-two studies reported the reliability Meta-Analysis of Diagnostic Accuracy of Ultraso-
of ultrasound findings: kappa statistics (17 studies), nography Versus Other Imaging Techniques. We
coefficients of variation (three studies), percent dis- found five studies comparing ultrasound data to CT,
agreement (one study), and intraclass correlation coef- MRI, or magnetic resonance spectroscopy (MRS)

Fig. 2. Summary receiver-oper-

ating characteristic (ROC) curve
plots showing test accuracy of
ultrasound compared to histology
to distinguish between presence
versus absence of steatosis (A),
and presence of steatosis versus
everything else (B).
HEPATOLOGY, Vol. 54, No. 3, 2011 HERNAEZ ET AL. 1087

without histology, including a total of 215 adults. sis, and diagnosis, among others, showed similar
Ultrasound had an overall sensitivity of 93.6% (60.5- results and, therefore, allow the use of the pooled ac-
99.3), specificity of 80.1% (53.3-93.4), positive likeli- curacy estimates. Similar factors may have also contrib-
hood ratio of 4.71 (1.89-11.71), and negative likeli- uted to the variation of reliability estimates between
hood ratio of 0.08 (0.01-0.56). Only three studies had studies, including prevalence of cases with steatosis in
ultrasound,56,65,66 another imaging technique, and his- the study population, lack of standard protocol to per-
tology (Supporting Table 6),S23-S25 and ultrasound had form the evaluation, and the use of different criteria.
slightly better overall accuracy for detecting fatty liver, The potential role of ultrasound in clinical settings
compared to other techniques. and in population research is very important. In the cur-
rent obesity epidemic, the prevalence of fatty liver disease,
in particular NAFLD, is likely to increase, making it nec-
Discussion essary to use practical tools for measuring the burden of
Our meta-analysis shows that ultrasound is an accu- disease and tracking time trends. In the clinical context,
rate, reliable imaging technique for the detection of the number of patients at risk for fatty liver disease is
fatty liver, as compared with histology, with a pooled also increasing. There is thus a pressing need to have
sensitivity of 84.8%, a pooled specificity of 93.6% for readily available, accurate methods to assess the presence
detecting 20%-30% steatosis, and a summary area of fatty liver, and ultrasound compares favorably to alter-
under the ROC curve of 0.93. Because ultrasound is native noninvasive techniques. Liver enzymes, indirect
relatively inexpensive and accessible, compared to other markers of liver injury, have lower sensitivity (0.30-0.63)
diagnostic techniques, our results suggest that ultra- and specificity (0.38-0.63) than ultrasound.74 Indeed,
sound may be the imaging technique of choice for compared to liver enzymes, the use of ultrasound as a tri-
screening for the presence of fatty liver in clinical set- age test, applied early on to determine which patients
tings and, especially, population studies. The wide- should undergo further testing, would likely reduce the
spread use of ultrasound to detect fatty liver may help number of false-positive results and thus decrease the bur-
better identify the determinants and natural history of den of subsequent testing. Other imaging techniques
fatty liver disease in the general population and may (i.e., CT or MRI/MRS) have similar operating character-
help target interventions directed to reducing the com- istics, but are more expensive, and CT involves radiation,
plications associated with fatty liver. Indeed, though and therefore, their widespread usefulness is limited.
no U.S. Food and Drug Administration–approved Our systematic review had certain limitations. We
therapy exists for fatty liver, lifestyle changes,72 vitamin did not include other ultrasound techniques (e.g.,
E, and pioglitazone73 have shown some efficacy. Doppler and histogram) that would have allowed a
We found a relatively large number of studies using more objective quantification of fat. Also, we could not
ultrasound as the diagnostic method and liver biopsy assess the accuracy of ultrasound for the whole range of
as a gold standard, with a wide range of sensitivities fat accumulation and could not evaluate the perform-
(55%-100%) and specificities (26%-100%). These dif- ance of an ultrasound-based four-grade scale (i.e., nor-
ferences could be the result of a number of factors. mal, intermediate, moderate, and severe) in the detec-
First, technical quality and performance of the ultra- tion of fatty liver. We did not have individual patient
sound varied across studies. We included studies con- data, so we were not able to evaluate the performance
ducted from 1979 to 2010; during this time, techno- of ultrasound in key patient subgroups (e.g., by body
logical advances in ultrasound equipment have mass index or presence of subcutaneous fat thickness).
occurred and could potentially explain part of this var- Although we reported significant statistical heterogene-
iation. Second, the ultrasound criteria used to define ity, in multiple secondary analyses on the key clinical
fatty liver differed across studies. Third, although the variables, our inferences remained unchanged.
majority of the studies included patients who under- Our review shows that though ultrasound is useful
went liver biopsy with some suspicion of liver disease, for identifying fatty liver, additional research is needed
there was a wide range in severity of the underlying to better assess the performance of specific ultrasound
disease. Finally, the composition of the comparison criteria of individual parameters, in particular gallblad-
group (i.e., normal liver or other liver disease, such as der and vessel wall definition, to accurately and reli-
inflammation, fibrosis, or a combination of these) also ably detect fatty liver. Some parameters may be more
differed across studies, adding to the heterogeneity. reliable and justify the use of a more focused ultra-
Despite these differences, our sensitivity analyses, sound examination. In addition, future studies assess-
stratified by publication year, setting, degree of steato- ing the accuracy of ultrasound should aim to refine
1088 HERNAEZ ET AL. HEPATOLOGY, September 2011

