HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: mwe

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2403.00268v1 [cs.CV] 01 Mar 2024

Improving Acne Image Grading with Label Distribution Smoothing

Abstract

Acne, a prevalent skin condition, necessitates precise severity assessment for effective treatment. Acne severity grading typically involves lesion counting and global assessment. However, manual grading suffers from variability and inefficiency, highlighting the need for automated tools. Recently, label distribution learning (LDL) was proposed as an effective framework for acne image grading, but its effectiveness is hindered by severity scales that assign varying numbers of lesions to different severity grades. Addressing these limitations, we proposed to incorporate severity scale information into lesion counting by combining LDL with label smoothing, and to decouple if from global assessment. A novel weighting scheme in our approach adjusts the degree of label smoothing based on the severity grading scale. This method helped to effectively manage label uncertainty without compromising class distinctiveness. Applied to the benchmark ACNE04 dataset, our model demonstrated improved performance in automated acne grading, showcasing its potential in enhancing acne diagnostics. The source code is publicly available at http://github.com/openface-io/acne-lds.

Index Terms—  acne grading, label smoothing, label distribution learning

1 Introduction

Acne vulgaris, commonly known as acne, is a widespread skin condition that is estimated to affect over 700 million people worldwide, significantly impacting interpersonal relationships, social functioning, and mental health [1]. Accurate acne severity assessment is important for selecting the right treatment and as a clinical trial outcome [2]. However, manual severity grading by visual global assessment and lesion counting is time-consuming and susceptible to inter-observer variability [3]. Moreover, dermatologists are consistently in short supply, particularly in rural areas, and often cases are seen instead by general practitioners with lower diagnostic accuracy, while consultation costs are rising [4, 5]. Therefore, the use of automated tools for computer-aided acne severity assessment may be a promising alternative for broadening the availability of dermatology expertise [5].

Over the last two decades, multiple approaches to automated acne severity assesement from facial photos were proposed. Initially, these solutions relied on conventional image analysis [6], but the advent of deep learning’s breakthrough performance improvements in biomedical image analysis have shifted focus to its use for acne lesion detection [7], classification [8], counting [9], and severity grading [10].

In acne image grading, each photo is assigned a severity level and while over 20 different grading scales have been proposed over time, the medical community has yet to agree on standardized criteria [11]. Most grading scales rely on lesion counting as a quantifiable measure informative of severity. Recognizing the connection between lesion counting and global severity grading, Wu et al[12] introduced a unified framework that tackles both tasks simultaneously and published the benchmark dataset ACNE04 with annotated lesions. Utilizing label distribution learning (LDL) [13], their method assigns to each image two label distributions: one for quantifying lesion counts and another for classifying acne severity. The severity class labels are based on the Hayashi scale [14], which delineates acne severity into four levels based on lesion count ranges: 1–5 lesions is mild, 6–20 lesions is moderate, 21–50 lesions is severe, and 50+ lesions is very severe.

The approach proposed by Wu et al[12] generates the ground-truth label distribution for lesion counts independently of the severity scale, assuming the same levels of grade uncertainty for all lesion counts. However, the predicted lesion count is then converted into the severity grade prediction and evaluated against the labels generated per Hayashi severity scale [14]. This leads to varying grade uncertainty for different lesion counts based on the severity scale: for example, both 12 and 13 lesion counts confidently correspond to the moderate grade, while 20 and 21 lesions are assigned moderate and severe grades, accordingly.

At the same time, global severity assessment branch directly predicts severity grade distribution from an image. In contrast with the lesion counting task, accounting for the Hayashi scale is not beneficial here, because the scale delineates severity classes by uneven ranges of lesion counts (e.g., mild only includes 1–5 lesions, while moderate includes 6–20), making global prediction more challenging.

Refer to caption
Fig. 1: Piecewise linear weighting of the smoothing parameter ε𝜀\varepsilonitalic_ε used to control how much of label distribution is added to smooth the hard label. Near the class boundaries εmin=1subscript𝜀𝑚𝑖𝑛1\varepsilon_{min}=1italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT = 1, which corresponds to LDL, and near the mid-range ε=εmin𝜀subscript𝜀𝑚𝑖𝑛\varepsilon=\varepsilon_{min}italic_ε = italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT, which preserves the dominance of the original label value. The value εmin=0.6subscript𝜀𝑚𝑖𝑛0.6\varepsilon_{min}=0.6italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT = 0.6 is tuned using single-fold validation.
Refer to caption
Fig. 2: Description of the proposed approach. Smoothed label and label distribution are generated using ground-truth count label. After that label distribution is converted into 4 and 13 classes distributions which are used in calculating KL losses along with smoothed distribution. During prediction stage one combines predictions from severity prediction branch with results from counting branch.

Here, we proposed an approach that addresses these issues by incorporating severity scale information into generating label distributions for lesion counting, while simultaneously removing it from the direct severity grade classification.

Our first contribution can be viewed as a novel way to combine label smoothing [15] with LDL. We smooth a hard lesion count label with the Gaussian label distribution, such that the amount of smoothing depends on where this lesion count falls on the severity scale (Fig. 1). This is realized by introducing a parameter ε𝜀\varepsilonitalic_ε to control amount of smoothing applied to each lesion count based in its proximity to the grade range border. For the counts at grade range boundaries, we use ε=1𝜀1\varepsilon=1italic_ε = 1 to generate Gaussian label distributions, enabling a soft transition between classes. This corresponds to LDL, incorporating high grade uncertainty. But for object counts towards the middle of the grade range, we reduce the weight ε𝜀\varepsilonitalic_ε of the label distribution such that the original count label remains dominant compared to its neighbors. In such cases, the grade uncertainty is lower, which allows the model to calibrate predictions accordingly. For instance, an image with a lesion count of 34—well within the range of the severe class—generates a distribution with lesser amount of label smoothing to maintain a highly confident grade prediction (Fig. 1). This hybrid approach ensures that our model accounts for the inherent uncertainty in the counting task without diluting the distinctness of each class.

For the classification branch, we reduce the complexity of the task by breaking down Hayashi-defined uneven grade ranges into evenly-sized classes such that each class range contains exactly five lesion counts. We demonstrate that our approach improves the results of automated acne grading on the benchmark dataset indicating the potential to improve diagnostics of acne.

2 Method

Let xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the i𝑖iitalic_i-th image out of the training set of size N𝑁Nitalic_N with the corresponding ground-truth lesion count annotation zi{1,2,Z}subscript𝑧𝑖12𝑍z_{i}\in\{1,2,\dots Z\}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 1 , 2 , … italic_Z }, where Z𝑍Zitalic_Z is the maximum lesion count, and the severity level yi[1,2,Y]subscript𝑦𝑖12𝑌y_{i}\in[1,2,\dots Y]italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 1 , 2 , … italic_Y ], where Y𝑌Yitalic_Y represents the number of distinct severity grades. Overall architecture follows [12], except for changes described below (see Fig. 2).

2.1 Gaussian label distribution generation

Wu et al[12] used the Gaussian function to generate the lesion count label distribution. For the particular acne count label cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and image xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT they defined the description degree as:

dxicj=d(cj|xi)=12πσ2Mexp((cjzi)22σ2),superscriptsubscript𝑑subscript𝑥𝑖subscript𝑐𝑗𝑑conditionalsubscript𝑐𝑗subscript𝑥𝑖12𝜋superscript𝜎2𝑀superscriptsubscript𝑐𝑗subscript𝑧𝑖22superscript𝜎2d_{x_{i}}^{c_{j}}=d(c_{j}|x_{i})=\frac{1}{\sqrt{2\pi\sigma^{2}}M}\exp\left({-% \frac{\left(c_{j}-z_{i}\right)^{2}}{2\sigma^{2}}}\right),italic_d start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_d ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_M end_ARG roman_exp ( - divide start_ARG ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , (1)

where j{1,2,,Z}𝑗12𝑍j\in\{1,2,\dots,Z\}italic_j ∈ { 1 , 2 , … , italic_Z } and M𝑀Mitalic_M is the normalization factor:

M=12πσ2j=1Zexp((cjzi)22σ2),𝑀12𝜋superscript𝜎2superscriptsubscript𝑗1𝑍superscriptsubscript𝑐𝑗subscript𝑧𝑖22superscript𝜎2M=\frac{1}{\sqrt{2\pi\sigma^{2}}}\sum\limits_{j=1}^{Z}\exp\left({-\frac{\left(% c_{j}-z_{i}\right)^{2}}{2\sigma^{2}}}\right),italic_M = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , (2)

such that dxicj[0,1]superscriptsubscript𝑑subscript𝑥𝑖subscript𝑐𝑗01d_{x_{i}}^{c_{j}}\in\left[0,1\right]italic_d start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∈ [ 0 , 1 ] and j=1Zdxicj=1superscriptsubscript𝑗1𝑍superscriptsubscript𝑑subscript𝑥𝑖subscript𝑐𝑗1\sum\limits_{j=1}^{Z}d_{x_{i}}^{c_{j}}=1∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 1.

2.2 Label smoothing

Label smoothing [15] was proposed to soften the hard label in the training process to prevent overconfidence and improve generalization. Consider one particular image x𝑥xitalic_x with ground-truth label ygtsubscript𝑦𝑔𝑡y_{gt}italic_y start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT that is one-hot encoded as q(k|x)=δk,ygt𝑞conditional𝑘𝑥subscript𝛿𝑘subscript𝑦𝑔𝑡q(k|x)=\delta_{k,y_{gt}}italic_q ( italic_k | italic_x ) = italic_δ start_POSTSUBSCRIPT italic_k , italic_y start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then the original label can be replaced with a distribution:

q(k|x)=(1ε)δk,ygt+εu(k|x),superscript𝑞conditional𝑘𝑥1𝜀subscript𝛿𝑘subscript𝑦𝑔𝑡𝜀𝑢conditional𝑘𝑥q^{\prime}(k|x)=(1-\varepsilon)\delta_{k,y_{gt}}+\varepsilon u(k|x),italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_k | italic_x ) = ( 1 - italic_ε ) italic_δ start_POSTSUBSCRIPT italic_k , italic_y start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_ε italic_u ( italic_k | italic_x ) , (3)

where u(k|x)𝑢conditional𝑘𝑥u(k|x)italic_u ( italic_k | italic_x ) is usually the uniform distribution u(k|x)=1K𝑢conditional𝑘𝑥1𝐾u(k|x)=\frac{1}{K}italic_u ( italic_k | italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG, where K𝐾Kitalic_K is the number of classes. As the result, the true label description degree will be reduced, while the other classes will obtain non-zero values.

2.3 Scale-adaptive label distribution smoothing

To obtain more confident predictions for the mid-range counts, while maintaining higher grade uncertainty for counts near the grade border, we propose a methods that combines Gaussian label distribution generation with confident labels via a label smoothing-like weighting scheme (see Fig. 1). We achieve this in two steps. First, we replace the uniform distribution in eq. (3) with the generated label distribution from eq. (1). This limits redistribution of confidence from the hard label to its surrounding neighbors, unlike the traditional label smoothing that assigns some small description degrees to all labels. Second, we introduce piecewise-linear schedule for the smoothing parameter ε𝜀\varepsilonitalic_ε in order to control the weight of the label distribution base on the count label location in the grading scale, as illustrated on Fig. 1. Now we can replace eq. (3) with the following:

q(cj|xi)=[1ε(cj)]q(cj|xi)+ε(cj)d(cj|xi),superscript𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖delimited-[]1𝜀subscript𝑐𝑗𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖𝜀subscript𝑐𝑗𝑑conditionalsubscript𝑐𝑗subscript𝑥𝑖q^{\prime}(c_{j}|x_{i})=[1-\varepsilon(c_{j})]q(c_{j}|x_{i})+\varepsilon(c_{j}% )d(c_{j}|x_{i}),italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [ 1 - italic_ε ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] italic_q ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_ε ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_d ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (4)

where q(cj|xi)𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖q(c_{j}|x_{i})italic_q ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the one-hot encoded ground-truth label, q(cj|xi)superscript𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖q^{\prime}(c_{j}|x_{i})italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the smoothed label distribution. Near the class border ε=1𝜀1\varepsilon=1italic_ε = 1, which corresponds to LDL, whereas for the mid-range labels εminε(cj)<1subscript𝜀𝑚𝑖𝑛𝜀subscript𝑐𝑗1\varepsilon_{min}\leq\varepsilon(c_{j})<1italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ≤ italic_ε ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) < 1 (εminsubscript𝜀𝑚𝑖𝑛\varepsilon_{min}italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT is a hyperparameter), which is more similar to the traditional label smoothing.

2.4 Lesion counting branch

We replace dxicjsuperscriptsubscript𝑑subscript𝑥𝑖subscript𝑐𝑗d_{x_{i}}^{c_{j}}italic_d start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT with q(cj|xi)superscript𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖q^{\prime}(c_{j}|x_{i})italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) from eq. (4) in the loss function that is the Kullback–Leibler (KL) divergence between the generated and predicted distributions eq. (2):

cnt(xi,zi)=j=1Zq(cj|xi)lnpcnt(cj|xi,𝜽)q(cj|xi),subscript𝑐𝑛𝑡subscript𝑥𝑖subscript𝑧𝑖superscriptsubscript𝑗1𝑍superscript𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖subscript𝑝𝑐𝑛𝑡conditionalsubscript𝑐𝑗subscript𝑥𝑖𝜽superscript𝑞conditionalsubscript𝑐𝑗subscript𝑥𝑖\mathcal{L}_{cnt}(x_{i},z_{i})=-\sum\limits_{j=1}^{Z}q^{\prime}(c_{j}|x_{i})% \ln\frac{p_{cnt}(c_{j}|x_{i},\boldsymbol{\theta})}{q^{\prime}(c_{j}|x_{i})},caligraphic_L start_POSTSUBSCRIPT italic_c italic_n italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_ln divide start_ARG italic_p start_POSTSUBSCRIPT italic_c italic_n italic_t end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) end_ARG start_ARG italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , (5)

where the probability of image xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT belonging to class cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is:

pcnt(cj|xi,𝜽)=exp(θcj)/lexp(θl).subscript𝑝𝑐𝑛𝑡conditionalsubscript𝑐𝑗subscript𝑥𝑖𝜽subscript𝜃subscript𝑐𝑗subscript𝑙subscript𝜃𝑙p_{cnt}(c_{j}|x_{i},\boldsymbol{\theta})=\exp{(\theta_{c_{j}})}/\sum\limits_{l% }\exp{(\theta_{l})}.italic_p start_POSTSUBSCRIPT italic_c italic_n italic_t end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) = roman_exp ( italic_θ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT roman_exp ( italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) . (6)

Following [12], we also convert count label distributions and their predictions into severity labels and predictions by summing up corresponding probabilities by the Hayashi scale.

Metric Wu et al. [12] LD smoothing New class ranges Both
Accuracy 83.70 ±plus-or-minus\pm± 1.53 83.90 ±plus-or-minus\pm± 1.48 83.63 ±plus-or-minus\pm± 1.32 84.11 ±plus-or-minus\pm± 1.94
Precision 82.97 ±plus-or-minus\pm± 1.27 83.38 ±plus-or-minus\pm± 3.02 82.63 ±plus-or-minus\pm± 2.27 83.11 ±plus-or-minus\pm± 2.56
Specificity 93.76 ±plus-or-minus\pm± 0.63 93.81 ±plus-or-minus\pm± 0.473 93.75 ±plus-or-minus\pm± 0.42 93.99 ±plus-or-minus\pm± 0.68
Sensitivity 81.06 ±plus-or-minus\pm± 3.46 81.21 ±plus-or-minus\pm± 2.29 81.47 ±plus-or-minus\pm± 2.88 81.53 ±plus-or-minus\pm± 2.95
Youden Index 74.83 ±plus-or-minus\pm± 4.06 75.02 ±plus-or-minus\pm± 2.75 75.22 ±plus-or-minus\pm± 3.28 75.52 ±plus-or-minus\pm± 3.61
MCC 75.41 ±plus-or-minus\pm± 2.35 75.69 ±plus-or-minus\pm± 2.18 75.32 ±plus-or-minus\pm± 1.98 76.16 ±plus-or-minus\pm± 2.82
Table 1: Evaluation results on the ACNE04 dataset [12]
Refer to caption
Fig. 3: We convert Hayashi scale-based grade ranges into evenly-spaced ones to simplify direct image severity grading.

2.5 Severity prediction branch

Since severity grading branch is independent of lesion counting, we can convert Hayashi-based severity grade labels into evenly-spaced ones, see Fig.3. The severity label distribution is generated according to new classes instead of the Hayashi scale. Then the severity prediction loss function follows:

cls(xi,yi)=k=1Yd(sk|xi)lnpcls(sk|xi,𝜽)d(sk|xi),subscript𝑐𝑙𝑠subscript𝑥𝑖subscript𝑦𝑖superscriptsubscript𝑘1superscript𝑌𝑑conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖subscript𝑝𝑐𝑙𝑠conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖𝜽𝑑conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖\mathcal{L}_{cls}(x_{i},y_{i})=-\sum\limits_{k=1}^{Y^{\prime}}d(s^{\prime}_{k}% |x_{i})\ln\frac{p_{cls}(s^{\prime}_{k}|x_{i},\boldsymbol{\theta})}{d(s^{\prime% }_{k}|x_{i})},caligraphic_L start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_d ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_ln divide start_ARG italic_p start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) end_ARG start_ARG italic_d ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , (7)

where probability of image xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to belonging to sksubscriptsuperscript𝑠𝑘s^{\prime}_{k}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT class is:

pcls(sk|xi,𝜽)=exp(θsk)/lexp(θl),subscript𝑝𝑐𝑙𝑠conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖𝜽subscript𝜃subscriptsuperscript𝑠𝑘subscript𝑙subscript𝜃𝑙p_{cls}(s^{\prime}_{k}|x_{i},\boldsymbol{\theta})=\exp{(\theta_{s^{\prime}_{k}% })}/\sum\limits_{l}\exp{(\theta_{l})},italic_p start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) = roman_exp ( italic_θ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT roman_exp ( italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) , (8)

and d(sk|xi)𝑑conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖d(s^{\prime}_{k}|x_{i})italic_d ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the new severity description degree.

2.6 Combined loss function

To combine severity grade assessment from the counting branch with direct global grading using the severity prediction branch, we train the model using a multi-task loss function defined as:

i(xi,yi,zi)=(1λ)cnt(xi,zi)++λ2(cls(xi,yi)+cnt2cls(xi,yi)),subscript𝑖subscript𝑥𝑖subscript𝑦𝑖subscript𝑧𝑖1𝜆subscript𝑐𝑛𝑡subscript𝑥𝑖subscript𝑧𝑖𝜆2subscript𝑐𝑙𝑠subscript𝑥𝑖subscript𝑦𝑖subscript𝑐𝑛𝑡2𝑐𝑙𝑠subscript𝑥𝑖subscript𝑦𝑖\begin{split}\mathcal{L}_{i}(x_{i},y_{i},z_{i})=(1-\lambda)\mathcal{L}_{cnt}(x% _{i},z_{i})+\\ +\frac{\lambda}{2}\left(\mathcal{L}_{cls}(x_{i},y_{i})+\mathcal{L}_{cnt2cls}(x% _{i},y_{i})\right),\end{split}start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( 1 - italic_λ ) caligraphic_L start_POSTSUBSCRIPT italic_c italic_n italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + end_CELL end_ROW start_ROW start_CELL + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ( caligraphic_L start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_c italic_n italic_t 2 italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) , end_CELL end_ROW (9)

where λ𝜆\lambdaitalic_λ is the trade-off hyperparameter.

At the prediction stage, class probabilities pcls(sk|xi,𝜽)subscript𝑝𝑐𝑙𝑠conditionalsubscriptsuperscript𝑠𝑘subscript𝑥𝑖𝜽p_{cls}(s^{\prime}_{k}|x_{i},\boldsymbol{\theta})italic_p start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) for the new set of classes are converted back to the original Hayashi class probabilities pcls(sk|xi,𝜽)subscript𝑝𝑐𝑙𝑠conditionalsubscript𝑠𝑘subscript𝑥𝑖𝜽p_{cls}(s_{k}|x_{i},\boldsymbol{\theta})italic_p start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) using the reverse mapping, see Fig. 2. After that, the final predicted distribution is obtained by averaging predicted class and counting probability distributions:

ptot(𝐬|xi,𝜽)=12(p~cls(𝐬|xi,𝜽)+pcls(𝐬|xi,𝜽)).subscript𝑝𝑡𝑜𝑡conditional𝐬subscript𝑥𝑖𝜽12subscript~𝑝𝑐𝑙𝑠conditional𝐬subscript𝑥𝑖𝜽subscript𝑝𝑐𝑙𝑠conditional𝐬subscript𝑥𝑖𝜽p_{tot}(\textbf{s}|x_{i},\boldsymbol{\theta})=\frac{1}{2}\left(\tilde{p}_{cls}% (\textbf{s}|x_{i},\boldsymbol{\theta})+p_{cls}(\textbf{s}|x_{i},\boldsymbol{% \theta})\right).italic_p start_POSTSUBSCRIPT italic_t italic_o italic_t end_POSTSUBSCRIPT ( s | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( s | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) + italic_p start_POSTSUBSCRIPT italic_c italic_l italic_s end_POSTSUBSCRIPT ( s | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) ) . (10)

3 Experiments and results

3.1 Dataset and evaluation details

We evaluate the proposed approach using the ACNE04 benchmarking dataset [12]. It contains 1,45714571,4571 , 457 images with 18,9831898318,98318 , 983 bounding boxes of lesions. For evaluating, the dataset is split into 80% training set and 20% testing set, containing 1,16511651,1651 , 165 and 292292292292 images, respectively.

Considering accurate acne severity grading as the ultimate goal, we focus on classification metrics to evaluate model performance. In addition to accuracy, precision, specificity, sensitivity, and Youden Index reported by Wu et al. [12], we also added Matthews correlation coefficient (MCC) [16] that has recently been reported to have advantages over other classification metrics [17]. During training, we use maximum validation MCC to select the best epoch for saving the model state for further evaluation.

3.2 Implementation details

We were unable to exactly reproduce the results from the original paper by Wu et al. [12]. Therefore, we re-trained their LDL model from scratch using provided source code to ensure fair comparison. We use exactly the same ResNet-50 [18] architecture and training schedule, including the pre-defined 5555-fold cross validation. We start calculating evaluation metrics after the first learning rate decay event. We tuned several hyperparameters using a single-fold validation, including the standard deviation σ=3.0𝜎3.0\sigma=3.0italic_σ = 3.0 in eq. (1), εmin=0.6subscript𝜀𝑚𝑖𝑛0.6\varepsilon_{min}=0.6italic_ε start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT = 0.6 in eq. 4, and the trade-off parameter λ=0.3𝜆0.3\lambda=0.3italic_λ = 0.3 balancing counting and grading tasks in eq. (9).

3.3 Results and ablations

As shown in Table 1, we compared performance of the baseline approach with both of the proposed contributions and their combination. Smoothing labels with generated lesion count label distributions in the scale-adaptive fashion (’LD smoothing’ column) immediately demonstrated performance improvement across all metrics. While the use of evenly-sized class ranges in the severity grading branch showed no obvious improvement when applied independently (’New class ranges’ column), the combination of both techniques resulted in further performance boost. This indicates that the combination of these two components benefits from their complimentary. The label distribution smoothing method effectively handles the uncertainty at the class boundaries and provides a more nuanced approach to learning the relationship between lesion counts and severity grading, while the simplified class definitions offer a straightforward image grading process for the model. Together, they balance detail-oriented and global approaches, enhancing overall performance.

4 Conclusion

In this work, we introduced an automated acne image grading method that combines smoothing lesion count labels by label distributions based on the severity grading scale and simplifying severity class definitions to enhance global acne grading. Our results demonstrate the synergy of these strategies, boosting grading accuracy and promising a step forward in automated acne diagnostics. The novel technique of smoothing hard labels by label distributions instead of the uniform distribution is general and potentially applicable beyond acne grading, for example, for grading tumor malignancy.

5 Compliance with Ethical Standards

This research study was conducted retrospectively using human subject data made available in open access [12].

6 Acknowledgments

The authors thank Natalia Martynova for valuable discussions and other support in development of this project.

References

  • [1] AM Layton, D Thiboutot, and J Tan, “Reviewing the global burden of acne: how could we improve care to reduce the burden?,” British Journal of Dermatology, vol. 184, no. 2, pp. 219–225, 2021.
  • [2] DM Thiboutot, AM Layton, M-M Chren, EA Eady, and J Tan, “Assessing effectiveness in acne clinical trials: steps towards a core outcome measure set,” British Journal of Dermatology, vol. 181, no. 4, pp. 700–706, 2019.
  • [3] Anne W Lucky, Beth L Barber, Cynthia J Girman, Jody Williams, Joan Ratterman, and Joanne Waldstreicher, “A multirater validation study to assess the reliability of acne lesion counting,” Journal of the American Academy of Dermatology, vol. 35, no. 4, pp. 559–565, 1996.
  • [4] Jack Resneck Jr and Alexa B Kimball, “The dermatology workforce shortage,” Journal of the American Academy of Dermatology, vol. 50, no. 1, pp. 50–54, 2004.
  • [5] Yuan Liu, Ayush Jain, Clara Eng, David H. Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, Vishakha Gupta, Nalini Singh, Vivek Natarajan, Rainer Hofmann-Wellenhof, Greg S. Corrado, Lily H. Peng, Dale R. Webster, Dennis Ai, Susan J. Huang, Yun Liu, R. Carter Dunn, and David Coz, “A deep learning system for differential diagnosis of skin diseases,” Nature Medicine, vol. 26, no. 6, pp. 900–908, 2020.
  • [6] Roshaslinie Ramli, Aamir Saeed Malik, Ahmad Fadzil Mohamad Hani, and Adawiyah Jamil, “Acne analysis, grading and computational assessment methods: An overview,” Skin Research and Technology, vol. 18, no. 1, pp. 1–14, 2012.
  • [7] Thanapha Chantharaphaichi, Bunyarit Uyyanonvara, Chanjira Sinthanayothin, and Akinori Nishihara, “Automatic acne detection for medical treatment,” in IC-ICTES. 2015, IEEE.
  • [8] Nasim Alamdari, Kouhyar Tavakolian, Minhal Alhashim, and Reza Fazel-Rezai, “Detection and classification of acne lesions in acne patients: A mobile application,” in EIT. 2016, IEEE.
  • [9] Gabriele Maroni, Michele Ermidoro, Fabio Previdi, and Glauco Bigini, “Automated detection, extraction and counting of acne lesions for automatic evaluation and tracking of acne severity,” in SSCI. 2017, IEEE.
  • [10] Sophie Seité, Amir Khammari, Michael Benzaquen, Dominique Moyal, and Brigitte Dréno, “Development and accuracy of an artificial intelligence algorithm for acne grading from smartphone photographs,” Experimental Dermatology, vol. 28, no. 11, pp. 1252–1257, 2019.
  • [11] Tamara Agnew, Gareth Furber, Matthew Leach, and Leonie Segal, “A comprehensive critique and review of published measures of acne severity,” The Journal of Clinical and Aesthetic Dermatology, vol. 9, no. 7, pp. 40–52, 2016.
  • [12] Xiaoping Wu, Ni Wen, Jie Liang, Yu Kun Lai, Dongyu She, Ming Ming Cheng, and Jufeng Yang, “Joint acne image grading and counting via label distribution learning,” in ICCV. 2019, pp. 10641–10650, IEEE/CVF.
  • [13] Xin Geng, “Label distribution learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1734–1748, 2016.
  • [14] Nobukazu Hayashi, Hirohiko Akamatsu, Makoto Kawashima, and Acne Study Group, “Establishment of grading criteria for acne severity,” The Journal of Dermatology, vol. 35, no. 5, pp. 255–260, 2008.
  • [15] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna, “Rethinking the Inception architecture for computer vision,” in CVPR. 2016, pp. 2818–2826, IEEE/CVF.
  • [16] Brian W Matthews, “Comparison of the predicted and observed secondary structure of t4 phage lysozyme,” Biochimica et Biophysica Acta (BBA)-Protein Structure, vol. 405, no. 2, pp. 442–451, 1975.
  • [17] Davide Chicco and Giuseppe Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC genomics, vol. 21, no. 1, pp. 1–13, 2020.
  • [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in CVPR. 2016, pp. 770–778, IEEE/CVF.