Breast Cancer Detection Using Extreme

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Received December 11, 2018, accepted January 1, 2019, date of publication January 16, 2019, date of current version

August 14, 2019.


Digital Object Identifier 10.1109/ACCESS.2019.2892795

Breast Cancer Detection Using Extreme


Learning Machine Based on Feature
Fusion With CNN Deep Features
ZHIQIONG WANG1,2 , MO LI 3 , HUAXIA WANG4 , HANYU JIANG 4 ,
YUDONG YAO 4 , (Fellow, IEEE), HAO ZHANG5 , AND JUNCHANG XIN 3
1 Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110169, China
2 Neusoft Research of Intelligent Healthcare Technology, Co. Ltd., Shenyang 110179, China
3 Key Laboratory of Big Data Management and Analytics (Liaoning), School of Computer Science and Engineering, Northeastern University,

Shenyang 110169, China


4 Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA
5 Department of Breast Surgery, Shengjing Hospital of China Medical University, Shenyang 110004, China

Corresponding author: Junchang Xin ([email protected])


This work was supported in part by the National Natural Science Foundation of China under Grant 61472069, Grant 61402089, and
Grant U1401256, in part by the China Postdoctoral Science Foundation under Grant 2018M641705, in part by the Fundamental Research
Funds for the Central Universities under Grant N161602003, Grant N161904001, and Grant N160601001, and in part by the Open
Program of Neusoft Research of Intelligent Healthcare Technology, Co. Ltd. under Grant NRIHTOP1802.

ABSTRACT A computer-aided diagnosis (CAD) system based on mammograms enables early breast
cancer detection, diagnosis, and treatment. However, the accuracy of the existing CAD systems remains
unsatisfactory. This paper explores a breast CAD method based on feature fusion with convolutional neural
network (CNN) deep features. First, we propose a mass detection method based on CNN deep features
and unsupervised extreme learning machine (ELM) clustering. Second, we build a feature set fusing
deep features, morphological features, texture features, and density features. Third, an ELM classifier is
developed using the fused feature set to classify benign and malignant breast masses. Extensive experiments
demonstrate the accuracy and efficiency of our proposed mass detection and breast cancer classification
method.

INDEX TERMS Mass detection, computer-aided diagnosis, deep learning, fusion feature, extreme learning
machine.

I. INTRODUCTION (b) then, extracting features of the tumor based on expert


Breast cancer is a serious threat to women’s life and health, knowledge, such as shape, texture, and density, to manually
and the morbidity and mortality of breast cancer are ranked generate feature vectors. (c) finally, diagnosing benign and
first and second out of all female diseases [1]. Early detection malignant tumors by classifying these feature vectors [6], [7].
of lumps can effectively reduce the mortality rate of breast Although the classical diagnosis method has been com-
cancer [2]. The mammogram is widely used in early screen- monly used, its accuracy still needs to be improved [8]. The
ing of breast cancer due to its relatively low expense and quality of the handcrafted feature set directly affects the diag-
high sensitivity to minor lesions [3]. In the actual diagnosis nostic accuracy, and hence an experienced doctor plays a very
process, however, the accuracy can be negatively affected important role in the process of manual feature extraction.
by many factors, such as radiologist fatigue and distrac- The commonly used features, including morphology, texture,
tion, the complexity of the breast structure, and the sub- density and other characteristics are manual set, which are
tle characteristics of the early-stage disease [4], [5]. The obtained based on doctors‘ experience, that is, subjective
computer-aided diagnosis (CAD) for breast cancer can help features. In recent years, deep learning methods, such as
address this issue. the convolutional neural network (CNN), that can extract
The classical CAD for breast cancer contains three steps: hierarchical features from image data without the manual
(a) finding the Region of Interest (ROI) in the preprocessed selection, which is also called objective features, have been
mammogram, and hence locating the region of the tumor. successfully applied with a great improvement on accuracies
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
105146 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

in many applications, such as image recognition, speech local thresholding segmentation method on the original mam-
recognition, and natural language processing [9], [10]. There mogram. Yap et al. [15] used three different deep learning
are some shortcomings in either subjective or objective fea- methods to detect lession in breast ultrasound images based
tures. Subjective features ignore the essential attributes of on a Patch-based LeNet, a U-Net, and a transfer learning
images, while objective features ignore artificial experience. approach with a pretrained FCN-AlexNet, respectively.
Therefore, the subjective and objective features are fused In the aspect of differentiating benign and malignant breast
so that these features can reflect the essential properties of tumors, Kahn, Jr., et al. [16] created a Bayesian network
the image as well as the artificial experience. Meantime, that utilizes 2 physical features and 15 manually marked
Extreme Learning Machine (ELM) has better classification probabilistic characteristics to conduct the computer-aided
effect on multi-dimensional features than other classifiers diagnosis for breast cancer. Wang et al. [17] utilized ELM
including SVM, decision tree, etc., based on our previous to classify features of breast tumors and compare results
research. Thus, we use ELM to classify the extracted breast with SVM classifier. Qiu et al. [18] applied CNN to the risk
mass features. Therefore, in this paper, we propose a novel prediction of breast cancer by training CNN with a large
diagnosis method that merges several deep features. The main amount of time series data. Sun et al. [19] also used deep
contributions are shown as follows. neural networks to predict the risk of breast cancer in the
• In the detection phase, we propose a method that utilizes near term based on 420 time series records of mammography.
CNN and US-ELM for feature extraction and clustering, Jiao et al. [20] proposed a deep feature based framework for
respectively. First, a mammogram will be segmented breast masse classification, in which CNN and a decision tree
into several sub-regions. Then, CNN is used to extract process are utilized. Arevalo et al. [21] used CNN to abstract
features based on each sub-region, followed by utilizing representations of breast tumor and then classified the tumor
US-ELM to cluster features of sub-regions, which even- as either benign or malignant. Carneiro et al. [22] proposed
tually locate the region of breast tumor. an automated mammogram analysis method based on deep
• In the phase of feature integration, we design an 8-layer learning to estimate the risk of patients of developing breast
CNN architecture and obtain 20 deep features. In addi- cancer. Kumar et al. [23] presented an image retrieval system
tion, we integrate extra 5 shape features, 5 texture fea- using Zernike moments (ZMs) for extracting features since
tures and 7 density features of the tumor with those deep the features can affect the effectiveness and efficiency of a
features to form a fusion deep feature set. breast CAD system. Aličković and Subasi [24] proposed a
• In the diagnosis phase, we use the fusion deep feature set breast CAD method, in which genetic algorithms are used
of each mammogram as an input of ELM for classifica- for extraction of informative and significant features, and the
tion. The output directly indicates whether the patient rotation forest is used to make a decision for two different
has either a benign or a malignant breast tumor. categories of subjects with or without breast cancer.
• Finally, the experimental results demonstrate that our
proposed methods, the sub-regional US-ELM clustering III. METHODS
and the ELM classification with fusion deep feature sets, In this paper, we consider the following five steps in breast
achieve the best performance in the diagnosis of breast cancer detection: breast image preprocessing, mass detection,
cancer. The experimental dataset contains 400 female feature extraction, training data generation, and classifier
mammograms. training. In the breast image preprocessing, denoising and
enhancing contrast processes on the original mammogram
II. RELATED WORK have been utilized to increase the contrast between the masses
The research efforts related to breast cancer CAD mainly and the surrounding tissues. The mass detection is then per-
focus on the detection and diagnosis of breast tumor. formed to localize the ROI. After that, features including
This section briefly summarizes existing works related to deep features, morphological features, texture features and
these two aspects. In the aspect of breast tumor detection, density features, are extracted from the ROI. During the
Sun et al. [11] proposed a mass detection method, where training process, the classifiers have been trained with every
an adaptive fuzzy C-means algorithm for segmentation is image from the breast image dataset using their extracted
employed on each mammogram of the same breast. A super- features and corresponding labels. Thus, the mammogram
vised artificial neural network is used as a classifier to judge underdiagnosis can be identified using the well-trained clas-
whether the segmented area is a tumor. Saidin et al. [12] sifiers. Fig. 1 presents the flowchart of the entire diagnosis
employed pixels as an alternative feature and used a region process.
growing method to segment breast tumor in the mammogram.
Xu et al. [13] proposed an improved watershed algorithm. A. BREAST IMAGE PREPROCESSING
They first make a coarse segmentation of breast tumor, fol- There are several preprocessing methods in [25]–[29]. The
lowed by the image edge detection via combining regions adaptive mean filter algorithm [25] is selected to eliminate
that has similar gray-scale mean values. Hu et al. [14] pro- noise on the original mammograms in order to avoid the
posed a novel algorithm to detect suspicious masses in the impact of noise on subsequent auxiliary diagnosis. The main
mammogram, in which they utilized an adaptive global and idea is to use a fixed size window sliding in the line direction

VOLUME 7, 2019 105147


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

FIGURE 1. Flowchart of mass detection process.

of the image, calculating the mean, variance, and spatial cor- into several non-overlapping sub-regions using a sliding win-
relation values of each sliding window to determine whether dow. After that, we determine whether all sub-regions have
the window contains noise. If the noise is detected, replace been successfully traversed. If yes, extract the deep features
the pixel values of the selected window with the mean value. of the sub-regions; otherwise, clustering the deep features of
In this paper, a contrast enhancement algorithm [30] has the sub-regions, obtain the mass area boundary and complete
been used to increase the contrast between the suspected the mass detection process.
masses and surrounding tissues. The main idea is to trans-
form the histogram of the original image into uniformly 1) EXTRACT ROI
distributed. After this process, the gray scale of the image is In mammography, there are a large number of 0 gray value
enlarged, thus the contrast has been enhanced and the image areas, which have no impact on the breast CAD. In order
details become more clear. to improve the mammary image processing efficiency and
Fig. 2 shows the mammogram before and after preprocess- ensure the accuracy of follow-up diagnosis, it is necessary
ing. Fig. 2(a) is the original image, Fig. 2(b) and 2(c) are to separate the mammary area from the whole mammogram
the images after denosing and enhancing contrast processes, as ROI.
respectively. By comparing these three images, we can see In this paper, an adaptive mass region detection algorithm
that the boundaries of the mammary and the background area has been utilized to extract the breast mass region. Specif-
in the original image (Fig. 2(a)) are often ambiguous and ically, in a mammogram, all rows are scanned sequentially
irregular. After preprocessing, the contrast between the mam- to find the first nonzero pixel (with abscissa denoted as xs )
mary region and the background area has been significantly and the last nonzero pixel (with abscissa denoted as xd ),
enhanced, thus reducing considerable computational burden and all columns are then scanned sequentially to find the
in image post-processing. first nonzero pixel (with ordinate denoted as ys ) and the last
nonzero pixel (with ordinate denoted as yd ). Algorithm 1
B. MASS DETECTION presents the details of this algorithm, the size of the mam-
The purpose of mass detection is to extract the mass region mography I is m × n.
from the normal tissues. The more precise the mass segmen-
tation, the more accurate the extracted features. In this paper, 2) PARTITION SUB-REGION
we propose a mass detection method based on sub-domain In this section, a method to divide the ROI into several
CNN deep features and US-ELM clustering. The processing non-overlapping sub-regions is proposed. The searching area
flowchart is shown in Fig. 3. The first step is to extract the ROI to determine the masses from the ROI is fixed in a rect-
from the images after preprocessing. The ROI is then divided angular area [xs , ys , xd , yd ], where the length of the search-

105148 VOLUME 7, 2019


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

FIGURE 2. Images after denoising and enhancement. (a) Initial mammogram. (b) Denoising after (a). (c) Enhancement after (b).

Algorithm 1 Self-Adaptive Mass Region Detection


Algorithm
1 Input: Mammography I
2 Output: Mass area M
3 for i = 1 to L do
4 find the first and the last nonzero pixel xs , xd .
5 end for
6 for i = 1 to n do
7 find the first and the last nonzero pixel ys , yd .
8 end for
9 Cut off a rectangle M by the coordinates (xs , xd ) and
(ys , yd ).
10 Return M .

FIGURE 4. Using sliding window to divide the ROI.

FIGURE 3. Flowchart of mass detection process.

the sliding window (w × h) is moved with a certain step


size, traversing the searching area without crossing the ROI
ing rectangular is W = xd − xs and width is H = boundary. Thus, the ROI is divided into several equal size
yd − ys . The rectangular searching area is segmented using (w × h), non-overlapping sub-regions and such sub-regions
a sliding window with length w and width h (W ≥ w, will serve as the basis for subsequent feature extraction.
H ≥ h). The segmentation procedure can be performed as In this paper, the size of the sliding window is fixed as
follows. 48 × 48 and the searching step size is equal to 48. Finally,
First, we generate a rectangular searching area (as shown the ROI has been divided into N non-overlapping sub-regions
in Fig. 4). In the rectangular searching area (W × H ), (s1 , s2 , · · · , sN ).

VOLUME 7, 2019 105149


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

Algorithm 2 US-ELM Algorithm


1 Input: Deep feature matrix X ∈ RN ×n0
2 Output: Embedding matrix E ∈ RN ×n0
3 Output: Clustering index vector y ∈ RN ×1
4 Construct the Laplacian operator L from the training set X
FIGURE 5. CNN architecture for mass detection. 5 Randomly generate hidden layer neuron output matrix
H ∈ RN ×nh
6 if nh ≤ N then
3) EXTRACT DEEP FEATURES USING CNN Use equation minβ∈Rnh ×no k β k2 +λTr β T HT LHβ

7
In this paper, CNN is used to extract deep features from to calculate output weights
the ROI sub-regions. Fig. 5 presents a 7-layer CNN archi- 8 else
Use equation I0 + lHT LH v = γ HT Hv to calculate

tecture, which contains 3 convolution layers, 3 max-pooling 9
layers, and one fully-connected layer. The input of CNN is a output weights
48 × 48 dimension sub-region image captured from previous 10 end if
steps. The first convolution layer filters the 48 × 48 × 3 input 11 Calculate the embedding matrix: E = Hβ
images with 12 kernels of size 9×9×3 and obtains the output 12 Use k-means algorithm clustering N points into K cate-
with size 40 × 40 × 12. gories
X 13 Denote y as the dimension vector for all point classifica-
Convk (i, j) = W k,l (u, v) · inputi (i − u, j − v) + bk,l tion indexes
u,v
14 return y
(1)

where W k,l represents the k th kernel and bk,l denotes the bias
of k th layer. The activation value is constrained in the range between hidden layer and output layer. Otherwise, we use
[−1, 1] using tanh as the activation function. the equation I0 + lHT LH v = γ HT Hv to calculate output
  weights. After that, we calculate the embedding matrix and
Outputk (i, j) = tanh Convk (i, j) (2) use k-means algorithm clustering N points into K categories.

The output of the first convolution layer is connected C. CLASSIFY BENIGN AND MALIGNANT MASSES BASED
with a max-pooling layer. Then the second and third ON FEATURES FUSED WITH CNN FEATURES
convolution/max-pooling layer are connected to one another In this subsection, a diagnosis method using ELM classifier
until we have the output with size 2 × 2 × 6. The with fusion deep features is proposed. The main idea is to
fully-connected layer has 2 × 2 × 6 = 24 neurons which extract the deep features using CNN, and also extract the
are the features for the following clustering analysis. morphological, texture and density characteristics from the
breast mass area. Then use the ELM classifier to classify the
4) CLUSTERING DEEP FEATURES USING US-ELM fusion features and obtain the benign and malignant diagnosis
In this paper, we use the US-ELM algorithm to cluster results.
deep features extracted from the previous CNN architec-
ture. The cluster number is set to 2 and sub-region features 1) FEATURE MODELING
are clustered into two categories: A. suspicious mass areas; In clinical, breast masses are common signs of early breast
B. non-suspicious mass areas. disease, according to the pathological characteristics of the
When the amount of training data is small, the effect of masses, where the masses are divided into two categories:
the model obtained by supervised learning cannot satisfy malignant and benign. On one hand, CNN extracts the deep
the demand. Therefore, semi-supervised learning is used to features of the masses, which can represent the essential
enhance the effect, meanwhile, it can also perform some properties of the masses. On the other hand, based on the
clustering tasks [31], [32]. US-ELM algorithm is one of the doctors‘ experience, malignant masses in mammography are
semi-supervised learning algorithms and it can find out the often have the following characteristics: the irregular shape
internal structure relationships that exist among the unla- with a burr-like edge, unsmooth surface with hard nodules,
beled dataset [33]. The details of this algorithm are shown and the densities are significantly different with surrounding
in Algorithm 2. The input of the algorithm is the deep tissues. While the benign masses often have regular shape
feature matrix X, and the output is the feature clustering with the clear edge, smooth surface, rarely accompanied with
results. Specifically, the Laplacian operator L is first con- small nodules, and the densities are uniformly distributed.
structed from the training set X, then a hidden layer neuron The types of the extracted features used in this paper are listed
output matrix is randomly generated. If the hidden neuron in Table 1.
number is smaller than the input neuron number, we use The fusion features can be modeled as
the equation minβ∈Rnh ×no k β k2 +λTr β T HT LHβ to
calculate output weights, where β represents the weights F = [F1 , F2 , F3 , F4 ] (3)

105150 VOLUME 7, 2019


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

TABLE 1. Types of extracted features. extracted inverse moment, entropy, energy, correlation and
contrast coefficient as the texture features and the corre-
sponding model can be expressed as F3 = [t1 , t2 , t3 , t4 , t5 ].
Table 3 illustrates the equations to calculate each texture
feature.

d: DENSITY FEATURES
Recent studies have shown that the density features of breast
masses have a significant correlation with mass benign
and malignant characteristics. Using density features to
predict the breast cancer is also a common method [37].
In this paper, we have extracted seven density features
and the corresponding model can be expressed as F4 =
where F1 denotes deep features, F2 denotes morphological
[d1 , d2 , d3 , d4 , d5 , d6 , d7 ]. Table 4 presents the equations to
features, F3 represents texture features, and F4 represents
calculate each density feature.
density features.
2) CLASSIFIER
a: DEEP FEATURES
The ELM algorithm is a single hidden layer feed-forward
CNN has great advantages in feature extraction due to its
neural network proposed by Huang et al. [38], which has
convolution-pooling operations. It can extract the essen-
a good generalization performance and fast learning speed,
tial image characteristics without human participation [34].
and insensitive to manual parameters setup. In this paper,
A 10-layer CNN architecture used in feature extraction is
we use ELM as the classifier to obtain breast cancer benign
shown in Fig. 6. The CNN input is the suspicious mass area.
and malignant diagnosis results.
After a series of convolution/max-pooling layers, the last
The entire ELM algorithm consists of training and testing
fully-connected layer has 2 × 2 × 5 = 20 neurons which
processes, and detailed algorithm training steps are shown
are the deep features denoted as F1 = [c1 , c2 , · · · , c20 ].
in Algorithm 3. First, randomly generate weights wi and
bias bi of the hidden layer. Then calculate the single hidden
layer output matrix H based on parameters wi , bi , and fusion
features F. After that, obtain the output weight vector β based
on the training data labels.

Algorithm 3 ELM Training Algorithm


FIGURE 6. CNN architecture for mass diagnosis.
1 Input: N = { xj , tj |xj ∈ Rn , tj ∈ Rm , j = 1, 2, · · · , N }


L: number of hidden neurons; N : labeled dataset size


b: MORPHOLOGICAL FEATURES 2 Output: Three parameters of ELM: w, b, β
3 for i = 1 to L do
According to experienced doctors, malignant breast masses
4 randomly generate weights wi and bias bi of the hidden
often have irregular shapes and blur boundaries with
layer
surrounding tissues [35]. Morphological features are impor-
5 end for
tant indicators to distinguish between benign and malignant
6 Calculate the single hidden layer output matrix H
masses. In this paper, we have extracted mass round-
7 Calculate the output weight vector β = HT T
ness, normalized radius entropy, normalized radius variance,
8 Return w, b, β
acreage ratio, and roughness as the morphological features
and the corresponding model can be expressed as F2 =
[g1 , g2 , g3 , g4 , g5 ]. Table 2 illustrates the detailed equations Algorithm 4 shows the testing steps of ELM algorithm.
to calculate each morphological feature. First, calculate the single hidden layer output matrix H based
on parameters wi , bi , β. Then extract the fusion features of the
c: TEXTURE FEATURES testing image as the algorithm input, obtaining breast cancer
Texture features are effective parameters that reflect the diagnosis results R.
benign and malignant characteristics of the breast masses,
and contribute to the early diagnosis of breast cancer. The Algorithm 4 ELM Testing Algorithm
gray-level co-occurrence matrix which was first introduced 1 Input:F, N , L, w, b, β
by Haralick is a classic gray-scale texture feature. The 2 Output: R: diagnosis results
gray-level co-occurrence matrix describes the gray distribu- 3 Calculate the single hidden layer output matrix H
tion over all image pixels, which is based on the second-order 4 Calculate the diagnosis results R = f (x)
joint condition probability [36]. In this paper, we have 5 Return R

VOLUME 7, 2019 105151


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

TABLE 2. Morphological features.

TABLE 3. Texture features.

TABLE 4. Density features.

IV. EXPERIMENT using the Senographe 2000D all-digital mammography cam-


In this section, the effectiveness of the mass detection method era from 32 to 74 years old female patients. The location of
based on the CNN and US-ELM algorithms, and breast the masses of all images have been marked by the profes-
cancer diagnostic method based on the fusion features are sional doctor, and the diagnosis results are also confirmed by
investigated. First, the parameter setup and experiment pro- the pathologist.
cedures are introduced. Then, the evaluation metrics of each
experiment are described. Finally, the experimental results of
B. EXPERIMENT PROCEDURES AND PARAMETERS
each experiment are listed and analyzed.
In the following experiments, the effectiveness of the mass
detection method based on the CNN and US-ELM algo-
A. EXPERIMENT DATA rithms, and breast cancer diagnostic method based on the
In this paper, there are 400 mammograms in the image fusion features are investigated based on the above experi-
dataset, which contains 200 malignant mass images ment dataset. The experimental procedures and parameters
and 200 benign mass images. These images are generated used in the verification process are as follows.

105152 VOLUME 7, 2019


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

1) MASS DETECTION EXPERIMENT In the diagnosis of breast masses, the CNN architecture
In the mass detection process, the feature extraction algo- parameters are shown in Fig. 6. Other parameters include the
rithms used in the experiment are CNN, DBN, and SAE. The activation function f and the learning rate η. In our experi-
clustering algorithms are US-ELM and k-means. The names ment, the learning rate is equal to 0.003 and the activation
of the experiments are simplified, and the combination of the function is tanh. When using DBN and SAE for feature
feature extraction method and the clustering method used in extraction, we use the same activation function and learning
each experiment is shown in Table 5. rate. When using ELM for clustering, the parameters involved
are the activation function and the number of hidden layer
TABLE 5. Mass detection experiment procedures. neurons. In the experiment, the selected activation function f
is ‘‘sigmoid’’, and the number of hidden layer neurons is
set to L = 1000. When considering the SVM classification
technique, the parameters involved are kernel function R,
penalty coefficient c, and kernel function parameter g. RBF
is selected as kernel function R, c = 0.5, and g = 0.0206.

C. EVALUATION METRICS
In the above experimental scheme, the mass detection method
based on subdomain CNN deep feature through US-ELM
clustering and the mass diagnosis method based on deep
The marker-controlled watershed algorithm (MCWA) [13] feature fusion are investigated respectively. The quantitative
and adaptive thresholding algorithm (ATA) [14] have been evaluation metrics of the experimental results are described
used for comparison and to further verify the accuracy of the as follows.
proposed method.
In the diagnosis of breast masses, the CNN architecture 1) DETECTION METRICS
parameters are shown in Fig. 5. Other parameters include the In this paper, Misclassified Error (ME), Area Overlap
activation function f and the learning rate η. In our experi- Metric (AOM), Area Over-segmentation Metric (AVM),
ment, the learning rate is equal to 0.003 and the activation Area Under-segmentation Metric (AUM), and Comprehen-
function is tanh. When using DBN and SAE for feature sive Measure (CM) are used to evaluate the accuracy of
extraction, we use the same activation function and learning mass segmentation in the step of breast mass detection. The
rate. When using ELM for clustering, the parameters involved detailed calculation formula of each evaluation metrics is
are the activation function and the number of hidden layer shown in Table 7. A smaller value of ME, AVM and AUM,
neurons. In the experiment, the selected activation function f and larger value of AOM and CM, correspond to a better
is ‘‘sigmoid’’, and the number of hidden layer neurons is set segmentation result. However, in practical applications, AVM
to L = 1000. Since the density feature is binary clustered and AUM often can not be optimal at the same time. ME and
based on US-ELM, the k value is set to 2. When clustering CM as comprehensive measurement parameters are often
is performed using k-means, the parameter k involved in the more important in the measurement process.
experiment is the same as the US-ELM algorithm, which is
equal to 2. 2) DIAGNOSIS METRICS
Accuracy, sensitivity, specificity, TPRatio, TNRatio, and area
2) BREAST CANCER DIAGNOSIS EXPERIMENT under ROC curve (AUC) are used to compare and analyze
In the process of breast-assisted diagnosis, the deep, morpho- the results of masses. The evaluation formula of the eval-
logical, texture and density features of suspected masses were uation metrics is shown in Table 8, and the explanation of
extracted and used for fusion feature modeling. In the exper- the evaluation metrics parameters in Table 9 are presented
iment, the deep feature model is mainly divided into single in Table 13. Among the above six metrics, the higher the
deep feature, double features and multi-features. Single deep value of accuracy, sensitivity, specificity, TP Ratio, TN Ratio
feature models (SF) contain the features extracted from CNN, and AUC, the more accurate the diagnosis will be. The k-fold
DBN, and SAE algorithms only. Double features (DF) models cross validation method [39] has been used to make the evalu-
contain deep features and one of the morphological, texture ation metrics more general. In this paper, the above evaluation
and density features, i.e., ‘‘deep feature + morphological indices are derived from the 5-folded cross validation method.
feature (CNN-G)’’. Multi-features models (MF) contain deep
features and two or more other features, i.e., ‘‘deep feature D. EXPERIMENT RESULTS AND ANALYSIS
+ morphological feature + texture feature (CNN-GTD)’’. In this section, we verify that the mass detection method
To further evaluate our proposed method, we also select based on subdomain CNN deep feature througth US-ELM
the state-of-art algorithm mentioned in [24] as the baseline, clustering and the mass diagnosis method based on deep
which is can be simplified as ‘‘GARF’’. Detailed feature feature fusion are superior to other detection and diag-
combinations and selections are shown in Table 6. nosis methods considering the above evaluation metrics.

VOLUME 7, 2019 105153


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

TABLE 6. Breast cancer diagnosis experiment feature selection.

TABLE 7. Detection indicators.

TABLE 8. Evaluation indices.

The experimental results and corresponding analysis are pre- FIGURE 7. Collation map of evaluation indices of experiments.
sented as follows.

1) DETECTION RESULTS ANALYSIS


Based on Table 5, ME, AOM, AVM, AUM and CM are
compared using three feature models and two clustering algo-
rithms. The comparison results are shown in Fig. 7 and 8.
From those figures, we can see that, the mass detection
method based on CNN sub-region deep feature clustering
gives the lowest ME regardless of which feature model is
used, and the mass detection method based on US-ELM
clustering gives the lowest ME and the best detection result.
In summary, the mass detection method based on sub-domain
CNN deep feature through US-ELM clustering can achieve FIGURE 8. Collation map of evaluation indices of experiments.

the best mass detection effect.


Fig. 9 shows the performance difference among the pro-
posed method and the MCWA, ATA algorithms on the five deep feature througth US-ELM clustering is better than other
metrics (ME, AOM, AUM, AVM and CM). From Fig. 9 we segmentation methods.
can see that, although the proposed method in the AVM
evaluation is slightly worse than other algorithms, the ME 2) MASS DETECTION BASED ON FUSION FEATURE
and CM evaluation performance have significant advantages. According to the experimental scheme described in
Therefore, the proposed method based on sub-domain CNN Section III.C.2, the accuracy, sensitivity, specificity,

105154 VOLUME 7, 2019


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

TABLE 9. Parameters meaning in Table 12.

TABLE 10. Breast cancer diagnosis experiment feature selection.

FIGURE 9. Collation map of evaluation indices of experiments.


FIGURE 10. ROC Curve of single feature with ELM and SVM classifier.

TP Ratio, TN Ratio and AUC of the benign and malignant


breast mass classification are analyzed based on three types in this paper is the most suitable method. In the double feature
of deep feature models (10 feature sets) and two classifiers. model, the CNN deep feature model combined with texture
We also show the results of the method mentioned in [24] feature has the best performance in the diagnosis accuracy,
using our datasets. The evaluation results are shown in sensitivity and specificity.
Table 10, and the ROC curve are shown in Fig. 10-12. When the classifier is SVM, the accuracy, sensitivity and
When the classifier is ELM, the accuracy, sensitivity and specificity of the CNN feature model are the best in the single
specificity of the CNN feature model are the best in the single deep feature model, which shows that the CNN model chosen
deep feature model, which shows that the CNN model chosen in this paper is the most suitable method. In the double feature

VOLUME 7, 2019 105155


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

texture, density, etc., and objective features include features


extracted from CNN or DBN. These features are flawed
to some extent. In this paper we combine subjective and
objective features, taking the doctor’s experience and the
essential attributes of the mammogram into account at the
same time. After extracting the features, the classifier is used
to classify the benign and malignant of the breast mass. In this
paper, ELM, which has a better effect on multi-dimensional
feature classification, is selected as the classifier. Through
the experiments using breast CAD of 400 cases of female
mammograms in the northeastern China, it demonstrates that,
in mass detection and mass diagnosis, our proposed methods
FIGURE 11. ROC Curve of double feature with ELM and SVM classifier. outperform other existing methods.

REFERENCES
[1] R. L. Siegel et al., ‘‘Colorectal cancer statistics,’’ CA, Cancer J. Clinicians,
vol. 64, no. 2, pp. 104–117, Mar. 2014
[2] J. B. Harford, ‘‘Breast-cancer early detection in low-income and
middle-income countries: Do what you can versus one size fits all,’’ Lancet
Oncol., vol. 12, no. 3, pp. 306–312, Mar. 2011.
[3] C. Lerman et al., ‘‘Mammography adherence and psychological distress
among women at risk for breast cancer,’’ J. Nat. Cancer Inst., vol. 85,
no. 13, pp. 1074–1080, Jul. 1993.
[4] P. T. Huynh, A. M. Jarolimek, and S. Daye, ‘‘The false-negative mammo-
gram,’’ Radiographics, vol. 18, no. 5, pp. 1137–1154, Sep. 1998.
[5] M. G. Ertosun and D. L. Rubin, ‘‘Probabilistic visual search for masses
within mammography images using deep learning,’’ in Proc. IEEE Int.
Conf. BioInform. Biomed. (BIBM), Nov. 2015, pp. 1310–1315.
[6] S. D. Tzikopoulos, M. E. Mavroforakis, H. V. Georgiou,
FIGURE 12. ROC Curve of multiple feature with ELM and SVM classifier. N. Dimitropoulos, and S. Theodoridis, ‘‘A fully automated scheme
for mammographic segmentation and classification based on breast
density and asymmetry,’’ Comput. Methods Programs Biomed., vol. 102,
no. 1, pp. 47–63, 2011.
model, the CNN deep feature model combined with texture [7] D. C. Pereira, R. P. Ramos, and M. Z. do Nascimento, ‘‘Segmentation and
feature has the best performance in the diagnosis accuracy, detection of breast cancer in mammograms combining wavelet analysis
and genetic algorithm,’’ Comput. Methods Programs Biomed., vol. 114,
sensitivity and specificity. no. 1, pp. 88–101, Apr. 2014.
According to the deep fusion model, comparing the anal- [8] S. A. Taghanaki, J. Kawahara, B. Miles, and G. Hamarneh, ‘‘Pareto-
ysis of each evaluation metrics obtained by using ELM and optimal multi-objective dimensionality reduction deep auto-encoder for
mammography classification,’’ Comput. Methods Programs Biomed.,
SVM classifier for benign and malignant tumor classification, vol. 145, pp. 85–93, Jul. 2017.
it can be seen that ELM classifier gives better diagnostic [9] X.-W. Chen and X. Lin, ‘‘Big data deep learning: Challenges and perspec-
accuracy, sensitivity, and specificity. Meantime, it is obvious tives,’’ IEEE Access, vol. 2, pp. 514–525, 2014.
[10] K. Ganesan, U. R. Acharya, C. K. Chua, L. C. Min, K. T. Abraham, and
that the mass diagnosis method based on deep fusion feature K. H. Ng, ‘‘Computer-aided breast cancer detection using mammograms:
are better than GARF [24] in the diagnosis accuracy, sensi- A review,’’ IEEE Rev. Biomed. Eng., vol. 6, pp. 77–98, 2012.
tivity and specificity. Thus it is desirable to combine the deep [11] X. Sun, W. Qian, and D. Song, ‘‘Ipsilateral-mammogram computer-aided
features with ELM in the diagnosis of breast cancer. detection of breast cancer,’’ Comput. Med. Imag. Graph., vol. 28, no. 3,
pp. 151–158, Apr. 2004.
[12] N. Saidin, U. K. Ngah, H. A. M. Sakim, N. S. Ding, M. K. Hoe, and
V. CONCLUSION I. L. Shuaib, ‘‘Density based breast segmentation for mammograms using
This paper proposes a breast CAD method based on fusion graph cut and seed based region growing techniques,’’ in Proc. 22nd Int.
Conf. Comput. Res. Develop., 2010.
deep features. Its main idea is to apply deep features extracted [13] S. Xu, H. Liu, and E. Song, ‘‘Marker-controlled watershed for lesion seg-
from CNN to the two stages of mass detection and mass mentation in mammograms,’’ J. Digit. Imag., vol. 24, no. 5, pp. 754–763,
diagnosis. In the stage of mass detection, a method based on Oct. 2011.
[14] K. Hu, X. Gao, and F. Li, ‘‘Detection of suspicious lesions by adaptive
sub-domain CNN deep features and US-ELM clustering is thresholding based on multiresolution analysis in mammograms,’’ IEEE
developed. In the stage of mass diagnosis, an ELM classifier Trans. Instrum. Meas., vol. 60, no. 2, pp. 462–472, Feb. 2011.
is utilized to classify the benign and malignant breast masses [15] M. H. Yap et al., ‘‘Automated breast ultrasound lesions detection using
convolutional neural networks,’’ IEEE J. Biomed. Health Inform., vol. 22,
using a fused feature set, fusing deep features, morphological no. 4, pp. 1218–1226, Jul. 2017.
features, texture features, and density features. In the process [16] K. C. Khan, Jr, L. M. Roberts, K. A. Shaffer, and P. Haddawy, ‘‘Construc-
of breast CAD, the choice of features is the key in deter- tion of a Bayesian network for mammographic diagnosis of breast cancer,’’
mining the accuracy of diagnosis. In previous studies, either Comput. Biol. Med., vol. 27, no. 1, pp. 19–29, Jan. 1997.
[17] Z. Wang, G. Yu, Y. Kang, Y. Zhao, and Q. Qu, ‘‘Breast tumor detection
traditional subjective features or objective features are used, in digital mammography based on extreme learning machine,’’ Neurocom-
in which traditional subjective features include morphology, puting, vol. 128, no. 5, pp. 175–184, Mar. 2014.

105156 VOLUME 7, 2019


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

[18] Y. Qiu et al., ‘‘An initial investigation on developing a new method to ZHIQIONG WANG received the M.Sc. and Ph.D.
predict short-term breast cancer risk based on deep learning technology,’’ degrees in computer science and technology from
in Proc. SPIE, vol. 9785, p. 978521, Mar. 2016. Northeastern University, China, in 2008 and 2014,
[19] W. Sun, T. L. Tseng, B. Zheng, and W. Qian, ‘‘A preliminary study respectively. She visited the National University of
on breast cancer risk analysis using deep neural network,’’ in Proc. Singapore, in 2010, and The Chinese University
Int. Workshop Breast Imag. Malmö, Sweden: Springer, Jun. 2016, of Hong Kong, in 2013, as an Academic Visi-
pp. 385–391. tor. She is currently an Associate Professor with
[20] Z. Jiao, X. Gao, Y. Wang, and J. Li, ‘‘A deep feature based framework
the Sino-Dutch Biomedical and Information Engi-
for breast masses classification,’’ Neurocomputing, vol. 197, pp. 221–231,
neering School, Northeastern University. She has
Jul. 2016.
[21] J. Arevalo, F. A. González, R. Ramos-Pollán, J. L. Oliveira, and published more than 50 papers. Her main research
M. A. G. Lopez, ‘‘Representation learning for mammography mass lesion interests include biomedical, biological data processing, cloud computing,
classification with convolutional neural networks,’’ Comput. Methods Pro- and machine learning.
gram. Biomed., vol. 127, pp. 248–257, Apr. 2016.
[22] G. Carneiro, J. Nascimento, and A. P. Bradley, ‘‘Automated
analysis of unregistered multi-view mammograms with deep
learning,’’ IEEE Trans. Med. Imag., vol. 36, no. 11, pp. 2355–2365,
Nov. 2017.
[23] Y. Kumar, A. Aggarwal, S. Tiwari, and K. Singh, ‘‘An efficient
and robust approach for biomedical image retrieval using Zernike
moments,’’ Biomed. Signal Process. Control, vol. 39, pp. 459–473,
Jan. 2018.
MO LI received the B.E. degree from the
[24] E. Aličković and A. Subasi, ‘‘Breast cancer diagnosis using GA feature
selection and rotation forest,’’ Neural Comput. Appl., vol. 28, no. 4,
College of Information and Computer Engineer-
pp. 753–763, Apr. 2017. ing, Northeast Forestry University, in 2014, and
[25] H. D. Cheng, J. Shan, W. Ju, Y. Guo, and L. Zhang, ‘‘Auto- the M.E. degree from the Sino-Dutch Biomedical
mated breast cancer detection and classification using ultrasound and Information Engineering School, Northeast-
images: A survey,’’ Pattern Recognit., vol. 43, no. 1, pp. 299–317, ern University, in 2017. She is currently pursuing
Oct. 2010. the Ph.D. degree with the School of Computer
[26] Y. Pathak, K. V. Arya, and S. Tiwari, ‘‘Low-dose ct image reconstruction Science and Engineering, Northeastern University.
using gain intervention-based dictionary learning,’’ Modern Phys. Lett. B, Her main research interests include bioinformat-
vol. 32, no. 14, p. 1850148, May 2018. ics, machine learning, and big data management.
[27] S. Tiwari, ‘‘A variational framework for low-dose sinogram restora-
tion,’’ Int. J. Biomed. Eng. Technol., vol. 24, no. 4, pp. 356–367,
2017.
[28] Z. Gao et al., ‘‘Motion tracking of the carotid artery wall from ultrasound
image sequences: A nonlinear state-space approach,’’ IEEE Trans. Med.
Imag., vol. 37, no. 1, pp. 273–283, Jan. 2018.
[29] Z. Gao et al., ‘‘Robust estimation of carotid artery wall motion using
the elasticity-based state-space approach,’’ Med. Image Anal., vol. 37,
pp. 1–21, Apr. 2017.
[30] H. Ibrahim and N. S. P. Kong, ‘‘Brightness preserving dynamic histogram HUAXIA WANG received the B.Eng. degree in
equalization for image contrast enhancement,’’ IEEE Trans. Consum. Elec- information engineering from Southeast Univer-
tron., vol. 53, no. 4, pp. 1752–1758, Nov. 2007. sity, Nanjing, China, in 2012. He is currently pur-
[31] Q. Miao, R. Liu, P. Zhao, Y. Li, and E. Sun, ‘‘A semi-supervised image suing the Ph.D. degree with the Electrical and
classification model based on improved ensemble projection algorithm,’’ Computer Engineering Department, Stevens Insti-
IEEE Access, vol. 6, pp. 1372–1379, 2018. tute of Technology, NJ, USA. In 2016, he was a
[32] H. Gan, Z. Li, Y. Fan, and Z. Luo, ‘‘Dual learning-based safe Research Intern with the Mathematics of Networks
semi-supervised learning,’’ IEEE Access, vol. 6, pp. 2615–2621, and Systems Research Department, Nokia Bell
2017. Labs, Murray Hill, NJ, USA. His current research
[33] G. Huang, S. Song, J. N. D. Gupta, and C. Wu, ‘‘Semi-supervised and
interests include wireless communications, cogni-
unsupervised extreme learning machines,’’ IEEE Trans. Cybern., vol. 44,
tive radio networks, and machine learning.
no. 12, pp. 2405–2417, Dec. 2014.
[34] R. K. Samala, H.-P. Chan, L. Hadjiiski, M. A. Helvie, J. Wei, and
K. Cha, ‘‘Mass detection in digital breast tomosynthesis: Deep convolu-
tional neural network with transfer learning from mammography,’’ Med.
Phys., vol. 43, no. 12, pp. 6654–6666, Dec. 2016.
[35] J. Tang, R. M. Rangayyan, J. Xu, I. E. Naqa, and Y. Yang, ‘‘Computer-
aided detection and diagnosis of breast cancer with mammography: Recent
advances,’’ IEEE Trans. Inf. Technol. Biomed., vol. 13, no. 2, pp. 236–251,
Mar. 2009.
[36] G. D. Tourassi, B. Harrawood, S. Singh, J. Y. Lo, and C. E. Floyd, ‘‘Eval- HANYU JIANG received the B.S. degree in con-
uation of information-theoretic similarity measures for content-based trol science and engineering from the Harbin Insti-
retrieval and detection of masses in mammograms,’’ Med. Phys., vol. 34, tute of Technology, Harbin, China, in 2012, and
no. 1, pp. 140–150, Jan. 2007.
the M.Eng. degree in computer engineering from
[37] L. Liu, J. Wang, and K. He, ‘‘Breast density classification using his-
the Stevens Institute of Technology, Hoboken,
togram moments of multiple resolution mammograms,’’ in Proc. Int. Conf.
Biomed. Eng. Informat., Oct. 2010, pp. 146–149. NJ, USA, in 2014, where he is currently pursu-
[38] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, ‘‘Extreme learning machine: ing the Ph.D. degree. He has been a Research
Theory and applications,’’ Neurocomputing, vol. 70, nos. 1–3, Co-Op with Nokia Bell Labs, Murray Hill, NJ,
pp. 489–501, 2006. USA, since 2017. His current research inter-
[39] Q. Dai, ‘‘A competitive ensemble pruning approach based on ests include heterogeneous and parallel comput-
cross-validation technique,’’ Knowl.-Based Syst., vol. 37, no. 2, ing, multi-/many-core processor architecture, bioinformatics, and artificial
pp. 394–414, Jan. 2013. intelligence.

VOLUME 7, 2019 105157


Z. Wang et al.: Breast Cancer Detection Using ELM Based on Feature Fusion

YUDONG YAO (S’88–M’88–SM’94–F’11) HAO ZHANG received the B.Sc., M.Sc., and Ph.D
received the B.Eng. and M.Eng. degrees in elec- degrees in medicine from China Medical Univer-
trical engineering from the Nanjing University of sity, Shenyang, China, in 2008, 2010, and 2018,
Posts and Telecommunications, Nanjing, China, respectively. He is currently an Assistant Director
in 1982 and 1985, respectively, and the Ph.D. Physician with the Department of Breast Surgery,
degree in electrical engineering from Southeast Shengjing Hospital of China Medical University,
University, Nanjing, in 1988. From 1989 and Shenyang, China. He has published more than
1990, he was a Research Associate with Carleton ten research papers as the First Author, including
University, Ottawa, Canada, focusing on mobile PNAS. His research interests include the com-
radio communications. From 1990 to 1994, he was bined treatment, the mechanism of recurrence and
with Spar Aerospace Ltd., Montreal, Canada, where he was involved in metastasis, and bioinformatics of breast cancer.
research on satellite communications. From 1994 to 2000, he was with
Qualcomm Inc., San Diego, CA, USA, where he participated in the research
and development of wireless code-division multiple-access (CDMA) sys-
tems. Since 2000, he has been with the Stevens Institute of Technology, JUNCHANG XIN received the B.Sc., M.Sc., and
Hoboken, NJ, USA, and is currently a Professor and the Department Director Ph.D. degrees in computer science and technol-
of electrical and computer engineering. He is also a Professor with the ogy from Northeastern University, China, in 2002,
Sino-Dutch Biomedical and Information Engineering School, Northeastern 2005, and 2008, respectively. From 2010 to 2011,
University, and the Director of the Stevens’ Wireless Information Systems he visited the National University of Singapore
Engineering Laboratory. He holds one Chinese patent and 12 U.S. patents. as a Postdoctoral Visitor. He is currently a Pro-
His research interests include wireless communications and networks, spread fessor with the School of Computer Science
spectrum and CDMA, antenna arrays and beamforming, cognitive and and Engineering, Northeastern University. He has
software-defined radio, and digital signal processing for wireless systems. published more than 60 research papers. His
He was an Associate Editor of IEEE COMMUNICATIONS LETTERS and the research interests include big data, uncertain data,
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, and an Editor of the IEEE and bioinformatics.
TRANSACTIONS ON WIRELESS COMMUNICATIONS.

105158 VOLUME 7, 2019

You might also like