Zhang 2018
Zhang 2018
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
Abstract— Convolutional neural network (CNN) is of great space using the kernel method, was introduced to deal with
interest in machine learning and has demonstrated excellent Hughes phenomenon, and the SVM classifier has become
performance in hyperspectral image classification. In this paper, a benchmark. Meanwhile, sparse representation-based clas-
we propose a classification framework, called diverse region-
based CNN, which can encode semantic context-aware repre- sification (SRC) [15, 16], extreme learning machine (ELM)
sentation to obtain promising features. With merging a diverse [17], active learning [18], relevance vector machine (RVM)
set of discriminative appearance factors, the resulting CNN- [19] and other classifiers have been developed to produce
based representation exhibits spatial-spectral context sensitivity superior performance. Nevertheless, due to the fact that the
that is essential for accurate pixel classification. The proposed same material may present spectral discrepancy and different
method exploiting diverse region-based inputs to learn contextual
interactional features is expected to have more discriminative materials may have similar spectral signatures, it is difficult to
power. The joint representation containing rich spectral and precisely distinguish different classes via spectral information
spatial information is then fed to a fully-connected network and only [20]. Accordingly, to utilize spatial information in HSI,
the label of each pixel vector is predicted by a softmax layer. many spectral-spatial techniques have been investigated. For
Experimental results with widely-used hyperspectral image data example, Markov random fields (MRFs)-based model was
sets demonstrate that the proposed method can surpass any other
conventional deep learning based classifiers and other state-of- employed to combine spatial and spectral features [21, 22],
the-art classifiers. and generalized composite kernel machine for spectral-spatial
HSI classification was presented by Li et al. [12], which can
Index Terms— Hyperspectral image, convolutional neural net-
work, deep learning, pattern recognition balance the use of spectral and spatial information without any
weighting parameter.
Aforementioned methods adopt a series of manually-
I. I NTRODUCTION extracted features, which involve massive experts’ experience
Hyperspectral remote sensing has received considerable and parameter setting. Deep learning methods [10, 23–29],
interest in recent years for a variety of applications in Earth which act more dynamically to provide automation features,
observations [1–6]. Hyperspectral imagery (HSI) provides have been extensively employed for remote sensing image
hundreds of contiguous narrow spectral bands [7–10], which feature extraction and classification. A general approach to
enables more accurate discrimination of different materials construct deep networks for remote sensing images was sys-
than traditional panchromatic and multispectral remote sensing tematically analyzed by Zhang et al. [30]. In order to extract
images. With high spectral resolution, HSI has unique advan- high-level features, deep learning architecture with multilayer
tages for finer classification [11, 12], because it can uncover stacked autoencoder was constructed through an unsupervised
and reveal subtle spectral characteristics that traditional im- manner [31, 32]. Particularly, the convolutional neural network
agery cannot resolve. (CNN), which is a class of neural networks with fewer param-
In the early stage of HSI classification, numerous ma- eters than fully-connected networks with the same number of
chine learning-based methods have been used, such as nearest hidden units, has drawn great attention. Hu et al. employed
neighbor, decision trees, and linear functions. Among these CNN to extract spectral features for HSI classification, and
methods, k-nearest neighbor (k-NN) [13] can be viewed the performance was superior to that of SVM [33]. Again,
as the simplest classifier that employs Euclidean distance excavating spatial information of HSI is of great importance
to measure the similarity between a testing sample and for classification task, and many CNN based researches have
available training samples. Support vector machine (SVM) done some explorations on this aspect. Slavkovikj et al. [34]
[14], which determines the boundary in a high-dimensional presented a CNN framework for HSI classification where
spectral features were extracted from a small neighbourhood.
This work was supported by the National Natural Science Foundation of
China (91638201, 61571033), Beijing Natural Science Foundation (4172043), Makantasis et al. introduced randomized principal component
Beijing Nova Program (Z171100001117050), and partly by the Fundamental analysis (PCA) in their work, followed by CNN (named as
Research Funds for the Central Universities (BUCTRC201615) (correspond- R-PCA CNN) to encode spectral and spatial information [35],
ing author: Wei Li).
M. Zhang and W. Li are with the College of Information Science and and a similar strategy of CNN with spatial-spectral features
Technology, Beijing University of Chemical Technology, Beijing 100029 was discussed by Mei et al. [36], named as SS-CNN. In
China (e-mail: [email protected]). the research proposed by Yue et al. [37], hyperspectral data
Q. Du is with the Department of Electrical and Computer Engineering,
Mississippi State University, Mississippi State, MS 39762 USA (e-mail: were projected to several principal components, where a CNN
[email protected]). model was adopted to extract spatial features. Recently, Li
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
et al. [38] presented a novel deep network to learn pixel-pair The joint representation using diverse region in the proposed
features and fuse the classification results of pixels in different CNN framework can simultaneously take advantages of spec-
pairs from its neighborhood. In such a strategy, CNN with tral information, spatial structure information, and semantic
pixel-pair features (denoted as CNN-PPF) can use pixel-pairs context-aware information in each pixel. (2) An important
within a fixed window during classification, while the con- module, “multi-scale summation”, is designed for deep feature
volution operation is mainly executed in the spectral domain extraction that can combine multiple scales and different level
while neglecting spatial details. Furthermore, Lee et al. [39] features from unequal layers. The features are extracted in
proposed an interesting contextual deep CNN (denoted as CD- a manner of information supplement and propagation, thus
CNN), which can optimally explore contextual interactions by ensuring the perfectibility of information.
jointly exploiting local spatial-spectral relationships of neigh- The remainder of this paper is organized as follows. The
boring pixel vectors within a square window. Specifically, the proposed classification framework is described in Section II.
joint exploitation of spatial-spectral information is achieved The experiments and analysis are discussed in Section III. The
by a multi-scale convolutional filter bank used as an initial conclusion is drawn in Section IV.
component of the proposed CNN pipeline.
Although existing methods based on CNN have employed II. D IVERSE R EGION - BASED CNN
some spatial-information extraction strategies for obtaining
spatial-spectral features, how to utilize information (abundant A. DR-CNN Architecture
spectral information and detailed spatial information) within The proposed deep network consists of several CNN
HSI more sufficiently still faces great challenge. Different from branches with each branch representing a different region,
commonly-used CNN models that apply a sliding window with which is called diverse region-based CNN (DR-CNN) model.
a specific scale to extract features [36, 39, 40], we present The architecture of the proposed DR-CNN model is illustrated
a diverse region-based deep CNN model (denoted as DR- in Fig. 1. It is based on the assumption that adjacent pixels
CNN) in this paper. In the proposed framework, different often consist of similar materials and tend to be the same
input patterns and topologies of CNN model are designed class as the central testing pixel with high probability. In other
to ensure complete information transfer. The input pattern, words, it is suboptimal if only considering the central pixel
namely diverse local or global regions (e.g., central region, without any neighboring spatial information. The key is how
original region, and four direction-oriented regions), support to select surrounding areas. Different from a traditional square
joint representation of each pixel, ensuring an architecture window, six regions associated with flexible shapes are built
of greater width. The proposed DR-CNN model employs in the form of diverse rectangles, followed by six blocks of
the mentioned six regions as the input to extract spectral- CNN model in feature extractor.
spatial features of HSI via the well designed network with For each CNN model, a “multi-scale summation” module
“multi-scale summation” module. A softmax classifier is used is employed to avoid overfitting that is usually caused by
to classify each pixel. Moreover, in order to alleviate the limited training data. The detailed framework of the module
problem of limited available training samples, we investigate is illustrated in Fig. 4, which allows for a certain extent of in-
hyperspectral data augmentation in the learning process. crease in depth and width of the network, leading to enhanced
The limited access to diverse input nodes can prevent learning capability and ultimately improved generalization per-
a well-designed network from leveraging deeper and wider formance. In a typical CNN model, early convolutional layers
networks that can take full advantage of very rich spectral- with high spatial resolutions often capture more local details
spatial information. In the work of Lee et al. [39], the single- while the ones with low spatial resolutions can capture more
scale input is a square region with fixed size, and it is structure information with high-level semantics [42]. Inspired
not universally applicable to data sets with various object by the densenet [43] and ResNet [44] for image classification,
distributions. Hence, due to the single input style of feature the “multi-scale summation” module is designed to combine
extraction process, CD-CNN proposed by Lee et al. [39] can- local fine details and high-level structure information by cross-
not fully exploit the abundant semantic-contextual properties layer aggregation of previous layers and deep layers. In
around a specific pixel, causing a great loss of information. particular, as shown in Fig. 4, the two cross-layer aggregations
Actually, we have found that most state-of-the-art CNN-based with shortcut connections contain convolution operations of
approaches are designed with single input architectures (e.g., different convolutional scales, to adjust the size of feature
spatial feature extraction using a fixed-size window) to conduct maps to be in line with that of the high-level hidden layer
classification. In our opinion, the single input style restricts the ones. The shortcut connections of “Multi-scale summation” is
performance, and He et al. [41] pointed out that segregation selectively bridged, and the selection principle is dependent
and subsequent aggregation at a deeper layer of the CNN upon optimum sparseness of shortcut connections. Then, the
model are more physiologically sound and suitable to model strategy passes previous features to a subsequent layer with
hierarchical information processing in human brains. It may be simple concatenation.
more reasonable to generate representations from flexible sized After that, all the features derived from different regions are
windows during training. Accordingly, we propose to construct fused together and fed into the last fully-connected network,
a CNN framework with diverse but rich pixel representation as shown in part III of Fig. 1. Particularly, this fully-connected
which plays a critical role in classification tasks. network is provided with more implementation details in
The main contributions of this paper are as follows. (1) Fig. 6. Consequently, the number of layers of the entire
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
... ...
... ...
...
Global
...
Region III.Fully Connect
... ...
Right ... ...
...
...
Region
... ...
Left ... ...
...
...
Region Classification
Result
Top
... ...
... ...
...
...
Region
Current pixel vector of interest
Hypothesis set around the pixel Bottom
... ...
Region ... ...
...
...
Local
... ...
Region ... ...
...
...
Fig. 1. The overall flowchart of the proposed DR-CNN.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
Conv [5x5]:64
Conv [3x3]:64
... ...
...
...
Conv [3x3]:128
Conv [3x3]:128
Conv [1x1]:128
Full Connect
Batch Norm
Input
ReLU
ReLU
ReLU
Multi-Scale Summation
Conv [1x1]:128
Conv [3x3]:128
Full Connect
Batch Norm
Input
ReLU
ReLU
... ...
...
...
...
Feature Extractor For Rc
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
Full Connect
Full Connect
Full Connect
Batch Norm
ŽƌŶͲŶŽƚŝůů
Input
ReLU
ReLU
ŽƌŶͲŵŝŶƚŝůů
'ƌĂƐƐͲƉĂƐƚƵƌĞ
Classifier of DR-CNN
(Part III of Fig. 1) ,ĂLJͲǁŝŶĚƌŽǁĞĚ
^ŽLJďĞĂŶͲŶŽƚŝůů
^ŽLJďĞĂŶͲŵŝŶƚŝůů
Fig. 6. The detailed framework of the fully-connected network in Part III ^ŽLJďĞĂŶͲĐůĞĂŶ
of Fig. 1. tŽŽĚƐ
(a) (b)
ƌŽĐĐŽůŝ ŐƌĞĞŶ ǁĞĞĚƐ ϭ
In general, the chain of the feature extractor ends in the ƌŽĐĐŽůŝ ŐƌĞĞŶ ǁĞĞĚƐ Ϯ
fully-connected layer, and the entire feature-extraction opera- &ĂůůŽǁ
&ĂůůŽǁ ƌŽƵŐŚ ƉůŽǁ
tion for a specific region is defined as &ĂůůŽǁ ƐŵŽŽƚŚ
^ƚƵďďůĞ
fRq = F (Rq , θ) , q ∈ {C, G, L, R, T, B} (4) ĞůĞƌLJ
'ƌĂƉĞƐ ƵŶƚƌĂŝŶĞĚ
where function F consists of the convolution process and ^Žŝů ǀŝŶĞLJĂƌĚ ĚĞǀĞůŽƉ
fully-connected process, Rq represents the specific region, ŽƌŶ ƐĞŶĞƐĐĞĚ ŐƌĞĞŶ ǁĞĞĚƐ
fRq ∈ <1×l is the features extracted from Rq , and θ consists >ĞƚƚƵĐĞ ƌŽŵĂŝŶĞƐ͕ ϰ ǁŬ
>ĞƚƚƵĐĞ ƌŽŵĂŝŶĞƐ͕ ϱ ǁŬ
of W, b, γ, and β. >ĞƚƚƵĐĞ ƌŽŵĂŝŶĞƐ͕ ϲ ǁŬ
After obtaining diverse features of all the regions by afore- >ĞƚƚƵĐĞ ƌŽŵĂŝŶĞƐ͕ ϳ ǁŬ
sŝŶĞLJĂƌĚ ƵŶƚƌĂŝŶĞĚ
mentioned feature extraction operations, these representative
sŝŶĞLJĂƌĚ ǀĞƌƚŝĐĂů ƚƌĞůůŝƐ
features are efficiently fused together. First, features of dif-
ferent CNN pipelines are concatenated with others to obtain a (c) (d)
feature vector f = {fRC , fRG , fRL , fRR , fRT , fRB }. Then, as ƐƉŚĂůƚ
shown in the part III of Fig. 1, the fully-connected layers are DĞĂĚŽǁƐ
established to combine these features from depth by regarding
'ƌĂǀĞů
f as input. Finally, the softmax layer is applied to predict the
classification label of the testing pixel. dƌĞĞƐ
^ŚĞĞƚƐ
III. E XPERIMENTS AND A NALYSIS ĂƌĞ ƐŽŝůƐ
For the proposed DR-CNN, all the programs are imple- ŝƚƵŵĞŶ
mented using Python language, and the network is constructed
using Keras1 and TensorFlow2 deep learning framework. Ten- ƌŝĐŬƐ
Fig. 7. For three experimental datasets: (a) False-color image of the Indian
A. Experimental Data Pines data, (b) Ground truth of the Indian Pines data, (c) False-color image of
the Salinas data, (d) Ground truth of the Salinas data, (e) False-color image of
The performance of the proposed DR-CNN is evaluated on the University of Pavia data, and (f) Ground truth of the University of Pavia
three datasets, i.e., the Indian Pines dataset, the Salinas dataset, data.
and the University of Pavia dataset, as illustrated in Fig. 7. For
each data set, we randomly select 200 labeled pixels per class
for training and all the other pixels in the ground-truth map for
testing. The Indian Pines data set, which consists of 145 × 145 TABLE I
pixels, was gathered by the Airborne Visible Infrared Imaging T HE N UMBERS O F T RAINING AND T ESTING S AMPLES FOR T HE I NDIAN
Spectrometer (AVIRIS) sensor in northwestern Indiana. There P INES DATASET.
are 220 spectral channels covering the range from 0.4 to
# Class Training Test
2.5µm with a spatial resolution of 20 m. The Indian Pines 1 Corn-notill 200 1228
dataset originally has 16 different land-cover classes; however, 2 Corn-mintill 200 630
from the statistical viewpoint, we discard small classes and 3 Grass-pasture 200 283
4 Hay-windrowed 200 278
select 8 large classes [14, 47]. The numbers of training and 5 Soybean-notill 200 772
testing samples are listed in Table I. 6 Soybean-mintill 200 2255
The second data, which consists of 512 × 217 pixels, were 7 Soybean-clean 200 393
8 Woods 200 1065
also collected by the AVIRIS sensor over Salinas Valley, - Total 1600 6904
1 https://github.com/fchollet/keras
2 http://tensorow.org/
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
TABLE II
T HE N UMBERS OF T RAINING AND T ESTING S AMPLES FOR THE S ALINAS
DATASET.
Training Set
Two fold
# Class Training Test
1 Broccoli green weeds 1 200 1809
training gaussian
2 Broccoli green weeds 2 200 3526 data fliplr noise
3 Fallow 200 1776 augmentation
4 Fallow rough plow 200 1194
5 Fallow smooth 200 2478
6 Stubble 200 3759
7 Celery 200 3379
8 Grapes untrained 200 11071
9 Soil vineyard develop 200 6003 Fig. 8. The process of data augmentation.
10 Corn senesced green weeds 200 3078
11 Lettuce romaines, 4 wk 200 868
12 Lettuce romaines, 5 wk 200 1727 of training samples can be increased by a factor of two,
13 Lettuce romaines, 6 wk 200 716
14 Lettuce romaines, 7 wk 200 870 ensuring more accurate estimation of parameters.
15 Vineyard untrained 200 7068 For each training pixel, we use the surrounding 11 × 11
16 Vineyard vertical trellis 200 1607 pixels, and diverse regions are extracted from the square-based
- Total 3200 50929
region and then pour into a sequence of convolutional layers.
Note that the proposed diverse-region strategy can be viewed
TABLE III as a flexible representation of the square-shaped region, hence
T HE N UMBERS OF T RAINING AND T ESTING S AMPLES FOR THE the region size affects the final performance of the proposed
U NIVERSITY O F PAVIA DATASET. DR-CNN. Here, we empirically set the global region size to be
11 × 11, which is further validated in Section III-C. Table IV
# Class Training Test
1 Asphalt 200 6431 summarizes the size of different regions.
2 Meadows 200 18449 Furthermore, stochastic gradient descent (SGD) with a batch
3 Gravel 200 1899 size of 450 samples is used with 500 × C (C is the number of
4 Trees 200 2864
5 Sheets 200 1145 classes) iterations, a momentum of 0.99, and a weight decay D
6 Baresoil 200 4829 of 0.0001. We initially set a base learning
rate L as 0.001, and
7 Bitumen 200 1130 1
8 Bricks 200 3482 it is decreased as, L̂ = L∗ (1+D×I) , where L̂ is the updated
9 Shadows 200 747 learning rate, and I is the number of current iterations. All the
- Total 1800 40976 convolutional layers are initialized using zero-mean Gaussian
2
random variables with standard deviation of (f anin +f anout ) ,
where ‘f anin ’ is the number of input units and ‘f anout ’ is
California. The image comprises 224 spectral bands with a the number of output units in the weight tensor [48]. Biases
spatial resolution of 3.7m. There are 16 classes in total, and the of all the convolutional layers are initialized as zero.
number of training and testing samples are listed in Table II.
The University of Pavia data set, which contains 610 × 340 C. Analysis on the Diverse Regions
pixels, was collected by the Reflective Optics System Imaging In order to validate the diverse-region strategy described
Spectrometer sensor covering the city of Pavia, Italy. The in Section II-B, we compare the classification results using
image scene comprises 103 spectral bands covering the range different kinds of input block, such as square-shaped regions
from 0.43 to 0.86µm with a spatial resolution of 1.3 m. of different sizes and direction-based half regions. The fully
Approximately 42776 labeled pixels with nine classes are from connected layer with softmax loss, as shown in Fig. 6, acts as
the ground-truth map, and the numbers of training and testing the classifier for obtaining classification result.
samples are listed in Table III. Fig. 9 illustrates the classification performance of square-
shaped region versus different window sizes, i.e., from 3 × 3
to 15 × 15. The classification result for each square-shaped
B. Learning the Proposed DR-CNN region is obtained by using the feature extraction structure
A deep network usually requires many training data to learn shown in Fig. 4 or 5 connected with softmax classifier. It is
the model with a large number of parameters. However, in apparent when the window size is as large as 11 × 11, the
HSI classification tasks, only a few labeled samples may be performance tends to be satisfied. The size 11 × 11 may not
available in practice. To solve this issue, we utilize a simple be the best window size for all the experimental data set. For
but effective data augmentation method as shown in Fig. 8. example, the red curve shows that the best window size of
For each training sample, two steps of data augmentation are square-shaped region is 9 × 9 for the University of Pavia data
executed to generate additional data without introducing extra set, and the blue curve indicates the best window size is 9 × 9
labeling costs. The first one is flip, for which the process for the Indian Pines data set, while the best window size for
is implemented by flipping the original samples horizontally, the Salinas data set is 11 × 11. Hence, we choose a relatively
or flipping them vertically. The second step is to add small large size (e.g., 11 × 11) within allowable hardware resources
Gaussian noise to the original samples. In doing so, the number for better analysis of interactions between different categories.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
TABLE IV
T HE W INDOW S IZE OF D IVERSE R EGIONS A DOPTED IN THE P ROPOSED DR-CNN.
Region Global Region Central Region Left Region Right Region Top Region Bottom Region
Size 11×11 3×3 11×7 11×7 7×11 7×11
TABLE V
C LASSIFICATION P ERFORMANCE OF D IVERSE R EGIONS F OR THE I NDIAN
98 P INES DATA S ET.
Classification Accuracy %
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
TABLE XI
C OMPARISON OF THE C LASSIFICATION ACCURACY (%) A MONG THE P ROPOSED M ETHOD AND THE BASELINES U SING THE I NDIAN P INES DATA .
Performance
class
SVM-RBF SVM-RFS SVM-MRF CNN[33] R-PCA CNN[35] CNN-PPF[38] CD-CNN[39] SS-CNN[36] DR-CNN
1 76.14 88.73 93.55 78.58 82.39 92.99 90.1 96.28 98.20±0.012
2 85.40 91.20 90.41 85.24 85.41 96.66 97.1 92.26 99.79±0.003
3 97.88 97.52 95.80 96.10 95.24 98.58 100 99.3 100±0
4 99.28 100 100 99.64 100 100 100 100 100±0
5 83.94 91.67 91.12 89.64 82.76 96.24 95.9 92.84 99.78±0.003
6 73.48 78.79 97.72 81.55 96.2 87.8 87.1 98.21 96.69 ± 0.011
7 92.11 93.76 91.71 95.42 82.14 98.98 96.4 92.45 99.86 ± 0.001
8 97.28 98.74 99.84 98.59 99.81 99.81 99.4 98.98 99.99±0
AA 88.19 92.55 95.02 90.60 90.49 96.38 95.75 96.29 99.29±0.001
OA 82.98 88.68 95.34 87.01 91.13 93.9 94.24 96.63 98.54±0.257
TABLE XII
C OMPARISON OF THE C LASSIFICATION ACCURACY (%) A MONG THE P ROPOSED M ETHOD AND THE BASELINES U SING THE S ALINAS DATA .
Performance
class
SVM-RBF SVM-RFS SVM-MRF CNN[33] R-PCA CNN[35] CNN-PPF[38] CD-CNN[39] SS-CNN[36] DR-CNN
1 96.81 99.55 100 97.34 98.84 100 100 100 100±0
2 94.67 99.92 99.70 99.29 99.61 99.88 100 99.89 100±0
3 90.27 99.44 98.94 96.51 99.75 99.60 100 99.89 99.98±0
4 98.61 99.86 98.44 99.66 98.79 99.49 99.3 99.25 99.89±0.001
5 94.82 98.02 99.47 96.97 99.84 98.34 98.5 99.39 99.83±0.002
6 97.61 99.7 99.95 99.60 99.7 99.97 100 100 100±0
7 99.24 99.69 100 99.49 79.05 100 99.8 99.82 99.96±0
8 54.69 84.85 87.64 72.25 99.17 88.68 83.4 91.45 94.14±0.018
9 98.32 99.58 99.45 97.53 96.88 98.33 99.6 99.95 99.99±0
10 81.91 96.49 94.41 91.29 99.31 98.6 94.6 98.51 99.20±0.003
11 90.57 98.78 99.91 97.58 100 99.54 99.3 99.31 99.99±0
12 92.43 100 99.64 100 100 100 100 100 100±0
13 98.07 99.13 100 99.02 98.97 99.44 100 99.72 100±0
14 90.39 98.97 98.79 95.05 82.24 98.96 100 100 100±0
15 60.06 76.38 83.37 76.83 97.57 83.53 100 96.24 95.52±0.029
16 90.87 99.56 97.34 98.94 99.61 99.31 98 99.63 99.72±0.002
AA 89.33 96.87 97.32 94.84 96.83 97.73 98.28 98.94 99.26±0
OA 81.55 93.15 94.59 89.28 92.39 94.8 95.42 97.42 98.33±0.171
TABLE XIII
C OMPARISON OF THE C LASSIFICATION ACCURACY (%) A MONG THE P ROPOSED M ETHOD AND THE BASELINES U SING THE U NIVERSITY OF PAVIA
DATA .
Performance
class
SVM-RBF SVM-RFS SVM-MRF CNN[33] R-PCA CNN[35] CNN-PPF[38] CD-CNN[39] SS-CNN[36] DR-CNN
1 84.01 87.95 98.22 88.38 92.43 97.42 94.6 97.4 98.43±0.005
2 88.9 91.17 98.90 91.27 94.84 95.76 96 99.4 99.45±0.006
3 87.57 86.99 88.97 85.88 90.89 94.05 95.5 94.84 99.14±0.003
4 96.09 95.5 93.64 97.24 93.99 97.52 95.9 99.16 99.50±0.003
5 99.91 99.85 99.11 99.91 100 100 100 100 100±0
6 93.33 94.31 80.13 96.41 92.86 99.13 94.1 98.7 100±0
7 93.98 94.74 82.79 93.62 93.89 96.19 97.5 100 99.70±0.003
8 82.94 85.89 91.88 87.45 91.18 93.62 88.8 94.57 99.55±0.002
9 99.6 99.89 100 99.57 99.33 99.6 99.5 99.87 100±0
AA 91.82 92.92 94.04 93.36 94.38 97.03 95.77 98.22 99.53±0.001
OA 89.24 91.1 92.63 92.27 93.87 96.48 96.73 98.41 99.56±0.253
TABLE VIII
TABLE VII C LASSIFICATION P ERFORMANCE (%) OF DIFFERENT INITIAL LEARNING
C LASSIFICATION P ERFORMANCE OF D IVERSE R EGIONS F OR THE RATE FOR DR-CNN ON THE I NDIAN P INES DATA .
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
with data augmentation without data augmentation Dataset Method 50 100 150 200
RL 96.12 94.87 CNN[33] 80.43 84.32 85.3 87.01
RR 95.44 93.03 CNN-PPF[38] 88.34 91.72 93.14 93.9
Indian Pines
RU 94.38 95.12 CD-CNN[39] 84.43 88.27 - 94.24
RB 95.52 93.77 DR-CNN 88.74 94.94 97.49 98.54
RC 90.43 88.20 CNN[33] 89.2 89.58 89.6 89.72
RG 96.99 95.28 CNN-PPF[38] 92.15 93.88 93.84 94.8
Salinas
DR-CNN 98.54 98.07 CD-CNN[39] 82.74 98.58 - 95.42
DR-CNN 93.46 95.54 97.36 98.33
CNN[33] 86.39 88.53 90.89 92.27
TABLE X CNN-PPF[38] 88.14 93.35 94.97 96.48
University of Pavia
OVERALL ACCURACY (%) WITH SHORT CONNECTIONS ON THE I NDIAN
CD-CNN[39] 92.19 93.55 - 96.73
DR-CNN 96.91 98.67 99.21 99.56
P INES DATA .
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
Fig. 10. Classification maps from the proposed DR-CNN and the baselines on the Indian Pines Data, (a) CNN: 97.01%, (b) CNN-PPF: 93.9%, (c) CD-CNN:
94.24%, (d) DR-CNN: 98.54%.
Fig. 11. Classification maps from the proposed DR-CNN and the baselines on the Salinas Data, (a) CNN: 89.28%, (b) CNN-PPF: 94.8%, (c) CD-CNN:
95.42%, (d) DR-CNN: 98.33%.
[5] S. Vivek, D. Ali, T. Tinne, and V. G. Luc, “Hyperspectral cnn for image Tilton, “Advances in spectral-spatial classification of hyperspectral im-
classification & band selection, with application to face recognition,” ages,” Proceedings of the IEEE, vol. 101, no. 3, pp. 652–675, Sept.
in Technical report KUL/ESAT/PSI/1604, KU Leuven, ESAT, Leuven, 2013.
Belgium, Dec. 2016. [12] J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias, and J. A. Benedikts-
[6] K. Makantasis, A. Doulamis, N. Doulamis, and A. Nikitakis, “Tensor- son, “Generalized composite kernel framework for hyperspectral image
based classifiers for hyperspectral data analysis,” arXiv preprint classication,” IEEE Transactions on Geoscience and Remote Sensing,
arXiv:1709.08164, 2017. vol. 51, no. 9, pp. 4816–4829, Sept. 2013.
[7] L. David, “Hyperspectral image data analysis,” IEEE Signal Processing [13] E. Blanzieri and F. Melgani, “Nearest neighbor classication of remote
Magazine, vol. 19, no. 1, pp. 17–28, Aug. 2002. sensing images with the maximal margin principle,” IEEE Transactions
[8] H. Li, Y. Song, and C. L. Philip Chen, “Hyperspectral image classifica- on Geoscience and Remote Sensing, vol. 46, no. 6, pp. 1804–1811, June
tion based on multiscale spatial information fusion,” IEEE Transactions 2008.
on Geoscience and Remote Sensing, 2017, in print. [14] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
[9] X. Zheng, Y. Yuan, and X. Lu, “Dimensionality reduction by spatial sensing images with support vector machines,” IEEE Transactions on
spectral preservation in selected bands,” IEEE Transactions on Geo- Geoscience and Remote Sensing, vol. 42, no. 8, pp. 1778–1790, Aug.
science and Remote Sensing, 2017, in print. 2004.
[10] L. Lin and X. Song, “Using cnn to classify hyperspectral data based on [15] J. Liu, Z. Wu, Z. Wei, L. Xiao, and L. Sun, “Spatial-spectral ker-
spatial-spectral information,” in International Conference on Intelligent nel sparse representation for hyperspectral image classication,” IEEE
Information Hiding and Multimedia Signal Processing, Kaohsiung, Journal of Selected Topics in Applied Earth Observations and Remote
Taiwan, Nov. 2016, pp. 61–68. Sensing, vol. 6, no. 6, pp. 2462–2471, 2013.
[11] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. [16] W. Li and Q. Du, “A survey on representation-based classification and
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
Fig. 12. Classification maps from the proposed DR-CNN and the baselines on the University of Pavia Data, (a) CNN: 92.27%, (b) CNN-PPF: 96.48%, (c)
CD-CNN: 96.73%, (d) DR-CNN: 99.56%.
detection in hyperspectral remote sensing imagery,” Pattern Recognition mentation of multispectral remote sensing imagery,” arXiv preprint
Letters, vol. 83, pp. 115–123, 2016. arXiv:1703.06452, 2017.
[17] W. Li, C. Chen, H. Su, and Q. Du, “Local binary patterns and [30] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing
extreme learning machine for hyperspectral imagery classication,” IEEE data: A technical tutorial on the state of the art,” IEEE Transactions on
Transactions on Geoscience and Remote Sensing, vol. 53, no. 7, pp. Geoscience and Remote Sensing, vol. 4, no. 2, pp. 22–40, June 2016.
3681–3693, 2015. [31] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based
[18] S. Sun, P. Zhong, H. Xiao, and R. Wang, “Active learning with gaussian classication of hyperspectral data,” IEEE Journal of Selected Topics in
process classier for hyperspectral image classication,” IEEE Transactions Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–
on Geoscience and Remote Sensing, vol. 53, no. 4, pp. 1746–1760, Aug. 2107, June 2014.
2015. [32] X. Ma, H. Wang, and J. Geng, “Spectral spatial classification of
[19] F. A. Mianji and Y. Zhang, “Robust hyperspectral classication using rel- hyperspectral image based on deep auto encoder,” IEEE Journal of
evance vector machine,” IEEE Transactions on Geoscience and Remote Selected Topics in Applied Earth Observations and Remote Sensing,
Sensing, vol. 49, no. 6, pp. 2100–2112, June 2011. vol. 9, no. 9, pp. 4073–4085, 2016.
[20] B. Liu, X. Yu, P. Zhang, A. Yu, Q. Fu, and X. Wei, “Supervised [33] W. Hu, Y. Huang, W. Li, F. Zhang, and H. Li., “Deep convolutional neu-
deep feature extraction for hyperspectral image classification,” IEEE ral networks for hyperspectral image classication,” Journal of Sensors,
Transactions on Geoscience and Remote Sensing, no. 99, pp. 1–13, Nov. vol. 2015, no. 258619, pp. 1–12, 2015.
2017. [34] V. Slavkovikj, S. Verstockt, W. D. Neve, S. V. Hoecke, and R. V. Walle,
[21] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVM- “Hyperspectral image classication with convolutional neural networks,”
and MRF-based method for accurate classication of hyperspectral im- in ACM International Conference on Multimedia (ACMMM), Brisbane,
ages,” IEEE Geoscience and Remote Sensing Letters, vol. 7, no. 4, pp. Australia, Oct. 2015, pp. 26–30.
736–740, May 2010. [35] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep
[22] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral spatial hyperspectral supervised learning for hyperspectral data classication through convolu-
image segmentation using subspace multinomial logistic regression and tional neural networks,” in IGARSS, Milan, Italy, July 2015, pp. 4959–
markov random fields,” IEEE Geoscience and Remote Sensing Letters, 4962.
vol. 50, no. 3, pp. 809–823, Aug. 2012. [36] S. Mei, J. Ji, J. Hou, X. Li, and Q. Du, “Learning sensor specic
[23] F. Zhang, B. Du, and L. Zhang, “Saliency-guided unsupervised feature spatial spectral features of hyperspectral images via convolutional neural
learning for scene classication,” IEEE Transactions on Geoscience and networks,” IEEE Transactions on Geoscience and Remote Sensing,
Remote Sensing, vol. 53, no. 4, pp. 2175–2184, Apr. 2015. vol. 55, no. 8, pp. 4520–4533, Aug. 2017.
[24] K. Makantasis, K. Karantzalos, A. Doulamis, and M. Loupos, “Deep [37] J. Yue, W. Zhao, S. Mao, and H. Liu, “Spectral spatial classication of
learning-based man-made object detection from hyperspectral data,” in hyperspectral images using deep convolutional neural networks,” Remote
International Symposium on Visual Computing, Las Vegas, Nevada, Dec. Sensing Letters, vol. 6, no. 6, pp. 468–477, May 2015.
2016, pp. 717–727. [38] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classication
[25] M. Lichao, G. Pedram, and Z. XiaoXiang, “Deep recurrent neural using deep pixel pair features,” IEEE Transactions on Geoscience and
networks for hyperspectral image classification,” IEEE Transactions on Remote Sensing, vol. 52, no. 2, p. 844853, Feb. 2017.
Geoscience and Remote Sensing, vol. 55, no. 7, pp. 3639–3655, 2017. [39] H. Lee and H. Kwon, “Going deeper with contextual CNN for hyper-
[26] F. Zhang, B. Du, and L. Zhang, “Scene classification via a gradient spectral image classification,” IEEE Transactions on Image Processing,
boosting random convolutional network framework,” IEEE Transactions vol. 26, no. 10, pp. 4843–4855, July 2017.
on Geoscience and Remote Sensing, vol. 54, no. 3, pp. 1793–1802, 2016. [40] Y. Chen, X. Zhao, and X. Jia, “Spectral spatial classication of hyper-
[27] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C. H. spectral data based on deep belief network,” IEEE Journal of Selected
Davis, “Training deep convolutional neural networks for land cover Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6,
classification of high resolution imagery,” IEEE Geoscience and Remote pp. 2381–2392, June 2015.
Sensing Letters, vol. 14, no. 4, pp. 549–553, 2017. [41] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep
[28] Q. Zou, L. Ni, T. Zhang, and Q. Wang, “Deep learning based feature convolutional networks for visual recognition,” European Conference on
selection for remote sensing scene classification,” IEEE Geoscience and Computer Vision (ECCV), vol. 37, no. 9, pp. 1904–1916, Sept. 2015.
Remote Sensing Letters, vol. 12, no. 11, pp. 2321–2325, 2015. [42] X. Liang, C. Xu, X. Shen, J. Yang, S. Liu, J. Tang, L. Lin, and
[29] R. Kemker and C. Kanan, “Deep neural networks for semantic seg- S. Yan, “Human parsing with contextualized convolutional neural net-
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2809606, IEEE
Transactions on Image Processing
1.0
0.8
0.6
3.0
2.5
2.0
1.5
1.0
Wei Li (S’11–M’13–SM’16) received the B.E. de-
0.5 gree in telecommunications engineering from Xidian
University, Xi’an, China, in 2007, the M.S. degree in
0.0
0 100 200 300 400 500 information science and technology from Sun Yat-
Ierations Sen University, Guangzhou, China, in 2009, and the
Ph.D. degree in electrical and computer engineering
(b) from Mississippi State University, Starkville, MS,
2.5
USA, in 2012.
Training error rate %
work,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Qian Du (S’98–M’00–SM’05–F’18) received the
vol. 39, no. 1, pp. 115–127, Mar. 2016. Ph. D. degree in electrical engineering from the
[43] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely University of Maryland, Baltimore, MD, USA, in
connected convolutional networks,” arXiv preprint arXiv:1608.06993, 2000. She is currently the Bobby Shackouls Profes-
2016. sor with the Department of Electrical and Computer
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image Engineering, Mississippi State University, Starkville,
recognition,” in Computer Vision and Pattern Recognition, Las Vegas, MS, USA. Her research interests include hyperspec-
NV, USA, June 2016, pp. 770–778. tral remote sensing image analysis and applications,
[45] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep pattern classification, data compression, and neural
network training by reducing internal covariate shift,” arXiv preprint networks.
arXiv:1502.03167, 2015. Dr. Du is a fellow of the SPIE-International So-
[46] V. Nair and G. E. Hinton, “Rectified linear units improve restricted ciety for Optics and Photonics. She received the 2010 Best Reviewer Award
boltzmann machines,” in International Conference on Machine Learn- from the IEEE Geoscience and Remote Sensing Society. She was a Co-
ing, Haifa, Israel, June 2010, pp. 21–24. Chair of the Data Fusion Technical Committee of the IEEE Geoscience and
Remote Sensing Society from 2009 to 2013, and the Chair of the Remote
[47] W. Li, E. W. Tramel, S. Prasad, and J. E. Fowler, “Nearest regularized
Sensing and Mapping Technical Committee of the International Association
subspace for hyperspectral classification,” IEEE Transactions on Geo-
for Pattern Recognition from 2010 to 2014. She has served as an Associate
science and Remote Sensing, vol. 52, no. 1, pp. 477–489, Jan. 2014.
Editor for the IEEE Journal of Selected Topics in Applied Earth Observations
[48] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
and Remote Sensing, the Journal of Applied Remote Sensing, and the IEEE
feedforward neural networks,” in AISTATS, Sardinia, Italy, May 2010,
Signal Processing Letters. Since 2016, she has been the Editor-in- Chief of the
pp. 249–256.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote
[49] Z. Zhong, J. Li, L. Ma, H. Jiang, and H. Zhao, “Deep residual networks
Sensing. She is the General Chair of the 4th IEEE GRSS Workshop on
for hyperspectral image classification,” Texas, USA, July 2017, pp. 23–
Hyperspectral Image and Signal Processing: Evolution in Remote Sensing,
28.
Shanghai, in 2012.
[50] B. Waske, S. Linden, J. Benediktsson, and P. Hostert, “Sensitivity of
support vector machines to random feature selection in classification
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.