Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery
Abstract
:1. Introduction
- -
- we simply compute the CNN activations over the entire image scene and regard the activation vectors of the fully-connected layer as the global feature representations for scenes;
- -
- we first generate dense CNN activations from the last convolutional layer with multiple scales of the original input image scenes, and then we aggregate the dense convolutional features into a global representation via the conventional feature coding scheme, e.g., the BOW and Fisher encoding. These dense CNN activations describe multi-scale spatial information.
- -
- We thoroughly investigate how to effectively use CNN activations from not only the fully-connected layers but also the convolutional layers as the image scene features.
- -
- We conduct a comparative evaluation of various pre-trained CNN models utilized for computing generic image features
- -
- A novel multi-scale feature extraction approach with the pre-trained CNN is presented, where we encode the dense CNN activations from the convolutional layer to generate image scene representations via feature coding methods. Moreover, four commonly used feature coding methods are evaluated based on the proposed approach.
- -
- The two proposed scenarios achieve a significant performance enhancement compared to existing methods on two public HRRS scene classification benchmarks and provide a referable baseline for HRRS scene classification with deep learning methods.
HRRS | High-Resolution Remote Sensing |
---|---|
CNN | convolutional neural network |
UFL | unsupervised feature learning |
SIFT | scale invariant feature transformation |
FC layer | fully-connected layer |
ReLU | rectified linear units |
ILSVRC | ImageNet Large Scale Visual Recognition Challenge |
AlexNet | a CNN architecture developed by Alex Krizhevsky [25] |
Caffe | Convolutional Architecture for Fast Feature Embedding [28] |
CaffeNet | a CNN architecture provided by Caffe [28] |
VGG-F | a fast CNN architecture developed by Chatfield [33] |
VGG-M | a medium CNN architecture developed by Chatfield [33] |
VGG-S | a slow CNN architecture developed by Chatfield [33] |
VGG-VD16 | a very deep CNN architecture (16 layers) developed by Simonyan [27] |
VGG-VD19 | a very deep CNN architecture (19 layers) developed by Simonyan [27] |
BOW | bag of visual words |
IFK | improved Fisher kernel |
LLC | locality-constrained linear coding |
VLAD | vector of locally aggregated descriptors |
SVM | support vector machine |
2. Related Work
3. Deep Convolutional Neural Networks (CNNs)
3.1. AlexNet
3.2. CaffeNet
3.3. VGGNet
- (1)
- VGG-F: The fast CNN architecture is similar to AlexNet. The primary differences from AlexNet are the smaller number of filters and small stride in some convolutional layers.
- (2)
- VGG-M: The medium CNN architecture is similar to the one presented by Zeiler et al. [32]. It is constructed with a smaller stride and pooling size in the 1st convolutional layer. A smaller number of filters in the 4th convolutional layer is explored for balancing the computational speed.
- (3)
- VGG-S: The slow CNN architecture is a simplified version of the accurate model in the OverFeat framework [26], which retains the first five convolutional layers of the six layers in the original accurate OverFeat model and has a smaller number of filters in the 5th layer. Compared to the VGG-M, the main differences are the the small stride in the 2nd convolutional layer and the large pooling size in the 1st and 5th convolutional layers.
3.4. VGG-VD Networks
3.5. PlacesNet
4. Methodology of Transferring Deep CNN Features for Scene Classification
4.1. Scenario (I): Utilize Features from FC Layers
- -
- Because all the pre-trained CNNs require a fixed-size (e.g., ) input image, we should beforehand resize each image scene to the fixed size by feeding it into the network. The size constraint causes inevitable degradation in spatial resolution when the original size of the image is larger than the pre-defined size of the CNN.
- -
- Although data augmentation is an effective technique to reduce overfitting in the training stage, recent works show that in the testing stage, data augmentation, which is performed by sampling multiple sub-image windows and averaging the activations of these sub-images, also helps to improve classification performance. In this paper, we also apply the prevalent “center + corners with horizontal flips” augmentation strategy [25,32,33] to increase accuracy. We extract five sub-image windows (with the required size of the CNN), corresponding to the center and four corners, as well as their horizontal flips, and then we construct the global feature for each image by averaging the activation vectors over the ten sub-image windows.
- -
- As a common practice, the 4096-dimensional output features should go through the ReLU transformation so that all the elements of the features are non-negative. We have also evaluated the features without ReLU but achieved slightly worse performance.
4.2. Scenario (II): Utilize Features from Convolutional Layers
5. Experiments and Analysis
5.1. Experimental Setup
- -
- UC Merced Land Use Dataset. The UC Merced dataset (UCM) [5], manually collected from large aerial orthoimagery, contains 21 distinctive scene categories. Each class consists of 100 images with a size of pixels. Each image has a pixel resolution of one foot. Figure 4 shows two examples of each category included in this dataset. Note that this dataset shows very small inter-class diversity among some categories that share a few similar objects or textural patterns (e.g., dense residential and medium residential), which makes the UCM dataset a challenging one.
- -
- WHU-RS Dataset. The WHU-RS dataset [6], collected from Google Earth (Google Inc.), is a new publicly available dataset, which consists of 950 images with a size of pixels uniformly distributed in 19 scene classes. Some example images are shown in Figure 5. We can see that the variation of illumination, scale, resolution and viewpoint-dependent appearance in some categories makes it more complicated than the UCM dataset.
5.2. Experimental Results of Scenario (I)
Pre-Trained CNN | Classification Accuracy (%) | |
---|---|---|
UCM | WHU-RS | |
AlexNet | 94.37 | 93.81 |
CaffeNet | 94.43 | 94.54 |
VGG-F | 94.35 | 95.11 |
VGG-M | 94.48 | 94.98 |
VGG-S | 94.60 | 95.46 |
VGG-VD16 | 94.07 | 94.35 |
VGG-VD19 | 93.15 | 94.36 |
PlacesNet | 91.44 | 91.73 |
Pre-Trained CNN | UCM | WHU-RS | |||||
---|---|---|---|---|---|---|---|
1st-FC | 1st-FC+Aug | 2nd-FC+Aug | 1st-FC | 1st-FC+Aug | 2nd-FC+Aug | ||
AlexNet | 95.08 | 95.57 | 95.20 | 94.29 | 95.69 | 95.49 | |
CaffeNet | 95.09 | 95.88 | 95.17 | 95.52 | 96.23 | 95.58 | |
VGG-F | 95.19 | 96.24 | 95.54 | 95.69 | 95.94 | 95.50 | |
VGG-M | 95.64 | 96.47 | 95.68 | 95.89 | 96.34 | 95.43 | |
VGG-S | 95.66 | 96.69 | 96.01 | 96.28 | 96.71 | 95.85 | |
VGG-VD16 | 95.43 | 96.88 | 95.42 | 95.21 | 95.75 | 95.22 | |
VGG-VD19 | 94.60 | 96.58 | 95.40 | 95.36 | 96.16 | 95.37 | |
PlacesNet | 93.33 | 94.90 | 92.61 | 92.68 | 94.89 | 93.23 |
5.3. Experimental Results of Scenario (II)
5.3.1. Comparison of Feature Coding Methods
Feature Coding Method | UCM | WHU-RS | |||||
---|---|---|---|---|---|---|---|
CaffeNet | VGG-M | VGG-VD16 | CaffeNet | VGG-M | VGG-VD16 | ||
BOW | 95.16 | 96.11 | 96.51 | 96.36 | 98.02 | 98.10 | |
VLAD | 95.39 | 96.04 | 96.46 | 96.55 | 97.88 | 98.64 | |
IFK | 95.71 | 96.90 | 96.52 | 97.43 | 98.28 | 97.79 | |
LLC | 94.50 | 95.24 | 95.64 | 96.06 | 96.97 | 97.57 |
5.3.2. Effect of Image Scales
Scale Setting | CaffeNet | VGG-M | VGG-VD16 |
---|---|---|---|
One scale (256×256) | 95.08 | 96.28 | 93.97 |
Two scales (128×128,256×256) | 96.13 | 96.59 | 95.00 |
Four scales (128×128,256×256, 512×512,1024×1024) | 95.07 | 95.64 | 96.38 |
5.3.3. Effect of Different Convolutional Layers
5.3.4. Comparison with Low-Level Features
SIFT Features | Dense CNN Features | |
---|---|---|
BOW | 75.11 | 96.51 |
VLAD | 74.50 | 96.46 |
IFK | 84.40 | 96.90 |
LLC | 77.64 | 95.64 |
Methods | Accuracy (%) |
---|---|
SPM [10] | 74 |
SCK [5] | 72.52 |
SPCK++ [12] | 77.38 |
SC+Pooling [4] | 81.67 ± 1.23 |
SG+UFL [39] | 82.72 ± 1.18 |
CCM-BOVW [11] | 86.64 ± 0.81 |
PSR [13] | 89.1 |
UFL-SC [40] | 90.26 ± 1.51 |
MSIFT [60] | 90.97 ± 1.81 |
COPD [38] | 91.33 ± 1.11 |
Dirichlet [61] | 92.8 ± 0.9 |
VLAT [14] | 94.3 |
CaffeNet [36] | 93.42 ± 1.00 |
OverFeat [36] | 90.91 ± 1.19 |
GoogLeNet+Fine-tune [37] | 97.10 |
Scenario (I) | 96.88 ± 0.72 |
Scenario (II) | 96.90 ± 0.77 |
5.4. Comparison with State-of-the-Art Methods
5.5. Combining Features from Two Scenarios
Combination | Classification Accuracy (%) | ||
---|---|---|---|
Scenario (I) | Scenario (II) | UCM | WHU-RS |
VGG-S | BOW(VGG-M) | 97.30 | 98.72 |
VGG-S | VLAD(VGG-M) | 97.92 | 98.79 |
VGG-S | IFK(VGG-M) | 98.27 | 98.70 |
VGG-S | BOW(VGG-VD16) | 98.05 | 98.82 |
VGG-S | VLAD(VGG-VD16) | 97.99 | 98.63 |
VGG-S | IFK(VGG-VD16) | 98.49 | 98.52 |
CaffeNet | BOW(CaffeNet) | 96.90 | 98.15 |
CaffeNet | VLAD(CaffeNet) | 97.31 | 98.21 |
CaffeNet | IFK(CaffeNet) | 97.40 | 98.03 |
CaffeNet | BOW(VGG-VD16) | 98.07 | 98.89 |
CaffeNet | VLAD(VGG-VD16) | 97.91 | 98.65 |
CaffeNet | IFK(VGG-VD16) | 98.16 | 98.80 |
OverFeat + CaffeNet [36] | 99.43 | – |
6. Discussion
- -
- The features extracted from both FC layers and convolutional layers have stronger representative ability than low-level hand-crafted features. Combined with a simple linear classifier, these CNN features can result in very remarkable performance, which also reveals that the deep CNNs trained on large natural image datasets generalize well to HRRS datasets.
- -
- As shown in Figure 7, we can invert the CNN features from very deep layers (e.g., the fifth convolutional layer) to recognizable reconstructed images. This result indicates that even from a very deep layer, the extracted CNN features still preserve rich useful information, sufficient for describing images.
- -
- For HRRS scene datasets, due to their more generic ability, CNN features from the first FC layers consistently work better than those from the second FC layers that are widely used in many works. Moreover, the data augmentation technique is verified to be beneficial for increasing the final classification performance.
- -
- The selection of the pre-trained CNN model also influences the final performance. Overall, the VGG-S model results in the best performance among the eight evaluated CNN models when extracting FC features; the VGG-M model that balances accuracy and computational cost is a better choice when extracting dense convolutional features. Moreover, PlacesNet, which is specially trained with a large natural scene dataset and has achieved impressive results on many natural scene benchmarks, performs considerably worse than the other CNN models on the HRRS scene datasets. This result indicates that considerable differences exist in the structural and textural patterns between natural scenes and HRRS scenes.
- -
- The very elementary feature coding approach, e.g., the BOW, can even achieve as competitive performance as the best if the features are good enough, e.g., the proposed multi-scale dense convolutional features. The IFK generally outperforms the other three feature coding methods, particularly with features extracted from lower-level convolutional layers.
- -
- In scenario (I), when extracting features of FC layers, we need to first resize the image scenes to the required size of the pre-trained model, whereas we can directly extract dense convolutional features of images with any size in scenario (II). Scenario (I) will suffer losing considerable information for resizing images when the size of the input image is much larger than the required size. Hence, the dense convolutional features are more suitable than the features from FC layers for HRRS scene datasets composed of large-sized images.
- -
- Features extracted from the two scenarios are complementary to some extent, and thus, we further improve the classification performance by combining feature representations of the two scenarios.
7. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Rogan, J.; Chen, D. Remote sensing technology for mapping and monitoring land-cover and land-use change. Prog. Plan. 2004, 61, 301–325. [Google Scholar] [CrossRef]
- Jaiswal, R.K.; Saxena, R.; Mukherjee, S. Application of remote sensing technology for land use/land cover change analysis. J. Indian Soc. Remote Sens. 1999, 27, 123–128. [Google Scholar] [CrossRef]
- Shao, W.; Yang, W.; Xia, G.S. Extreme value theory-based calibration for multiple feature fusion in high-resolution satellite scene classification. Int. J. Remote Sens. 2013, 34, 8588–8602. [Google Scholar] [CrossRef]
- Cheriyadat, A. Unsupervised Feature Learning for Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 439–451. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279.
- Xia, G.S.; Yang, W.; Delon, J.; Gousseau, Y.; Sun, H.; Maitre, H. Structrual High-Resolution Satellite Image Indexing. In Processings of the ISPRS, TC VII Symposium Part A: 100 Years ISPRS—Advancing Remote Sensing Science, Vienna, Austria, 5–7 July 2010.
- Xu, Y.; Huang, B. Spatial and temporal classification of synthetic satellite imagery: Land cover mapping and accuracy validation. Geo-spat. Inf. Sci. 2014, 17, 1–7. [Google Scholar] [CrossRef]
- Yang, W.; Yin, X.; Xia, G.-S. Learning High-level Features for Satellite Image Classification With Limited Labeled Samples. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4472–4482. [Google Scholar] [CrossRef]
- Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Processings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1470–1477.
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Processings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178.
- Zhao, L.J.; Tang, P.; Huo, L.Z. Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4620–4631. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Spatial pyramid co-occurrence for image classification. In Processings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1465–1472.
- Chen, S.; Tian, Y. Pyramid of Spatial Relatons for Scene-Level Land Use Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1947–1957. [Google Scholar] [CrossRef]
- Negrel, R.; Picard, D.; Gosselin, P.H. Evaluation of second-order visual features for land-use classification. In Proceedings of the International Workshop on Content-Based Multimedia Indexing, Klagenfurt, Austria, 18–20 June 2014; pp. 1–5.
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Xia, G.-S.; Delon, J.; Gousseau, Y. Accurate junction detection and characterization in natural images. Int. J. Comput. Vis. 2014, 106, 31–56. [Google Scholar] [CrossRef]
- Xia, G.-S.; Delon, J.; Gousseau, Y. Shape-based Invariant Texture Indexing. Int. J. Comput. Vis. 2010, 88, 382–403. [Google Scholar] [CrossRef]
- Liu, G.; and Xia, G.-S.; Yang, W.; Zhang, L. Texture analysis with shape co-occurrence patterns. In Proceedings of the International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 1627–1632.
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Coates, A.; Ng, A.Y.; Lee, H. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 215–223.
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Twenty-Sixth Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105.
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. In Proceedings of the International Conference on Learning Representations. CBLS, Banff, AL, Canada, 14–16 April 2014.
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015.
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014.
- Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 512–519.
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724.
- Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 647–655.
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833.
- Chatfield, K.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014.
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587.
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
- Penatti, O.A.; Nogueira, K.; dos Santos, J.A. Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 12 June 2015; pp. 44–51.
- Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land Use Classification in Remote Sensing Images by Convolutional Neural Networks. Available online: http://arxiv.org/abs/1508.00092 (accessed on 14 August 2015).
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L. Saliency-Guided Unsupervised Feature Learning for Scene Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2175–2184. [Google Scholar] [CrossRef]
- Hu, F.; Xia, G.; Wang, Z.; Huang, X.; Zhang, L.; Sun, H. Unsupervised Feature Learning via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2015–2030. [Google Scholar] [CrossRef]
- Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef] [PubMed]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
- Gong, Y.; Wang, L.; Guo, R.; Lazebnik, S. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 392–407.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 346–361.
- Cimpoi, M.; Maji, S.; Vedaldi, A. Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3828–3836.
- Rumelhart, D.E.; Hintont, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015. [Google Scholar] [CrossRef]
- Zhou, B.; Lapedriza, A.; Xiao, J.; Torralba, A.; Oliva, A. Learning deep features for scene recognition using places database. In Proceedings of the Twenty-eighth Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 487–495.
- Chatfield, K.; Lempitsky, V.S.; Vedaldi, A.; Zisserman, A. The devil is in the details: An evaluation of recent feature encoding methods. In Proceedings of the British Machine Vision Conference, Dundee, UK, 29 August–2 September 2011; pp. 1–12.
- Huang, Y.; Wu, Z.; Wang, L.; Tan, T. Feature coding in image classification: A comprehensive study. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 493–506. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yang, J.; Yu, K.; Lv, F.; Huang, T.; Gong, Y. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3360–3367.
- Perronnin, F.; Sánchez, J.; Mensink, T. Improving the fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 143–156.
- Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311.
- Sheng, G.; Yang, W.; Xu, T.; Sun, H. High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int. J. Remote Sens. 2012, 33, 2395–2412. [Google Scholar] [CrossRef]
- Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.R.; Lin, C.J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
- Vedaldi, A.; Fulkerson, B. VLFeat: An Open and Portable Library of Computer Vision Algorithms. Available online: http://www.vlfeat.org/ (accessed on 14 August 2015).
- Caffe Model Zoo. Available online: https://github.com/BVLC/caffe/wiki/Model-Zoo (accessed on 14 August 2015).
- Mahendran, A.; Vedaldi, A. Understanding Deep Image Representations by Inverting Them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015.
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Avramović, A.; Risojević, V. Block-based semantic classification of high-resolution multispectral aerial images. Signal Image Video Proc. 2014, published online. 1–10. [Google Scholar] [CrossRef]
- Kobayashi, T. Dirichlet-based histogram feature transform for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3278–3285.
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, F.; Xia, G.-S.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680-14707. https://doi.org/10.3390/rs71114680
Hu F, Xia G-S, Hu J, Zhang L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sensing. 2015; 7(11):14680-14707. https://doi.org/10.3390/rs71114680
Chicago/Turabian StyleHu, Fan, Gui-Song Xia, Jingwen Hu, and Liangpei Zhang. 2015. "Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery" Remote Sensing 7, no. 11: 14680-14707. https://doi.org/10.3390/rs71114680
APA StyleHu, F., Xia, G. -S., Hu, J., & Zhang, L. (2015). Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sensing, 7(11), 14680-14707. https://doi.org/10.3390/rs71114680