1. Introduction
The emergence of Satellite Internet of Things(IoT) system, which means combining various information sensor equipments with network into a huge network through satellite communication, has a profound impact on processing. Today, with the emergence of new acquisition platforms, smaller and more efficient sensors, and edge computing [
1], remote sensing technology is once again on the edge of major technological innovation. Traditionally, remote sensing was a subject of aerial surveying and mapping, geographic information systems, and earth observation, but recent developments have shifted it to the direction of satellite Internet of Things. Ideally, the continuous streaming data from the interconnected devices on the aggregation platform will paint a vivid picture of the world people live in. However, the real world is ever-changing with an enormous amount of details, but the capacity of the remote sensing system is limited. Facing the large amount of hyperspectral data and time-consuming data transmission, computing or caching the data at the edge can effectively reduce the amount of transmission [
2], and the satellite Internet of Things can solve the latency and bandwidth issues in the data transmission process. The satellite IoT is shown in
Figure 1. First, the hyperspectral data is collected through satellites, and the collected data is processed through multi-access edge computing, after which the results are sent to ground. Finally, the data is analyzed through data statistics and post-processing to realize data monitoring. As an aspect of the application of the research in this paper, the restrictions of on orbit satellite hyperspectral application can be resolved to a certain extent, and lay a foundation for the subsequent research on satellite IoT as well as other hyperspectral images applications.
With a prominent role in hyperspectral image classification, which is the core part of edge computing process, attribute profile (AP) [
3] can use available attributes that can be calculated based on region to realize multi-scale analysis of images. AP is considered a multi-scale analysis tool, which can filter the connected components of gray image rather than single pixel executed by morphological attributes. In addition, in the case of limited sample numbers, high-quality samples for classifiers can also be generated by AP-based algorithms. Due to the high dimension of hyperspectral image data, the dimension reduction process before attribute filtering is common in hyperspectral images, which often leads to the loss of spectral information. Stacked filtered images are called extended attribute profiles (EAPs). As shown in [
3,
4,
5], spatial information of connected region at different scales can be modeled by APs. Therefore, the multi-level spatial features of images can be created by applying APs in sequence, which make APs an effective spatial feature of hyperspectral data. As the [
6,
7] show, when EAPs combined with original spectral data is used as the input samples of the network, the extracted features of the network are better for classification, reflecting the great cooperation potential of combining EAP and deep learning. Also, seeing that images can be processed based on different attributes and thresholds which first can be calculated based on the connected components, AP can be used as a flexible tool. The traditional thresholds are set arbitrarily, but the tuning of the parameters of the attribute filter is rarely studied. In [
8], an automatic feature selection method is proposed to tune the thresholds of attribute filters. Dalla [
9] proves that the two attributes (area and standard deviation) using automatic methods are separated from manual methods. Using the algorithm in this paper, it is simpler to obtain the threshold when the attribute area is considered.
There are many architectures to perform classification-related tasks, of which Autoencoder (AE) [
10], as an unsupervised learning model, holds one of the most dominant positions. Chen et al. [
11] introduce autoencoders into HSI data classification. Traditional research on AE in HSI classification tended to select raw spectral data combined with image patches as the input of the AE network to learn the spatial-spectral features. The extraction effect of depth features has a great impact on the classification accuracy [
12]; effective feature representation can improve the efficiency of classifier [
13]. Lauzon [
14] and Lin [
15] proposed that in those image patches the spatial information of the center pixel is represented by all the pixels in the region. Before getting the image patches, since the dimension of the raw HSI data is high, techniques reducing this dimensionality can be beneficial [
16]. However, traditional methods of dimension reduction such as Principal Component Analysis (PCA) [
17], Independent Component Analysis (ICA) [
18], etc., tend to cause the loss of spectrum information, further leading to the decline of classification accuracy. Cavallaro [
6] demonstrated that after encoding the raw spectral data, features can be classified more effectively.
The classification accuracy can be improved by the pre-training network obtained by the AE. In addition, the feature of its own dimension reduction coding contributes to reduce the dimension of hyperspectral images, which can further improve the performance of hyperspectral image classification. Besides, the selection of the parameters of the attribute filters is a major issue when the profiles are generated. There are related researches of this issue that can be found in [
6,
19], which are time-consuming and difficult to handle. This paper proposes to choose the strategy of selecting thresholds for attribute filters to construct area attribute profiles and then encoding APs with autoencoders for HSI classification. In this method, we focus on the parameter selection of attribute profiles to generate APs and process of encoding by autoencoder. The spatial-spectral features are extracted by EAPs, and in combination with the deep features learning by autoencoder, can acquire more effective features for classification. The innovative framework proposed in this paper can be introduced to other applications, such as the Internet of Vehicles [
20].
Compared with state-of-the-art, the main contribution of this study can be summarized as follows:
- (1)
Spatial spectrum feature extraction. The space frequency characteristics of the joint spectral information and spatial information are used to solve the problems of “same spectral foreign matter” and “Homo object heterospectral” in hyperspectral data. The spatial information of hyperspectral data is extracted based on EMAP in this paper, leading to the full and comprehensive spatial features of hyperspectral images extraction.
- (2)
Multi-feature fusion. A multi feature hyperspectral image classification algorithm based on the fusion of depth feature and spatial spectrum feature is proposed. The stack autoencoder is selected to extract the depth feature from the training samples.
2. Related Work
The introduction of AP aims to make full use of the spatial information in hyperspectral images, but spatial features have limited ability to represent hyperspectral images, so it is necessary to select various features fused to improve the classification accuracy of hyperspectral images, such as AE, the structure adopted by this paper.
2.1. Attribute Profile
In order to alleviate the problems of “same object with different spectrum” and “same spectrum foreign object” in hyperspectral image classification, and reduce the probability of misclassification of edge pixels, spatial features are introduced into hyperspectral image classification features. In order to make full use of the spatial information in hyperspectral images, AP is used in this paper to extract the spatial information in multi-scale. The concept of AP is based on morphology profiles (MP), which is constructed based on the repeated use of openings and closings by reconstruction with a structuring element (SE) [
21]. The MP has some limitations because of SE’s properties; to overcome these limitations, morphological AP has been proposed. AP can analyze many geometric attributes such as area, standard deviation, and the diagonal of the box bounding the regions, and in this way various spatial information can be obtained according to different attributes.
More specifically, APs rely on morphological attribute filters (AFs), since an AP is achieved by using AFs with a set of thresholds [
3]. AFs process connected components either by keeping or merging them. The decision on the AFs to be performed on each region is given by the result of the threshold that evaluates if a given attribute which is computed on a connected component is greater/lower than the arbitrary reference value [
22]. If the comparsion result is not verified, then the region is merged to the adjacent region having a closer gray-level value (either greater or equal to the one of the evaluated region). In general, features of the connected component on which the AFs are applied are compared to the given threshold.
The set of thresholds can be set manually or predicted by algorithm. The thresholds are calculated manually based on the statistics and selected in a trial-and-error way [
3,
19], while the predicates are calculated automatically according to value of attributes [
8]. The classification accuracy obtained from automaic prediction may be slightly lost, but automatic method was chosen as its universality in satellite applications. The
predicates are set to represent a set of thresholds predicated by values of image attribute in this paper. More formally, given a set of predicates of length L in order (
),
, let
and
denote the attribute thinning and thickening operation respectively.
An AP of a gray image is defined as in (
1),
where the
f represents the original gray image,
represents different predicates,
represents image after thickening operation with the predicate of
,
represents image after thinning operation with the predicate of
, respectively. It is possible to note how the sequence of thinning transformations is taken considering the sequence of predicates in increasing order, while thickening transformations refer to decreasing order. That is to say, progressively strict criteria leading to progressively coarse images. When
= 0,
=
=
=
f.
Figure 2 shows an example of an AP formed by attribute filtering on one of the principal components (PC) after PCA operation on hyperspectral data. Different images can be obtained by using different predicates on the original PC. Therefore, an AP is a stack of thickening and thinning profiles. The original image
f can be regarded as the level zero of both the thickening and thinning profiles. It’s obvious that given the original image
f as input, after the attribute filtering, there are
output images as APs. In order to expand AP to the hyperspectral image spatial information extraction field, people proposed the concept of EAP. EAP is extracted on the first
m principle components (PCs) transformed from HSI data. The attribute filtered PCs construct extended APs (EAPs). More formally, let
g indicates
m PCs, the process of generating an EAP can be formalized as in (
2).
When we use two or more attributes, we can get EMAPs. Assuming that
k attributes are selected, EMAP can be expressed as formula (
3).
where
is an EAP built with a set of predicates evaluating the attribute
and
. In order to avoid redundancy information, the original component
of EAP is removed.
2.2. Autoencoder and Classifier
There are many methods proposed in remote sensing images classification, but regarding the lack of labeled samples, the supervised methods and semi-supervised methods are not suitable for hyperspectral images classification. Therefore, the unsupervised methods are adopted in this paper, among which SAE performs well, which is a deep learning network structure for hyperspectral image classification in common use. The most commonly reported paradigm for classification of autoencoders consists of unsupervised pre-training, followed by supervised fine-tune and ends with its’ classification often by logistic classifier or softmax classifier. The typical autoencoder is a three-layered network, consist of an input layer, a hidden layer and an output layer, it aims to minimize the reconstruction error and then learn a network which can learn deep features of the input data. For this reason, it encodes the input data to get the feature data, next decodes the feature data to obtain the reconstruction data, then defines the loss function and optimizes the function until the network training finishes.
The encoding process from the input layer to the hidden layer is a linear combination with a nonlinear activation function. Similarity, the decoding process from hidden layer to the output layer is still a linear combination with a nonlinear activation function. Let
represent the input data, the output data of encoding, and the output data of decoding, respectively; these processes can be formalized as shown in (
4) and (
5) below.
where
and
are the encoding weight matrix and bias,
and
are the decoding weight matrix and bias,
indicates the nonlinear activation function. To expand the unsaturated region of sigmoid activation function, we use the parametric sigmoid which allows some flexibility in network training in this paper. The parametric sigmoid function is defined as (
6) [
23]:
where x is the input,
,
and
are the parameters and/or hyper-parameters which have been kept either trainable or fixed under different setting scenarios. Keeping
equal to 1,
. The
,
and
in this paper are hyper-parameters. As a improved function of Sigmoid, the introduction of parametric sigmoid function makes it easier for the model to learn the training dataset irrespective of easy or hard examples. Besides, in order to simplify the training processing of autoencoder, tied weights strategy is employed.
There are many distance metrics to evaluate the performance of the reconstruction from z to x, such as mean squared error (MSE) cross entropy. In this paper, MSE is chosen as cost function. Our goal is to minimize the cost function defined as:
where
indicates the number of the training samples. Equation (
7) can be solved by minibatch stochastic gradient descent (MSGD) method.
The parameter matrix of the autoencoder has been optimized to minimize reconstruction errors.There are many distance measurement functions to evaluate the reconstruction performance of, such as Mean Squared Error (MSE) function. In the autoencoder, MSE is generally selected as the loss function. The goal of training the autoencoder is to minimize the loss function defined as Formula (
7).
N represents the number of input samples.
After pre-training, the output layer of autoencoder will be replaced by a logistic regression(LR) layer. Since LR works in a supervised manner, the input of the network should be the input data and its label information, and the label is the output of the network. In more detail, the sigmoid function is still the activation function in LR layer,
h is the encoding result and the input data of LR layer, the probability of
h belongs to
class can be defined as:
The output of LR is between [0,1]. And the cost function is:
where
N is the number of input samples and
l is the number of true label. Dalla Equation (
9) can also be solved by stochastic gradient descent (MSGD) method.
3. Proposed Method for Spectral-Spatial Features Encoding
Our proposed framework is shown in
Figure 3. It contains two learning stages, which are optimized step by step for different objectives: the former is the training of feature extractors, and the latter is the joint training for hyperspectral image classifiers. At the first stage, the program imposes a similarity regularization on each hidden layer of SAE to learn a discriminative feature space in which homogeneous pixels are mapped closely and inhomogeneous pixels are mapped further separately. At the second stage, the program acquires an effective classifier by replacing the reconstruction layer with softmax layer. The output is class labels of pixels in HSI.
There is only one hidden layer in AE, while the hyperspectral data in this paper contains many bands. If AE is chosen to transfer high dimensions data as input through neurons directly, the difficulty of network training will be increased, making the network difficult to converge, and reducing the accuracy of feature learning. The stacked autoencoder(SAE) increases the number of hidden layers on the basis of AE, whose effect is equivalent to superposition of several AEs. SAE can fit the nonlinear relationship in the spectral information of hyperspectral image well, so as to achieve efficient representation of background image, and the parameters can be self-adaptive by learning the image information, It is a deep learning network structure commonly used in hyperspectral image classification.
The most commonly reported paradigm for classification of autoencoders consists of unsupervised pre-training, followed by supervised fine-tune and ends with its classification often by logistic classifier or softmax classifier. The typical autoencoder is a three-layered network, consist of an input layer, a hidden layer and an output layer, it aims to minimize the reconstruction error and then learn a network which can learn deep features of the input data. For this reason, it encodes the input data to get the feature data, next decodes the feature data to obtain the reconstruction data, then defines the loss function and optimizes the function until the network training finished. The outline of the proposed classification strategy is shown in
Figure 4.
The principle of AP-SAE proposed is shown as
Figure 5. Suppose that the proposed AP-SAE consists of
L stacked AE, and the hidden layer dimension of
lth AP-SAE is
, where
. Let
denote the training set, where
is the spectral-spatial feature of the
ith training sample and
N is the total number of training samples. So the
lth AP-MAE has two parts: one part is an encoder to learn the feature mapping matrix, another part is a decoder to restore the input of AP-SAE with Sam constraint. For the
lth AP-SAE, let
be the weights of the hidden layer,
be the data fed into the
lth AP-SAE which is equivalent to
(
when
l = 1), and
be the restore of input
. The process is formulated as
where
is the weights matrix vector and
is the bias vector of the encoder to be learned in
lth AP-SAE. The
is the weights matrix vector and
is the bias vector of the decoder to be learned in
lth AE.
is the activation function, which uses parameter sigmoid in this method. Besides, in the case of difficult in the training of the stacked autoencoder, tied weights strategy is employed.
In varied image classification and annotation applications, there are many index or criteria to evaluate the performance of the approximation from input of encoder to output of decoder, such as mean squared error (MSE) or cross-entropy. To achieve fast convergence, each of the
lth AP-SAE sub network is trained by using the following objective function:
where
is trade off parameters,
represents the reconstruction error term, and
represents the discriminant regularization term.
The first item in (
12) is the reconstruction cost between the input data and its corresponding reconstruction data, which is calculated by
The second item in (
12) is the reconstruction cost between input data and its corresponding reconstructed data, which is calculated by
We integrate (
13), (
14) into (
12) to obtain the following objective function of AP-SAE:
By optimizing the objective function in (
15), a compact and distinctive low dimensional feature space is obtained to cover the similar spatial context in HSI. The stochastic gradient descent method is used to resolve the Equation (
15).
After the pre-training, the output layer of the auto encoder will be replaced by the logistic regression (LR) layer for the purpose of classification. Once all hidden layers of AP-SAE are pre-trained, the network will converted into the second stage of multi class classifier training. The method first integrate a C-way softmax classification layer at the top of the AP-SAE network, and then train the network by minimizing the classification error, where C is the number of land cover classifications. The softmax classifier is characterized by .
For a training sample
, let
be the output of softmax classifier and
be its input, where
is the
Lth hidden layer of AP-SAE. The softmax classifier is formulated as
where
is the softmax activation function. The objective function is the softmax cross-entropy loss, which is formulated as follows:
where
is the label set of training set
X.
is the label vector of the
ith training sample
, in which only one element is 1 and others are zeros.
5. Conclusions
Hyperspectral image classification is of significant value in remote sensing analysis, including the latest trend of satellite IoT, which can be applied in various scenarios, such as crop supervision, forest management, urban development and risk management. At the same time, continuity of the data as well as extrapolation among several scales, temporal, spatial and spectral, are key components of hyperspectral image classification [
29].
Unfortunately, the traditional satellite system is facing the issue of latency and efficiency caused by gigantic amount of data collected by remote sensors. The remote sensing data are transmitted back to the ground for processing in traditional satellite system, the transparent forwarding of data was implemented without any processing on the satellite. The latency caused by transmission and processing on the ground can be decreased greatly if on-board computing can be introduced. Besides, with the development of spacecraft, issues related to performing on-board and automatic data computing and analysis as well as decision planning and scheduling will figure among the most important requirements. The method proposed in this paper can be adapted to other hyperspectral data with similar wavelength range and spectral channel number, so it can be extended to the satellite IoT application. Also, due to the influence of vegetation spectral similarity and the loss of spectrum information during the process of dimension reduction, high classification accuracy of some geomorphic types is hard to be obtained. This paper proposed an effective HSI classification model named AP-SAE at the edge of satellite IoT, and the classification accuracy can be significantly improved by our method without obvious efficiency degradation.
Experiments are made in this paper to prove the superiority of the method proposed, but there are also some deficiencies. For example, the determination of the number of middle layer neurons in AE lacks of generalization ability. At present, the determination of the number of middle layer neurons of AE is obtained by artificial experiments. It is still a problem to propose the algorithm framework to determine the neuron format in mathematics and formula level. In future research, it is worthwhile to try this innovative framework in various settings to test its applicability such as intelligent transportation networks [
30]. Possibly, datasets with wide variations in volume, velocity, variety and veracity may lead to different performance of this framework. Moreover, with the upgrade of sensors, processors and transmitters on satellites, the division of work between the edge processing and the ground processing should be adjusted intelligently to reach optimal whole-system performance.