Survey On Change Detection Techniques For Multitemporal Aerial Video Sequences
Survey On Change Detection Techniques For Multitemporal Aerial Video Sequences
Survey On Change Detection Techniques For Multitemporal Aerial Video Sequences
http://doi.org/10.22214/ijraset.2020.5086
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 8 Issue V May 2020- Available at www.ijraset.com
I. INTRODUCTION
Aerial images are captured through various platforms like Unmanned Aerial Vehicles. In the field of remote sensing and
surveillance, analysis of these aerial images for the automation of the process of change detection is very necessary as the manual
process of performing the same task is very tedious and time-consuming. The main focus of this paper is the discussion of different
techniques used for detection and analysis of changes between two aerial video feeds taken with significant differences in time,
seasonal and light conditions. In order to obtain the change between two video feeds, we begin by converting the videos into
panorama images using image stitching techniques which have been described in this paper. This can be performed using two
techniques - Identifying and detecting frames with max overlap and comparing them or by creating mosaics and comparing the
mosaics. This is followed by comparison of the obtained panoramic image pairs to detect changes, if any. The popular techniques
like Conditional multilayer mixed Markov model, intensity-based correlation methods, feature based comparison methods, Machine
Learning based methods. that are used to perform change detection have been discussed in this paper. Conditional multilayer mixed
Markov model is a probabilistic model has been employed to enable detection of appropriate changes in the pairs of aerial images.
Global intensity along with correlation, performed locally and features, that are contrasting are integrated by the model. For optical
photographs, differences in pixel levels is not sufficient to segregate the background from the modified region, according to
methods like MLP and Parzen. The PCA (Principle Component Analysis) approach uses a scalar feature for the same. The k-Nearest
Neighbor approach enables updation of the changed or unchanged pixels of a binary change classifier, like the FFNN pixel-based
binary classifier. The DrLIM technique for learning a non-linear function which is coherent globally performs mapping of the data
to the output. Siamese CNN has also been implemented due to its relevant advantages as produces feature vectors for each
subsequent pixel of a pair of images rather than just one at a time. This is clearly more efficient than the various methods that must
input one image at a time. Siamese CNN also extracts features directly from image pairs rather than using hand crafted feature
vectors. The efficiency can be increased by using knn updating methods. In the section II we discuss the image stitching techniques
used to convert a video into an image. The subsequent sections - Section III to Section X, discuss various techniques, like
PCANetwork, Markov models, Deep learning, MFRC models and so on respectively, that can be employed to perform change
detection on the image pairs. This is followed by discussion of techniques that can be employed to perform detection of the objects
that have been identified as a source of the change in section XI.
In image registration, the image is converted to the coordinate system of the template image and the coinciding parts of the images
are aligned to fuse the images. This is the main aspect of image stitching. This includes three aspects: extraction of features,
matching of features and determining the transformation matrix. To improve the speed and accuracy of the matching process, only
the feature points that are similar are compared. The Euclidean distance serves as the similarity measure.
One of the images will serve as a template, and the feature points of this will be examined to obtain the closest feature points and the
subsequent closest feature points in the other image. Then a ratio of similarity measure for the points are calculated. If ratio is within
a specific threshold, then the feature points are correctly matched. SURF algorithm enables extraction of the feature points from the
entire image. Feature points are extracted only from the overlapping parts of the images.
In order to calculate the modification between two image transformation of cylindrical projection is essential to fuse the images. In
order to determine the shift between images, the source image is moved towards the target image, using set of matching feature
points. After calculating the transformation, they can be fused. To enable fusion, cylindrical projection of the image is necessary.
Through transformation matrix movement, perfect fusion between source and target images can be obtained by taking one of the
images as the target image and the other image as the base image.
III. PCANETWORK
This method classifies images based on deep learning. This method uses the representative nearby features from every pixel using
PCANetwork filters. These features are considered as convolutional filters. It reduces great amount of the speckle noise and will be
able to produce change maps that have less noise.
Fuzzy c-means and Gabor wavelets (GW) are used and relevant pixels from the multi temporal images are selected which have
higher chances of being changed or unchanged. The newly produced image patches that are centered at the relevant pixels are used
and PCANetwork model is trained. Finally, the change map is generated by combining the PCANetwork classification result and
pre-classification result.
This research work is described in [8], was mainly focused on thresholding clustering, active contours and Gabor features to identify
the pixels from DI.
Gabor wavelet representations are used to work with the changed information. After implementing the fuzzy c-means method with
the nearest neighbor (NN) approach, a better performance is given by the model. In every stage of the recent deep approaches, they
comprise convolutional-filter-bank, feature-pooling layer and non-linear processing layer. Restricted-Boltzmann-machines (RBMs)
are quite involved to learn about the filter-bank.
PCANetwork, proposed by Chan, is a deep-learning network where the filter-banks are taken from PCA filters that have showed
good performance. In the PCANetwork change detection method for SAR images have contributions as follows:
A. PCANetwork is taken as the classification model and the PCA filters are used as filters and the traditional architecture of the
CNN is retained.
B. A pre-classification scheme, inspired by Li’s work, made some accurate labelled samples for PCANetwork using GW and
FCM.
The proposed method is mainly composed of three steps:
1) Step: Log-ratio images are obtained. Gabor wavelets and FCM algorithm are used to select pixels that have high probability of
being changed or unchanged.
2) Step2: The generated image patches that are centered at interested pixels are used to train PCANetwork model.
3) Step3: The trained PCANetwork model is used to classify the remaining pixels into either changed or unchanged. The
remaining pixels are further separated into changed and unchanged classes using the trained PCANet model. The final change
map is formed by using the results of PCANetwork classification and results of the pre-classification.
a) Experimental Setup: The proposed method is applied for the 3 data-sets which possess different features-Ottawa, San Francisco
and Yellow River datasets.
b) Result and Analysis: On first dataset the D_MRFFCM produced 98.3% of PCC value, while the method proposed here gave
98.2% value. On second dataset the proposed method has given a best performance with 98.94% value. The results of Yellow
River dataset had the influence of speckle noise which stopped from giving good performance. It showed a difficult situation to
determine the efficiency of the proposed method.
Fig 1. Qualitative comparison of the change detection results with the different test methods [5]
On the same dataset, this method has provided better results than the CMX method and other futuristic methods. The CMX method
which also uses a supervised method is compared with this method qualitatively and quantitatively. The segmentation results of the
CXM model were validated against Ground Truth. The number of missed changes (during segmentation, ignorance of changed
pixels), falsely detected changes (detection of unchanged pixels as changed pixels) and total error (sum of number of instances of
missed and falsely detected changes) were computed. A more homogeneous and smoothening effect is produced in this method than
in CMX method. This method requires 10 to 15 seconds per image pairs. So, in comparison to CXM method, this method is more
efficient computationally.
Another method for segmenting images is the clustering method. The FCM, fuzzy c-means, algorithm can preserve the information
better than the hard clustering.
The proposed model is compared with two other models which are,
1) Gaussian Model with KI-based method (GKI), which uses generalized Kittler and Illingworth minimum error thresholding
algorithm on the test statistic images to determine the changes.
2) RFLICM which is the modified and reformulated function of the FLCIM (fuzzy local information c-means clustering
algorithm).
a) Proposed Model: A deep architecture for change detection. In this model first pre-classification is performed for obtaining
some examples. Second to pre-train a deep neural network RBMs are used and then fine-tune it. Third, to classify a DI a trained
deep neural network is used. The neighborhood features of each pixel in the DI is taken as inputs and logistic output can be
used to indicate the change in the pixel.
b) Experimental Study: Three datasets were adopted to test the algorithm proposed in this paper which were Ottawa dataset, Bern
dataset, and Yellow River dataset
c) Results: For Ottawa dataset, in the GKI method there was noise due to the necessity to search an optimal threshold, and GKI
and also in RFLICM had a better final map but there were missed detections. the proposed DNN algorithm gave good
performance and has no requirements for the model or the data distribution. It has strong ability to learn complicated functions.
Although DNN doesn’t result on the lowest FP and FN, it can balance them better. For Bern Dataset There were noise on the
final map generated by GKI and in the RFLICM model the sensitivity of the noise was decreased but many changed areas
where detected as unchanged. The proposed DNN performed well when checked the final map and the table. For the Yellow
River Dataset, here the changed areas are relatively small GKI did not gave good performance and the final map obtained by
RFLICM gave false alarms because of existence of noise, but the proposed method DNN gave the best to complete the
detection task which yielded 76.51% which is much higher than 59.98% by RFLICM and 30.26% by GKI
d) Conclusion: the deep learning network has an ability to learn features and represent them in an abstract way. It is not necessary
to model the data and the results are robust to noise. Compared to the clustering method RFLICM and thresholding the method
GKI, the proposed method exhibits good performance.
For training purpose, the parameter is estimated by standard maximum-likelihood approach, where the labelled images are given
and it must choose the parameters such that logarithm of likelihood is maximized. Since the labelled images are provided it is a
supervised learning technique. Feature extraction is one of the main steps in the whole process. It is done using existing feature
operators and it is carried out by first dividing the input images into many non-overlapping 16x16 pixel blocks. To obtain the
general stats of the blocks, five different kinds of multiscale texture features are applied. Instead of considering an entire image, data
present in multiple local windows is used to calculate an attainable feature vector of each block.
Aerial Images and Satellite Images were used as dataset for testing images. Labelling every block as non-urban and urban was done
manually to obtain ground truth. The performance of algorithm reduced with poor image quality but adequate results were obtained
with normal quality images. To sum up, CRF model results one feature which is insufficient to distinguish between non-urban and
urban areas. But MCRF’s ensemble model integrates very favorable result.
(a) Initial input image (b) Input image with change (c) Change detected output [7]
XI. CNN
CNN is mainly used to analyze data and have a wide range of applications in image processing. Convolution is a mathematical
operation which is basically a linear operation. Matrix multiplication is replaced by this linear operation in one of the layers and
hence called convolutional neural networks. In [9] the main focus is on comparing two images patches. This helps in solving many
Computer Vision problems and in image analysis. This can vary from low-altitude tasks such as motions, panorama constructions to
high-altitude tasks such as recognition of objects, image retrieval of images, etc. It builds patch similarity function without using
any features that are manually generated. Such functions are represented using Deep CNN. This paper describes three basic models -
i) Pseudo Siamese, ii) Siamese and iii) 2-Channel. In Siamese, one patch is fed as input to each branch out of two branches and then
a series of convolutional, ReLu and max-pooling layers are applied. The concatenated outputs of each branch are given to a top
network. These branches of the Siamese network can be considered as descriptor computation modules and similarity function is
represented as top network. 2-channel directly takes the patches for comparison rather than having a descriptor. It is more flexible
and faster to train. For evaluating the model, they use evaluation protocol and prompts ROC curves by thresholding the distance
between feature pairs. According to the experiments, 2-channel model showed better performance than the rest of them. It
adequately outperformed the state-of-the-art. When compared to present state-of-the-art systems, Siamese based architecture also
showed a better performance. To be precise, the two-stream network had best performance, which showed the significance of multi-
resolution information. The Siamese architecture is shown in fig 2a and fig 2b
Fig 2b: Four branches resulted from processing of each stream that are fed as input to top decision layer [9]
The model is trained for 25 epochs each using 60,000 images. This process takes about 722 to 760 seconds in TensorFlow CPU
system. A training epoch of 25 yielded an accuracy of 96 percentage.
XIII. CONCLUSION
Having analyzed and perused various methods of implementing change detection, we have arrived at the conclusion that the most
efficient and applicable method is to introduce panoramas of multitemporal video feeds of the same location into Siamese CNN, as
input with knn methods to improve performance measures. We intend to use this in aerial surveillance application.
REFERENCES
[1] P. Zhong and R. Wang, “A multiple conditional random fields ensemble model for urban area detection in remote sensing optical images,” IEEE Trans. Geosci.
Remote Sens., vol. 45, no. 12, pp. 3978–3988, Dec. 2007.
[2] T. Celik, “Unsupervised change detection in satellite images using principal component analysis and k-means clustering,” IEEE Geosci.Remote Sens. Lett., vol.
6, no. 4, pp. 772–776, Oct. 2009.
[3] J. Zhao, M. Gong, J. Liu, and L. Jiao, “Deep learning to classify difference image for image change detection,” in Proc. IEEE Int. Joint Conf. Neural Netw., Jul.
2014, pp. 411–417.
[4] Mr.Sudharshan Duth P, Ms.Swathi Raj, “Object Recognition in Images using Convolutional Neural Network”.
[5] C. Benedek and T. Szirányi, “Change detection in optical aerial images by a multilayer conditional mixed Markov model,” IEEE Trans. Geosci. Remote Sens.,
vol. 47, no. 10, pp. 3416–3430, Oct. 2009.
[6] P. Singh, Z. Kato, and J. Zerubia, “A multilayer Markovian model for change detection in aerial image pairs with large time differences,” in Proc. IEEE 22nd
Int. Conf. Pattern Recognit., Aug. 2014, pp. 924–929.
[7] J. Liu, M. Gong, K. Qin, and P. Zhang, “A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,” IEEE
Trans. Neural Netw. Learn. Syst., to be published.
[8] F. Gao, J. Dong, B. Li, and Q. Xu, “Automatic change detection in synthetic aperture radar images based on PCANet,” IEEE Geosci. Remote Sens. Lett., vol.
13, no. 12, pp. 1792–1796, Dec. 2016.
[9] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2015, pp. 4353–4361.
[10] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit., vol. 2. Sep. 2006, pp. 1735–1742.
[11] Touazi and D. Bouchaffra, “A k-nearest neighbor approach to improve change detection from remote sensing: Application to optical aerial images,” in Proc.
IEEE 15th Int. Conf. Intell. Syst. Design Appl., Dec. 2015, pp. 98–103.
[12] C. Benedek and T. Szirányi, “A mixed Markov model for change detection in aerial photos with large time differences,” in Proc. IEEE 19th Int. Conf. Pattern
Recognit., Sep. 2008, pp. 1–4.
[13] Yi Tan, Subhodev Das, Ali Chaudhry, “An Aerial Change Detection System Using Multiple Detector Fusion and Adaboost Classificatio”. SRI Internatinal 201
Washington Road, Princeton, New Jersey,USA
[14] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embedding,” in Proc. 22nd Int. Conf Multimedia (ACM), 2014, pp. 675–678.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf.
Comput. Vis., Feb. 2015, pp. 1026–1034.
[16] Jie Chen, Xiao YU, “Research on Cylindrical Panaromic Video Stitching and AR Prespective Observation Algorithm”.