1. Introduction
Synthetic Aperture Radar (SAR) has an all-weather, all-time, cloud-free imaging mechanism and has been explored prosperously in both academic and application affairs. The development of high-resolution satellite technology stimulates considerable research on ship target detection in SAR images. Some traditional methods focus on ship target recognition via image processing techniques. The rapid development of artificial intelligence has triggered a great amount of research on deep-learning algorithms for ship detection tasks using SAR images. Furthermore, objects in aerial images have the characteristic of arbitrary orientations, which stimulate the progression of academic research in oriented object detection. Therefore, an increasing amount of SAR ship detection research has gradually employed oriented-object-detection algorithms based on deep learning because of their credible and accurate detection performance.
Even though the number of studies of object detection in SAR images has grown extensively in recent years, some obvious drawbacks still exist compared with the detection performance of optical aerial images on account of the SAR special imaging mechanism and image characteristics. In strong interference and high sea environments of a specific space–time range, there are a large number of high-frequency, strong-power, multiple, and complex noise interferences from radio equipment, which lead to difficult communication conditions, signal blocking, and poor SAR image quality. Objects in low-quality SAR images are fuzzy and hidden from view by multiple interferences and are thus laborious to recognize using deep-learning models and human perception. Several examples of comparisons between high-quality SAR images and low-quality SAR images are presented in
Figure 1, including random background noise, strip noise, and azimuth defocus.
Considering the relatively rare quantity of low-quality SAR images and the difficulties of labeling objects manually, training an object detection model of a fully supervised deep neural network using only low-quality SAR images is inefficient and inauthentic in practical applications. Therefore, most SAR ship detection tasks are implemented on public SAR ship datasets that contain adequate high-quality images with annotations when adopting deep neural networks, a method that has achieved tremendous success. Nevertheless, coming up with complicated imaging conditions and environments, the detectors will struggle with the noteworthy disparity of feature distributions between normal SAR images and low-quality SAR images in strong interference environments, given that the objects in low-quality SAR images with low contrast and random noise are indistinct and blur at the boundary. Training with high-quality SAR images, deep-learning models will have difficulty identifying these ambiguous targets in low-quality SAR images, and a high proportion of objects will be omitted by detectors and hurt the model performance.
To tackle the dilemma of strong interference during the SAR object detection task, two opposite solutions are deeply explored and confirmed to be reasonable via extensive research. One approach is SAR image denoising and despeckling based on both traditional methods and deep-learning methods as data-preprocessing techniques. As traditional methods, SAR denoising and despeckling focus on frequency filtering and geometric transformation, such as Lee filtering [
1], Kuan Filtering [
2], PPB [
3], and SAR-BM3D [
4] algorithms. In addition, numerous strategies based on deep learning have achieved excellent performance in image denoising. Jain [
5] utilizes a CNN as well as SDA [
6] and a Sparse de-noising self-encoder [
7] to realize natural image denoising. Much research [
8,
9,
10] has achieved SAR image despeckling via the CNN. However, these preprocessing methods are still risky for the downstream task since they change the object’s appearance, and they are also time-consuming during inference processes. On the contrary, an alternative approach is to generate artificial SAR images with a corresponding noise style as a data augmentation strategy to improve the model’s robustness and performance. In analyzing the imaging mechanism and environment characteristics, diverse traditional noise simulation and image-processing methods are involved, including image saliency enhancement [
11], saturation adjustment [
12], and gamma noise simulation [
13]. However, these mentioned methods are only suitable for single and specific interference and tend to be ineffective in the typical low-quality SAR object detection task considering the complicated interferences and ineffective background.
Training with high-quality SAR images with annotations and encountering low-quality SAR images with strong interferences in the inference process can be regarded as a specific domain shift problem, so exploring the domain adaption method is a reasonable and promising solution. Therefore, an unsupervised domain adaption method is proposed in this paper to surmount the detection performance deterioration during the low-quality SAR ship detection task. First, an image-to-image translation algorithm based on Generative Adversarial Networks (GANs) is explored to convert high-quality SAR images from the source domain to low-quality SAR images in the target domain. The cycle-consistency loss in CycleGAN [
14] keeps the localization and appearance of ship targets invariant, which makes the annotation inheritance possible for the downstream task when attempting object detection. Second, the generated artificial low-quality SAR images with original labels from high-quality images are added to the training dataset, enriching the samples in the model training process and enhancing the capacity of detectors. Treating the proposed method as an efficacious data augmentation strategy, artificial large-scale nearshore SAR images are generated rigorously and accurately as training images for object detection models, encouraging significant progress in detecting recall and recognition precision. The main contributions of this paper can be summarized as follows:
A low-quality SAR image generation module is designed for generating SAR images with typical interference by utilizing unpaired image-to-image translation via CycleGAN, which is employed as a data augmentation strategy for the ship target detection task to enrich the training samples of detectors;
A framework for ship detection using SAR images is proposed to effectively learn target features in low-quality SAR images, thereby enhancing the detection performance of ship targets under the condition of strong interference;
Extensive experiments on public SAR datasets show that our method can generate effective SAR images for ship detection tasks while improving the recall and precision of ship targets in low-quality SAR images under strong interference and decreasing classification errors and missing detection.
The subsequent description of this paper is organized in the following order: in
Section 2.1, we present some previous works related to the proposed method for object detection and the image generation task; in
Section 2.2, the implementation details of our method are demonstrated via theorization;
Section 3 introduces comprehensive experiments to evaluate the validity of this unsupervised domain adaption approach for ship target detection in SAR images; in
Section 4, the discussion of this work and its future research are presented; the final section,
Section 5, presents the conclusion of the whole paper.
2. Materials and Methods
2.1. Related Works
2.1.1. SAR Ship Detection
Ship target detection has been a fundamental and momentous task in SAR perceptional interpretation for many years. Situated at the initial stage, most traditional detection methods mainly consist of three parts: preprocessing, detection, and discrimination. The CFAR [
15] detector has triggered a considerable amount of research on the SAR target detection task. The constant false alarm rate detection algorithm leverages the adaptive threshold according to the given false alarm rate and background data distribution to determine the target region at the pixel level. Furthermore, multiple optimizing strategies [
16,
17,
18,
19] improve the statistical model and threshold determination based on the original CFAR algorithm for the ship target detection task.
On the other hand, with the rapid development of satellite technologies and artificial intelligence, readily available aerial image resources stimulate the wide utilization and applications of deep-learning algorithms for remote sensing as well as SAR ship detection. Girshick proposes several productive and landmark methods, such as the R-CNN [
20], Fast R-CNN [
21], and Faster R-CNN [
22], all focusing on the object detection task. Feature extraction, region proposal strategies, classification, and regression benchmarks establish the foundations of detection network architectures. The Region Proposal Network (RPN) provides flexible bounding boxes with multiple aspect ratios. The Feature Pyramid Network (FPN) [
23] maintains multi-scale, high-level features to enhance the feature extraction efficiency. The You Only Look One (YOLO) [
24] series has developed into the most influential set of one-stage detectors in recent years. YOLOv3 [
25] adopts the Darknet53 backbone and other higher-version networks to introduce more effective modules to improve the model performance, including Cross-Stage Partial connections (CSP), SPP-Block, and PANet.
Turning now to interdisciplinary approaches using the Synthetic Aperture Radar, typical feature extraction and a fusion module are proposed to optimize the model performance for SAR images. An attention mechanism has been deeply explored in SAR target detection via elaborate networks. The Convolutional Block Attention Module (CBAM) in the feature map has been proven to be beneficial by Cui while using the Dense Attention Pyramid Network (DAPN) [
26]. Fu [
27] balances the feature pyramid under the guidance of attention to detect small ship targets better in complicated backgrounds. Several data augmentation strategies are proposed to improve the model performance and achieve a significantly reduced annotation cost for classification [
28] and semantic segmentation [
29] tasks of SAR images. Since the ship object always appears with large aspect ratios and arbitrary orientations, traditional object detection methods, which are only capable of providing horizontal bounding boxes, still exhibit obvious shortcomings. Oriented-object-detection algorithms flourish under the condition of sufficient aerial images and annotations with oriented bounding boxes in remote sensing.
2.1.2. Oriented Object Detection
As a popular topic in computer vision, oriented object detection has been comprehensively researched for both algorithms and applications. Considering the characteristic of dense distribution and arbitrary orientations of objects in remote sensing, an increasing number of algorithms are explored to locate the rotated bounding boxes that strictly embrace the edges of objects on the public datasets of aerial images from satellites, including DOTA [
30], HRSC2016 [
31], DIOR-R [
32], ICDAR2015 [
33], and so on. The RoI Transformer [
34] equips the RRoI learner and Rotated Position-Sensitive RoI Align, with the former learning the transformation from Horizontal RoIs to Rotated RoIs, and the latter being capable of extracting the rotation-invariant feature from RRoI to fulfill the oriented-object-detection task. Subsequently, a Rotation-equivariant Detector (ReDet) [
35] is able to incorporate rotation-equivariant networks into the backbone and use Rotation-invariant RoI Align (RiRoI Align) to extract rotation-invariant features which facilitate the outstanding improvement of detection performance for aerial images with oriented bounding boxes. A gliding vertex [
36] presents an effective representation of oriented bounding boxes to alleviate detection error and confusion problems. The Oriented R-CNN [
37] leverages an oriented Region Proposal Network (oriented RPN) and Oriented R-CNN head to achieve state-of-the-art detection accuracy on two public datasets with the potential to obtain high-quality oriented proposals and refine the oriented Regions of Interest. Attempting to tackle three challenges of object detection in remote sensing, small objects, cluttered arrangement, and arbitrary orientations, the SCRNet [
38] offers a feature fusion framework, a supervised multi-dimensional attention method, and an improved smooth L1 loss.
An efficient and fast single-stage detector named R3Det [
39] first overcomes the feature misalignment problem for large aspect ratio object detection and realizes the SkewIoU loss to estimate object orientations more accurately. Circular Smooth Label (CSL) [
40] regards the angle estimation of oriented bounding boxes as a classification task instead of regression. Densely Coded Labels (DCLs) [
41] utilize a novel coding mechanism to speed up the training process of CSL-based methods and also bring up Angle Distance and Aspect Ratio-Sensitive Weighting (ADARSW) in an effort to enhance square-like object detection. GWD [
42] tackles boundary discontinuity and inconsistency using an innovative regression loss derived from the Gaussian Wasserstein distance that converts the rotated bounding boxes to a 2D Gaussian distribution. Similarly, KLD [
43] approximates the SkewIoU loss with the specified distribution distance based on Gaussian modeling, and KFIoU [
44] is an easy-to-implement Gaussian-based loss helped by the Kalman filter to realize its full differentiability when mimicking the SkewIoU’s computing mechanism. In our work, the Oriented R-CNN was explored as the baseline for its distinguished detection performance based on a public SAR dataset, as reported, and our experiences.
2.1.3. Unsupervised Domain Adaptation Based on GANs
Multiple studies have investigated the potential of Generative Adversarial Networks (GANs) when it comes to domain adaption problems. GANs have strong capability for modeling and approximating data distributions of images and manufacturing artificial image samples for specific data distribution. The image-to-image translation task is widely employed in domain transfer for generating images of the target domain. Requiring strictly paired data, Pix2Pix [
45] illustrates a conditional generative adversarial network for mapping input to output images at the pixel level and plays a significant role in many classic tasks, such as style transfer. CycleGAN [
14] releases the restrictions of expensive cost-paired data and expands the applications on image-to-image translation dramatically by introducing a cycle consistency loss and two mapping frameworks and realizing an image-to-image translation task between two domains with unpaired samples. Consequently, CycleGAN stimulates multitudinous derivation in extensive research areas. Arruda [
46] adopts CycleGAN for the cross-domain car detection task by transferring images from light to dark. Pasqualino [
47] explores an unsupervised domain adaptation scheme, including CycleGAN, for cultural artwork recognition. A data augmentation method based on CycleGAN is involved in multi-organ detection in CT images by Hammami [
48]. Liu [
49] also introduces CycleGAN to transfer simulated samples for SAR target classification. More in-depth works upgrade the architecture of CycleGAN according to the characteristics of their downstream tasks. FteGanOd [
50] contains a feature translate-enhancement module based on CycleGAN and achieves multi-scale feature fusion for night-time vehicle detection. DE-CycleGAN [
51] enhances the color information and sharpness of whole satellite images and also enhances features of the target at the object level when involving weak vehicle detection in aerial images. A physical-model-based CycleGAN [
52] combines the physical model of a signal from the satellite sensor with CycleGAN and removes the thin cloud on remote sensing datasets by synthesizing a cloud-free image, thin cloud thickness map, and its thickness coefficient. In our work, CycleGAN is responsible for learning the mapping of typical interference in SAR images and generating SAR images as training samples for downstream tasks.
2.2. The Proposed Method
This section illustrates the proposed method of a data augmentation strategy in an object detection task based on unsupervised domain adaption in the following order: unpaired image-to-image translation, low-quality SAR image generation with annotations, and oriented object detection. At the initial stage, the background characteristic of low-quality SAR images is imitated using generators of CycleGAN via unpaired image-to-image translation. The cycle-consistency loss is adopted to achieve cycle mappings between two data distributions of two domains as follows: the high-quality SAR as the source domain and the low-quality SAR with strong interferences as the target domain. Therefore, during the second step, the translator keeps the geographic and semantic information of scenes invariant and only simulates the appearances of strong interferences when converting the high-quality SAR images to low-quality SAR images so that the inheritance of original annotations (rotated bounding boxes) from the source domain to target domain is feasible and sensible. During the last process, a state-of-the-art oriented-object-detection algorithm based on deep learning, named the Oriented R-CNN, is explored for low-quality SAR ship detection with multiple traditional data augmentation strategies and the proposed method of generating artificial low-quality SAR images as training samples, which is verified to be beneficial and meaningful for detector performance improvement.
Figure 2 shows the whole structure of the proposed method.
2.2.1. Unpaired Image-to-Image translation
The unpaired image-to-image translation realized via CycleGAN with a cycle-consistent loss based on Generative Adversarial Networks expands the potential of image generation for both academic and industrial applications since it releases the restriction of a paired data requirement. In particular, low-quality SAR images in strong interference environments are relatively unavailable and sparse, not to mention the fact that access to the paired high-quality SAR and low-quality SAR images at the identical geography scene is almost impossible to achieve. Relying on the capability of cycle-consistency mappings between two data distributions of CycleGAN, it becomes feasible to achieve the interference emulation of low-quality SAR images and artificial low-quality SAR training samples generation with their corresponding annotations for downstream tasks.
The proposed method attempts to simulate the complicated noises in low-quality SAR images and inject them into high-quality SAR images. Consequently, these SAR images are divided into two domains according to the imaging conditions. We treat high-quality SAR images as source domain , and low-quality SAR images as target domain . A data distribution mapping is learned by CycleGAN, with . Furthermore, iterating for the final objective, which combines with a cycle-consistency loss, the generator of CycleGAN enforces the inverse mapping to so that this image-to-image translation process is capable of maintaining the semantic information of SAR images at the object level.
The CycleGAN framework, depicted in
Figure 2, receives two unpaired datasets from separate domains: low-quality SAR and high-quality SAR. Two generators,
and
, and two discriminators,
and
, are involved in the training process simultaneously. The generator
produces artificial low-quality SAR images with strong interference from high-quality SAR images, and the discriminator
distinguishes whether this image is a real sample or a fake one. By the same token, the generator
and discriminator
are responsible for filtering complex noise in low-quality SAR images and determining the facticity of input samples.
The whole training objective consists of three parts: adversarial losses of mapping
and its discriminator
, adversarial losses of inverse mapping
and its discriminator
, and a cycle-consistency loss between these two mappings.
The adversarial losses of the low-quality SAR generation process can be described as follows:
When tries to minimize this objective against an adversary, prefers to maximize it as follows: . Similarly, the adversarial losses of the inverse mapping follow the same formulation.
Except for the adversarial loss from two pairs of generators and discriminators, as mentioned before, a cycle-consistency loss plays an extremely essential role in the generation framework to guarantee the learned mappings of the domain shift are cycle-consistent. More specifically, the cycle-consistency condition contains forward cycle consistency and backward cycle consistency, which means that these two image translation architectures should ensure bringing every image x or y from its own domain
or
back to the original image from the intermediate artificial image in the opposite domain
or
, as
and
demonstrate. The cycle-consistency loss is expressed as follows:
The property of cycle consistency enables CycleGAN to realize the style transfer for images so that some important semantic details of objects remain unchanged, such as the object positions, categories, and terrestrial background. For SAR image generation, the trained CycleGAN distinguishes the main difference between two domains and learns to imitate the strong interferences for high-quality SAR images. The invariant geographical scenes and the ship targets make the bounding boxes inheritance from the high-quality SAR to the low-quality SAR practicable.
2.2.2. Oriented Object Detection
Oriented-object-detection algorithms have been extensively explored in SAR object detection tasks, especially for SAR ship target detection, considering the arbitrary orientations and extreme aspect ratios of densely distributed ship targets in SAR images. The Oriented R-CNN [
37], chosen as the baseline of our proposed method, is a state-of-the-art two-stage rotated object detection algorithm that is widely implemented in aerial images for its promising accuracy and efficiency. Equipped with the oriented Region Proposal Network (oriented RPN), the first stage of the Oriented R-CNN produces accurately oriented proposals simply and efficiently. Architected via lightweight fully convolutional networks, the oriented RPN extracts multi-scale features using the five-stage FPN. Three types of horizontal anchors with various aspect ratios are devoted to delivering oriented proposals with a novel representation scheme of oriented bounding boxes
O = (
x,
y,
w,
h, ∆
α, ∆
β) for objects named as a midpoint offset (representation shown in the top of
Figure 3) so that the coordinates
of the four vertices of the oriented bounding boxes can be calculated as follows:
In the second stage, an oriented R-CNN head is responsible for refinement and recognition of oriented Regions of Interest (oriented RoI). Fixed-size feature vectors are derived using rotated RoI alignment and sent into two fully connected layers, followed by two parallel fully connected layers for classification and regression tasks, respectively.
Figure 3 illustrates the pipeline of Oriented R-CNN.
For SAR ship detection, the phenomena of dense distribution and near-shore scenes increase the difficulties in detecting objects precisely. Although the Oriented R-CNN achieves promising results in oriented object detection because of its capability of extracting image features and sketching the contours of objects with oriented bounding boxes, when encountering a domain shift problem as distinguished data distributions of training and testing images, it will come up against a brick wall, just as with other standard oriented-object-detection algorithms. For instance, missing detection and category confusion problems will occur frequently and damage model performance. Manifold data augmentation strategies have been proposed to improve the accuracy of detectors, but such domain shift problems continue to exist if only adopting simple data augmentation methods such as random crop, random flip, and some unsophisticated noise simulations, leading to limited performance improvements in our experiments for low-quality SAR ship detection. As a result, the proposed method is designed elaborately as a typical data augmentation strategy for the object detection task to improve the robustness of detectors in strong SAR interference environments based on CycleGAN by simulating global interference and imaging characteristics for generated low-quality SAR images and expanding training samples for detection models. Exhaustive experiments demonstrate that the proposed methods are efficient and credible, considering the detecting performance significantly increases and the missing rate decreases simultaneously in the SAR ship detection task.
3. Results
This section illustrates the methodology and datasets involved in the experiments of our proposed method. Comprehensive experiments are included in this section to analyze the model’s efficiency and potential. Both the generation model and object detection algorithm are evaluated using standard metrics [
53,
54], such as Inception Score (IS), Frechet Inception Distance (FID), recall, precision, and mean Average Precision (mAP). Various data conditions for image-to-image translation model training and multiple data augmentation strategies are investigated in our experiments and demonstrate the necessity and significance of the proposed method.
3.1. Datasets
3.1.1. GaoFen-3 Ship Dataset
Thirty scenes of estuaries and rivers from the GaoFen-3 (GF-3) satellite are involved in this dataset, demonstrating an azimuth resolution of 1.124 m and slant range resolution of 1.700–1.754 m. The original SAR images are preprocessed to uniform slices with a size of 1024 × 1024, a size which includes ship targets. The training set contains a total of 2555 images and 7747 ship targets from 20 scenes, and the remaining 10 scenes are sliced to form the test dataset of 1037 images with 2680 ship targets. Both near-shore and off-shore scenes are explored in the experiments of SAR image generation and oriented object detection. In addition, the low-quality SAR images in the test dataset appear with strong interference, while the training dataset is derived from a normal high-quality imaging environment.
3.1.2. SRSDD-V1.0
SRSDD-V1.0 [
55] is a high-resolution SAR rotation ship detection dataset. All data in this dataset are from the GF-3 Spotlight (SL) mode with a 1 m resolution, with each image having 1024 × 1024 pixels. The data of inshore scenes occupy a proportion of 63.1%, with complex backgrounds and much interference, making detection more challenging. The dataset contains multiple categories: a total of six categories of 2884 ships that are all annotated by rotatable bounding boxes. It contains a total of 666 images, including 532 training images and 134 test images. Similarly, we introduce typical noise and interference into the testing samples to explore the efficiency of the proposed method when it comes to the domain adaption problem shown in
Figure 4.
3.2. Evaluation Criteria
3.2.1. Evaluation Metrics for GANs
An Inception model is employed to evaluate the generated images’ conditional label distribution
p(y|x), and a higher Inception Score (IS) indicates the better quality and diversity of the samples. Assuming images with reasonable semantic information bring about a conditional label distribution
p(y|x) with low entropy and rich diversity is accompanied by high entropy on the data distribution marginal, then it is expressed as follows:
The IS for estimating the generation quality and diversity is set as follows:
However, the IS demonstrates some disadvantages when meeting model collapse, overfitting, and the specific generation task. For instance, in our method of generating low-quality SAR images with strong interference, the ideal samples tend to have a lower IS with a complicated imaging background. As a result, another standard evaluation metric, the Frechet Inception Distance (FID), is adopted as the main evaluation criterion for GANs in this paper.
The FID represents the similarity between the real data and generated data distributions using the Frechet distance between the two Gaussian distributions and is computed by the following equation:
Here, φ is the Inception network’s convolutional feature functions, and φ () and φ () are the Gaussian random variables with empirical means µr; µg and empirical covariance Cr; Cg.
3.2.2. Evaluation Metrics for Oriented Object Detection
The precision P and recall R are defined by the following equations:
These are the basic metrics in the object detection model, with
TP,
FP, and
FN referring to True Positive (
TP), False Positive (
FP), and False Negative (
FN), respectively, and they measure the accuracy of the detectors. As a supplement, the oriented Intersection Over Union (IOU) measures the overlapping area between the predicted bounding box
and the ground-truth bounding box
divided by the area of union between them, as shown by the following equation:
When the IOU is larger than the given threshold t, the predicted bounding box is treated as a correct detection result.
As the most common evaluation metric in object detection, the measurement method of the average precision AP and the mean AP (mAP) is provided by the PASCAL VOC challenge over all object classes. AP represents the comprehensive performance of the detector since a trade-off exists between precision and recall under different IOU thresholds and can be visualized via the precision–recall curve
P(
R) and is calculated as follows:
Hence,
mAP is the overall average
AP of total
N object classes, as shown in the following equation:
3.3. Image Generation Experiments
In the first stage of our experiments, the image-to-image translation task aims to imitate the characteristics of SAR interferences and only trains the generators with images from two different domains so that no annotations are required. Selecting from the whole original dataset, we build two domain datasets for CycleGAN: the high-quality SAR as the source domain and the low-quality SAR with strong interference as the target domain. For more elaborate analyses, near-shore SAR images that contain lands and off-shore SAR images with only ship targets in the ocean are separated for subsequent experiments to determine the data distribution modeling ability of CycleGAN on SAR images under different scenes.
Due to the high computing resource consumption of Generative Adversarial Networks, the SAR images are resized to 800 × 800 when training, while the size is maintained at 1024 × 1024 in the test phases in order to retain the original semantic information as much as possible for the downstream object detection task.
Inference on the architecture of CycleGAN and artificial low-quality SAR images are produced and inherit the bounding boxes annotations, respectively, from the high-quality SAR dataset. The quality of the image generation model is evaluated using GAN metrics such as FID and IS. From observations and trials, SAR images containing both land and ocean have more rich geographic semantic information, and the SAR images only displaying several ship objects with only ocean background information lack enough interference information for CycleGAN networks. Training with both near-shore and off-shore SAR images tends to be more effortless in developing a cycle-consistent generation model of CycleGAN, which leads to competitive generation performance. As a supplement, the unpaired SAR datasets consist of approximately 342, 522, and 864 SAR images for the near-shore, off-shore, and both scenes, for each domain, respectively.
As the copy and crop strategies are discarded in order to study the global features of the SAR images, CycleGAN is trained for a total of 80,000 iterations with batch size 1 on an NVIDIA GeForce RTX 3090 GPU. Adam optimizers are set for both generations and discriminators, and the learning rate is 0.0002. The evaluation of the FID and IS metrics takes place every 2000th iteration, with the best IS (highest) being 3.4309 at the 8000th iteration for near-shore scenes, and the best FID (lowest) being 49.7250 at the 40,000th iteration when training with both near-shore and off-shore SAR images.
Table 1 shows all the quantified evaluation results of our experiments in separate subdatasets. Consequently, the generation model with the best FID is chosen to be responsible for producing low-quality SAR images as extended training samples from the high-quality SAR dataset, considering the IS metric is comparatively inappropriate for evaluating the quality of a generated low-quality SAR image in this task.
Table 1 shows the evaluation results of the SAR image generation task.
The training process of all near-shore and off-shore SAR image generation tasks is visualized in
Figure 5 every 12,000 iterations, from the 4000th iteration to the 40,000th iteration, and the samples at the end of the iterations for visualization are random.
3.4. Oriented-Object-Detection Experiments
All these experiments were conducted on two NVIDIA GeForce RTX 3090 GPUs for a total of 36 epochs on the GaoFen-3 ship dataset and 120 epochs on the SRSDD-V1.0 dataset with a batch size of 4. The learning rate was 0.005, and the momentum was 0.9 with a weight decay of 0.0001 on an SGD optimizer. The Oriented R-CNN worked as the baseline for our experiments on SAR ship detection with rotated bounding boxes. In addition, random flipping and random cropping were regarded as standard data augmentation strategies. More training epochs were applied on the SRSDD-V1.0 dataset for enough iterations with fewer training samples when keeping the batch size constant.
3.4.1. Experiments on GaoFen-3 Ship Dataset
In training on the original SAR dataset, the high-quality SAR images dominate the direction of convergence in the model training process, and most of the ship targets in low-quality SAR images will be ignored by detectors in the inference phase. To determine the efficiency and significance of our proposed method, multiple data augmentation strategies and noise simulation methods based on both normal distribution and GANs were explored in comprehensive experiments.
On the one hand, several traditional data augmentation methods, including random flipping, random rotation, and Hue Saturation Value (HSV) augmentation, were utilized to improve the detection performance and robustness of the model. The results reveal that common data augmentation strategies are capable of improving detection accuracy to some extent but tend to be ineffective when it comes to the domain shift problem. However, after adding the artificial low-quality SAR image samples obtained from CycleGAN into the detector training process, the detection performance improves significantly due to the shrinking gap between the two domains. On the other hand, a common noise simulation method of gamma distribution noise was compared with the proposed method, and the evaluation results demonstrate that our method has distinguished performance considering the comprehensive interference generation abilities relying on CycleGAN. Furthermore, multiple-oriented-object-detection algorithms were investigated in the expanded experiments to confirm the practicality and robustness of the proposed method as an unsupervised domain adaption method for the ship target detection task in SAR images. The Oriented R-CNN was selected as the final detection algorithm in this downstream task, establishing a trade-off between the model performance, time and memory consumption of the detector training process, and model stability. The evaluation results using recall, precision, and mAP metrics illustrate that other general augmentation strategies achieve feeble improvements in low-quality SAR ship detection, while our proposed method emerges as an effective approach that is specially designed for such typical imaging characteristics.
Figure 6 presents some cases of the reductions in missing detection, and
Table 2 shows the detection results of our experiments.
Considering there is a trade-off between precision and recall, especially for detection tasks with only one category, the quantitative evaluation results in
Table 2 are not reliable enough to evaluate the detection performance of the proposed method, as a higher recall may bring lower precision spontaneously. Therefore, the visualized curve of precision and recall is displayed in
Figure 7, which reveals that our methods (R + F + H + C and R + F + H + G + C) outperform baseline and other data augmentation strategies.
To ascertain the extensive applicability of our proposed approach, we perform supplementary experiments utilizing horizontal object detection algorithms to assess the efficacy of our approach as an efficacious data augmentation strategy, including YOLO series as one-stage networks and Mask R-CNN as two-stage networks. The details of the evaluation are displayed in
Table 3.
Additionally, to avoid any possible bias caused by the individuality of the training and test dataset, the cross-validation experiments were conducted on the GaoFen-3 ship dataset, adopting the Oriented R-CNN baseline. The evaluation results shown in
Table 4 verified that the proposed method improves the model performance in all data divisions, and also outperforms the traditional data augmentation strategies, driving us to reach conclusions that are consistent with those made in
Table 2.
3.4.2. Experiments on SRSDD-V1.0 Dataset
To authenticate the validity of our proposed method for a cross-domain problem encountered with SAR images, we generated strong interference for a public SAR dataset named SRSDD-V1.0, which contains six categories. It was assumed that the well-trained detector trained on normal SAR images would significantly struggle with the testing images with strong interference, and artificial low-quality SAR training samples are beneficial for mitigating such detector deterioration and improving the model’s robustness. The evaluation results for each category based on mAP, recall, and precision are shown in
Table 5 and exhaustively prove our speculations. During multiple experiments, random flip and random rotate data augmentation strategies were abandoned due to their negative effects on the SRSDDV1.0 dataset. In addition, we present some visual examples of the classification and detection improvements with the proposed method in
Figure 8.
4. Discussion
Ship target detection in low-quality SAR images is continuously challenging due to their complicated imaging conditions with strong interference and fewer quantities. To improve the detection performance for this typical condition, it is regarded as a domain shift problem, and unsupervised domain adaption is adopted to shrink the gap between different data domains. Generative Adversarial Networks are capable of simulating the image style of low-quality SAR images using deep-learning networks and generating artificial SAR images with special characteristics for expanding the training samples of detection models. Extensive experiments demonstrate that our proposed method is an efficient data augmentation strategy to improve the model performance and robustness of ship target detection in SAR images. We will further discuss how our work compares with other research in the field of computer vision and explore the influence of the unique characteristics of SAR images on our research.
Different from the traditional methods of injecting the noise of simple statistical distribution into SAR images, our approach demonstrated superior performance in experiments for several reasons. Firstly, in traditional approaches to image processing, a limited variety of noises will be utilized without rigorous consideration of the statistical distributions of the noises in the real world and specific applications, so these data augmentation strategies relying on the traditional noise simulation make feeble improvements upon our task. However, the proposed method, leveraging style transferring on image generation, can more authentically simulate the sophisticated characteristics of interferences on low-quality SAR images. This not only provides our model with greater robustness when faced with low-quality SAR images, but it also bestows it with a more potent generalization ability. Secondly, compared to other studies that attempt to improve model performance using popular data augmentation methods in computer vision, our method delves deeper into the characteristics of SAR images. SAR possesses a unique imaging mechanism, resulting in typical noise and low quality. Our method not only takes these characteristics into account but also strives to enhance model performance through them. For instance, we harness the features of complex noise in low-quality SAR images and generate low-quality SAR images via CycleGAN, which has a strong ability to learn latent data distribution comprehensively. This allows the model to adapt to special conditions during training, improving its applicability to actual scenarios.
In our discussion, we would also like to follow through with the analysis of the generalization ability of our model. The factors influencing a model’s generalization ability include the volume of data and target morphology, among other factors.
On the one hand, the proposed method uses image translation techniques to create a greater volume of samples, improving the model’s generalization ability, particularly within low-quality SAR images with complicated interference. The introduction of more data and samples typically implies an increase in the volume of data annotation, while the proposed method achieves image style transfer, keeping the target position consistent, enabling instance-level unsupervised training to generate augmented samples without necessitating additional annotation costs. This factor is vital as annotation, especially for SAR images, can be a tedious and time-consuming task, often requiring specialized knowledge about the target and environment. By adopting our method, more training data are generated without incurring the costs of increased manual labor.
On the other hand, our approach also takes into consideration the various target morphologies that may exist within SAR images. The use of an unsupervised domain adaption method based on image-to-image translation helps to simulate different levels of quality and noise within the images, providing a more comprehensive set of training data that covers a broader range of target appearances. This further enhances the robustness of our model, enabling it to better generalize and detect targets under varying circumstances.
In summary, our approach considers both the generalization ability of the model and the label cost in the practical application process, providing an efficient solution for target detection and identification in low-quality SAR images.
In the first stage of this work, CycleGAN is responsible for modeling a cycle map between two SAR image dataset domains. Both near-shore SAR images and off-shore images are involved in the experiments, and we discover that rich semantic information in various scenes in terms of both near-shore and off-shore data are beneficial for improving SAR image generation performance and training stabilities. Afterward, multiple training processes for the low-quality SAR image generation task were analyzed and explored, and the shortcomings of the Generative Adversarial Networks emerged, such as the model training stability, overfitting, and mode collapse. Especially for the SAR image generation task, the special physical imaging mechanism, diverse target characteristics, and multiple interferences increase the difficulty of the generator training based on the GANs. It is assumed that some well-designed modules based on the combinations of physical imaging theories and deep learning, which are not deeply involved and investigated in our works, will achieve outstanding improvements in the SAR image generation task under designated scenes.
According to the final stage of the object detection modules, our method was treated as a data augmentation strategy to simulate a typical image style and interference characteristics and enrich the training samples for narrowing the domain gap. Comprehensive experiments verify that the proposed method outperforms other traditional data augmentation strategies and works effectively in multiple-oriented-object-detection algorithms. On the other hand, an SAR image generation model based on CycleGAN is also available to convert a low-quality SAR image to a high-quality SAR image with denoising and despeckling abilities, so it is possible for it to be implemented as another data preprocessing method for detection tasks in the future. Considering the inference time consumption of the detection processes, we decided not to undergo the expansion of research and experiments for such preprocessing tricks in this study.
5. Conclusions
The proposed method improves the robustness and detection performance of the SAR ship detection model by implementing an unsupervised domain adaption image-to-image translation task based on Generative Adversarial Networks with cycle-consistency loss regarding the ship target detection task in low-quality SAR images with strong interference as a domain shift problem. Artificial low-quality SAR training samples that have a compatible imaging style with a strongly interfering environment are produced via CycleGAN, which is capable of modeling the data distributions on different domains. The mAP performance of the oriented-object-detection models significantly improved for the GaoFen-3 ship dataset, from 79.9% to 90.9% and from 49.3% to 52.5% for the SRSDD-V1.0 dataset; other evaluation metrics, such as recall and precision, also demonstrate the efficiency of the proposed method via the results of comprehensive experiments with multiple-oriented- and horizontal-object-detection algorithms using one-stage and two-stage detectors. On the other hand, the obvious problems of a high missing rate and classification error for the domain shift problem were dramatically ameliorated, given the artificial training samples provide meaningful semantic information for the detection algorithm. In addition, this work investigated the capacity and potential of SAR image generation and style transferring tasks based on Generative Adversarial Networks for near-shore and off-shore scenes.