1. Introduction
The presence of weather conditions affects and corrupts the signal quality of autonomous driving system (ADS) sensors and causes perception failures [
1]. During the past decade, with the expansion of weather-related datasets [
2,
3] and the development of sophisticated machine learning techniques, models designed to address adverse weather problems, such as precipitation, in autonomous driving have been widely studied [
4]. Progress has been made in the weather models of both LiDAR and images with weather conditions commonly treated as uniformly or normally distributed noise that can be represented by linear or monotone functions [
5,
6]. Snow is among the conditions with tangible threats to the sensors but few profundities in research. What makes it even more special is the unexpected irregular swirls phenomena brought by wind or the motions of the ego vehicle itself or passing-by vehicles [
7], causing not only randomly distributed scattered noise (salt-and-pepper) but also swirl clusters in LiDAR point clouds, as shown in the disordered clusters near the center in the red boxes of
Figure 1a. The study on the snow problem in the point cloud has been focusing on k-d-tree-based neighbor searching outlier filters [
8] in recent years and the de-noising performance has almost reached saturation [
9]. Nonetheless, few attempts at the implementation of deep-learning-based models have been made in snow conditions. Unlike filters with limited explainability, learning-based models have the potential to grasp both the surface and hidden features of snow clusters in a certain driving scene and perform snow synthesizing on top of snow de-noising.
The development of robust weather models benefits from training on paired data, i.e., a pair of weather-corrupted data and clear data with the rest of the elements identical, which are commonly obtained via artificially synthesizing realistic weather effects in previously clear driving scene images [
10,
11,
12]. Such an approach has been proven highly effective in rain [
13,
14], fog [
15,
16], and snow [
17] weather conditions in camera images, plus contaminations on the camera lens [
18]. However, due to the relatively low data density, the realization of the weather effects in point clouds still largely depends on the collections in weather chambers [
19,
20] before the mature realization of weather data augmentation in point clouds. Although models have been successfully built for point clouds under rain and fog with additional road data [
21], the low domain similarities between chambers and real roads still largely limit the generality. What is more, common experimental facilities with controllable precipitation rates across the world can hardly simulate complicated weather conditions such as dynamic snowfall [
4]. Therefore, it is necessary to develop a way to work with few paired data or unpaired data.
In terms of disentangled data processing, CycleGAN [
22] demonstrates a high ability in style conversion and object generation [
23] based on datasets with different backgrounds and from different domains, and its implementation in weather models has been proven feasible [
17,
24]. In this research, we propose the ‘L-DIG’ (LiDAR depth images GAN), a GAN-based method using depth image priors for LiDAR point cloud processing under various snow conditions. The original data format of the LiDAR point cloud in the spatial dimension does not align with the plane dimension as camera images, while a depth image is able to store the third dimension information of a point’s spatial coordinates into its pixel value, hence serving as the 2D representation of the LiDAR point cloud, providing the opportunity of employing GAN-based methods [
25]. Our unpaired training datasets are derived from the Canadian Adverse Driving Conditions (CADC) dataset [
2], which is known for its authentic winter driving scenarios and snow diversity. The proposed model aims to perform snow de-noising with a deep-level understanding of the snow features related to the driving scene, and inversely generate fake snow conditions in clear point clouds, exploring the possibility of creating paired synthetic datasets for point clouds with snow conditions.
Furthermore, the quantitative evaluation of LiDAR point cloud processing has always been a tricky task. Researchers used to select a certain amount of samples, 100 frames for example, and manually determine if a point is a snow point, in order to calculate the precision and recall of the removal of snow noise points [
8,
26]. Even though straightforward, it has two downsides: For one, it consumes an unimaginable amount of time and manpower to manually annotate a whole dataset while a small portion of samples suffers the risk of bias. Secondly, the accuracy of human annotation on point clouds with over 20,000 points in each frame can be as low as 85% and not satisfied enough to support the subsequent calculations on precision and recall [
27]. Given that a LiDAR point cloud fundamentally represents a distribution of points within a three-dimensional space, 3D point clustering algorithms can be effectively used to reflect the physical properties and assess the performance of point cloud processing specifically on snow points [
28].
Among various algorithms, the
ordering points to identify the clustering structure (OPTICS) algorithm [
29] emerges as a preferred choice due to its exceptional capacity in handling clusters of varying densities [
30]. Derived from the DBSCAN (density-based spatial clustering of applications with noise) methodology, OPTICS identifies distinct cluster groups by densely associating data points within a predetermined radius [
31]. This principle highly aligns with the adaptive filters, which are predicated on neighbor searching and have been widely employed in the LiDAR point cloud de-noising over recent years.
An example of the OPTICS clustering result is shown in
Figure 1b. It can be seen that despite the density variation, everything has been classified into separated groups of clusters, each distinguished by unique colors, while the environmental structures are also well segmented. The conglomeration of snow swirl points, positioned at the lower left of the center, are collectively assigned to large blue and purple clusters. Minor snow clusters, such as those in the immediate right vicinity of the center, along with individual scattered snow points spread across the scene, are categorized into smaller, uniquely colored clusters.
The main contributions of this research are described as follows:
We build a deep-learning-based LiDAR point cloud translation model with unpaired data and depth image priors. A new discriminator structure to better remove snow noise and new loss functions, including depth and SSIM (structural similarity index measure) losses to maintain the driving scene integrity have been designed in the proposed model.
The proposed model is able to perform LiDAR point cloud translations between snow conditions and clear conditions in driving scenes. The model demonstrates a certain level of understanding of the snow features and performs both effective snow de-noising and artificial snow point generation, which could help create paired datasets for training or simulation in autonomous driving applications.
We employ the OPTICS clustering algorithm as the quantitative evaluation metrics of the snow conditions in LiDAR point clouds. We set adaptive parameters to cluster different forms and levels of snow masses, calculate multiple indexes, and present distribution changes in violin plots to reflect the conditions of snow in the whole dataset in a comprehensive way.
4. Experiments and Results
4.1. Experiments
We conducted experiments with the trained models on two different conditions: (1) Mild snow conditions: snowfalls only without snow swirls; (2) Fierce snow conditions: both snowfalls and snow swirls. We first conducted the experiment on snowfall-only conditions to examine the performance of scattered noise point capture. In the meantime, this less occlusion condition provides a better opportunity to check how well the original environmental structures have been maintained, so as to affirm the model’s ability for accurate snow capturing. Then, the same experiment was conducted on conditions with both snowfalls and snow swirls, to comprehensively present the model’s ability to handle highly adverse conditions.
To guarantee the realism of the snow effect, we utilized the well-known dataset specializing in snow conditions within an autonomous driving context—the CADC dataset [
2]. This dataset encompasses over 7000 frames of LiDAR point cloud data gathered during the winter season in Waterloo, Ontario. The driving scenarios span both urban and suburban settings, incorporate high and low-speed conditions, and cover a range of snowfall intensities and heavy snow accumulation situations. The LiDAR system implemented in the CADC dataset has a vertical FOV of 40
, with a range from +10
to −30
, and a horizontal FOV of 360
.
Training, testing, and data processing were conducted utilizing the Pytorch framework. We initially examined all possible combinations of ResNet residual blocks (ranging from 4 to 9) in and , and convolutional layers (ranging from 1 to 4) in and , and identified the most optimal combination for our model. When variables are kept constant, a combination of four ResNet residual blocks in and two downsampling convolutional layers in and produce the most superior translation result.
Square-shaped samples randomly cropped from the depth images are input to two NVIDIA RTX 3090Ti graphics cards with a batch size of eight for training. In the second half of the N-Layer discriminator stage training, we adhere to a linearly declining learning rate schedule until the process converges.
The quantitative analysis is conducted based on 500 samples under mild snowfall conditions and the other 500 samples under fierce snow swirl conditions out of the testing dataset. The reported metrics in the following results all mean the average values.
4.2. Results
4.2.1. Mild Snow Conditions
Figure 6 shows the translation results of our model under mild snow conditions, which means the majority of the snow is scattered noise points without the snow swirl phenomenon. (a), (b), and (c) sets show three typical scenarios with the top row being the original snow scene from CADC, the middle row being the de-snowed results, and the bottom row being the fake snow results. Each of the three scenarios presented features an overall BEV on the left, while the right shows a magnified third-person view of the point cloud’s central region, where the ego vehicle is situated.
As indicated by the red arrows and encircled by red boxes, it is clear that the ‘salt-and-pepper’ noise points have been largely erased, with key environmental features left unaltered. Essential components, like vehicles (outlined in green) and pedestrians (highlighted in orange), are not only well preserved but also exhibit a level of point enhancement, as demonstrated in the de-snow (a) set. Moreover, the road sign enclosed in the red box of (a) which was partially obscured by snow points in the earlier image, seems to be better defined, a testament to the deep scene comprehension facilitated by our model.
The quantitative analyses are presented in
Table 3. The noticeable reduction in the average noise number, cluster count, and overall reachability distances in the de-snowed results strongly suggests the effectiveness of the de-snowing process. As the majority of clusters now comprise object points and environmental features that are more densely and uniformly packed, the average inter-cluster distances, and average cluster sizes naturally increase. This shift in cluster characteristics is a byproduct of fewer, but more meaningful, clusters primarily representing substantive elements of the environment rather than scattered snow points. Similarly, the declines in the DBI and silhouette score are in line with our expectations for the de-snowing process.
In the violin plots of
Figure 7, the colored data on the left represents de-snowed data, while the gray data on the right serves as a comparison from the CADC dataset. This arrangement is consistent across all subsequent violin plots. A glance at the better evenness within the cluster distribution on the left half of each violin plot reveals the improvement of the de-snowing process compared to the slightly skewed distribution on the right. This observation is further substantiated by the lower skewness of the de-snowed distributions. Our calculations show that for the reachability distances, inter-cluster distances, and sizes of clusters, the skewness values for the de-snow data are
,
, and
, respectively, while for the CADC data, these values are
,
, and
. Note that the median reachability distance of the de-snow is a little bit higher than with snow. This small anomaly originates from a few detached clusters at a remote distance after de-snowing, which can be seen from very few sample points exceeding the upper limit of the y-axis.
For fake snow generations, as seen in the bottom row of
Figure 6, the scattered snow features are noticeably reproduced, and there is an apparent enhancement as the number of noise points is higher. This is in line with the noticeable increase in cluster number and DBI, as well as the reduction in cluster sizes, as presented in
Table 3. The artificially generated snow demonstrates a remarkable replication capacity, as evidenced by the highly alike violin plots (left and right) in
Figure 8, including the quartile lines. The degree of skewness (8.87, 0.33, and 22.43) is remarkably close to the previously mentioned CADC snow skewness (9.64, 0.30, and 21.49), further attesting to the model’s ability to accurately reproduce snow effects.
4.2.2. Fierce Snow Conditions
Figure 9 demonstrates the translation outcomes of our model under intense snow conditions, characterized by the presence of snow swirls around the ego vehicle. Three distinctive scenarios have been chosen for illustration, and are presented in the same format as in
Figure 6. In these harsh snowy conditions where the snowfall has dramatically increased, it becomes easier to observe that the vibrantly colored airborne snowdrifts (highlighted in shades of red, green, yellow, and cyan) have been substantially mitigated, as indicated by the red arrows.
Under these severe snow circumstances featuring dense snow swirl clusters, our attention is more on the noise reduction near the ego vehicle, as indicated by the red boxes, instead of entirely eradicating the snow swirls, as this could lead to a loss of important environmental elements. We strive for a balance between significant snow removal and effective preservation of objects like the vehicles shown in the green boxes. Simultaneously, a certain degree of point cloud restoration can also be observed near the central ground rings. This can be credited to the profound comprehension of the scene by the translation model.
Table 4 provides a quantitative representation of the translation model’s performance under extreme snow conditions. Given that the translation effects are applied to more clusters spanning the entire scene during heavy snowfall, all metrics largely veer towards less noise and tidier clustering results in the de-snowing task. From
Figure 10, we can tell that the shifts in quartile lines are less prominent, which can be attributed to the fact that snow swirls typically have capacities similar to those of object clusters. Nevertheless, the efficacy of the de-snowing process is evidenced by the smoother and more consolidated distributions in the violin plots. This assertion is additionally validated by the slightly improved skewness of the de-snowed data which stand at 8.87, 0.38, and 28.04, respectively. Conversely, for the CADC data, these values are 10.45, 0.42, and 32.76.
As can be observed from the bottom row of
Figure 9, our model effectively replicates airborne snowdrifts and the snow swirls surrounding the ego vehicle. However, the model exhibits a slight restraint at the point cloud’s central region. This outcome results from our strong focus on comprehending the driving scene, which is to avoid the broken integrity obstructed by the extremely dense snow swirl clusters. This situation leads to a somewhat compromised snow imitation, as corroborated by the close statistical outcomes in
Table 4, which do not exhibit any significant jumps. Still, the near-symmetrical violin plots in
Figure 11 further substantiate the successful emulation of snow effects. The smoother edges of reachability distances and the more concentrated distribution in cluster sizes of the imitation snow hint at a feature of reduced noise at the center. The skewness values associated with the artificially generated heavy snow are 9.95, 0.42, and 31.75, respectively. These values closely mirror those obtained from actual data, which are 10.45, 0.42, and 32.76, respectively. This statistical similarity provides additional evidence, affirming our model’s effectiveness in the task of synthetic snow generation.
4.3. Ablation Study
To affirm the significance of our model’s key components, we conduct an ablation study using the de-snow model under mild snow conditions. This study investigates the impact of the absence of the pixel-attention discriminator, SSIM loss, depth loss, and the basic CycleGAN. Additionally, we examine a training pair with a considerable domain gap. For this purpose, we select 6000 frames from the LIBRE dataset [
1], which was collected under clear conditions in Nagoya, Japan’s urban area. This choice serves as a representative due to the substantial domain disparity between Canada and Japan in terms of scenario layouts and traffic patterns. The CADC dataset contains a large portion of suburban scenarios with fewer buildings and more vegetation, which hardly appears in the LIBRE dataset.
Table 5 presents the results, using our proposed model as a reference.
The absence of the pixel-attention discriminator results in an immediate degradation in the performance, as evidenced by the increased noise number and reachability distance. Failing to remove a certain amount of solitary noise points substantiates the importance of the pixel-attention discriminator in de-snowing.
More noise points are observed in the scenario without SSIM loss. Apart from the slightly reduced cluster number, other metrics, especially the elevated reachability distance, indicate a breakdown in structural integrity during the translation process. A primary objective of our model is to maintain the crucial objects and environmental elements as effectively as possible, thus affirming the critical role of SSIM loss in our model.
The scenario without depth loss indicates a complete failure in de-snowing, as evidenced by the significant plummeting in all metrics toward noisy and poor clustering. The cause of this failure lies in the unique properties of depth images, which are highly sensitive to non-linear scale changes during the conversion back to point clouds. Consequently, the depth loss forms the cornerstone of our translation model based on depth images.
In the basic CycleGAN model, the mediocre statistics could be interpreted as an utter ineffectiveness in point cloud translation, without managing to preserve the original states either. This result underscores the necessity of all the components in our model for achieving successful translation outcomes.
Finally, when trained on datasets with a substantial domain gap, the model does not yield satisfactory de-snowing performance. This is suggested by the exceedingly high noise number, reachability distances, and low cluster sizes, at least under the same parameter settings as before. The unjustifiably high noise and cluster numbers are the result of poor clustering, which is corroborated by the exceedingly high DBI. This result, derived under extreme conditions, serves to confirm our judicious decision to generate unpaired clear data with filters. However, it does not necessarily suggest that our model lacks generality. Despite this, the model’s robustness against domain gaps does stand as a major limitation of our current translation model.
5. Conclusions
In this research, we introduced a GAN-driven model capable of translating LiDAR point cloud data, encompassing both the removal of snow and artificial snow production. Utilizing depth image priors of point clouds, our model was trained on unpaired datasets and supplemented with depth loss and SSIM loss functions to ensure scale and structure consistencies. Furthermore, we crafted a novel discriminator structure with an emphasis on pixels. This feature was integrated alongside the original convolution layers, thereby enhancing the snow removal capability in the vicinity of the ego vehicle. Experiments carried out using authentic snow conditions from the CADC dataset revealed a profound comprehension of snow characteristics and driving scenes as well as exceptional performance in snow removal and snow reproduction. The 3D clustering results from the OPTICS algorithm and their corresponding violin plots evidently prove the successful translation between snow and clear conditions.
LiDAR point cloud processing under snowy conditions has consistently faced the challenge of lacking reliable snow-affected data. Given the difficulty in acquiring paired or quasi-paired data under both snowy and clear conditions, our current model must strike a balance between the strength of translation and model stability, which subsequently leads to domain sensitivity. Moreover, the limited resources of the CADC dataset intensify the adversity for training and testing. To address these limitations, our future goal is to develop the capability to generate high-quality paired data under snowy conditions. This aim is to augment the LiDAR point cloud with snow based on a deep understanding of the driving scene, with the ultimate intention of preserving the original state of the scene to the greatest extent possible.