1. Introduction
Lightning is one of the most common discharge phenomena in the atmosphere. According to statistics, lightning events occur approximately 30–100 times per second globally [
1]. According to the spatial location of lightning, people usually classify lightning events into two types: IC and CG. According to a large amount of recorded data, cloud flash events account for about three-quarters of total lightning events [
2]. In all lightning events, some lightning discharges from the cloud to the ground, which is considered a ground flash. This type of lightning poses a serious threat to human survival and may cause the death of organisms. The pulses generated by cloud flashes are different from those generated by CG. Based on the difference in waveform between IC and CG, people can simply classify discharge events based on the waveform of extremely low-frequency electromagnetic fields. Usually, CG flashes within a range of several tens of kilometers from the observation point have similar time-domain characteristics, and their initial peaks in the time-domain waveform have the characteristics of a steep rising edge and a slow falling edge [
3,
4]. Compared to CG, the pulses of most cloud flashes are usually narrower. In addition, due to the high diversity of discharge events, the pulse waveforms generated often exhibit significant differences from one event to another. Therefore, the differences between the electric field waveforms generated by IC and CG are utilized by many lightning detection networks as a basis for distinguishing between different types of lightning.
For most lightning detection networks, multi-parameter methods are often used to classify lightning strikes and lightning in clouds. This method typically extracts time-domain features such as amplitude ratio, descent time, ascent time, and zero-crossing time of waveforms to characterize electromagnetic field waveforms [
5,
6]. According to some practical verification results, the multi-parameter method has low classification accuracy for lightning signals. For example, Flenor et al. [
7] demonstrated that, in a 2005 actual inspection, approximately 54% of the ICs recorded by the National Lightning Detection Network (NLDN) were incorrectly classified as CG. Based on the inspection results, Leal et al. [
8] found that when the peak current of cloud flash is greater than 50 kA, the NLDN and Earth Networks Total Lightning Network (ENTLN) wrongly classify this cloud flash as CG; Paul et al. [
9] found during an inspection that 30% of the detected ground flashes were actually cloud flashes. Most lightning detection networks only provide classification results for cloud flashes and ground flashes, but there is little mention of the classification and recognition results for special cloud lightning events such as NBEs. In a few articles that provide NBE classification accuracy, their classification accuracy for NBEs is usually low, mainly due to the influence of multi-parameter classification methods. For example, when the peak current is higher than 20 kA, more than 97% of NBEs are misclassified in the NLDN, while the corresponding percentage misclassified by the ENTLN exceeds 63% [
8]. Therefore, using multi-parameter methods for more precise classification of lightning discharges is a challenge.
Deep learning has become an increasingly important branch within the field of machine learning. In recent years, deep learning has made breakthrough progress in fields such as video recognition, audio analysis, and medical diagnosis. A study has used support vector machine (SVM) methods to classify extremely low-frequency lightning waveforms of cloud and ground flashes, with an accuracy of 97% [
10]. The introduction of neural networks has significantly improved the classification ability of lightning waveforms. In a study by Wang et al. [
11], they applied one-dimensional convolutional neural networks (CNNs) to lightning signal classification. In their results, 10 types of lightning signals were classified with an overall accuracy of 98%, but its recognition efficiency was very low and required high hardware equipment, making it difficult to meet the needs of real-time classification. On the other hand, although multi-parameter recognition methods have better recognition speed than CNNs and other methods, it is difficult to achieve higher accuracy recognition. Therefore, in practical business applications, a method that balances classification accuracy and classification speed is needed. Autoencoders were developed as early as 1980 [
12]. People can use autoencoders to convert complex high-dimensional data into low-dimensional encoding, and apply autoencoders to various fields. For example, Mak et al. applied variational autoencoders to the field of game design [
13], Kapoor et al. improved the detection accuracy of images by merging multi-layer features through bottleneck structures [
14], Guo et al. achieved efficient compression of lightning signals through stacked autoencoders [
15], and Ling et al. improved the accuracy of lightning prediction by leveraging the advantages of encoder decoder structures [
16]. This article proposes a classification method based on convolutional encoding features, which utilizes the excellent feature extraction ability of convolutional neural networks and utilizes the special structure of encoders to extract low-dimensional features and complete the classification of extremely low-frequency lightning waveforms.
5. Conclusions
This article proposes a lightning classification method based on convolutional encoding features, which can recognize and classify various types of lightning such as IC, CG, and NBEs. The main results are as follows:
(1) This paper proposes a multi-type lightning discharge recognition method based on encoding features and a random forest classifier. The method utilizes the excellent feature extraction ability of convolutional neural networks to effectively extract the convolutional encoding features of lightning waveform signals. After comparative testing, the random forest classifier outperformed SVM in accuracy and recognition speed for the same dataset classification. Therefore, we chose a random forest classifier as the final classifier for this method. Comparative tests were conducted on two factors that may affect the classification results; namely, the length of lightning pulse signals and the number of bits extracted from convolutional encoding features. After verification, it was shown that when the pulse length is 120 µs and 8-bit convolutional encoding features are extracted, this method can achieve a high recognition accuracy of 97% for the IC, CG, and NBE lightning types;
(2) Compared with multi-parameter classification methods, the new method solves the problem that multi-parameter classification methods cannot accurately identify similar cloud discharge signals (such as NBEs and cloud flash pulses), and also improves the classification accuracy of IC and CG, which to some extent reduces the complexity of lightning signal classification tasks. It was found that for the same set of data, the new method has a higher classification accuracy for the three types of lightning than the multi-parameter classification method, with an accuracy rate of about 97%. In addition, the new method does not require detailed feature engineering to extract pulse time-domain features, such as rising edge time, falling edge time, pulse width, and peak to peak ratio before and after the pulse, reducing the possibility of low accuracy caused by human error;
(3) Compared with waveform classification methods based on convolutional neural networks, our method can quickly identify lightning signals. From the recognition results, both methods have high recognition accuracy, with an accuracy difference of only about 2%. The time required for the recognition of single pulse signals by the new method is about one-tenth of that of waveform classification methods based on convolutional neural networks. The recognition speed has been greatly improved. The method can also quickly classify lightning signals while maintaining high classification accuracy, providing important support for real-time positioning business applications.
The type of lightning discharge is a key parameter for lightning monitoring and research. High-precision lightning classification information can enhance the applicability of lightning positioning data and can also be used for positioning, reducing the error of lightning positioning to a certain extent. In addition, this method is not only applicable to the classification of CG, IC, and NBEs, but can also be extended to identify more types of lightning discharges. Currently, various total lightning positioning technologies have been applied to business positioning systems. However, due to the complexity of cloud flash signals, there are many errors in using traditional signal features to identify lightning discharge types. The method proposed in this article can be applied to business positioning systems in two ways. The first way is to apply the feature extraction model to each substation of the lightning positioning system, extract the encoded features in real time, and send them to the central station. At the central station, the lightning type identification results are obtained through a random forest classification model. The second approach is to directly use the feature extraction model and random forest classification model proposed in this article in the substations to obtain the lightning discharge type results, and then send the lightning signal classification results to the central station.