1. Introduction
In a fully mechanized working face, the shearer is one of the most important pieces of coal mining equipment and monitoring its cutting condition has played an indispensable important segment for the automatic control of shearer. However, due to the poor mining environment and complex component structure of a shearer, the operator cannot identify the shearer cutting conditions timely and accurately only with the help of visualization. Under this circumstance, the shearer drum may cut the rock, which will cause harm to the machine and lead to poor coal quality and low mining efficiency. Another concern is that many casualties occur in collieries. Therefore, it is necessary to efficiently and accurately identify shearer cutting conditions, which is becoming a challenging and significant research subject [
1].
Over the past few decades, some scholars have focused on the coal-rock interface recognition to roughly estimate the cutting state of shearer. In [
2], Yu
et al. used the sonic wave reflection method to identify the coal-rock interface. In [
3], the image processing technique for visible light and infrared images was applied to the recognition of coal-rock interfaces. In [
4], the vibration signals of a hydraulic support beam were used to extract its features by the use of wavelet packet energy spectrum and the coal-rock interface was identified. In [
5], the method based on natural γ–rays was utilized to identify the coal-rock interface. Sahoo
et al. used the opto-tactile sensor to recognize the rock surfaces in underground coal mining [
6]. In [
7], radar technology was used to identify the coal-rock interface and obtain the cutting patterns of the shearer. However, coal-rock interface recognition technology requires too harsh geological conditions of the coal seam, and the recognition precision cannot help the shearer achieve automatic control.
According to some research focuses in the literature, intensive study has been done in the fault diagnosis of traditional equipment. Sensors are used extensively in pattern recognition and fault diagnosis systems because they can provide the inner information of the machine. Using vibrations to collect state information has become effective [
8]. For this reason, vibration-based analysis is becoming the most commonly used method and is also proving efficient in various real applications. For a shearer, the rocker arm is the critical component and the vibration of rocker arm can comprehensively reflect the cutting condition of the shearer, which can be diagnosed correctly by appropriate measurement and description of sensors. Therefore, data analysis methods of the measured signals are essential.
In recent years, the commonly used data analysis methods for vibration signals have been wavelet transform (WT), Fourier transform (FT), Hilbert-Huang transform (HHT), empirical mode decomposition (EMD), and so on. In [
9], a method based on wavelet transform was proposed to analyze the vibration response of discrete piecewise linear oscillators. In [
10], the authors made an attempt to identify the vibration sources, analyze the law of vibration propagation, and establish the relationship between the vibration sources and ground vibration using the Time-Wavelet Power Spectrum and the Cross Wavelet Transform techniques. In [
11], the chaotic vibrations of flexible plates of infinite length were studied and analyzed by the use of fast Fourier transforms and wavelets. In [
12], a Hilbert-Huang transform (HHT) algorithm was presented for flywheel vibration analysis to lay the foundation for the detection and diagnosis in a reactor main coolant pump. In [
13], the authors employed nonlinear rotor dynamics with the vibration signal processing scheme based on the Empirical Mode Decomposition (EMD) in order to understand the vibration mechanism.
Nevertheless, when separately depending on data, an analysis is difficult to directly identify the working status or fault type of a machine. The trend in recent years has been to automate the analysis of the measured signals by incorporating the data analysis methods with machine learning algorithms, such as neural networks (NNs) and the support vector machine (SVM) [
14,
15,
16]. NNs have gained popularity over other techniques because they are efficient in discovering similarities among large bodies of data. It has the ability to simulate human thinking, and owns powerful function and incomparable superiority in terms of establishing nonlinear and experiential knowledge simulation models. The common training method for NNs is the standard back-propagation training algorithm, which is known to have some limitations of local optimal solution, a low convergence rate, obvious “overfitting” and especially poor generalization when the number of fault samples is limited [
17,
18]. Watrous R.L. [
19] tested the application of a quasi-Newton method proposed by Boyden, Fletcher, Goldfarb and Shanno (abbreviated as BFGS method) to neural network training. It was shown that the BFGS method converges much faster than the standard back-propagation method, which uses the gradient method.
However, the quasi-Newton method has the obvious drawback that it consumes a lot of time and memory to store the Hessian matrix, which leads to the limitation in applications of complex problems. Considering the superiority of parallelism mechanism in processing speed, many methods based on parallelism mechanism have arisen to improve the convergence speed of neural network [
20,
21,
22]. In this paper, computing parallelism is coupled with the quasi-Newton algorithm to generate the parallel quasi-Newton (PQN) algorithm, which is used in the training process of neural networks.
However, the signal data collected from a single sensor may be invalid or inaccurate. A single sensor has limited capabilities for resolving ambiguities and has the ability to provide consistent descriptions of the measurement, which makes the classification results of NNs spurious and incorrect. Therefore, multi-sensor data fusion arises at the historic moment, which can potentially improve the detection capabilities and probability that any damage is detected. There are many multi-sensor data fusion methods being applied in fault diagnosis and pattern recognition. In [
23], the authors proposed an intelligent multi-sensor data fusion method using the relevance vector machine based on an ant colony optimization algorithm for gearbox fault detection. In [
24], a method was presented that used multi-sensor data technology and the
k-Nearest Neighbor algorithm to diagnose the fault pattern of rolling element bearings. In [
25], the authors used the federated Kalman filter to fuse the sensor signals for high-speed trains in a high accuracy navigation system. After many years of development of information fusion technology, Dempster-Shafer (DS) theory is now commonly known and used. In [
26], a novel and easily implemented method was presented to fuse the multisource data in wireless sensor networks through the DS evidence theory. In [
27], a novel information fusion approach using the DS evidence theory and neural networks was proposed to forecast the distribution of coal seam terrain. In [
28], an intelligent detection method was proposed by integration of multi-sensory data fusion and classifier ensemble to detect the location and extent of the damage based on posteriori probability support vector machines and the DS evidence theory. In [
29], a multi sensor fusion methodology was proposed to identify indoor activity based on DS theory framework with an incremental conflict resolution strategy. According to the literature, DS theory does not need prior knowledge of the probability distribution, and it is able to assign probability values to sets of possibilities rather than to single events only.
Bearing the above observation in mind, we provide a cutting condition identification scheme for the shearer based on the vibration signals of the rocker arm and the current signal of the cutting motor. The PQN-NN algorithm and DS theory is used to improve the performance and accuracy of the condition diagnosis system. Firstly, feature extraction is conducted by signal processing techniques. Secondly, the extracted data are used for inputs of the neural network to obtain some classifiers and the outputs are assessed quantitatively. Lastly, estimated quantities from different classifiers are combined by DS theory to enhance the identification accuracy.
The rest of this paper is organized as follows. In
Section 2, we briefly present the basic theory of the advanced neural network and DS theory.
Section 3 describes the main key techniques of the proposed method and provides some experimental analysis.
Section 4 presents the application effect of the proposed method in the coal mining face. Our conclusions and future works are summarized in
Section 5.
2. Theoretical Background
2.1. Parallel Quasi-Newton Neural Network (PQN-NN)
A feedforward network model with multi-input and multi-output is taken as an example to represent the basic principle. The input and output vectors are set as X = (x1, x2, …, xn) and Z = (z1, z2, …, zm), and the output of hidden layer is H = (h1, h2, …, hs). The activation functions of the hidden layer and output layer can be chosen as sigmoid function. The connection weight of network is defined as w = [w1, w2], where w1 is the connection weight between input and hidden layers and w2 is the connection weight between hidden and output layers. The desired output is Zd and the number of training samples is P. The error between desired output and network output is selected as . The weights of network should be updated to minimize error E.
According to the principle of the quasi-Newton algorithm (QN), the updating formula of weights
wk can be calculated as follows:
where
k is the number of iterations,
dk = −
Hk·
gk is the search direction,
λk is the step-size of the iteration
k,
,
is the current inverse Hessian matrix approximation [
30]. In the QN method, the selection of
Hk directly affects the performance of the algorithm. In order to obtain better optimization results, some scholars put forward different methods to determine
Hk. But these strategies would increase the calculation of the algorithm and degrade the efficiency. In this paper, a parallel quasi-Newton optimization algorithm (PQN) was provided to train the neural network. The PQN algorithms use the following updating formula with three parameters:
where
. The three parameters can be defined by:
The learning process of PQN neural network can be shown as follows:
Step 1: Initialize variables. The variables mainly include the initial random values of the weights (w0), initial approximate inverse Hessian, named the identity matrix I, convergence condition (ε), maximum iterations (Kmax) and k is set as 0 initially.
Step 2: Compute the parallel search directions. In this paper, two search directions are chosen as follows:
where
is the scaling parameter and can be adjusted as follows:
where the arguments
ε1,
ε2 satisfy 0 <
ε1 <
ε2 ≤ 1. In our experiment, we set
ε1 = 0.5,
ε2 = 1 [
31].
Step 3: Perform the parallel line searches. Along each search direction, inexact line searches are performed to determine the step-size
λjk in parallel according to the following Wolfe conditions proposed by Hanno and Phua:
where
and
are the regulatory factors and
.
The process terminates until two step-sizes are found along all the search directions.
Step 4: Choose the minimum point. Let
denote the direction that attains the minimum function value and
be the step-size corresponding to
. The only
and
can be determined to update the weights
through the following formula:
Step 5: Test for convergence. If the convergence criterion satisfies the condition , then stop; otherwise, compute Hk+1 according to Equation (2).
Step 6: Repeat the process: Set k = k + 1 and repeat the process from Step 2.
2.2. Dempster—Shafer Theory
The Dempster-Shafer theory, known as evidence theory, was initially proposed by Dempster [
32,
33] and Shafer [
34], was elaborated by Smets [
35,
36], and then was further developed by Denoeux [
37,
38]. The basic concepts and mechanisms of the Dempster-Shafer (DS) theory are introduced in this subsection.
In DS theory, a finite non-empty set Θ is assumed as a set of hypotheses, which contains N exclusive elements and Θ= {A1, A2, …, AN} is called the frame of discernment. The following function should be firstly defined: , where m(A) denotes the basis belief assignment (BBA). If we provide a piece of evidence, every possible hypothesis or their combination should be assigned the belief level in the range of [0, 1]. The empty set should be assigned the belief level of zero and the sum of all BBAs should be equal to 1.
The BBA
m(
A) is used to describe the belief level that the evidence supports
A. For each subset
,
m(
A) is bigger than zero and can be called the focal element of
m. Two concepts should be defined as follows:
where
Bel(
A) denotes the belief function and
Pl(
A) denotes the plausibility function.
Belief function Bel(A) provides the support for A that hypothesis A is true and Bel(A) is also interpreted as the lower limit function. Plausibility function Pl(A) represents the support for A that hypothesis A is not false and can be known as upper limit function.
Through the description of the basis belief assignment, the information of different sources can be combined by a fusion rule proposed by Dempster. We assume that
m1 and
m2 are two BBAs induced by the evidence, which must be independent. Then the Dempster’s rule is used to combine the two massed to generate a new mass function, which can be calculated as follows:
In this Dempster’s rule, the value of K reflects the degree of conflict between m1 and m2 induced by evidence. The coefficient 1/(1 − K) is referred to as the normalization factor and its role is to avoid the non-zero probability to be assigned to an empty set in the synthesis process. With the increase of K, the conflicts will become more and more obvious and the combination results may be not consistent with the actual situation.