1. Introduction
The rapid development of advanced manufacturing is closely related to a country’s economy [
1]. Milling is an indispensable part of modern manufacturing industries and optimizing its machining process is crucial to achieving substantial economic benefits and improving product quality. Manufacturing cost is mainly affected by the cutting tool, power, machining efficiency, and machined surface quality [
2]. Cutting tool wear condition is an important factor during the manufacturing process, which directly affects the quality of products and the operation of equipment. Moreover, monitoring tool wear condition is necessary to ensure timely tool change; an early tool replacement increases cost, while a tool replaced too late causes reduced workpiece quality and even damages the machine tool. In the machining process, the total processing cost of cutting tools and replacing them ranges from 3% to 12% [
3]; at the same time, the downtime caused by cutting tools can represent up to 20% of total machining downtime [
4]. Tool wear is a normal phenomenon because the tool is in contact with the chip and workpieces. Moreover, the tools undergoes various failures mechanisms, such as adhesion, abrasion, chipping, diffusion, and plastic deformation. Due to the highly complex physics behind tool wear, it is challenging to avoid tool wear. Therefore, tool wear monitoring is essential in advanced manufacturing to optimize the machining process and maintain the quality of the manufactured product. For these purposes, it is crucial to develop an automated and accurate tool wear monitoring system to generate warnings for tool wear, which has been the focus of various research studies in tool wear monitoring.
Recently, cutting tool condition monitoring has focused chiefly on extracting degradation information by various physical properties, and the research methodologies have been divided into the following categories [
5]: direct approaches, in which a machine vision system as a common method [
6,
7] is used to directly measure the areas of tool wear for evaluating tool conditions; and indirect approaches, in which the wear state is evaluated by analyzing signals from various sensors, such as machining force [
8], vibration [
9], temperature [
10], acoustic emissions (AE) [
11], and machined surface roughness [
12,
13,
14,
15,
16,
17]. Surface texture images, as relatively easily obtained monitoring data, can be extracted by cost-effective data acquisition devices to provide rich information for diagnosis and monitoring applications. The machined surfaces’ texture not only directly realizes the evaluation of machined surface quality but is also indirectly utilized to reflect various types of tool wear, such as flank wear, crater wear, nose wear, fracture, and breakage. Compared with the abovementioned methods, the most notable advantages of analyzing machined surface images for monitoring tool wear states are non-invasive, low-cost, and flexible technique.
The machined surface images could provide rich geometrical characteristics for diagnosing the soundness of the state of cutting tools [
18]. The massive and unstructured raw image data brings new opportunities and challenges to vison-based tool condition monitoring. Because machine vision-based tool condition monitor methods can learn the texture characteristics of an image from massive image data and automatically build the corresponding monitoring models, these have attracted increasing attention in recent vision-based tool condition monitoring studies. The machine vision approach relies on handcrafted feature design to extract sensitive degradation features using prior knowledge and expertise from the acquired monitoring data. Then, the extracted features are sent into machine learning models to assess the target values. At present, there is a rich research literature on the topic of machine learning surface texture inspection for tool wear assessment. Bhat et al. [
12] presented a support vector machine (SVM) model to predict the state of tools using features extracted from the grey level co-occurrence matrix (GLCM) of machined surface images. Dutta et al. [
13,
14] proposed a support vector machine-based regression model to assess tool flank wear using extracted features from turned surface images. Kassim et al. [
15] developed a run-length statistical method to monitor the tool condition based on the machined surface images by using a machined vision technique. Riego et al. [
16] designed an extremely randomized tree algorithm to monitor the wear state by means of supervised classification. Li et al. [
17] studied a micro-vision system to monitor the wear of a cutting tool, which combines an insert image with workpiece texture. These methods aim to develop an intelligent method to monitor the level of tool wear through the machined surface. Although these automated monitoring systems employed handcrafted feature extraction by machine vision to achieve decent results for monitoring tool wear, which requires significant computational effort and related knowledge, they are less intelligent.
To help enhance the intelligence capability of tool monitoring systems, a new branch of artificial intelligence called deep learning [
19] has widely emerged in tool condition detection fields and has provided excellent results. Deep learning is a particular kind of machine learning structure with more powerful feature learning abilities, including deep belief networks (DBNs), recurrent neural networks (RNNs), and convolutional neural network (CNNs). It achieves advanced accuracy in various tasks, such as computer vision [
20], speech recognition [
21], and natural language processing [
22]. Compared with traditional handcrafted feature extraction, deep learning-based tool wear monitoring studies [
23,
24,
25,
26,
27,
28] realized the complex correlation between the input (signal data) and the target output values (tool wear). These papers observed that the latent features are automatically learned from raw data for deep learning technology, which is advantageous over manually designed features. Although deep learning-based tool condition monitoring has been developed in recent years, very few studies have focused on deep learning-based machined surface monitoring methods. In 2021, Kumar et al. [
29] proposed a deep CNN architecture for intelligent wear monitoring of a cutting tool using machined surface images in turning. The model applied preprocessing images as an input to overcome inhomogeneous equalization. Although the model has achieved promising monitoring results, the preprocessing operation is not intelligent enough. In this paper, an end-to-end intelligence method should be proposed for tool condition monitoring, which creates direct mapping from raw data (machined surface images) to the needed results (the level of cutting wear). The workpiece textures with different tool wear states are characterized by visually shared global texture structure. The wear category is to be recognized by local regions with smaller visual differences. Fine-grained image classification, as a challenging task in computer vision, can distinguish different local details under the same global structure. The discriminative local regions play a critical factor in fine-grained classification and are identified to classify the target categories. Therefore, fine-grained classification has been introduced to recognize similar workpiece textures to identify the tool wear state.
The fine-grained image classification method has been widely applied in various classification tasks with the same global structure (such as animals, product brands, and vehicles) [
30,
31,
32,
33,
34,
35]. Deconstruction and construction learning (DCL) [
36] is a newly proposed fine-grained classification model and differs from traditional classification methods with significant differences between categories, which aim to extract discriminative features from similar global structures. The architecture for DCL includes both destruction and construction streams, where the former is used to enhance recognition robustness while the latter is used to simulate the semantic correlations between image regions. As a result, DCL, as an end-to-end approach, may be readily learned without manual intervention. It, in particular, is light in weight, fast in reasoning, and practical. In this study, an intelligent method was proposed focusing on cutting tool wear monitoring from machined surface images based on the architecture of a CNN with fine-grained classification.
An attention mechanism is used to ensure that most salient information is noticed, which has been proven effective in many previous studies [
37,
38,
39]. Hu et al. [
40] developed the squeeze-and-excitation (SE) module to learn channel features by using interchannel relationships. However, describing the learned features by dimensionality reduction is suboptimal. Wang et al. [
41] presented an efficient channel attention (ECA) module as an improvement of the squeeze-and-excitation module.
In this article, efficient channel attention destruction-construction learning (ECADCL) is proposed to monitor the tool wear state (sharp, normal, or dull) from machined surface images. The proposed ECADCL, including the feature extraction module, destruction-construction module, and decision module, combines the conventional CNN and the idea of DCL for identifying the local detail from a similar global structure. Considering the ability to learn feature representation, the ECA module is introduced into the feature extraction module and named ECACNN. Ultimately, the experimental results confirm that ECADCL offers better tool wear monitoring performance than the existing methods.
The main contents of this study are organized as follows.
- (1)
The architecture of ECACNN is structured by combining a typical CNN and attention mechanisms. This architecture can enhance the channel relationships of features and extract efficient information from texture images.
- (2)
Based on fine-grained image recognition, ECADCL is designed to enhance the representation of local details.
- (3)
An automatic monitoring system, which is a combination of the designed algorithms and the image acquisition system, is designed to detect the tool conditions based on machined surface images.
- (4)
The experiments verify that the proposed method is accurate and effective. Compared with handcrafted feature extraction methods and conventional CNN methods, we show that the proposed ECADCL can obtain competitive performance with handcrafted feature extraction and solve local feature recognition problems compared to conventional CNNs.
The rest of this article is organized as follows. In
Section 2, the experimental system is described. In
Section 3, the proposed ECADCL technique is elaborated.
Section 4 presents the experimental results.
Section 5 concludes the article.
3. Methodology
In this work, cutting wear monitoring based on the machined surface is regarded as an image classification problem. We aim to design an end-to-end intelligent model for tool wear monitoring that directly maps the input data to the target output. The model is represented as
, where
represents a set of input images,
denotes the label of
, and
denotes all the learnable parameters. It should be noted that the different levels of wear on machined surface images have the same global structure and differ only in specific local details. In order to find the discriminating area among different tool wear states, the fined-grained method is introduced for realizing the monitoring tool condition. The procedure of this method is shown in
Figure 3. Basically, it includes three stages: acquisition, training, and inference. The first part consists of data acquisition and data processing. In the second part, the proposed method applied the channel attention mechanism and the destruction-construction structure, and the training process mainly consists of feature extraction with attention mechanism, destruction learning (DL), and construction learning (CL). In the destruction process, the region confusion mechanism (RCM) disrupted the global structure into the local regions and finds the discrimination area by the feature extraction module. Nevertheless, adversarial learning is introduced into the DL part, which prevents the noise emerged by overfitting RCM from negatively affecting network learning. In addition, CL induces the feature extraction module to learn the semantic correlative among local regions to reconstitute the original image. In the inference process, test samples have been sent into the trained feature extraction network and obtained the classification results.
In this paper, ECADCL is designed to learn the local feature space based on the disruption of the global structure. The overall architecture structure of ECADCL is illustrated in
Figure 4, and in the following subsections, we will present our proposed tool wear monitoring method based on machined surface images in detail.
3.1. Feature Extraction Module
Our suggested framework is a typical CNN, which can be seen in
Figure 3. The creation of a network structure is a significant challenge in practical applications. However, transfer learning has been found to be a well-performing way to overcome this issue. In this study, the ECADCL architecture is developed, relying on three types of famous networks in transfer learning, namely, AlexNet [
42], VGG-16 [
43], and ResNet-18 [
44], which exhibit excellent performance for image recognition. Comparing the results of the above three famous networks, the ResNet18 is selected as the basic framework, and the details of the proposed model architecture are listed in
Table 2.
As illustrated in
Table 2, the feature extraction module architecture, mainly constituted by one input layer and five convolutional blocks, has the same size as the input image,
. There is a max pooling layer, 8 ECA layers, and 17 convolutional layers in the five convolutional blocks. Following the first convolutional block, the max pooling layer is responsible for replacing the output for the appropriate places with the overall features of the nearby domains in order to complete the downsampling procedure. In this study, the rectified linear unit (ReLU) activation function is used for nonlinearity. Note that the original image
, its destroyed version
, and its ground-truth one-vs.-all labelling
are combined for training. Therefore, the extracted features may be described by the equation:
where
indicates the learned features,
represents the
th feature of the original image in the
th channels,
denotes the feature extraction procedure,
represents all the learnable parameters, and the three dimensions indicate the width, height, and number of channels, respectively, of the learned features.
3.2. Attention-Based Network
The underlying attention mechanisms are inspired by the human visual system, which tends to process the most critical information in a scene rather than attempting to analyze the whole scene simultaneously. Accordingly, an attention mechanism can be introduced into a CNN to concentrate on more representative sections while suppressing less significant information. For this purpose, the ECA module is introduced into our feature extraction module. Referring to
Figure 5, EAC is a local cross-channel interaction strategy that can be efficiently implemented without dimensionality reduction to adaptively determine the coverage of local cross-channel interactions.
In the ECA block, the intermediate feature map
is subjected to the global average pooling (GAP) operation and then generates an aggregated feature vector:
. This vector is then input into a fast 1D convolution of size k, which produces the attention vector
. Finally, to acquire the channel attention vector
, the attention vector is passed via the sigmoid activation function. Thus,
can be generated as follows:
where
denotes the GAP operator,
represents a 1D convolutional layer and its kernel size
, and
is the sigmoid function.
In the ECA block, determining the value of
is another critical issue. Considering a similar principle to group convolutions, different channel dimensions C correspond to different convolution ranges
in feature maps, giving rise to a mapping relationship between
and C. This relationship is characterized by a nonlinear function mapping from kernel size
to channel dimension C, which can be expressed adaptively as follows:
where
is the odd integer that is closest to t. As verified in [
35], we set
and
to 2 and 1, respectively.
represents a nonlinear mapping that relates feature maps with various dimensions of channels and varying interaction ranges, driving the model to adaptively learn the interdependencies between feature channels.
3.3. Destruction-Construction Module
As mentioned above, machined surface images are visually similar in terms of their global information. Hence, the proposed algorithm needs to learn to discriminate local details within the same global structure. The destruction-construction strategy is introduced to improve the ECACNN architectures to address this issue. The destruction-construction module consists of three branches: the region confusion mechanism (RCM), adversarial learning, and construction learning. RCM is the key part of the destruction-construction module since it destroys the entire structure by causing localized disruption. To avoid the noise caused by the RCM operation, adversarial learning is recommended during the destruction learning process. Construction learning is proposed for reorganizing previously learned local information in accordance with the semantic relevance among areas. For the input, consisting of the original , and after RCM obtained destroyed images , they are sent into the feature extraction module to obtain the feature map, which is sent to the construction module, while the adversarial network receives the feature map after average pooling.
For RCM, it is suggested that the spatial distribution of the whole structure be disrupted and allow the local areas to move within a specific range. In the confusion method, any original image
is divided into N × N subregions, each denoted by
, which corresponds to the horizontal coordinate
and vertical coordinates
. The subregions are rearranged as a destroyed image
, which generates a new coordinate:
where
represents the subregions at locating
in the original image
,
represents a new coordinate
of the areas in the jth row of
by a random vector
of size N, the size of the ith element
, with
being a uniformly distributed random variable within the stretch of
, and k as a tunable parameter (1 < k < N) defines the neighborhood range.
similarly represents a new coordinate
of the area in the ith column of
.
For adversarial learning, a discriminator is designed as a new stream to judge whether the input image
has been destroyed or not. It prevents the RCM-induced noise patterns from shuffling the local regions into the feature map for classification tasks, which can be written as follows:
where
is the feature vector extracted from the outputs of the mth layer in the feature extraction module,
represents the learnable parameters from the 1st layer to the mth layers in the feature extraction module, and
is a linear mapping.
For construction learning, a region alignment network is designed to calculate the positional precision of different regions in the images and simulate the correlations between local regions in an end-to-end fashion. This can be expressed as follows:
where
represents the two channels of row and column coordinates,
represents the suggested region alignment network,
is the feature vector extracted from the outputs of the nth layer in the feature extraction module,
represents the learnable parameters from the 1st layer to the nth layers in the feature extraction module, and
represents the learnable parameters in the region alignment network.
The final decision module, relying on the typical classified design of the network, includes a fully connected layer to predict the output results.
5. Conclusions
In this article, we propose a new tool wear monitoring framework with an emphasis on fine-grained categorization and deep learning technology. AlexNet, VGG-16, and ResNet-18, three types of classical deep learning architectures, were researched and compared as a base network to develop an effective network. Introducing the ECA module into the base network enables the network to focus on more representative parts of the input image while suppressing less critical information. The destruction-construction module is also employed to discriminate local features, because of the similar global structures in the surface texture images. In experiments, our proposed model proves more effective than other methods for monitoring the level of tool wear.
However, to train the proposed model for machined surface-based tool wear state monitoring, a deep learning model requires a large amount of training data to achieve outstanding monitoring performance. As a result, our future work will focus on suitable unsupervised models to overcome this challenge. Meanwhile, in the image acquisition process, since the contamination of chips and crude oils generate machined surface pollution, it poses a great analytical challenge for an intelligent monitoring tool wear system. Furthermore, variable factors, such as various tools and cutting parameters, will be explored to further enhance the suggested method’s universality.