1. Introduction
Due to recent advances in Internet of Things (IoT), the connectivity between machines as well as the amount of data from sensors have been significantly increased. Depending on the need to leverage data properly, transparent and interpretable machine learning (ML) techniques are drawing particular attention amid growing interest in more reliable systems for a digital twin [
1]. In particular, in fault detection of the manufacturing process (e.g., semiconductor manufacturing), interpretable ML models can provide insights into which attributes are the root causes for faults on the shop floor, so that human operators can improve the product quality [
2].
However, even though various ML techniques having black-box structures (e.g., neural network) have been studied and developed for more accurate fault detection, many manufacturing companies suffer from the opaqueness of models, and costly human efforts to enhance the interpretability of detection accordingly [
3]. In this context, the interpretability of ML models in manufacturing environments is growing in importance for two reasons. First, human operators want a set of understandable rules to control parameters in the manufacturing process. In addition, ML models with interpretability enable practitioners to explain the factors that have affected the ups and downs of process quality based on past production data [
4].
Despite the fact that there is a significant need for interpretability, one of the most widely used interpretable models, decision-tree-based algorithms, still present some issues. The first issue is the trade-off between accuracy and interpretability. In order to improve the accuracy for training data, the maximum tree height should be increased. However, as the tree becomes deeper, the interpretability of the model decreases, because of the more complicated structure. Therefore, interpretable models are preferably small in size, as well as of sufficient high-performance. In order to have high explanation complexity, there is a significant need for shrinkage methods for ML models [
5]. For example, a decision tree of depth = 5 is easier to understand than one of depth = 50.
However, when limiting the tree height for interpretability, many decision-tree-based algorithms, such as classification and regression tree (CART) and C4.5, allow trees to have only binary splits for continuous attributes, and thus hinder the potential for improving the performance of the decision trees within the limited tree depth.
To deal with these issues, discretization techniques for multi-point splits (decision tree algorithms) have been proposed [
6,
7]. The discretization techniques enable the information obtained from datasets to be more concise, easy to understand, and easy to use [
8]. Nonetheless, efficient discretization techniques considering dependencies among attributes while maintaining interpretability have been far less studied even though the dependencies are important in performance [
9]. In particular, for a decision tree algorithm, discretization of all continuous attributes without consideration of those dependencies has been shown to result in decreased accuracy [
10].
In order to construct interpretable and effective models for fault detection, there is a significant need for efficient discretization algorithms designed for decision trees while considering dependencies between continuous attributes. Therefore, this paper proposes a novel approach for retrieving an improved decision tree for fault detection in manufacturing. The proposed approach utilizes the evolutionary process with k-means clustering to find good solutions efficiently for global discretization. In addition, to maintain high interpretability, the proposed approach is designed to improve a decision tree under the limitation of the maximum tree depth.
This paper is organized as follows. In
Section 2, the previous research related to interpretable ML and discretization techniques is reviewed.
Section 3 proposes a new evolutionary algorithm for discretization of continuous attributes based on
k-means clustering.
Section 4 defines the two datasets for fault detection in manufacturing, and
Section 5 summarizes the results of experiments verifying the algorithm. Finally,
Section 6 draws conclusions, and discusses possible areas for further research.
5. Results and Discussions
This section presents the experimental results and compares the performance of the proposed DIMPLED algorithm to other tree-based algorithms, including ensemble learning methods, such as Random Forest, AdaBoost, and Gradient Boosting. The maximum tree height may impact how a tree-based algorithm attains interpretable structures, including the logic as well as the accuracy. Thus, in order to maintain the appropriate level of interpretability through tree-based models, the maximum tree height is determined as 3 based on the previous literature on tree-based algorithms [
48,
49]. Note that the level of interpretability of the DIMPLED can be also tuned by changing the maximum tree height. In a similar vein, the maximum number of intervals was limited to 4 for maintaining the interpretability. The detailed parameters for the algorithms are listed in
Table 5. In the case of other tree-based algorithms, scikit-learn packages were used with the default setting, except for the maximum tree depth. The experiments were run on an Intel i9 10,900 3.7 GHz processor with 32 GB of RAM and GeForce RTX 2080 Ti.
To compare the performance of the tree-based algorithms, the performances obtained by the five algorithms are compared in terms of the average accuracy and interpretability using the CNC and Pasteurizer datasets described in
Section 4. First, the average accuracy was calculated by the well-known tenfold cross-validation, which divides the dataset into 10 mutually exclusive and exhaustive partitions. In this paper, two datasets were partitioned using the stratified tenfold cross-validation. Also, the interpretability of the resulting trees was evaluated by splitting each dataset into two sets such that 70% of the data was used for training and 30% was used for testing.
5.1. Comparison Between Algorithms for Average Performance
To compare the performance of the tree-based algorithms, the average classification accuracy obtained by DIMPLED and the other algorithms are compared in
Table 6. The table also includes the standard deviation for further comparisons. Among the tree-based algorithms, the gradient boosting algorithm outperformed the others in terms of the average accuracy and standard deviation for the two datasets. Also, in the case of the Pasteurizer dataset, the performances of some ensemble methods (Random Forest and AdaBoost) were significantly weakened compared to the result of the CNC dataset due to the limited tree depth. However, even though ensemble methods produced better performances than single-tree-based algorithms (C4.5, CART, and DIMPLED), the level of interpretability for ensemble models may not be appropriate for practitioners, because they have a lot of trees having different structures. In addition, decision trees generated by DIMPLED showed competitive performance compared to the widely-used ensemble methods in practice.
Furthermore, DIMPLED could produce significantly better performance than C4.5 and CART without loss of its interpretability. One possible explanation for this result is that the performance and generalizability of a single decision tree could be improved by having multiple splits with discretization.
5.2. Comparison Between Algorithms for Interpretability
To compare the interpretability of DIMPLED and the tree-based algorithms, each dataset was split into two sets: 70% of the data was used for training and 30% was used for testing. Based on the training and test datasets, tree-based models were generated, and their performances were compared in terms of the training and test accuracies, which are summarized in
Table 7. In order to compare the interpretability of models in detail, the decision trees are depicted in
Figure 6,
Figure 7 and
Figure 8.
In terms of the training and test accuracies, the results showed that gradient boosting algorithm outperformed the other tree-based algorithms. Similarly to the result of cross-validation, the benefits from combining multiple classifiers in Random Forest and Adaboost were not significant in the Pasteurizer dataset. However, in spite of its high performance, the interpretability of gradient boosting algorithm was significantly lower than the models based on a single decision tree as shown in
Figure 6,
Figure 7 and
Figure 8. In the case of single decision trees, CART and DIMPLED are much easier to understand than the other algorithms, due to their simple and compact structures.
Even though CART and DIMPLED have a similar power of interpretability, DIMPLED significantly outperformed CART in terms of both the training and test accuracies as shown in
Table 6. Furthermore, DIMPLED could identify the root causes and their interdependencies as shown in
Figure 6. For example, in the case of Pasteurizer dataset, when Pasteurizer B’s temperature was low-to-medium (between 41.7 and 55.2 °C) and Pasteurizer A’s temperature was over 40.6 °C, faulty products were observed in the training data. Also, in the case of CNC dataset, when the current feed rate was low-medium (between 4.8 and 13 mm/s) and the current of X output was low-to-medium (between 325.2 and 326.5 A), faulty products were observed in the training data.
In summary, the results demonstrate that DIMPLED can offer good interpretability compared with the other tree-based algorithms. In addition, DIMPLED appears to find an improved decision tree with the evolutionary process for global discretization, because the tree provides significantly better performance than does C4.5 and CART. Also, the model and its discretized attributes are completely transparent and interpretable, which can make the manufacturing systems more understandable, and thus reliable to human operators.
6. Conclusions and Future Work
This paper addressed the classification model for fault detection in manufacturing. In order to identify the root causes on the shop floor, interpretable ML models that can provide insights as an understandable form are crucial to improving the product quality. However, due to the recent trend of IoT, the number of sensors is exploding, and thus the generation of ML models with high-performance and appropriate level of interpretability is becoming more complicated. To deal with the fault detection problem effectively, a new approach called DIMPLED for evolutionary discretization is proposed. The proposed DIMPLED algorithm improves the structure of a single decision tree by evolving discretization strategies so that it enables the tree to have multiple splits. The experimental results with two datasets in manufacturing show that the decision tree improved by DIMPLED outperformed the performance of C4.5 and CART used widely in practice and it was competitive compared to the ensemble methods, which require multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has more interpretable structure while maintaining the appropriate performance level.
The major contribution of this paper is the development of a new approach for capturing insights with the appropriate level of interpretability. To improve the accuracy with the limited tree height, the proposed DIMPLED enables a tree to have multiple splits with automated discovery process of the best discretization strategy. Also, by the benefit of the reduced set of rules from a simple decision tree, the models generated by DIMPLED have the capability for fault prediction in real-time. Moreover, based on the improved tree, human operators can improve the product quality by identifying the root causes as a set of IF-THEN rules and thus DIMPLED is expected to be utilized to various fault detection problems without the black-box issue.
Future work can proceed in several directions. First, considerations of other interpretable algorithms are interesting and worthy for investigation. Additionally, applications of DIMPLED to other types of classification problems can be studied. Finally, feature extraction techniques can be supplemented to DIMPLED for consideration of their potential effects.