1. Introduction
In the industrial age, the demand for strip steel across various industries is increasing. However, due to the influence of temperature and manufacturing processes, surface defects such as water spots, creases, and patches frequently occur during production. These defects can seriously affect both the quality and safety of the final product [
1,
2]. If such defects are not identified in a timely manner, they can lead to significant losses in subsequent production stages. Therefore, it is crucial to quickly and accurately classify surface defects in strip steel production [
3,
4].
With the rise of deep learning, industrial production has transitioned from traditional to intelligent manufacturing, infusing artificial intelligence with new vigor [
5,
6,
7]. Many researchers have applied deep learning techniques to classify strip steel surface defects. The fusion matrix based on Fisher’s criterion and correlation analysis was introduced in [
8], effectively integrating global and local dimensions. Classification performance was improved using multi-label techniques in [
9], with model complexity and latency for small datasets reduced. Consideration of the correlation between pixel-level segmentation masks, object-level bounding boxes, and global image-level classification labels was undertaken in [
10], with the joint learning of the features of related tasks to improve the performance. A scheme based on ResNet50 with FcaNet and Convolutional Block Attention Module (CBAM) for strip defect classification was proposed in [
11]. The CutPaste-Mix data augmentation strategy and Gaussian Density Estimation for abnormal region classification were utilized in [
12].
However, in actual industrial production, surface defects on strip steel are rare and challenging to acquire. Therefore, directly applying traditional deep learning methods to classify these defects often leads to overfitting issues [
13,
14]. Moreover, industrial images contain significant redundant information, further complicating the task of classification [
15,
16,
17].
Inspired by the human ability to quickly learn from a small number of examples, few-shot learning emerged [
18]. Its goal is to train a classifier using a limited number of samples that can then efficiently detect new defects with a small number of samples [
19,
20]. This necessitates high precision and robust generalization from the model, aligning more closely with the practical demands of industrial defect classification [
21].
The few-shot strip steel surface defect classification model is mainly divided into three methods: data augmentation-based, optimization-based, and metric-based [
22,
23].
Data augmentation-based. This is the most direct approach to addressing the few-shot strip steel surface defect classification problem, which can be extended through affine transformations such as rotation, cropping, or online enhancements like Generative Adversarial Networks (GAN) [
24] and CutMix [
25,
26,
27]. Data augmentation methods such as those proposed in [
28] involve accumulating richly featured data incorporating expert knowledge of abnormalities, including diverse features, positions, sizes, and backgrounds. The residual discriminator network structure within a dual discriminator GAN framework was introduced in [
29] to enhance generation diversity while preserving image features. Recognition generalization across meta-tasks is improved by a meta-augmentation method proposed in [
30] through joint parameter updating from original and augmented domains.
Optimization-based. Gradient optimization enables rapid adaptation to new tasks [
31,
32]. MAML [
33] is recognized as one of the most influential methods, with iterative models updated by amalgamating gradients, thereby influencing numerous subsequent methodologies. A hyperparametric adaptive strategy based on gradient descent (HASGD) is introduced in [
34] to enhance the stability and scalability of the training process. The framework and neural network models are refined in [
35] based on MAML [
33].
Metric-based. This is one of the most common solutions for few-shot strip steel surface defect classification, primarily comprising a classifier and feature extractor, which categorize samples by mapping nonlinear maps in the embedding space [
36,
37,
38]. A novel dual-stream neural network is proposed, involving the generation of numerous defect samples for classifier pretraining, and the classification of real steel strip surface defects is achieved using the transfer learning method [
39]. A transductive learning algorithm was designed and presented in [
40], where a new classifier was trained during the test phase to accommodate the needs of unknown samples. A depth metric-based classification method is proposed in [
41] to identify a sample-matching feature space with effective similarity measures using cosine distance. A transductive few-shot surface defect classification method is introduced in [
42], leveraging both instance-level and distribution-level relations within each few-shot learning task. ResMSNet, a novel backbone network presented in [
43], draws on the idea of multi-scale feature extraction for small discriminative regions in defect samples and provides classification via linking prototype distances and nonlinear relation scores. CPANet, proposed in [
44], effectively aggregates long-range relationships of discrete defects and introduces a space squeeze attention module to aggregate multiscale context information of defect features. An attention-guided recognition network is presented in [
45], featuring channel and position attention modules and a dual-metric function for learning classification boundaries by controlling sample distances in the feature space between intraclass and interclass. Benefiting from the simplicity, high efficiency, and strong designability of metric-based methods, the model proposed in this work also falls into the category of metric-based approaches.
With the advances in deep learning technology, mainstream models are becoming increasingly complex. However, as the model complexity grows, real-time performance is adversely affected, which fails to meet the demands of industrial production. Conversely, simpler models lack the capability to extract intricate discriminative features, thereby compromising classification performance. Moreover, strip surface defects typically occupy a small portion of the overall image, with the majority consisting of redundant information. Addressing the issue of few-shot learning, the model’s effectiveness is hindered by insufficient sample data, necessitating the minimization of redundant information interference to enhance the model’s utilization of pertinent data.
Before the widespread adoption of deep learning, earlier studies utilized Singular Value Decomposition (SVD) to address redundancy in the classification of strip surface defects. Ref. [
46] presents a technique for the detection of local defects in cold rolled strips. In their approach, principal component analysis is employed with SVD to reduce the dimensionality of the extracted feature vector. Subsequently, the defects in the steel strips are detected using a feed-forward neural network. An approach is proposed in [
47], where the gray level matrix of a digital image is projected onto its singular vectors obtained through SVD. Defects are identified by abrupt changes in these projections, allowing for the determination and rough localization of the defects. The effectiveness of traditional machine learning also provides inspiration for this work. The combination of traditional methods with deep learning can yield improved results.
To tackle the above challenges, this study introduces ODNet, a high real-time network that utilizes orthogonal methods to mitigate the influence of redundant information on the model and maximize the utility of the limited available data. ODNet achieves de-redundancy via the orthogonal decomposition of fully connected layer parameters, ensuring orthogonal feature projection. The model incorporates hops to safeguard against the loss of useful information during orthogonal decomposition operations. This orthogonal embedding of features enhances its suitability for Euclidean distance inputs. Experiments were conducted on the FSC-20 benchmark, specifically designed to validate the few-shot strip steel surface defect classification model. ODNet demonstrates superior classification accuracy, high real-time performance, and strong generalization compared to other methods. Additionally, extensive ablation experiments were conducted to assess the influence of the model parameters and modules on the performance.
Accordingly, this paper makes the following four major contributions:
A high real-time network for few-shot strip steel surface defect classification is proposed.
ODNet employs orthogonal decomposition to derive orthogonal features, thereby minimizing the impact of redundant information on the model. The inclusion of a skip connection ensures that the valuable correlation information remains intact, especially after orthogonal decomposition.
The features extracted by the model with orthogonality also adhere more closely to the orthogonality requirement of the Euclidean distance on input, thereby enhancing the classifier performance.
Compared to alternative methods, ODNet exhibits superior real-time performance, precision, and generalization, aligning more closely with the specific demands of industrial production.
The proposed method is described in detail in
Section 2,
Section 3 provides the details of a series of experiments to verify the performance of the model. Finally, this paper discusses and summarizes the proposed method in
Section 4 and
Section 5.
4. Discussion
We verified the performance of ODNet for few-shot strip steel surface defect classification. As depicted in
Figure 9, the proposed method exhibits high precision and real-time performance.
In industrial defect samples, redundant information is often prevalent. Valuable information in few-shot learning is limited and precious, and an excess of redundant information can impede the model’s training direction. This interference hinders the model’s ability to effectively discern useful information from redundancy and amplify the importance of the pertinent features. ODNet addresses this issue by subjecting features containing redundant information to orthogonal decomposition. This operation rapidly mitigates the impact of redundant information on the model’s training direction, consequently enhancing the classification performance. Moreover, the skip connection prevents the removal of useful information by the orthogonal decomposition process and fortifies the model’s capacity to distinguish between helpful and redundant information. The efficacy of the skip connection is also evident in the experimental results presented in
Table 7. ODNet is a metric-based method, and its orthogonal features partly fulfill the input requirements of Euclidean distance. This enhances the alignment between the feature extractor and the classifier, thus contributing to the performance improvement of the model.
To enhance the real-time performance of the model, this study intentionally simplified its architecture, aiming to achieve improved efficiency. The orthogonal decomposition and skip connection essentially added two fully connected layers. Compared to feature extractors in other mainstream models, this approach significantly reduced the complexity. Additionally, the model employed Euclidean distance as a classifier, which offers stable classification performance without additional parameters. Experimental results, as shown in
Table 9, validate the effectiveness of this design in enhancing the real-time capability of the model.
However, as depicted in
Figure 9, the cross-domain performance of the model was observed to be slightly lower compared to its intra-domain performance in the 5-shot scenario. Analyzing the reasons, we posit that cross-domain tasks necessitate knowledge transfer, supplemented by prior knowledge introduction. However, ODNet’s orthogonal operation only manages current task knowledge and does not provide prior knowledge to aid learning. Consequently, the proposed model is constrained in its performance on cross-domain tasks.
ODNet theoretically fulfills the requirements of industrial production. In the future, it holds the potential to enable swift and precise detection and classification of surface defects in strip steel on the production line, ensuring product quality aligns with standards, reducing defect rates, and enhancing production efficiency. Nonetheless, its real-world industrial application may encounter challenges, particularly pertaining to the model’s generalization across diverse industrial environments. Addressing this, leveraging techniques such as model pre-training to expedite convergence or employing data augmentation methods to broaden the training dataset could significantly enhance the model’s classification performance in practical industrial settings.