Informative knowledge distillation for image anomaly segmentation
Introduction
Image anomaly segmentation is a critical task in product quality control [1], [2], [3], [4]. This can be referred to as segmenting regions that deviate significantly from the most normal data. With image anomaly segmentation, defects in industrial images can be well segmented, and experts can quickly locate faulty parts in the production equipment.
Image anomaly segmentation is often formulated as an unsupervised task [5], [6]. In real-world applications, many tasks often lack abnormal samples or it is unclear which type of anomaly may appear [3]. Meanwhile, anomalies can vary in color, scale, and shape, making abnormal data distribution challenging to observe. This study proposes a novel unsupervised image-anomaly segmentation method. The proposed method can assign an anomaly score to each pixel. Examples from the MVTec anomaly detection (MVTec AD) dataset [3] and the corresponding anomaly scores produced by the proposed method are shown in Fig. 1.
Several researchers have studied reconstruction-based methods [7], [8], [9], [10], [11], [12], [13], [14], [15]. The per-pixel reconstruction errors or densities obtained from the probability distribution of the model can be leveraged to score the anomalies. Nevertheless, the quality of the reconstructed images significantly affects the segmentation results [9], [10].
Recently, networks pretrained on large natural image datasets, such as ImageNet [16], have been proven to be effective in image anomaly segmentation [6], [17], [18], [19], [20], [21], [22], [23], [24]. Pre-trained networks automatically build comprehensive representations at multiple hierarchical levels. In the feature spaces produced by pretrained networks, normal data feature manifolds are compact, and abnormal data features are located sparsely and far away from normal data manifolds. Clustering models are leveraged to describe normal data manifolds, and anomaly scores can be derived from the probability density estimation of the chosen clustering model [17], [18], [19], [20]. However, it is difficult to select a suitable clustering model without prior knowledge of the properties of normal data manifolds.
To avoid manually selecting clustering models, researchers have leveraged knowledge distillation to implicitly model normal data manifolds [6], [21], [22], [23]. Knowledge distillation was initially proposed to build light models [25], [26], [27]. In unsupervised image anomaly segmentation, knowledge distillation is leveraged to learn a student network that can mimic the normal data features extracted by a teacher network that is pretrained on a large image dataset. The differences between the features extracted by the student and teacher networks indicate anomaly scores. Knowledge-distillation -based methods achieve significant improvements compared to clustering model-based methods because student networks have a greater number of parameters than traditional clustering models. However, continuously increasing the number of parameters does not always increase performance [6], [21], [22], [23], which indicates that an overfitting problem exists. Intuitively, the knowledge distilled into the student network is far less than the capacity of the student network.
Generally, overfitting problems can be mitigated in two ways: by decreasing the number of parameters in the student network or by improving the amount of knowledge distilled to the student network. A simpler cloner student network of the teacher network was leveraged in [21] to mitigate overfitting. However, a drop in critical information can occur if the capacity of the student network is smaller than that of the teacher network [22].
This study proposes mitigating the overfitting problem by distilling informative knowledge in a similar way to improve the amount of knowledge. Informative knowledge can be defined as a type of knowledge that offers a strong supervisory signal and alleviates overfitting. Hence, this study presents a novel method called informative knowledge distillation (IKD) to distill informative knowledge. Technically, a novel loss function called context similarity loss (CSL) is proposed to constrain the context similarities between the features extracted by the teacher and student networks. CSL helps the student network be aware of the structure of the normal data manifold. In addition, novel adaptive hard sample mining (AHSM) is introduced to encourage more attention on information-rich hard samples and avoid continuous optimization of easy samples. In the optimizing phase, the optimizing direction of easy samples may be opposite to that of hard samples; therefore, the effectiveness of hard samples will be counteracted by easy samples. Such suppression of easy samples over hard samples is considered as unexpected noise. AHSM can inhibit unexpected noise and improve anomaly segmentation performance. Benefiting from the above innovations, informative knowledge that helps offer a strong supervisory signal is distilled to the student network, such that the overfitting risk declines, and the generalization ability of the student network improves. Thereafter, networks with a more significant number of parameters can be leveraged to improve segmentation performance. IKD comprises two stages: training and inference. There are two modules: multi-hierarchical knowledge distillation (MHKD) and multi-hierarchical anomaly score fusion (MHASF). CSL and AHSM are leveraged in MHKD to extract informative knowledge. The main novelties and contributions of this study are as follows:
- •
This study improves the performance of knowledge distillation-based image anomaly segmentation by extracting informative knowledge and mitigating the overfitting problem.
- •
This study proposes a novel loss function called CSL. This encourages the student network to be aware of the structure of the normal data manifold, such that low regression errors can be acquired on all normal samples in the manifold. Therefore, abnormal samples will have more significant regression errors and are easier to detect.
- •
This study proposes a novel AHSM. It can adaptively mine hard samples based on the statistical information of feature distances, which is beneficial for extracting informative knowledge from hard samples and avoiding unexpected noise in easy samples.
The remainder of this paper is organized as follows. Related studies on unsupervised image anomaly segmentation are reviewed in Section 2. The IKD is introduced in detail in Section 3. Extensive experiments are conducted and analyzed in Section 4. The conclusions and scope for future studies are discussed in Section 5.
Section snippets
Related studies
Anomaly segmentation methods can be categorized into two types according to whether they leverage pre-trained features: learning features from scratch and leveraging pre-trained features.
The proposed IKD method
This section describes the core principles of the IKD. This method is illustrated in Fig. 2. The IKD comprises two stages and modules. A descriptive teacher network pretrained on a large natural image dataset and a randomly initialized student network play key roles. During the training stage, normal images are delivered to the MHKD. MHKD extracts and normalizes the features in each hierarchy of and . Thereafter, AHSM mines hard samples from these features based on the statistical
Experiments
This section compares the proposed IKD with other state-of-the-art anomaly segmentation methods. Various ablation studies are conducted to demonstrate the influence of individual components.
Conclusion
This study proposes a novel method called IKD for unsupervised image anomaly segmentation. Previous anomaly segmentation methods based on knowledge distillation suffer from the overfitting problem caused by the inconsistency between the capacity of the neural network and amount of knowledge. The IKD aims to alleviate the overfitting problem in two ways. One is a loss function called CSL, and the other is an AHSM module. CSL can help the student network learn the structure of a data manifold
CRediT authorship contribution statement
Yunkang Cao: Conceptualization, Methodology, Writing – original draft. Qian Wan: Validation, Investigation, Writing – review & editing. Weiming Shen: Supervision, Writing – review & editing, Funding acquisition, Project administration. Liang Gao: Writing – review & editing, Funding acquisition, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work presented in this paper has been partially supported by the Fundamental Research Funds for the Central Universities of China (2021GCRC058).
References (40)
- et al.
A review of novelty detection
Signal Process.
(2014) - et al.
F-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks
Med. Image Anal.
(2019) - et al.
adVAE: A Self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection
Knowl.-Based Syst.
(2020) - et al.
Deep learning for anomaly detection: A review
ACM Comput. Surv.
(2021) - et al.
The MVTec anomaly detection dataset: A comprehensive real-world dataset for unsupervised anomaly detection
Int. J. Comput. Vis.
(2021) - et al.
Surface defect detection methods for industrial products: A review
Appl. Sci.
(2021) Deep one-class classification
- et al.
Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings
- et al.
Learning discriminative reconstructions for unsupervised outlier removal
- et al.
Iterative energy-based projection on a normal data manifold for anomaly localization
(2020)
Improving unsupervised defect segmentation by applying structural similarity to autoencoders
Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection
Reconstruction by inpainting for visual anomaly detection
Pattern Recognit.
Unsupervised anomaly detection with generative adversarial networks to guide marker discovery
Unsupervised anomaly segmentation via multilevel image reconstruction and adaptive attention-level transition
IEEE Trans. Instrum. Meas.
ImageNet: A Large-scale hierarchical image database
Gaussian anomaly detection by modeling the distribution of normal data in pretrained deep features
IEEE Trans. Instrum. Meas.
PANDA: ADapting pretrained features for anomaly detection and segmentation
PaDiM: A Patch distribution modeling framework for anomaly detection and localization
Industrial image anomaly localization based on Gaussian clustering of pre-trained feature
IEEE Trans. Ind. Electron.
Cited by (47)
Incremental Template Neighborhood Matching for 3D anomaly detection
2024, NeurocomputingREB: Reducing biases in representation for industrial anomaly detection
2024, Knowledge-Based SystemsUnsupervised anomaly detection and localization with one model for all category
2024, Knowledge-Based SystemsAEKD: Unsupervised auto-encoder knowledge distillation for industrial anomaly detection
2024, Journal of Manufacturing SystemsMemFormer: A memory based unified model for anomaly detection on metro railway tracks
2024, Expert Systems with Applications