Elsevier

Knowledge-Based Systems

Volume 248, 19 July 2022, 108846
Knowledge-Based Systems

Informative knowledge distillation for image anomaly segmentation

https://doi.org/10.1016/j.knosys.2022.108846Get rights and content

Abstract

Unsupervised anomaly segmentation methods based on knowledge distillation have recently been developed and have shown superior segmentation performance. However, little attention has been paid to the overfitting problem caused by the inconsistency between the capacity of a neural network and the amount of knowledge in this scheme. This study proposes a novel method called informative knowledge distillation (IKD) to address the overfitting problem by distilling informative knowledge and offering a strong supervisory signal. Technically, a novel context similarity loss method is proposed to capture context information from normal data manifolds. In addition, a novel adaptive hard sample mining method is proposed to encourage more attention on hard samples with valuable information. With IKD, informative knowledge can be distilled such that the overfitting problem can be effectively mitigated, and the performance can be further increased. The proposed method achieved better results on several categories of the well-known MVTec AD dataset than state-of-the-art methods in terms of AU-ROC, achieving 97.81% overall in 15 categories. Extensive experiments on ablation have also been conducted to demonstrate the effectiveness of IKD in alleviating the overfitting problem.

Introduction

Image anomaly segmentation is a critical task in product quality control [1], [2], [3], [4]. This can be referred to as segmenting regions that deviate significantly from the most normal data. With image anomaly segmentation, defects in industrial images can be well segmented, and experts can quickly locate faulty parts in the production equipment.

Image anomaly segmentation is often formulated as an unsupervised task [5], [6]. In real-world applications, many tasks often lack abnormal samples or it is unclear which type of anomaly may appear [3]. Meanwhile, anomalies can vary in color, scale, and shape, making abnormal data distribution challenging to observe. This study proposes a novel unsupervised image-anomaly segmentation method. The proposed method can assign an anomaly score to each pixel. Examples from the MVTec anomaly detection (MVTec AD) dataset [3] and the corresponding anomaly scores produced by the proposed method are shown in Fig. 1.

Several researchers have studied reconstruction-based methods [7], [8], [9], [10], [11], [12], [13], [14], [15]. The per-pixel reconstruction errors or densities obtained from the probability distribution of the model can be leveraged to score the anomalies. Nevertheless, the quality of the reconstructed images significantly affects the segmentation results [9], [10].

Recently, networks pretrained on large natural image datasets, such as ImageNet [16], have been proven to be effective in image anomaly segmentation [6], [17], [18], [19], [20], [21], [22], [23], [24]. Pre-trained networks automatically build comprehensive representations at multiple hierarchical levels. In the feature spaces produced by pretrained networks, normal data feature manifolds are compact, and abnormal data features are located sparsely and far away from normal data manifolds. Clustering models are leveraged to describe normal data manifolds, and anomaly scores can be derived from the probability density estimation of the chosen clustering model [17], [18], [19], [20]. However, it is difficult to select a suitable clustering model without prior knowledge of the properties of normal data manifolds.

To avoid manually selecting clustering models, researchers have leveraged knowledge distillation to implicitly model normal data manifolds [6], [21], [22], [23]. Knowledge distillation was initially proposed to build light models [25], [26], [27]. In unsupervised image anomaly segmentation, knowledge distillation is leveraged to learn a student network that can mimic the normal data features extracted by a teacher network that is pretrained on a large image dataset. The differences between the features extracted by the student and teacher networks indicate anomaly scores. Knowledge-distillation -based methods achieve significant improvements compared to clustering model-based methods because student networks have a greater number of parameters than traditional clustering models. However, continuously increasing the number of parameters does not always increase performance [6], [21], [22], [23], which indicates that an overfitting problem exists. Intuitively, the knowledge distilled into the student network is far less than the capacity of the student network.

Generally, overfitting problems can be mitigated in two ways: by decreasing the number of parameters in the student network or by improving the amount of knowledge distilled to the student network. A simpler cloner student network of the teacher network was leveraged in [21] to mitigate overfitting. However, a drop in critical information can occur if the capacity of the student network is smaller than that of the teacher network [22].

This study proposes mitigating the overfitting problem by distilling informative knowledge in a similar way to improve the amount of knowledge. Informative knowledge can be defined as a type of knowledge that offers a strong supervisory signal and alleviates overfitting. Hence, this study presents a novel method called informative knowledge distillation (IKD) to distill informative knowledge. Technically, a novel loss function called context similarity loss (CSL) is proposed to constrain the context similarities between the features extracted by the teacher and student networks. CSL helps the student network be aware of the structure of the normal data manifold. In addition, novel adaptive hard sample mining (AHSM) is introduced to encourage more attention on information-rich hard samples and avoid continuous optimization of easy samples. In the optimizing phase, the optimizing direction of easy samples may be opposite to that of hard samples; therefore, the effectiveness of hard samples will be counteracted by easy samples. Such suppression of easy samples over hard samples is considered as unexpected noise. AHSM can inhibit unexpected noise and improve anomaly segmentation performance. Benefiting from the above innovations, informative knowledge that helps offer a strong supervisory signal is distilled to the student network, such that the overfitting risk declines, and the generalization ability of the student network improves. Thereafter, networks with a more significant number of parameters can be leveraged to improve segmentation performance. IKD comprises two stages: training and inference. There are two modules: multi-hierarchical knowledge distillation (MHKD) and multi-hierarchical anomaly score fusion (MHASF). CSL and AHSM are leveraged in MHKD to extract informative knowledge. The main novelties and contributions of this study are as follows:

  • This study improves the performance of knowledge distillation-based image anomaly segmentation by extracting informative knowledge and mitigating the overfitting problem.

  • This study proposes a novel loss function called CSL. This encourages the student network to be aware of the structure of the normal data manifold, such that low regression errors can be acquired on all normal samples in the manifold. Therefore, abnormal samples will have more significant regression errors and are easier to detect.

  • This study proposes a novel AHSM. It can adaptively mine hard samples based on the statistical information of feature distances, which is beneficial for extracting informative knowledge from hard samples and avoiding unexpected noise in easy samples.

The remainder of this paper is organized as follows. Related studies on unsupervised image anomaly segmentation are reviewed in Section 2. The IKD is introduced in detail in Section 3. Extensive experiments are conducted and analyzed in Section 4. The conclusions and scope for future studies are discussed in Section 5.

Section snippets

Related studies

Anomaly segmentation methods can be categorized into two types according to whether they leverage pre-trained features: learning features from scratch and leveraging pre-trained features.

The proposed IKD method

This section describes the core principles of the IKD. This method is illustrated in Fig. 2. The IKD comprises two stages and modules. A descriptive teacher network T pretrained on a large natural image dataset and a randomly initialized student network S play key roles. During the training stage, normal images are delivered to the MHKD. MHKD extracts and normalizes the features in each hierarchy of T and S. Thereafter, AHSM mines hard samples from these features based on the statistical

Experiments

This section compares the proposed IKD with other state-of-the-art anomaly segmentation methods. Various ablation studies are conducted to demonstrate the influence of individual components.

Conclusion

This study proposes a novel method called IKD for unsupervised image anomaly segmentation. Previous anomaly segmentation methods based on knowledge distillation suffer from the overfitting problem caused by the inconsistency between the capacity of the neural network and amount of knowledge. The IKD aims to alleviate the overfitting problem in two ways. One is a loss function Lcs called CSL, and the other is an AHSM module. CSL can help the student network learn the structure of a data manifold

CRediT authorship contribution statement

Yunkang Cao: Conceptualization, Methodology, Writing – original draft. Qian Wan: Validation, Investigation, Writing – review & editing. Weiming Shen: Supervision, Writing – review & editing, Funding acquisition, Project administration. Liang Gao: Writing – review & editing, Funding acquisition, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work presented in this paper has been partially supported by the Fundamental Research Funds for the Central Universities of China (2021GCRC058).

References (40)

  • BergmannP. et al.

    Improving unsupervised defect segmentation by applying structural similarity to autoencoders

  • GongD.

    Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection

  • ZavrtanikV. et al.

    Reconstruction by inpainting for visual anomaly detection

    Pattern Recognit.

    (2021)
  • SchleglT. et al.

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

  • YanY. et al.

    Unsupervised anomaly segmentation via multilevel image reconstruction and adaptive attention-level transition

    IEEE Trans. Instrum. Meas.

    (2021)
  • DengJia et al.

    ImageNet: A Large-scale hierarchical image database

  • RippelO. et al.

    Gaussian anomaly detection by modeling the distribution of normal data in pretrained deep features

    IEEE Trans. Instrum. Meas.

    (2021)
  • ReissT. et al.

    PANDA: ADapting pretrained features for anomaly detection and segmentation

  • DefardT. et al.

    PaDiM: A Patch distribution modeling framework for anomaly detection and localization

  • WanQ. et al.

    Industrial image anomaly localization based on Gaussian clustering of pre-trained feature

    IEEE Trans. Ind. Electron.

    (2021)
  • Cited by (47)

    View all citing articles on Scopus
    View full text