GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

Yao, Hang; Liu, Ming; Yin, Zhicun; Yan, Zifei; Hong, Xiaopeng; Zuo, Wangmeng

doi:10.1007/978-3-031-73209-6_1

Hang Yao¹³,
Ming Liu^13,14,
Zhicun Yin¹³,
Zifei Yan^13,14,
Xiaopeng Hong¹³ &
…
Wangmeng Zuo^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15129))

Included in the following conference series:

European Conference on Computer Vision

407 Accesses

Abstract

Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. For example, adding back a missing element is harder than dealing with a scratch, thus requiring a larger number of denoising steps. Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models. From the local perspective, reconstructing abnormal regions differs from normal areas even in the same image. Theoretically, the diffusion model predicts a noise for each step, typically following a standard Gaussian distribution. However, due to the difference between the anomaly and its potential normal counterpart, the predicted noise in abnormal regions will inevitably deviate from the standard Gaussian distribution. To this end, we propose introducing synthetic abnormal samples in training to encourage the diffusion models to break through the limitation of standard Gaussian distribution, and a spatial-adaptive feature fusion scheme is utilized during inference. With the above modifications, we propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection, which introduces appealing flexibility and achieves anomaly-free reconstruction while retaining as much normal information as possible. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, MPDD, and VisA) and a printed circuit board dataset (PCB-Bank) we integrated, showing the effectiveness of the proposed method. The source code and pre-trained models are publicly available at https://github.com/hyao1/GLAD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Latent Diffusion Based Multi-class Anomaly Detection

LGFDR: local and global feature denoising reconstruction for unsupervised anomaly detection

Article 12 May 2024

ADTR: Anomaly Detection Transformer with Feature Reconstruction

Notes

1.
In the setting of diffusion models, the randomness is equivalent to the weight of the random noise, which is determined by the denoising step. In other words, a larger denoising step means higher noise weight and stronger randomness.
2.
PCB-Bank is a printed circuit board dataset we integrated from existing datasets, please refer to https://github.com/SSRheart/industrial-anomaly-detection-dataset for more details.

References

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: MVTec AD–a comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9592–9600 (2019)
Google Scholar
Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Uninformed students: student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4183–4192 (2020)
Google Scholar
Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., Steger, C.: Improving unsupervised defect segmentation by applying structural similarity to autoencoders. arXiv preprint arXiv:1807.02011 (2018)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chen, X., Han, Y., Zhang, J.: A zero-/few-shot anomaly classification and segmentation method for CVPR 2023 vand workshop challenge tracks 1 &2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 (2023)
Defard, T., Setkov, A., Loesch, A., Audigier, R.: PaDiM: a patch distribution modeling framework for anomaly detection and localization. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12664, pp. 475–489. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68799-1_35
Chapter Google Scholar
Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746 (2022)
Google Scholar
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)
He, H., et al.: Diad: a diffusion-based framework for multi-class anomaly detection. arXiv preprint arXiv:2312.06607 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advance in Neural Information Processing System, vol. 33, pp. 6840–6851 (2020)
Google Scholar
Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 66–71. IEEE (2021)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Liang, Y., Zhang, J., Zhao, S., Wu, R., Liu, Y., Pan, S.: Omni-frequency channel-selection representations for unsupervised anomaly detection. IEEE Trans. Image Process. (2023)
Google Scholar
Liu, W., et al.: Towards visually explaining variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8642–8651 (2020)
Google Scholar
Liu, X., et al.: Smartcontrol: enhancing controlnet for handling rough visual conditions. arXiv preprint arXiv:2404.06451 (2024)
Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: a simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411 (2023)
Google Scholar
Lu, F., Yao, X., Fu, C.W., Jia, J.: Removing anomalies as noises for industrial defect localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16166–16175 (2023)
Google Scholar
Mousakhan, A., Brox, T., Tayyub, J.: Anomaly detection with conditioned denoising diffusion models. arXiv preprint arXiv:2305.15956 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328 (2022)
Google Scholar
Rudolph, M., Wehrbein, T., Rosenhahn, B., Wandt, B.: Asymmetric student-teacher networks for industrial anomaly detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2592–2602 (2023)
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2020)
Google Scholar
Wei, Y., Zhang, Y., Ji, Z., Bai, J., Zhang, L., Zuo, W.: Elite: encoding visual concepts into textual embeddings for customized text-to-image generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15943–15953 (2023)
Google Scholar
Yang, M., Wu, P., Feng, H.: MemSeg: a semi-supervised method for image surface defect detection using differences and commonalities. Eng. Appl. Artif. Intell. 119, 105835 (2023)
Article Google Scholar
Yin, H., Jiao, G., Wu, Q., Karlsson, B.F., Huang, B., Lin, C.Y.: Lafite: latent diffusion model with feature editing for unsupervised multi-class anomaly detection. arXiv preprint arXiv:2307.08059 (2023)
Zavrtanik, V., Kristan, M., Skočaj, D.: Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339 (2021)
Google Scholar
Zhang, X., Li, N., Li, J., Dai, T., Jiang, Y., Xia, S.T.: Unsupervised surface anomaly detection with diffusion probabilistic model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6782–6791 (2023)
Google Scholar
Zhang, Y., Wei, Y., Jiang, D., Zhang, X., Zuo, W., Tian, Q.: Controlvideo: training-free controllable text-to-video generation. In: The Twelfth International Conference on Learning Representations (2023)
Google Scholar
Zhang, Y., et al.: Videoelevator: elevating video generation quality with versatile text-to-image diffusion models. arXiv preprint arXiv:2403.05438 (2024)
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.S.: Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1933–1941 (2017)
Google Scholar
Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13690, pp. 392–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20056-4_23
Chapter Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant No. 2023YFA1008500.

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Hang Yao, Ming Liu, Zhicun Yin, Zifei Yan, Xiaopeng Hong & Wangmeng Zuo
Pazhou Lab Huangpu, Guangzhou, China
Ming Liu, Zifei Yan & Wangmeng Zuo

Authors

Hang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhicun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Zifei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Liu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7574 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, H., Liu, M., Yin, Z., Yan, Z., Hong, X., Zuo, W. (2025). GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15129. Springer, Cham. https://doi.org/10.1007/978-3-031-73209-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-73209-6_1
Published: 01 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73208-9
Online ISBN: 978-3-031-73209-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection