research-article

Pixel-Level Anomaly Detection via Uncertainty-aware Prototypical Transformer

Authors:

Chengliang Liu,

Yong XuAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 521 - 530

https://doi.org/10.1145/3503161.3548082

Published: 10 October 2022 Publication History

Abstract

Pixel-level visual anomaly detection, which aims to recognize the abnormal areas from images, plays an important role in industrial fault detection and medical diagnosis. However, it is a challenging task due to the following reasons: i) the large variation of anomalies; and ii) the ambiguous boundary between anomalies and their normal surroundings. In this work, we present an uncertainty-aware prototypical transformer (UPformer), which takes into account both the diversity and uncertainty of anomaly to achieve accurate pixel-level visual anomaly detection. To this end, we first design a memory-guided prototype learning transformer encoder to learn and memorize the prototypical representations of anomalies for enabling the model to capture the diversity of anomalies. Additionally, an anomaly detection uncertainty quantizer is designed to learn the distributions of anomaly detection for measuring the anomaly detection uncertainty. Furthermore, an uncertainty-aware transformer decoder is proposed to leverage the detection uncertainties to guide the model to focus on the uncertain areas and generate the final detection results. As a result, our method achieves more accurate anomaly detection by combining the benefits of prototype learning and uncertainty estimation. Experimental results on five datasets indicate that our method achieves state-of-the-art anomaly detection performance.

Supplementary Material

MP4 File (MM22-fp1379.mp4)

Presentation video of MM22-fp1379

Download
41.53 MB

References

[1]

Seung-Hwan Bae and Kuk-Jin Yoon. 2015. Polyp detection via imbalanced learning and discriminative feature learning. IEEE Transactions on Medical Imaging, Vol. 34, 11 (2015), 2379--2393.

[2]

Hangbo Bao, Li Dong, and Furu Wei. 2021. BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254 (2021).

[3]

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger. 2020. Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4182--4191.

[4]

Jorge Bernal, F Javier Sánchez, Gloria Fernández-Esparrach, Debora Gil, Cristina Rodr'iguez, and Fernando Vilari no. 2015. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics, Vol. 43 (2015), 99--111.

[5]

Zhi Chen, Jingjing Li, Yadan Luo, Zi Huang, and Yang Yang. 2020a. Canzsl: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 874--883.

[6]

Zhi Chen, Yadan Luo, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, and Zheng Zhang. 2021a. Semantics Disentangling for Generalized Zero-Shot Learning. In IEEE/CVF International Conference on Computer Vision (ICCV).

[7]

Zhi Chen, Yadan Luo, Sen Wang, Ruihong Qiu, Jingjing Li, and Zi Huang. 2021b. Mitigating Generation Shifts for Generalized Zero-Shot Learning. In Proceedings of the 28th ACM International Conference on Multimedia.

[8]

Zhi Chen, Sen Wang, Jingjing Li, and Zi Huang. 2020b. Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches. In Proceedings of the 28th ACM International Conference on Multimedia. 3413--3421.

Digital Library

[9]

Ming-Ming Cheng and Deng-Ping Fan. 2021. Structure-measure: A new way to evaluate foreground maps. International Journal of Computer Vision, Vol. 129, 9 (2021), 2622--2638.

Digital Library

[10]

Deng-Ping Fan, Ge-Peng Ji, Xuebin Qin, and Ming-Ming Cheng. 2021. Cognitive vision inspired object segmentation metric and loss function. SSI, Vol. 6 (2021).

[11]

Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. Pranet: Parallel reverse attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 263--273.

Digital Library

[12]

Yuqi Fang, Cheng Chen, Yixuan Yuan, and Kai-yu Tong. 2019. Selective feature aggregation network with area-boundary constraints for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 302--310.

Digital Library

[13]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377 (2021).

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 770--778.

[15]

Chao Huang, Jie Wen, Yong Xu, Qiuping Jiang, Jian Yang, Yaowei Wang, and David Zhang. 2022. Self-supervised attentive generative adversarial networks for video anomaly detection. IEEE Transactions on Neural Networks and Learning Systems (2022).

[16]

Chao Huang, Zhihao Wu, Jie Wen, Yong Xu, Qiuping Jiang, and Yaowei Wang. 2021a. Abnormal event detection using deep contrastive learning for intelligent video surveillance system. IEEE Transactions on Industrial Informatics, Vol. 18, 8 (2021), 5171--5179.

[17]

Chao Huang, Zehua Yang, Jie Wen, Yong Xu, Qiuping Jiang, Jian Yang, and Yaowei Wang. 2021b. Self-Supervision-Augmented Deep Autoencoder for Unsupervised Visual Anomaly Detection. IEEE Transactions on Cybernetics (2021).

[18]

Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, and Min Sun. 2018. Efficient uncertainty estimation for semantic segmentation in videos. In Proceedings of the European Conference on Computer Vision (ECCV). 520--535.

[19]

Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen. 2019. Resunet: An advanced architecture for medical image segmentation. In 2019 IEEE International Symposium on Multimedia (ISM). 225--2255.

[20]

Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015).

[21]

Taehun Kim, Hyemin Lee, and Daijin Kim. 2021. UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation. In Proceedings of the 29th ACM International Conference on Multimedia. 2167--2175.

Digital Library

[22]

Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weissenborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.

[23]

Yongchan Kwon, Joong-Ho Won, Beom Joon Kim, and Myunghee Cho Paik. 2020. Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation. Computational Statistics & Data Analysis, Vol. 142 (2020), 106816.

Digital Library

[24]

An-An Liu, Yu-Ting Su, Wei-Zhi Nie, and Mohan Kankanhalli. 2016. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 1 (2016), 102--114.

[25]

An-An Liu, Hongshuo Tian, Ning Xu, Weizhi Nie, Yongdong Zhang, and Mohan Kankanhalli. 2021. Toward region-aware attention learning for scene graph generation. IEEE Transactions on Neural Networks and Learning Systems (2021).

[26]

Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, and Jian Yang. 2021. Learning Normal Dynamics in Videos with Meta Prototype Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15425--15434.

[27]

Alexander V Mamonov, Isabel N Figueiredo, Pedro N Figueiredo, and Yen-Hsi Richard Tsai. 2014. Automated polyp detection in colon capsule endoscopy. IEEE Transactions on Medical Imaging, Vol. 33, 7 (2014), 1488--1502.

[28]

Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 248--255.

Digital Library

[29]

Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14372--14381.

[30]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[31]

Zihuan Qiu, Zhichuan Wang, Miaomiao Zhang, Ziyong Xu, Jie Fan, and Linfeng Xu. 2022. BDG-Net: Boundary Distribution Guided Network for Accurate Polyp Segmentation. arXiv preprint arXiv:2201.00767 (2022).

[32]

Tal Reiss, Niv Cohen, Liron Bergman, and Yedid Hoshen. 2021. PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2806--2814.

[33]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 234--241.

[34]

Juan Silva, Aymeric Histace, Olivier Romain, Xavier Dray, and Bertrand Granado. 2014. Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International Journal of Computer Assisted Radiology and Surgery, Vol. 9, 2 (2014), 283--293.

[35]

Nima Tajbakhsh, Suryakanth R Gurudu, and Jianming Liang. 2015. Automated polyp detection in colonoscopy videos using shape and context information. IEEE Transactions on Medical Imaging, Vol. 35, 2 (2015), 630--644.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.

[37]

David Vázquez, Jorge Bernal, F Javier Sánchez, Gloria Fernández-Esparrach, Antonio M López, Adriana Romero, Michal Drozdzal, and Aaron Courville. 2017. A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of Healthcare Engineering (2017).

[38]

Shenzhi Wang, Liwei Wu, Lei Cui, and Yujun Shen. 2021. Glancing at the Patch: Anomaly Localization With Global and Local Feature Comparison. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 254--263.

[39]

Jhih-Ciang Wu, Ding-Jie Chen, Chiou-Shann Fuh, and Tyng-Luh Liu. 2021. Learning Unsupervised Metaformer for Anomaly Detection. In Proceedings of the IEEE International Conference on Computer Vision. 4369--4378.

[40]

Fan Yang, Qiang Zhai, Xin Li, Rui Huang, Ao Luo, Hong Cheng, and Deng-Ping Fan. 2021. Uncertainty-guided transformer reasoning for camouflaged object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4146--4155.

[41]

Zijin Yin, Kongming Liang, Zhanyu Ma, and Jun Guo. 2021. Duplex Contextual Relation Network for Polyp Segmentation. arXiv preprint arXiv:2103.06725 (2021).

[42]

Dasheng Zhang, Chao Huang, Chengliang Liu, and Yong Xu. 2022a. Weakly Supervised Video Anomaly Detection via Transformer-Enabled Temporal Relation Learning. IEEE Signal Processing Letters, Vol. 29 (2022), 1197--1201.

[43]

Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Saleh, Sadegh Aliakbarian, and Nick Barnes. 2021. Uncertainty inspired RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[44]

Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, and Yizhou Yu. 2020a. Adaptive context selection for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 253--262.

Digital Library

[45]

Zheng Zhang, Luyao Liu, Yadan Luo, Zi Huang, Fumin Shen, Heng Tao Shen, and Guangming Lu. 2020b. Inductive structure consistent hashing via flexible semantic calibration. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 10 (2020), 4514--4528.

[46]

Zhengxin Zhang, Qingjie Liu, and Yunhong Wang. 2018. Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters, Vol. 15, 5 (2018), 749--753.

[47]

Zheng Zhang, Haoyang Luo, Lei Zhu, Guangming Lu, and Heng Tao Shen. 2022b. Modality-invariant asymmetric networks for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering (2022).

Digital Library

[48]

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6881--6890.

[49]

Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. Unet: A nested u-net architecture for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 3--11.

Digital Library

Cited By

Cheng KPan YLiu YZeng XFeng RLarson K(2024)Denoising diffusion-augmented hybrid video anomaly detection via reconstructing noised framesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/77(695-703)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/77
Wu PZhou XPang GYang ZYan QWang PZhang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal PromptsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681442(9301-9310)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681442
Lin JTao ZTong XMai XWang HWang BWang YZhao QYu JLin YYan SGao SZhang WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Suppressing Uncertainties in Degradation Estimation for Blind Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681439(6374-6383)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681439
Show More Cited By

Index Terms

Pixel-Level Anomaly Detection via Uncertainty-aware Prototypical Transformer
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene anomaly detection
    2. Knowledge representation and reasoning
      1. Probabilistic reasoning

Recommendations

Multiresolution feature guidance based transformer for anomaly detection
Abstract
Anomaly detection is represented as an unsupervised learning to identify deviated images from normal images. In general, there are two main challenges of anomaly detection tasks, i.e., the class imbalance and the unexpectedness of anomalies. In ...
Rectifying inaccurate unsupervised learning for robust time series anomaly detection
Abstract
Unsupervised time series anomaly detection is a challenging task. Data contamination brings more challenges for the existing methods that rely on completely clean training data. Moreover, sparse anomaly knowledge leads to the deviation of the ...
Transformer for Point Anomaly Detection
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

In data analysis, unsupervised anomaly detection holds an important position for identifying statistical outliers that signify atypical behavior, erroneous readings, or interesting patterns within data. The Transformer model, known for its ability to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Establishment of Key Laboratory of Shenzhen Science and Technology Innovation Committee
Shenzhen Fundamental Research Fund

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
484
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)8

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cheng KPan YLiu YZeng XFeng RLarson K(2024)Denoising diffusion-augmented hybrid video anomaly detection via reconstructing noised framesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/77(695-703)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/77
Wu PZhou XPang GYang ZYan QWang PZhang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal PromptsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681442(9301-9310)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681442
Lin JTao ZTong XMai XWang HWang BWang YZhao QYu JLin YYan SGao SZhang WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Suppressing Uncertainties in Degradation Estimation for Blind Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681439(6374-6383)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681439
Liu YLiu JYang KJu BLiu SWang YYang DSun PSong L(2024)AMP-Net: Appearance-Motion Prototype Network Assisted Automatic Video Anomaly Detection SystemIEEE Transactions on Industrial Informatics10.1109/TII.2023.329847620:2(2843-2855)Online publication date: Feb-2024
https://doi.org/10.1109/TII.2023.3298476
Huang CLiu CWen JWu LXu YJiang QWang Y(2024)Weakly Supervised Video Anomaly Detection via Self-Guided Temporal Discriminative TransformerIEEE Transactions on Cybernetics10.1109/TCYB.2022.322704454:5(3197-3210)Online publication date: May-2024
https://doi.org/10.1109/TCYB.2022.3227044
Popescu RAnantrasirichai NBiggs J(2024)Anomaly Detection for the Identification of Volcanic Unrest in Satellite Imagery2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647957(2327-2333)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIP51287.2024.10647957
Huang CShi YZhang BLyu K(2024)Uncertainty-aware prototypical learning for anomaly detection in medical imagesNeural Networks10.1016/j.neunet.2024.106284175(106284)Online publication date: Jul-2024
https://doi.org/10.1016/j.neunet.2024.106284
Duan KCui SShinnou HBao S(2024)View-Channel Mixer Network for Double Incomplete Multi-View Multi-Label learningNeurocomputing10.1016/j.neucom.2024.129013(129013)Online publication date: Nov-2024
https://doi.org/10.1016/j.neucom.2024.129013
Qu XZhou JJiang JWang WWang HWang STang WLin X(2024)EH-formerInformation Fusion10.1016/j.inffus.2024.102430109:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.inffus.2024.102430
Yang CChen MWang YWang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field RecordingsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612424(4031-4041)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612424

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten