research-article

Hierarchical Graph Embedded Pose Regularity Learning via Spatio-Temporal Transformer for Abnormal Behavior Detection

Authors:

Chengliang Liu,

Yaowei WangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 307 - 315

https://doi.org/10.1145/3503161.3548369

Published: 10 October 2022 Publication History

Abstract

Abnormal behavior detection in surveillance video is a fundamental task in modern public security. Different from typical pixel-based solutions, pose-based approaches leverage low-dimensional and strongly-structured skeleton feature, which enables the anomaly detector to be immune to complex background noise and obtain higher efficiency. However, existing pose-based methods only utilize the pose of each individual independently while ignore the important interactions between individuals. In this paper, we present a hierarchical graph embedded pose regularity learning framework via spatio-temporal transformer, which leverages the strength of graph representation in encoding strongly-structured skeleton feature. Specifically, skeleton feature is encoded as the hierarchical graph representation, which jointly models the interactions among multiple individuals and the correlations among body joints within the same individual. Furthermore, a novel task-specific spatial-temporal graph transformer is designed to encode the hierarchical spatio-temporal graph embeddings of human skeletons and learn the regular patterns within normal training videos. Experimental results indicate that our method obtains superior performance over state-of-the-art methods on several challenging datasets.

Supplementary Material

MP4 File (MM22-2845.mp4)

Presentation video

Download
90.14 MB

References

[1]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In Proceedings of the International Conference on Machine Learning (ICML).

[2]

Ruichu Cai, Hao Zhang, Wen Liu, Shenghua Gao, and Zhifeng Hao. 2021. Appearance-motion memory consistency network for video anomaly detection. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. 938--946.

[3]

Zhi Chen, Jingjing Li, Yadan Luo, Zi Huang, and Yang Yang. 2020a. Canzsl: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 874--883.

[4]

Zhi Chen, Yadan Luo, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, and Zheng Zhang. 2021a. Semantics Disentangling for Generalized Zero-Shot Learning. In IEEE/CVF International Conference on Computer Vision (ICCV).

[5]

Zhi Chen, Yadan Luo, Sen Wang, Ruihong Qiu, Jingjing Li, and Zi Huang. 2021b. Mitigating Generation Shifts for Generalized Zero-Shot Learning. In Proceedings of the 28th ACM International Conference on Multimedia.

[6]

Zhi Chen, Sen Wang, Jingjing Li, and Zi Huang. 2020b. Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches. In Proceedings of the 28th ACM International Conference on Multimedia. 3413--3421.

Digital Library

[7]

Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao Ni, and Haifeng Chen. 2021. Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection. In Proceedings of the 29th ACM International Conference on Multimedia. 5546--5554.

Digital Library

[8]

Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE International Conference on Computer Vision. 1705--1714.

[9]

Chao Huang, Jie Wen, Yong Xu, Qiuping Jiang, Jian Yang, Yaowei Wang, and David Zhang. 2022. Self-supervised attentive generative adversarial networks for video anomaly detection. IEEE Transactions on Neural Networks and Learning Systems (2022).

[10]

Chao Huang, Zhihao Wu, Jie Wen, Yong Xu, Qiuping Jiang, and Yaowei Wang. 2021a. Abnormal event detection using deep contrastive learning for intelligent video surveillance system. IEEE Transactions on Industrial Informatics, Vol. 18, 8 (2021), 5171--5179.

[11]

Chao Huang, Zehua Yang, Jie Wen, Yong Xu, Qiuping Jiang, Jian Yang, and Yaowei Wang. 2021b. Self-Supervision-Augmented Deep Autoencoder for Unsupervised Visual Anomaly Detection. IEEE Transactions on Cybernetics (2021).

[12]

Yashswi Jain, Ashvini Kumar Sharma, Rajbabu Velmurugan, and Biplab Banerjee. 2021. PoseCVAE: Anomalous Human Activity Detection. In 25th International Conference on Pattern Recognition (ICPR). 2927--2934.

[13]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[14]

An-An Liu, Yu-Ting Su, Wei-Zhi Nie, and Mohan Kankanhalli. 2016. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 1 (2016), 102--114.

[15]

An-An Liu, Hongshuo Tian, Ning Xu, Weizhi Nie, Yongdong Zhang, and Mohan Kankanhalli. 2021. Toward region-aware attention learning for scene graph generation. IEEE Transactions on Neural Networks and Learning Systems (2021).

[16]

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection--a new baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6536--6545.

[17]

Cewu Lu, Jianping Shi, and Jiaya Jia. [n.d.]. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE International Conference on Computer vision.

[18]

Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME). 439--444.

[19]

Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE international conference on computer vision. 341--349.

[20]

Weixin Luo, Wen Liu, and Shenghua Gao. 2021a. Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection. Neurocomputing, Vol. 444 (2021), 332--337.

[21]

Weixin Luo, Wen Liu, Dongze Lian, and Shenghua Gao. 2021b. Future Frame Prediction Network for Video Anomaly Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[22]

Weixin Luo, Wen Liu, Dongze Lian, Jinhui Tang, Lixin Duan, Xi Peng, and Shenghua Gao. 2021c. Video anomaly detection with sparse coding inspired deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 3 (2021), 1070--1084.

[23]

Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, and Jian Yang. 2021. Learning Normal Dynamics in Videos with Meta Prototype Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15425--15434.

[24]

Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. 2020. Graph embedded pose clustering for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10539--10547.

[25]

Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, and Svetha Venkatesh. 2020. Learning regularity in skeleton trajectories for anomaly detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11996--12004.

[26]

Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14372--14381.

[27]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[28]

Royston Rodrigues, Neha Bhargava, Rajbabu Velmurugan, and Subhasis Chaudhuri. 2020. Multi-timescale trajectory prediction for abnormal human activity detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2626--2634.

[29]

Che Sun, Yunde Jia, Yao Hu, and Yuwei Wu. 2020. Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos. In Proceedings of the 28th ACM International Conference on Multimedia. 184--192.

Digital Library

[30]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5693--5703.

[31]

ultralytics. 2020. Yolov5. https://github.com/ultralytics/yolov5 (2020).

[32]

Ziming Wang, Yuexian Zou, and Zeming Zhang. 2020. Cluster Attention Contrast for Video Anomaly Detection. In Proceedings of the 28th ACM International Conference on Multimedia. 2463--2471.

Digital Library

[33]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP). 3645--3649.

Digital Library

[34]

Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. Anopcn: Video anomaly detection via deep predictive coding network. In Proceedings of the 27th ACM International Conference on Multimedia. 1805--1813.

Digital Library

[35]

Guang Yu, Siqi Wang, Zhiping Cai, En Zhu, Chuanfu Xu, Jianping Yin, and Marius Kloft. 2020. Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events. In Proceedings of the 28th ACM International Conference on Multimedia. 583--591.

Digital Library

[36]

Qing Yu and Kiyoharu Aizawa. 2019. Unsupervised out-of-distribution detection by maximum classifier discrepancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9518--9526.

[37]

Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, and Wei Wu. 2021. Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection. arXiv preprint arXiv:2112.03649 (2021).

[38]

Muhammad Zaigham Zaheer, Jin-ha Lee, Marcella Astrid, and Seung-Ik Lee. 2020. Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14183--14193.

[39]

Xianlin Zeng, Yalong Jiang, Wenrui Ding, Hongguang Li, Yafeng Hao, and Zifeng Qiu. 2021. A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos. arXiv preprint arXiv:2112.04294 (2021).

[40]

Dasheng Zhang, Chao Huang, Chengliang Liu, and Yong Xu. 2022a. Weakly Supervised Video Anomaly Detection via Transformer-Enabled Temporal Relation Learning. IEEE Signal Processing Letters, Vol. 29 (2022), 1197--1201.

[41]

Zheng Zhang, Luyao Liu, Yadan Luo, Zi Huang, Fumin Shen, Heng Tao Shen, and Guangming Lu. 2020. Inductive structure consistent hashing via flexible semantic calibration. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 10 (2020), 4514--4528.

[42]

Zheng Zhang, Haoyang Luo, Lei Zhu, Guangming Lu, and Heng Tao Shen. 2022b. Modality-invariant asymmetric networks for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering (2022).

Digital Library

[43]

Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, and Zhengming Ding. 2021. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11656--11665.

Cited By

Liu GZhang JLv PWang CWang HWang D(2025)TAAD: Time-varying adversarial anomaly detection in dynamic graphsInformation Processing & Management10.1016/j.ipm.2024.10391262:1(103912)Online publication date: Jan-2025
https://doi.org/10.1016/j.ipm.2024.103912
Li CLi HZhang G(2025)Spatial Scene Temporal Behavior Framework for Anomaly DetectionDigital Signal Processing10.1016/j.dsp.2025.105076(105076)Online publication date: Feb-2025
https://doi.org/10.1016/j.dsp.2025.105076
Huang CWen JLiu CLiu YLarson K(2024)Long short-term dynamic prototype alignment learning for video anomaly detectionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/96(866-874)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/96
Show More Cited By

Index Terms

Hierarchical Graph Embedded Pose Regularity Learning via Spatio-Temporal Transformer for Abnormal Behavior Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene anomaly detection

Recommendations

Pixel-Level Anomaly Detection via Uncertainty-aware Prototypical Transformer
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pixel-level visual anomaly detection, which aims to recognize the abnormal areas from images, plays an important role in industrial fault detection and medical diagnosis. However, it is a challenging task due to the following reasons: i) the large ...
Spatio-Temporal AutoEncoder for Video Anomaly Detection
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Anomalous events detection in real-world video scenes is a challenging problem due to the complexity of "anomaly" as well as the cluttered backgrounds, objects and motions in the scenes. Most existing methods use hand-crafted features in local spatial ...
Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph
Abstract
Detecting anomalies from a massive amount of user behavioral data is often liken to finding a needle in a haystack. While tremendous efforts have been devoted to anomaly detection from temporal graphs, existing studies rarely consider community ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Establishment of Key Laboratory of Shenzhen Science and Technology Innovation Committee
Shenzhen Fundamental Research Fund

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
564
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)11

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu GZhang JLv PWang CWang HWang D(2025)TAAD: Time-varying adversarial anomaly detection in dynamic graphsInformation Processing & Management10.1016/j.ipm.2024.10391262:1(103912)Online publication date: Jan-2025
https://doi.org/10.1016/j.ipm.2024.103912
Li CLi HZhang G(2025)Spatial Scene Temporal Behavior Framework for Anomaly DetectionDigital Signal Processing10.1016/j.dsp.2025.105076(105076)Online publication date: Feb-2025
https://doi.org/10.1016/j.dsp.2025.105076
Huang CWen JLiu CLiu YLarson K(2024)Long short-term dynamic prototype alignment learning for video anomaly detectionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/96(866-874)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/96
Cheng KPan YLiu YZeng XFeng RLarson K(2024)Denoising diffusion-augmented hybrid video anomaly detection via reconstructing noised framesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/77(695-703)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/77
Wu PZhou XPang GYang ZYan QWang PZhang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal PromptsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681442(9301-9310)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681442
Liu YYang DWang YLiu JLiu JBoukerche ASun PSong L(2024)Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep ModelsACM Computing Surveys10.1145/364510156:7(1-38)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.1145/3645101
Noghre GPazho ATabkhi H(2024)An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW60836.2024.00109(995-1004)Online publication date: 1-Jan-2024
https://doi.org/10.1109/WACVW60836.2024.00109
Liu YWang JHuang CWu YXu YCao X(2024)MLFA: Toward Realistic Test Time Adaptive Object Detection by Multi-Level Feature AlignmentIEEE Transactions on Image Processing10.1109/TIP.2024.347353233(5837-5848)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3473532
Wu PLiu JHe XPeng YWang PZhang Y(2024)Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and ModelIEEE Transactions on Image Processing10.1109/TIP.2024.337407033(2213-2225)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3374070
Liu YLiu JYang KJu BLiu SWang YYang DSun PSong L(2024)AMP-Net: Appearance-Motion Prototype Network Assisted Automatic Video Anomaly Detection SystemIEEE Transactions on Industrial Informatics10.1109/TII.2023.329847620:2(2843-2855)Online publication date: Feb-2024
https://doi.org/10.1109/TII.2023.3298476
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten