Few-shot fine-tuning with auxiliary tasks for video anomaly detection

Lv, Jing; Liu, Zhi; Li, Gongyang

doi:10.1007/s00530-025-01706-8

Few-shot fine-tuning with auxiliary tasks for video anomaly detection

Regular Paper
Published: 24 February 2025

Volume 31, article number 127, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jing Lv¹,
Zhi Liu^1,2 &
Gongyang Li^1,2,3

118 Accesses
Explore all metrics

Abstract

Anomaly detection in surveillance videos aims to identify video frames that exhibit unexpected behavior. Most existing methods follow an unsupervised setup, training with normal videos and testing with videos from the same scene. However, in real-world deployments, the performance of existing models significantly degrades when faced with unseen scenes. To address this issue, we introduce the auxiliary tasks of segmentation and optical flow estimation into the fine-tuning process, proposing a novel Segmentation and Optical Flow Fine-tuning (SOFF) framework. This framework enables the existing models to adapt to new scenes with only a few samples for fine-tuning. To integrate these auxiliary tasks, we design a Segmentation and Flow Output Network (SFO-Net). SFO-Net enhances fine-tuning performance in unseen scenes by extracting rich shape and motion information through the execution of auxiliary tasks during the fine-tuning process. Additionally, SFO-Net can be flexibly cascaded with existing models that output images to form the SOFF framework. Experiments on multiple datasets demonstrate that our framework improves the performance of existing models when faced unseen scenes through few-shot scenes fine-tuning and achieves competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Anomaly Detection Network for Unseen Scene Without Fine-Tuning

FDC-Net: foreground dynamic capture with deep feature enhancement for video anomaly detection

Article 09 February 2025

Multimodal and multiscale feature fusion for weakly supervised video anomaly detection

Article Open access 01 October 2024

Data availibility

No datasets were generated or analysed during the current study.

References

Zaigham Zaheer, M., Lee, J.-H., Astrid, M., Lee, S.-I.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14171–14181 (2020). https://doi.org/10.1109/CVPR42600.2020.01419
Liu, Y., Liu, J., Lin, J., Zhao, M., Song, L.: Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circuits Syst. II Express Briefs 69(5), 2498–2502 (2022). https://doi.org/10.1109/TCSII.2022.3161049
Article Google Scholar
Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018) https://doi.org/10.1016/j.patrec.2017.07.016
Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Unsupervised video anomaly detection with diffusion models conditioned on compact motion representations. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds.) Image Analysis and Processing—ICIAP 2023, pp. 49–62. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43153-1_5
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection—a new baseline. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018).https://doi.org/10.1109/CVPR.2018.00684
Lei, S., Song, J., Wang, T., Wang, F., Yan, Z.: Attention u-net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection. Multimedia Syst. 30(3), 118 (2024). https://doi.org/10.1007/s00530-024-01320-0
Article Google Scholar
Liu, W., Cao, J., Zhu, Y., Liu, B., Zhu, X.: Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model. Multimedia Syst. 29(1), 59–71 (2023). https://doi.org/10.1007/s00530-022-00979-7
Article Google Scholar
Wang, D., Hu, Q., Wu, K.: Dual-branch network with memory for video anomaly detection. Multimedia Syst. 29(1), 247–259 (2023). https://doi.org/10.1007/s00530-022-00991-x
Article Google Scholar
Zhang, T., Lu, H., Li, S.Z.: Learning semantic scene models by object classification and trajectory clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1947 (2009). https://doi.org/10.1109/CVPR.2009.5206809. IEEE
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008). https://doi.org/10.1109/CVPR.2008.4587510. IEEE
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2112–2119 (2012). https://doi.org/10.1109/CVPR.2012.6247917. IEEE
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177. IEEE
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005). https://doi.org/10.1109/VSPETS.2005.1570899. IEEE
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872
Lu, Y., Yu, F., Reddy, M.K.K., Wang, Y.: Few-shot scene-adaptive anomaly detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 125–141 (2020). https://doi.org/10.1007/978-3-030-58558-7_8. Springer
Lv, H., Chen, C., Cui, Z., Xu, C., Li, Y., Yang, J.: Learning normal dynamics in videos with meta prototype network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15425–15434 (2021).https://doi.org/10.1109/CVPR46437.2021.01517
Hu, Y., Huang, X., Luo, X.: Adaptive anomaly detection network for unseen scene without fine-tuning. In: Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part II 4, pp. 311–323 (2021). https://doi.org/10.1007/978-3-030-88007-1_26. Springer
Huang, X., Hu, Y., Luo, X., Han, J., Zhang, B., Cao, X.: Boosting variational inference with margin learning for few-shot scene-adaptive anomaly detection. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2813–2825 (2022). https://doi.org/10.1109/TCSVT.2022.3227716
Article Google Scholar
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338
Liu, Y., Liu, J., Zhao, M., Yang, D., Zhu, X., Song, L.: Learning appearance-motion normality for video anomaly detection. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022). https://doi.org/10.1109/ICME52920.2022.9859727. IEEE
Pang, G., Yan, C., Shen, C., Hengel, A.v.d., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020). https://doi.org/10.1109/CVPR42600.2020.01219
Liu, Y., Liu, J., Ni, W., Song, L.: Abnormal event detection with self-guiding multi-instance ranking framework. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–07 (2022). https://doi.org/10.1109/IJCNN55064.2022.9892231. IEEE
Li, C., Chen, M.: Dy-mil: dynamic multiple-instance learning framework for video anomaly detection. Multimedia Syst. 30(1), 11 (2024). https://doi.org/10.1007/s00530-023-01237-0
Article MathSciNet Google Scholar
Acsintoae, A., Florescu, A., Georgescu, M.-I., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M.: Ubnormal: New benchmark for supervised open-set video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20143–20153 (2022). https://doi.org/10.1109/CVPR52688.2022.01951
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019). https://doi.org/10.1109/ICCV.2019.00179
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28 . Springer
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015). https://doi.org/10.1126/science.aab3050
Article MathSciNet Google Scholar
Rostami, M., Kolouri, S., Eaton, E., Kim, K.: Sar image classification using few-shot cross-domain transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019). https://doi.org/10.1109/CVPRW.2019.00120
Tai, Y., Tan, Y., Xiong, S., Sun, Z., Tian, J.: Few-shot transfer learning for sar image classification without extra sar samples. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 15, 2240–2253 (2022). https://doi.org/10.1109/JSTARS.2022.3155406
Article Google Scholar
Yu, Z., Chen, L., Cheng, Z., Luo, J.: Transmatch: A transfer-learning scheme for semi-supervised few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12856–12864 (2020). https://doi.org/10.1109/CVPR42600.2020.01287
Ghani, B., Denton, T., Kahl, S., Klinck, H.: Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Sci. Rep. 13(1), 22876 (2023). https://doi.org/10.1038/s41598-023-49989-z
Article Google Scholar
Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, vol. 70, pp. 2554–2563 (2017). https://doi.org/10.5555/3305890.3305945. PMLR
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. Adv. Neural Inform. Process. Syst. 31 (2018)https://doi.org/10.5555/3327546.3327612
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021). https://doi.org/10.1109/CVPR46437.2021.01350
Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R.: Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9772–9781 (2021). https://doi.org/10.1109/ICCV48922.2021.00963
Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLO. https://github.com/ultralytics/ultralytics
Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)
Zahid, Y., Zarges, C., Tiddeman, B., Han, J.: Adversarial diffusion for few-shot scene adaptive video anomaly detection. Neurocomputing 614, 128796 (2025). https://doi.org/10.1016/j.neucom.2024.128796
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62471285 and 62401350, in part by the Shanghai Sailing Program under Grant 24YF2713000, and in part by the Foundation of Yunnan Key Laboratory of Service Computing (No. YNSC24109).

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, China
Jing Lv, Zhi Liu & Gongyang Li
Wenzhou Institute of Shanghai University, Wenzhou, China
Zhi Liu & Gongyang Li
Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, China
Gongyang Li

Authors

Jing Lv
View author publications
You can also search for this author inPubMed Google Scholar
Zhi Liu
View author publications
You can also search for this author inPubMed Google Scholar
Gongyang Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

J. L. wrote the main manuscript text and prepared all the tables as well as Figs. 3, 4, 5 and 6. G. L. prepared Figs. 1 and 2, and reviewed and revised the manuscript. Z. L. reviewed and revised the manuscript.

Corresponding authors

Correspondence to Zhi Liu or Gongyang Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Teng Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lv, J., Liu, Z. & Li, G. Few-shot fine-tuning with auxiliary tasks for video anomaly detection. Multimedia Systems 31, 127 (2025). https://doi.org/10.1007/s00530-025-01706-8

Download citation

Received: 16 August 2024
Accepted: 31 January 2025
Published: 24 February 2025
DOI: https://doi.org/10.1007/s00530-025-01706-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-shot fine-tuning with auxiliary tasks for video anomaly detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Anomaly Detection Network for Unseen Scene Without Fine-Tuning

FDC-Net: foreground dynamic capture with deep feature enhancement for video anomaly detection

Multimodal and multiscale feature fusion for weakly supervised video anomaly detection

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now