Skip to main content

Advertisement

Log in

Few-shot fine-tuning with auxiliary tasks for video anomaly detection

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Anomaly detection in surveillance videos aims to identify video frames that exhibit unexpected behavior. Most existing methods follow an unsupervised setup, training with normal videos and testing with videos from the same scene. However, in real-world deployments, the performance of existing models significantly degrades when faced with unseen scenes. To address this issue, we introduce the auxiliary tasks of segmentation and optical flow estimation into the fine-tuning process, proposing a novel Segmentation and Optical Flow Fine-tuning (SOFF) framework. This framework enables the existing models to adapt to new scenes with only a few samples for fine-tuning. To integrate these auxiliary tasks, we design a Segmentation and Flow Output Network (SFO-Net). SFO-Net enhances fine-tuning performance in unseen scenes by extracting rich shape and motion information through the execution of auxiliary tasks during the fine-tuning process. Additionally, SFO-Net can be flexibly cascaded with existing models that output images to form the SOFF framework. Experiments on multiple datasets demonstrate that our framework improves the performance of existing models when faced unseen scenes through few-shot scenes fine-tuning and achieves competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availibility

No datasets were generated or analysed during the current study.

References

  1. Zaigham Zaheer, M., Lee, J.-H., Astrid, M., Lee, S.-I.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14171–14181 (2020). https://doi.org/10.1109/CVPR42600.2020.01419

  2. Liu, Y., Liu, J., Lin, J., Zhao, M., Song, L.: Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circuits Syst. II Express Briefs 69(5), 2498–2502 (2022). https://doi.org/10.1109/TCSII.2022.3161049

    Article  Google Scholar 

  3. Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018) https://doi.org/10.1016/j.patrec.2017.07.016

  4. Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Unsupervised video anomaly detection with diffusion models conditioned on compact motion representations. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds.) Image Analysis and Processing—ICIAP 2023, pp. 49–62. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43153-1_5

  5. Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection—a new baseline. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018).https://doi.org/10.1109/CVPR.2018.00684

  6. Lei, S., Song, J., Wang, T., Wang, F., Yan, Z.: Attention u-net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection. Multimedia Syst. 30(3), 118 (2024). https://doi.org/10.1007/s00530-024-01320-0

    Article  Google Scholar 

  7. Liu, W., Cao, J., Zhu, Y., Liu, B., Zhu, X.: Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model. Multimedia Syst. 29(1), 59–71 (2023). https://doi.org/10.1007/s00530-022-00979-7

    Article  Google Scholar 

  8. Wang, D., Hu, Q., Wu, K.: Dual-branch network with memory for video anomaly detection. Multimedia Syst. 29(1), 247–259 (2023). https://doi.org/10.1007/s00530-022-00991-x

    Article  Google Scholar 

  9. Zhang, T., Lu, H., Li, S.Z.: Learning semantic scene models by object classification and trajectory clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1947 (2009). https://doi.org/10.1109/CVPR.2009.5206809. IEEE

  10. Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008). https://doi.org/10.1109/CVPR.2008.4587510. IEEE

  11. Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2112–2119 (2012). https://doi.org/10.1109/CVPR.2012.6247917. IEEE

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177. IEEE

  13. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  14. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005). https://doi.org/10.1109/VSPETS.2005.1570899. IEEE

  15. Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872

  16. Lu, Y., Yu, F., Reddy, M.K.K., Wang, Y.: Few-shot scene-adaptive anomaly detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 125–141 (2020). https://doi.org/10.1007/978-3-030-58558-7_8. Springer

  17. Lv, H., Chen, C., Cui, Z., Xu, C., Li, Y., Yang, J.: Learning normal dynamics in videos with meta prototype network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15425–15434 (2021).https://doi.org/10.1109/CVPR46437.2021.01517

  18. Hu, Y., Huang, X., Luo, X.: Adaptive anomaly detection network for unseen scene without fine-tuning. In: Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part II 4, pp. 311–323 (2021). https://doi.org/10.1007/978-3-030-88007-1_26. Springer

  19. Huang, X., Hu, Y., Luo, X., Han, J., Zhang, B., Cao, X.: Boosting variational inference with margin learning for few-shot scene-adaptive anomaly detection. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2813–2825 (2022). https://doi.org/10.1109/TCSVT.2022.3227716

    Article  Google Scholar 

  20. Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45

  21. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678

  22. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338

  23. Liu, Y., Liu, J., Zhao, M., Yang, D., Zhu, X., Song, L.: Learning appearance-motion normality for video anomaly detection. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022). https://doi.org/10.1109/ICME52920.2022.9859727. IEEE

  24. Pang, G., Yan, C., Shen, C., Hengel, A.v.d., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020). https://doi.org/10.1109/CVPR42600.2020.01219

  25. Liu, Y., Liu, J., Ni, W., Song, L.: Abnormal event detection with self-guiding multi-instance ranking framework. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–07 (2022). https://doi.org/10.1109/IJCNN55064.2022.9892231. IEEE

  26. Li, C., Chen, M.: Dy-mil: dynamic multiple-instance learning framework for video anomaly detection. Multimedia Syst. 30(1), 11 (2024). https://doi.org/10.1007/s00530-023-01237-0

    Article  MathSciNet  Google Scholar 

  27. Acsintoae, A., Florescu, A., Georgescu, M.-I., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M.: Ubnormal: New benchmark for supervised open-set video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20143–20153 (2022). https://doi.org/10.1109/CVPR52688.2022.01951

  28. Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019). https://doi.org/10.1109/ICCV.2019.00179

  29. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28 . Springer

  30. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015). https://doi.org/10.1126/science.aab3050

    Article  MathSciNet  Google Scholar 

  31. Rostami, M., Kolouri, S., Eaton, E., Kim, K.: Sar image classification using few-shot cross-domain transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019). https://doi.org/10.1109/CVPRW.2019.00120

  32. Tai, Y., Tan, Y., Xiong, S., Sun, Z., Tian, J.: Few-shot transfer learning for sar image classification without extra sar samples. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 15, 2240–2253 (2022). https://doi.org/10.1109/JSTARS.2022.3155406

    Article  Google Scholar 

  33. Yu, Z., Chen, L., Cheng, Z., Luo, J.: Transmatch: A transfer-learning scheme for semi-supervised few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12856–12864 (2020). https://doi.org/10.1109/CVPR42600.2020.01287

  34. Ghani, B., Denton, T., Kahl, S., Klinck, H.: Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Sci. Rep. 13(1), 22876 (2023). https://doi.org/10.1038/s41598-023-49989-z

    Article  Google Scholar 

  35. Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, vol. 70, pp. 2554–2563 (2017). https://doi.org/10.5555/3305890.3305945. PMLR

  36. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  37. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1

  38. Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. Adv. Neural Inform. Process. Syst. 31 (2018)https://doi.org/10.5555/3327546.3327612

  39. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021). https://doi.org/10.1109/CVPR46437.2021.01350

  40. Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R.: Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9772–9781 (2021). https://doi.org/10.1109/ICCV48922.2021.00963

  41. Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLO. https://github.com/ultralytics/ultralytics

  42. Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)

  43. Zahid, Y., Zarges, C., Tiddeman, B., Han, J.: Adversarial diffusion for few-shot scene adaptive video anomaly detection. Neurocomputing 614, 128796 (2025). https://doi.org/10.1016/j.neucom.2024.128796

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62471285 and 62401350, in part by the Shanghai Sailing Program under Grant 24YF2713000, and in part by the Foundation of Yunnan Key Laboratory of Service Computing (No. YNSC24109).

Author information

Authors and Affiliations

Authors

Contributions

J. L. wrote the main manuscript text and prepared all the tables as well as Figs. 3, 4, 5 and 6. G. L. prepared Figs. 1 and 2, and reviewed and revised the manuscript. Z. L. reviewed and revised the manuscript.

Corresponding authors

Correspondence to Zhi Liu or Gongyang Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Teng Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lv, J., Liu, Z. & Li, G. Few-shot fine-tuning with auxiliary tasks for video anomaly detection. Multimedia Systems 31, 127 (2025). https://doi.org/10.1007/s00530-025-01706-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-025-01706-8

Keywords