Abstract
Anomaly detection in surveillance videos aims to identify video frames that exhibit unexpected behavior. Most existing methods follow an unsupervised setup, training with normal videos and testing with videos from the same scene. However, in real-world deployments, the performance of existing models significantly degrades when faced with unseen scenes. To address this issue, we introduce the auxiliary tasks of segmentation and optical flow estimation into the fine-tuning process, proposing a novel Segmentation and Optical Flow Fine-tuning (SOFF) framework. This framework enables the existing models to adapt to new scenes with only a few samples for fine-tuning. To integrate these auxiliary tasks, we design a Segmentation and Flow Output Network (SFO-Net). SFO-Net enhances fine-tuning performance in unseen scenes by extracting rich shape and motion information through the execution of auxiliary tasks during the fine-tuning process. Additionally, SFO-Net can be flexibly cascaded with existing models that output images to form the SOFF framework. Experiments on multiple datasets demonstrate that our framework improves the performance of existing models when faced unseen scenes through few-shot scenes fine-tuning and achieves competitive performance.







Similar content being viewed by others
Data availibility
No datasets were generated or analysed during the current study.
References
Zaigham Zaheer, M., Lee, J.-H., Astrid, M., Lee, S.-I.: Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14171–14181 (2020). https://doi.org/10.1109/CVPR42600.2020.01419
Liu, Y., Liu, J., Lin, J., Zhao, M., Song, L.: Appearance-motion united auto-encoder framework for video anomaly detection. IEEE Trans. Circuits Syst. II Express Briefs 69(5), 2498–2502 (2022). https://doi.org/10.1109/TCSII.2022.3161049
Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018) https://doi.org/10.1016/j.patrec.2017.07.016
Tur, A.O., Dall’Asen, N., Beyan, C., Ricci, E.: Unsupervised video anomaly detection with diffusion models conditioned on compact motion representations. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds.) Image Analysis and Processing—ICIAP 2023, pp. 49–62. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43153-1_5
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection—a new baseline. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018).https://doi.org/10.1109/CVPR.2018.00684
Lei, S., Song, J., Wang, T., Wang, F., Yan, Z.: Attention u-net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection. Multimedia Syst. 30(3), 118 (2024). https://doi.org/10.1007/s00530-024-01320-0
Liu, W., Cao, J., Zhu, Y., Liu, B., Zhu, X.: Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model. Multimedia Syst. 29(1), 59–71 (2023). https://doi.org/10.1007/s00530-022-00979-7
Wang, D., Hu, Q., Wu, K.: Dual-branch network with memory for video anomaly detection. Multimedia Syst. 29(1), 247–259 (2023). https://doi.org/10.1007/s00530-022-00991-x
Zhang, T., Lu, H., Li, S.Z.: Learning semantic scene models by object classification and trajectory clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1947 (2009). https://doi.org/10.1109/CVPR.2009.5206809. IEEE
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008). https://doi.org/10.1109/CVPR.2008.4587510. IEEE
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2112–2119 (2012). https://doi.org/10.1109/CVPR.2012.6247917. IEEE
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177. IEEE
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005). https://doi.org/10.1109/VSPETS.2005.1570899. IEEE
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872
Lu, Y., Yu, F., Reddy, M.K.K., Wang, Y.: Few-shot scene-adaptive anomaly detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 125–141 (2020). https://doi.org/10.1007/978-3-030-58558-7_8. Springer
Lv, H., Chen, C., Cui, Z., Xu, C., Li, Y., Yang, J.: Learning normal dynamics in videos with meta prototype network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15425–15434 (2021).https://doi.org/10.1109/CVPR46437.2021.01517
Hu, Y., Huang, X., Luo, X.: Adaptive anomaly detection network for unseen scene without fine-tuning. In: Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part II 4, pp. 311–323 (2021). https://doi.org/10.1007/978-3-030-88007-1_26. Springer
Huang, X., Hu, Y., Luo, X., Han, J., Zhang, B., Cao, X.: Boosting variational inference with margin learning for few-shot scene-adaptive anomaly detection. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2813–2825 (2022). https://doi.org/10.1109/TCSVT.2022.3227716
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 341–349 (2017). https://doi.org/10.1109/ICCV.2017.45
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018). https://doi.org/10.1109/CVPR.2018.00678
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013). https://doi.org/10.1109/ICCV.2013.338
Liu, Y., Liu, J., Zhao, M., Yang, D., Zhu, X., Song, L.: Learning appearance-motion normality for video anomaly detection. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022). https://doi.org/10.1109/ICME52920.2022.9859727. IEEE
Pang, G., Yan, C., Shen, C., Hengel, A.v.d., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020). https://doi.org/10.1109/CVPR42600.2020.01219
Liu, Y., Liu, J., Ni, W., Song, L.: Abnormal event detection with self-guiding multi-instance ranking framework. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–07 (2022). https://doi.org/10.1109/IJCNN55064.2022.9892231. IEEE
Li, C., Chen, M.: Dy-mil: dynamic multiple-instance learning framework for video anomaly detection. Multimedia Syst. 30(1), 11 (2024). https://doi.org/10.1007/s00530-023-01237-0
Acsintoae, A., Florescu, A., Georgescu, M.-I., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M.: Ubnormal: New benchmark for supervised open-set video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20143–20153 (2022). https://doi.org/10.1109/CVPR52688.2022.01951
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019). https://doi.org/10.1109/ICCV.2019.00179
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28 . Springer
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015). https://doi.org/10.1126/science.aab3050
Rostami, M., Kolouri, S., Eaton, E., Kim, K.: Sar image classification using few-shot cross-domain transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019). https://doi.org/10.1109/CVPRW.2019.00120
Tai, Y., Tan, Y., Xiong, S., Sun, Z., Tian, J.: Few-shot transfer learning for sar image classification without extra sar samples. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 15, 2240–2253 (2022). https://doi.org/10.1109/JSTARS.2022.3155406
Yu, Z., Chen, L., Cheng, Z., Luo, J.: Transmatch: A transfer-learning scheme for semi-supervised few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12856–12864 (2020). https://doi.org/10.1109/CVPR42600.2020.01287
Ghani, B., Denton, T., Kahl, S., Klinck, H.: Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Sci. Rep. 13(1), 22876 (2023). https://doi.org/10.1038/s41598-023-49989-z
Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, vol. 70, pp. 2554–2563 (2017). https://doi.org/10.5555/3305890.3305945. PMLR
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. Adv. Neural Inform. Process. Syst. 31 (2018)https://doi.org/10.5555/3327546.3327612
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021). https://doi.org/10.1109/CVPR46437.2021.01350
Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R.: Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9772–9781 (2021). https://doi.org/10.1109/ICCV48922.2021.00963
Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLO. https://github.com/ultralytics/ultralytics
Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)
Zahid, Y., Zarges, C., Tiddeman, B., Han, J.: Adversarial diffusion for few-shot scene adaptive video anomaly detection. Neurocomputing 614, 128796 (2025). https://doi.org/10.1016/j.neucom.2024.128796
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62471285 and 62401350, in part by the Shanghai Sailing Program under Grant 24YF2713000, and in part by the Foundation of Yunnan Key Laboratory of Service Computing (No. YNSC24109).
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Teng Li.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lv, J., Liu, Z. & Li, G. Few-shot fine-tuning with auxiliary tasks for video anomaly detection. Multimedia Systems 31, 127 (2025). https://doi.org/10.1007/s00530-025-01706-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-025-01706-8