Attention-guided Boundary Refinement on Anchor-free Temporal Action Detection

Shi, Henglin; Chen, Haoyu; Zhao, Guoying

doi:10.1007/978-3-031-31435-3_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13885))

Included in the following conference series:

Scandinavian Conference on Image Analysis

520 Accesses

Abstract

Modelling temporal dependencies is important for accurate action detection. In this work, we develop a temporal attention unit to mine the global dependencies among features from different temporal locations. Additionally, based on the developed temporal attention unit, we propose an attention-guided boundary refinement module for revising action prediction results. Besides, we integrate the proposed module into a contemporary anchor-free detector for performing temporal action detection. To evaluate the proposed method, experiments are carried out on two large-scale temporal action detection datasets, namely THUMOS14 and ActivityNet1.3 datasets. Experimental results show that the action detection performance is significantly boosted by the proposed temporal attention module which outperforms several state-of-the-art methods.

This work was supported by the Academy of Finland for Academy Professor project EmotionAI (grants 336116, 345122), project MiGA (grant 316765), the University of Oulu & The Academy of Finland Profi 7 (grant 352788), and Ministry of Education and Culture of Finland for AI forum project. As well, the authors wish to acknowledge CSC - IT Center for Science, Finland, for computational resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, Y., Wang, Y., Tong, Y., Yang, Y., Liu, Q., Liu, J.: Boundary content graph neural network for temporal action proposal generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 121–137. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_8
Chapter Google Scholar
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS-improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017)
Google Scholar
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., Sukthankar, R.: Rethinking the faster R-CNN architecture for temporal action localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1130–1139 (2018)
Google Scholar
Gao, J., Yang, Z., Chen, K., Sun, C., Nevatia, R.: Turn tap: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3628–3636 (2017)
Google Scholar
Jiang, Y.G., et al.: Thumos challenge: action recognition with a large number of classes (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, C., et al.: Learning salient boundary feature for anchor-free temporal action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3320–3329 (2021)
Google Scholar
Lin, T., Liu, X., Li, X., Ding, E., Wen, S.: BMN: boundary-matching network for temporal action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3889–3898 (2019)
Google Scholar
Lin, T., Zhao, X., Shou, Z.: Single shot temporal action detection. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 988–996 (2017)
Google Scholar
Lin, T., Zhao, X., Su, H., Wang, C., Yang, M.: BSN: boundary sensitive network for temporal action proposal generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 3–21. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_1
Chapter Google Scholar
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 344–353 (2019)
Google Scholar
Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J.: BorderDet: border feature for dense object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 549–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_32
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Xu, H., Das, A., Saenko, K.: R-c3d: Region Convolutional 3D Network for Temporal Activity detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5783–5792 (2017)
Google Scholar
Yang, L., Peng, H., Zhang, D., Fu, J., Han, J.: Revisiting anchor mechanisms for temporal action localization. IEEE Trans. Image Process. 29, 8535–8548 (2020)
Article MATH Google Scholar
Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2914–2923 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
Henglin Shi, Haoyu Chen & Guoying Zhao

Authors

Henglin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoying Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoying Zhao .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Rikke Gade
Linköping University, Linköping, Sweden
Michael Felsberg
Tampere University, Tampere, Finland
Joni-Kristian Kämäräinen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, H., Chen, H., Zhao, G. (2023). Attention-guided Boundary Refinement on Anchor-free Temporal Action Detection. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13885. Springer, Cham. https://doi.org/10.1007/978-3-031-31435-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-31435-3_9
Published: 27 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31434-6
Online ISBN: 978-3-031-31435-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Attention-guided Boundary Refinement on Anchor-free Temporal Action Detection