Skip to main content

Towards Efficient Human Action Retrieval Based on Triplet-Loss Metric Learning

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2022)

Abstract

Recent pose-estimation methods enable digitization of human motion by extracting 3D skeleton sequences from ordinary video recordings. Such spatio-temporal skeleton representation offers attractive possibilities for a wide range of applications but, at the same time, requires effective and efficient content-based access to make the extracted data reusable. In this paper, we focus on content-based retrieval of pre-segmented skeleton sequences of human actions to identify the most similar ones to a query action. We mainly deal with the extraction of content-preserving action features, which are learned using the triplet-loss approach in an unsupervised way. Such features are (1) effective as they achieve a similar retrieval quality as the features learned in a supervised way, and (2) of a fixed size which enables the application of indexing structures for efficient retrieval.

Supported by ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No. CZ.02.1.01/0.0/0.0/16_019/0000822).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aristidou, A., Cohen-Or, D., Hodgins, J.K., Chrysanthou, Y., Shamir, A.: Deep motifs and motion signatures. ACM Trans. Graph. 37(6), 187:1–187:13 (2018). https://doi.org/10.1145/3272127.3275038

  2. Barnachon, M., Bouakaz, S., Boufama, B., Guillou, E.: Ongoing human action recognition with motion capture. Pattern Recogn. 47(1), 238–247 (2014)

    Article  Google Scholar 

  3. Budikova, P., Sedmidubsky, J., Zezula, P.: Efficient indexing of 3d human motions. In: International Conference on Multimedia Retrieval (ICMR), pp. 10–18. ACM (2021)

    Google Scholar 

  4. Chang, S., et al.: Towards accurate human pose estimation in videos of crowded scenes. In: 28th ACM International Conference on Multimedia (MM), pp. 4630–4634. ACM (2020). https://doi.org/10.1145/3394171.3416299

  5. Cheng, Y.B., Chen, X., Chen, J., Wei, P., Zhang, D., Lin, L.: Hierarchical transformer: Unsupervised representation learning for skeleton-based human action recognition. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). https://doi.org/10.1109/ICME51207.2021.9428459

  6. Häring, S., Memmesheimer, R., Paulus, D.: Action segmentation on representations of skeleton sequences using transformer networks. In: IEEE International Conference on Image Processing (ICIP), pp. 3053–3057 (2021). https://doi.org/10.1109/ICIP42928.2021.9506687

  7. Khaire, P., Kumar, P.: Deep learning and rgb-d based human action, human–human and human–object interaction recognition: a survey. J. Visual Commun. Image Repr.86, 1–25 (2022). https://doi.org/10.1016/j.jvcir.2022.103531, https://www.sciencedirect.com/science/article/pii/S1047320322000724

  8. Laraba, S., Brahimi, M., Tilmanne, J., Dutoit, T.: 3d skeleton-based action recognition by representing motion capture sequences as 2d-rgb images. Comput. Anim. Virtual Worlds 28(3–4), e1782 (2017)

    Google Scholar 

  9. Lv, N., Wang, Y., Feng, Z., Peng, J.: Deep hashing for motion capture data retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2215–2219. IEEE (2021). https://doi.org/10.1109/ICASSP39728.2021.9413505

  10. Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap Database HDM05. Tech. Rep. CG-2007-2, Universität Bonn (2007)

    Google Scholar 

  11. Novak, D., Zezula, P.: Rank aggregation of candidate sets for efficient similarity search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014. LNCS, vol. 8645, pp. 42–58. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10085-2_4

    Chapter  Google Scholar 

  12. Peng, W., Hong, X., Zhao, G.: Tripool: graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021). https://doi.org/10.1016/j.patcog.2021.107921

  13. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR.2015.7298682

  14. Sedmidubsky, J., Budikova, P., Dohnal, V., Zezula, P.: Motion words: a text-like representation of 3D skeleton sequences. In: ECIR 2020. LNCS, vol. 12035, pp. 527–541. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_35

    Chapter  Google Scholar 

  15. Sedmidubsky, J., Elias, P., Budikova, P., Zezula, P.: Content-based management of human motion data: Survey and challenges. IEEE Access 9, 64241–64255 (2021). https://doi.org/10.1109/ACCESS.2021.3075766, https://doi.org/10.1109/ACCESS.2021.3075766

  16. Sedmidubsky, J., Elias, P., Zezula, P.: Similarity searching in long sequences of motion capture data. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 271–285. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46759-7_21

    Chapter  Google Scholar 

  17. Sedmidubsky, J., Elias, P., Zezula, P.: Searching for variable-speed motions in long sequences of motion capture data. Inf. Syst. 80, 148–158 (2019). https://doi.org/10.1016/j.is.2018.04.002

  18. Sedmidubsky, J., Zezula, P.: Augmenting Spatio-Temporal Human Motion Data for Effective 3D Action Recognition. In: 21st IEEE International Symposium on Multimedia (ISM), pp. 204–207. IEEE Computer Society (2019). https://doi.org/10.1109/ISM.2019.00044

  19. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: Spatio-temporal attention-based LSTM networks for 3d action recognition and detection. IEEE Trans. Image Process. 27(7), 3459–3471 (2018). https://doi.org/10.1109/TIP.2018.2818328, https://doi.org/10.1109/TIP.2018.2818328

  20. Tanfous, A.B., Zerroug, A., Linsley, D., Serre, T.: How and what to learn: taxonomizing self-supervised learning for 3d action recognition. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2888–2897 (2022). https://doi.org/10.1109/WACV51458.2022.00294

  21. Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (BMVC), pp. 1–13. BMVA Press (2018). http://bmvc2018.org/contents/papers/1003.pdf

  22. Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 119, 3–11 (2019)

    Article  Google Scholar 

  23. Wang, J., Jin, S., Liu, W., Liu, W., Qian, C., Luo, P.: When human pose estimation meets robustness: Adversarial algorithms and benchmarks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11855–11864 (2021)

    Google Scholar 

  24. Wang, W., Chang, F., Liu, C., Li, G., Wang, B.: Ga-net: a guidance aware network for skeleton-based early activity recognition. IEEE Trans. Multimedia, 1–13 (2021). https://doi.org/10.1109/TMM.2021.3137745

  25. Wen, Y.H., Gao, L., Fu, H., Zhang, F.L., Xia, S., Liu, Y.J.: Motif-gcns with local and non-local temporal blocks for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1–15 (2022). https://doi.org/10.1109/TPAMI.2022.3170511

  26. Zhang, T., et al.: Deep manifold-to-manifold transforming network for skeleton-based action recognition. IEEE Trans. Multi. 22(11), 2926–2937 (2020). https://doi.org/10.1109/TMM.2020.2966878

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iris Kico .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kico, I., Sedmidubsky, J., Zezula, P. (2022). Towards Efficient Human Action Retrieval Based on Triplet-Loss Metric Learning. In: Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2022. Lecture Notes in Computer Science, vol 13426. Springer, Cham. https://doi.org/10.1007/978-3-031-12423-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12423-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12422-8

  • Online ISBN: 978-3-031-12423-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics