Skip to main content

Partial Alignment of Time Series for Action and Activity Prediction

  • Conference paper
  • First Online:
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022)

Abstract

The temporal alignment of two complete action/activity sequences has been the focus of interest in many research works. However, the problem of partially aligning an incomplete sequence to a complete one has not been sufficiently explored. Very effective alignment algorithms such as Dynamic Time Warping (DTW) and Soft Dynamic Time Warping (S-DTW) are not capable of handling incomplete sequences. To overcome this limitation the Open-End DTW (OE-DTW) and the Open-Begin-End DTW (OBE-DTW) algorithms were introduced. The OE-DTW has the capability to align sequences with common begin points but unknown ending points, while the OBE-DTW has the ability to align unsegmented sequences. We focus on two new alignment algorithms, namely the Open-End Soft DTW (OE-S-DTW) and the Open-Begin-End Soft DTW (OBE-S-DTW) which combine the partial alignment capabilities of OE-DTW and OBE-DTW with those of Soft DTW (S-DTW). Specifically, these algorithms have the segregational capabilities of DTW combined with the soft-minimum operator of the S-DTW algorithm that results in improved, differentiable alignment in the case of continuous, unsegmented actions/activities. The developed algorithms are well-suited tools for addressing the problem of action prediction. By properly matching and aligning an on-going, incomplete action/activity sequence to prototype, complete ones, we may gain insight in what comes next in the on-going action/activity. The proposed algorithms are evaluated on the MHAD, MHAD101-v/-s, MSR Daily Activities and CAD-120 datasets and are shown to outperform relevant state of the art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abu-Aisheh, Z., Raveaux, R., Ramel, J.Y., Martineau, P.: An exact graph edit distance algorithm for solving pattern recognition problems. In: ICPRAM (2015)

    Google Scholar 

  2. Alfaifi, R., Artoli, A.: Human action prediction with 3D-CNN. SN Comput. Sci. 1, 1–15 (2020)

    Article  Google Scholar 

  3. Bacharidis, K., Argyros, A.: Improving deep learning approaches for human activity recognition based on natural language processing of action labels. In: IJCNN. IEEE (2020)

    Google Scholar 

  4. Bochkovskiy, A., Wang, C., Liao, H.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)

  5. Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR (2020)

    Google Scholar 

  6. Chang, C.Y., Huang, D.A., Sui, Y., Fei-Fei, L., Niebles, J.C.: D3TW: discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation. In: CVPR (2019)

    Google Scholar 

  7. Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. arXiv:1703.01541 (2017)

  8. Dvornik, N., Hadji, I., Derpanis, K.G., Garg, A., Jepson, A.D.: Drop-DTW: aligning common signal between sequences while dropping outliers. arXiv preprint arXiv:2108.11996 (2021)

  9. Fellbaum, C.: Wordnet and wordnets (2005)

    Google Scholar 

  10. Hadji, I., Derpanis, K.G., Jepson, A.D.: Representation learning via global temporal alignment and cycle-consistency. arXiv preprint arXiv:2105.05217 (2021)

  11. Haresh, S., et al.: Learning by aligning videos in time. arXiv preprint arXiv:2103.17260 (2021)

  12. Kim, D., Jang, M., Yoon, Y., Kim, J.: Classification of dance motions with depth cameras using subsequence dynamic time warping. In: SPPR. IEEE (2015)

    Google Scholar 

  13. Koppula, H., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)

    Article  Google Scholar 

  14. Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint CS/0205028 (2002)

    Google Scholar 

  15. Manousaki, V., Papoutsakis, K., Argyros, A.: Evaluating method design options for action classification based on bags of visual words. In: VISAPP (2018)

    Google Scholar 

  16. Manousaki, V., Argyros, A.A.: Segregational soft dynamic time warping and its application to action prediction. In: VISIGRAPP (5: VISAPP), pp. 226–235 (2022)

    Google Scholar 

  17. Manousaki, V., Papoutsakis, K., Argyros, A.: Action prediction during human-object interaction based on DTW and early fusion of human and object representations. In: Vincze, M., Patten, T., Christensen, H.I., Nalpantidis, L., Liu, M. (eds.) ICVS 2021. LNCS, vol. 12899, pp. 169–179. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87156-7_14

    Chapter  Google Scholar 

  18. Manousaki, V., Papoutsakis, K., Argyros, A.: Graphing the future: activity and next active object prediction using graph-based activity representations. In: 17th International Symposium on Visual Computing (2022)

    Google Scholar 

  19. Panagiotakis, C., Papoutsakis, K., Argyros, A.: A graph-based approach for detecting common actions in motion capture data and videos. Pattern Recognit. 79, 1–11 (2018)

    Article  Google Scholar 

  20. Papoutsakis, K., Panagiotakis, C., Argyros, A.: Temporal action co-segmentation in 3D motion capture data and videos (2017)

    Google Scholar 

  21. Papoutsakis, K., Panagiotakis, C., Argyros, A.A.: Temporal action co-segmentation in 3D motion capture data and videos. In: CVPR 2017. IEEE (2017)

    Google Scholar 

  22. Papoutsakis, K., Panagiotakis, C., Argyros, A.A.: Temporal action co-segmentation in 3D motion capture data and videos. In: CVPR (2017)

    Google Scholar 

  23. Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2007)

    Article  Google Scholar 

  24. Reily, B., Han, F., Parker, L., Zhang, H.: Skeleton-based bio-inspired human activity prediction for real-time human-robot interaction. Auton. Robots 42, 1281–1298 (2018)

    Article  Google Scholar 

  25. Roditakis, K., Makris, A., Argyros, A.: Towards improved and interpretable action quality assessment with self-supervised alignment (2021)

    Google Scholar 

  26. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  27. Schez-Sobrino, S., Monekosso, D.N., Remagnino, P., Vallejo, D., Glez-Morcillo, C.: Automatic recognition of physical exercises performed by stroke survivors to improve remote rehabilitation. In: MAPR (2019)

    Google Scholar 

  28. Tormene, P., Giorgino, T., Quaglini, S., Stefanelli, M.: Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif. Intell. Med. 45(1), 11–34 (2009)

    Article  Google Scholar 

  29. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE CVPR (2012)

    Google Scholar 

  30. Wu, X., Wang, R., Hou, J., Lin, H., Luo, J.: Spatial-temporal relation reasoning for action prediction in videos. Int. J. Comput. Vision 129(5), 1484–1505 (2021)

    Article  Google Scholar 

  31. Wu, Z., Palmer, M.: Verb semantics and lexical selection. arXiv preprint CMP-LG/9406033 (1994)

    Google Scholar 

  32. Yang, C.K., Tondowidjojo, R.: Kinect V2 based real-time motion comparison with re-targeting and color code feedback. In: IEEE GCCE (2019)

    Google Scholar 

Download references

Acknowledgements

This research was co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the Act “Enhancing Human Resources Research Potential by undertaking a Doctoral Research” Sub-action 2: IKY Scholarship Programme for PhD candidates in the Greek Universities. The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the HFRI PhD Fellowship grant (Fellowship Number: 1592) and by HFRI under the “1st Call for H.F.R.I Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment”, project I.C.Humans, number 91.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victoria Manousaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manousaki, V., Argyros, A. (2023). Partial Alignment of Time Series for Action and Activity Prediction. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45725-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45724-1

  • Online ISBN: 978-3-031-45725-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics