Abstract
The temporal alignment of two complete action/activity sequences has been the focus of interest in many research works. However, the problem of partially aligning an incomplete sequence to a complete one has not been sufficiently explored. Very effective alignment algorithms such as Dynamic Time Warping (DTW) and Soft Dynamic Time Warping (S-DTW) are not capable of handling incomplete sequences. To overcome this limitation the Open-End DTW (OE-DTW) and the Open-Begin-End DTW (OBE-DTW) algorithms were introduced. The OE-DTW has the capability to align sequences with common begin points but unknown ending points, while the OBE-DTW has the ability to align unsegmented sequences. We focus on two new alignment algorithms, namely the Open-End Soft DTW (OE-S-DTW) and the Open-Begin-End Soft DTW (OBE-S-DTW) which combine the partial alignment capabilities of OE-DTW and OBE-DTW with those of Soft DTW (S-DTW). Specifically, these algorithms have the segregational capabilities of DTW combined with the soft-minimum operator of the S-DTW algorithm that results in improved, differentiable alignment in the case of continuous, unsegmented actions/activities. The developed algorithms are well-suited tools for addressing the problem of action prediction. By properly matching and aligning an on-going, incomplete action/activity sequence to prototype, complete ones, we may gain insight in what comes next in the on-going action/activity. The proposed algorithms are evaluated on the MHAD, MHAD101-v/-s, MSR Daily Activities and CAD-120 datasets and are shown to outperform relevant state of the art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abu-Aisheh, Z., Raveaux, R., Ramel, J.Y., Martineau, P.: An exact graph edit distance algorithm for solving pattern recognition problems. In: ICPRAM (2015)
Alfaifi, R., Artoli, A.: Human action prediction with 3D-CNN. SN Comput. Sci. 1, 1–15 (2020)
Bacharidis, K., Argyros, A.: Improving deep learning approaches for human activity recognition based on natural language processing of action labels. In: IJCNN. IEEE (2020)
Bochkovskiy, A., Wang, C., Liao, H.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR (2020)
Chang, C.Y., Huang, D.A., Sui, Y., Fei-Fei, L., Niebles, J.C.: D3TW: discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation. In: CVPR (2019)
Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. arXiv:1703.01541 (2017)
Dvornik, N., Hadji, I., Derpanis, K.G., Garg, A., Jepson, A.D.: Drop-DTW: aligning common signal between sequences while dropping outliers. arXiv preprint arXiv:2108.11996 (2021)
Fellbaum, C.: Wordnet and wordnets (2005)
Hadji, I., Derpanis, K.G., Jepson, A.D.: Representation learning via global temporal alignment and cycle-consistency. arXiv preprint arXiv:2105.05217 (2021)
Haresh, S., et al.: Learning by aligning videos in time. arXiv preprint arXiv:2103.17260 (2021)
Kim, D., Jang, M., Yoon, Y., Kim, J.: Classification of dance motions with depth cameras using subsequence dynamic time warping. In: SPPR. IEEE (2015)
Koppula, H., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)
Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint CS/0205028 (2002)
Manousaki, V., Papoutsakis, K., Argyros, A.: Evaluating method design options for action classification based on bags of visual words. In: VISAPP (2018)
Manousaki, V., Argyros, A.A.: Segregational soft dynamic time warping and its application to action prediction. In: VISIGRAPP (5: VISAPP), pp. 226–235 (2022)
Manousaki, V., Papoutsakis, K., Argyros, A.: Action prediction during human-object interaction based on DTW and early fusion of human and object representations. In: Vincze, M., Patten, T., Christensen, H.I., Nalpantidis, L., Liu, M. (eds.) ICVS 2021. LNCS, vol. 12899, pp. 169–179. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87156-7_14
Manousaki, V., Papoutsakis, K., Argyros, A.: Graphing the future: activity and next active object prediction using graph-based activity representations. In: 17th International Symposium on Visual Computing (2022)
Panagiotakis, C., Papoutsakis, K., Argyros, A.: A graph-based approach for detecting common actions in motion capture data and videos. Pattern Recognit. 79, 1–11 (2018)
Papoutsakis, K., Panagiotakis, C., Argyros, A.: Temporal action co-segmentation in 3D motion capture data and videos (2017)
Papoutsakis, K., Panagiotakis, C., Argyros, A.A.: Temporal action co-segmentation in 3D motion capture data and videos. In: CVPR 2017. IEEE (2017)
Papoutsakis, K., Panagiotakis, C., Argyros, A.A.: Temporal action co-segmentation in 3D motion capture data and videos. In: CVPR (2017)
Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2007)
Reily, B., Han, F., Parker, L., Zhang, H.: Skeleton-based bio-inspired human activity prediction for real-time human-robot interaction. Auton. Robots 42, 1281–1298 (2018)
Roditakis, K., Makris, A., Argyros, A.: Towards improved and interpretable action quality assessment with self-supervised alignment (2021)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Schez-Sobrino, S., Monekosso, D.N., Remagnino, P., Vallejo, D., Glez-Morcillo, C.: Automatic recognition of physical exercises performed by stroke survivors to improve remote rehabilitation. In: MAPR (2019)
Tormene, P., Giorgino, T., Quaglini, S., Stefanelli, M.: Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif. Intell. Med. 45(1), 11–34 (2009)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE CVPR (2012)
Wu, X., Wang, R., Hou, J., Lin, H., Luo, J.: Spatial-temporal relation reasoning for action prediction in videos. Int. J. Comput. Vision 129(5), 1484–1505 (2021)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. arXiv preprint CMP-LG/9406033 (1994)
Yang, C.K., Tondowidjojo, R.: Kinect V2 based real-time motion comparison with re-targeting and color code feedback. In: IEEE GCCE (2019)
Acknowledgements
This research was co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the Act “Enhancing Human Resources Research Potential by undertaking a Doctoral Research” Sub-action 2: IKY Scholarship Programme for PhD candidates in the Greek Universities. The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the HFRI PhD Fellowship grant (Fellowship Number: 1592) and by HFRI under the “1st Call for H.F.R.I Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment”, project I.C.Humans, number 91.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Manousaki, V., Argyros, A. (2023). Partial Alignment of Time Series for Action and Activity Prediction. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-45725-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45724-1
Online ISBN: 978-3-031-45725-8
eBook Packages: Computer ScienceComputer Science (R0)