Skip to main content
Log in

Stacked Marginal Time Warping for Temporal Alignment

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Time warping is the popular technique of temporally aligning two sequences and has successfully applied in temporal alignment tasks such as activity recognition. However, existing time warping methods suffer from limited representation ability because aligning process is performed on either raw sequences or the projected lower-dimensional features. In this paper, we propose a stacked time warping framework (STW) to learn layer-wise representation for temporal alignment in a stacked structure. By using this structure, STW gives higher flexibility than existing methods meanwhile unifies them into a deep architecture. Based on the proposed STW framework, we explore a stacked marginal time warping (SMTW) method by using marginal stacked denoising autoencoder (mSDA) as the regularization term which enables SMTW to marginalize out noises and learn layer-wise non-linear representations with the effective closed-form solution. Benefitting from the incorporation of mSDA, SMTW achieves better alignment performance and keeps comparable time efficiency with regular time warping methods. Experiments on both synthetic data and practical human activity recognition datasets demonstrate that SMTW is superior to the state-of-the-art time warping methods in quantity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://mocap.cs.cmu.edu.

References

  1. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech, Signal Process 26(1):43–49

    Article  MATH  Google Scholar 

  2. Zhou F, Torre F (2009) Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp 2286–2294

  3. King B, Smaragdis P, Mysore GJ (2012) Noise-robust dynamic time warping using plca features. In: IEEE International Conference on Acoustics. Speech and Signal Processing, pp 1973–1976

  4. Listgarten J, Neal RM, Roweis ST, Emili A (2004) Multiple alignment of continuous time series. In: Advances in Neural Information Processing Systems, pp 817–824

  5. Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185

    Article  Google Scholar 

  6. Li X, Liu T, Deng J, Tao D (2016) Video face editing using temporal-spatial-smooth warping. ACM Trans Intell Syst Technol 7(3):1–28

    Google Scholar 

  7. Shariat S, Pavlovic V (2011) Isotonic cca for sequence alignment and activity recognition. In: International Conference on Computer Vision, pp 2572–2578

  8. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  9. Liu W, Zha ZJ, Wang Y, Lu K, Tao D (2016) \(p\)-laplacian regularized sparse coding for human activity recognition. IEEE Trans Industrial Electron 63(8):5120–5129

    Google Scholar 

  10. Zhou F, De la Torre F (2012) Generalized time warping for multi-modal alignment of human motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1282–1289

  11. Anderson TW (1962) An introduction to multivariate statistical analysis. Tech. rep, Wiley, New York

  12. Gong D, Medioni G (2011) Dynamic manifold warping for view invariant action recognition. In: International Conference on Computer Vision, pp 571–578

  13. Vu HT, Carey C, Mahadevan S (2012) Manifold warping: Manifold alignment over time. In: The 26th AAAI Conference on Artificial Intelligence, pp 1155–1161

  14. Panagakis Y, Nicolaou MA, Zafeiriou S, Pantic M (2013) Robust canonical time warping for the alignment of grossly corrupted sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 540–547

  15. Zhou F, Torre FDL (2016) Generalized canonical time warping. IEEE Trans Pattern Anal Mach Intell 38(2):279–294

    Article  Google Scholar 

  16. Su B, Hua G (2017) Order-preserving wasserstein distance for sequence matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2906–2914

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition pp 770–778

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105

  19. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation

  20. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034

  21. Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448

  22. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99

  23. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition

  24. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp 809–817

  25. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning, pp 1096–1103

  26. Chen M, Weinberger KQ, Xu ZE, Sha F (2015) Marginalizing stacked linear denoising autoencoders. J Mach Learn Res 16:3849–3875

    MathSciNet  Google Scholar 

  27. Wei P, Ke Y, Goh CK (2016) Deep nonlinear feature coding for unsupervised domain adaptation. In: International Joint Conference on Artificial Intelligence, pp 2189–2195

  28. Ding Z, Shao M, Fu Y (2015) Deep low-rank coding for transfer learning. In: International Joint Conference on Artificial Intelligence, pp 3453–3459

  29. Zhou JT, Pan SJ, Tsang IW, Yan Y (2014) Hybrid heterogeneous transfer learning through deep learning. In: The 28th AAAI Conference on Artificial Intelligence, pp 2213–2220

  30. Jiang W, Gao H, Chung Fl, Huang H (2016) The \(l_{2,1}\)-norm stacked robust autoencoders for domain adaptation. In: The Thirtieth AAAI Conference on Artificial Intelligence, pp 1723–1729

  31. Li S, Kawale J, Fu Y (2015) Deep collaborative filtering via marginalized denoising auto-encoder. In: ACM International on Conference on Information and Knowledge Management, pp 811–820

  32. Chen Z, Chen M, Weinberger KQ, Zhang W (2015) Marginalized denoising for link prediction and multi-label learning. In: The 29th AAAI Conference on Artificial Intelligence, pp 1707–1713

  33. Majumdar A (2015) Real-time dynamic mri reconstruction using stacked denoising autoencoder. arXiv:1503.06383

  34. Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: The 29th AAAI Conference on Artificial Intelligence, pp 3798–3804

  35. Xu ZE, Chen M, Weinberger KQ, Sha F (2012) From sbow to dcot marginalized encoders for text representation. In: ACM International Conference on Information and Knowledge Management, pp 1879–1884

  36. Nie L, Wang Y, Zhang X, Huang X, Luo Z (2016) Enhancing temporal alignment with autoencoder. In: International Joint Conference on Neural Network, pp 4873–4879

  37. Liu W, Yang X, Tao D, Cheng J, Tang Y (2017) Multiview dimension reduction via hessian multiset canonical correlations. Inf Fus 41:119–128

    Article  Google Scholar 

  38. Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352

    Article  Google Scholar 

  39. Guan N, Zhang X, Luo Z, Lan L (2012) Sparse representation based discriminative canonical correlation analysis for face recognition. In: International Conference on Machine Learning and Applications, pp 51–56

  40. Van Der Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: International Conference on Machine Learning, pp 410–418

  41. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  42. Ouyang W, Zeng X, Wang X (2016) Partial occlusion handling in pedestrian detection with a deep model. IEEE Trans Circuits Syst Video Technol 26(11):2123–2137

    Article  Google Scholar 

  43. Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2006) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153–160

    Google Scholar 

  44. Liu T, Gong M, Tao D (2017) Large-cone nonnegative matrix factorization. IEEE Trans Neural Netw Learn Syst 28(9):2129–2142

    MathSciNet  Google Scholar 

  45. Liu T, Tao D, Xu D (2016) Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural Comput 28(10):2213–2249

    Article  MathSciNet  Google Scholar 

  46. Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International Conference on Machine Learning, pp 689–696

  47. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11

    Article  MathSciNet  MATH  Google Scholar 

  48. Nikitidis S, Zafeiriou S, Pantic M (2014) Merging svms with linear discriminant analysis: a combined model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1067–1074

  49. Nene SA, Nayar SK, Murase H, et al (1996) Columbia object image library (coil-20). Tech. rep., Technical report CUCS-005-96

  50. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. Int Conf Pattern Recognit 3:32–36

    Google Scholar 

  51. Jolliffe I (2002) Principal component analysis. Wiley, New York

    MATH  Google Scholar 

  52. Alpaydm E (1999) Combined 5 \(\times \) 2 cv f test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China [2016YFB0200401] and the National Natural Science Foundation of China [U1435222].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Nie, L., Lan, L. et al. Stacked Marginal Time Warping for Temporal Alignment. Neural Process Lett 49, 711–735 (2019). https://doi.org/10.1007/s11063-018-9834-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-018-9834-4

Keywords

Navigation