Skip to main content

Advertisement

Log in

Human action segmentation and classification based on the Isomap algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual analysis of human behavior has attracted a great deal of attention in the field of computer vision because of the wide variety of potential applications. Human behavior can be segmented into atomic actions, each of which indicates a single, basic movement. To reduce human intervention in the analysis of human behavior, unsupervised learning may be more suitable than supervised learning. However, the complex nature of human behavior analysis makes unsupervised learning a challenging task. In this paper, we propose a framework for the unsupervised analysis of human behavior based on manifold learning. First, a pairwise human posture distance matrix is derived from a training action sequence. Then, the isometric feature mapping (Isomap) algorithm is applied to construct a low-dimensional structure from the distance matrix. Consequently, the training action sequence is mapped into a manifold trajectory in the Isomap space. To identify the break points between the trajectories of any two successive atomic actions, we represent the manifold trajectory in the Isomap space as a time series of low-dimensional points. A temporal segmentation technique is then applied to segment the time series into sub series, each of which corresponds to an atomic action. Next, the dynamic time warping (DTW) approach is used to cluster atomic action sequences. Finally, we use the clustering results to learn and classify atomic actions according to the nearest neighbor rule. If the distance between the input sequence and the nearest mean sequence is greater than a given threshold, it is regarded as an unknown atomic action. Experiments conducted on real data demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440

    Article  Google Scholar 

  2. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  3. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–522

    Article  Google Scholar 

  4. Blackburn J, Ribeiro E (2007) Human motion recognition using Isomap and dynamic time warping. Proceedings of the Second Workshop on Human Motion, pp285–298

  5. Blank M, Gorelick L, Shechtman E, Irani M, Barsi R (2005) Actions as space-time shapes. Proc IEEE Int Conf Comput Vis 2:1395–1402

    Google Scholar 

  6. Cock KD, Moor BD (2000) Subspace angles and distances between ARMA models. Proceedings of the Fourteenth International Symposium of Mathematical Theory of Networks and Systems

  7. Collins RT, Lipton AJ, Kanade T (2000) Introduction to the special section on video surveillance. IEEE Trans Pattern Anal Mach Intell 22(8):745–746

    Article  Google Scholar 

  8. Cox TF, Cox MAA (2011) Multidimensional scaling. Chapman and Hall

  9. Cutler R, Davis L (2000) Robust real-time periodic motion detection, analysis, and applications. IEEE Trans Pattern Anal Mach Intell 22(8):781–796

    Article  Google Scholar 

  10. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp269–274

  11. Elgammal A, Lee CS (2004) Inferring 3D body pose from silhouettes using activity manifold learning. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:681–688

    Google Scholar 

  12. Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98

    Article  MATH  Google Scholar 

  13. Hsieh JW, Hsu YT, Mark Liao HY, Chen CC (2008) Video-based human movement analysis and its application to surveillance systems. IEEE Trans Multimed 10(3):372–384

    Article  Google Scholar 

  14. Jain AK, Murthy MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323

    Article  Google Scholar 

  15. Law MHC, Jain AK (2006) Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans Pattern Anal Mach Intell 28(3):377–391

    Article  Google Scholar 

  16. Liang YM, Shih SW, Shih ACC, Liao HYM, Lin CC (2009) Learning atomic human action using variable-length Markov models. IEEE Trans Syst Man Cybern B 39(1):268–280

    Article  Google Scholar 

  17. Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809

    Article  Google Scholar 

  18. Miyamori H, Iisaku S (2000) Video annotation for content-based retrieval using human behavior analysis and domain knowledge. Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 320–325

  19. Morariu VI, Camps OI (2006) Modeling correspondences for multi-camera tracking using nonlinear manifold learning and target dynamics. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 1:545–552

    Google Scholar 

  20. Nevill-Manning CG, Witten IH (2000) On-line and off-line heuristics for inferring hierarchies of repetitions in sequence. Proc IEEE 88(11):1745–1755

    Article  Google Scholar 

  21. Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  22. Rabiner L, Juan BH (1993) Fundamentals of speech recognition. Prentice-Hall Signal Processing Series

  23. Rane N, Birchfield S (2007) Isomap tracking with particle filtering. Proc IEEE Int Conf Image Process 2:513–516

    Google Scholar 

  24. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  25. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sign Process 26(1):43–49

    Article  MATH  Google Scholar 

  26. Sharma R, Pavlović VI, Huang TS (1998) Toward multimodal human-computer interface. Proc IEEE 86(5):853–869

    Article  Google Scholar 

  27. Su CW, Mark Liao HY, Tyan HR, Lin CW, Chen DY, Fan KC (2007) Motion flow-based video retrieval. IEEE Trans Multimed 9(6):1193–1201

    Article  Google Scholar 

  28. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  29. TREC Video Retrieval Evaluation, http://www-nlpir.nist.gov/projects/trecvid/

  30. Turaga PK, Veeraraghavan A, Chellappa R (2007) From videos to verbs: mining videos for activities using a cascade of dynamical systems. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  31. Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recog 36(3):585–601

    Article  Google Scholar 

  32. TS Wang, HY Shum, YQ Xu, NN Zheng (2001) Unsupervised analysis of human gestures. Proceedings of the IEEE Pacific-Rim Conference on Multimedia, pp174–181

  33. Wang L, Suter D (2008) Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput Vis Image Understand 110(2):153–172

    Article  Google Scholar 

  34. Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785

    Article  Google Scholar 

  35. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:819–826

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank the National Science Council, Taiwan under Contract NSC 99-2632-H-156-001-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Ming Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, YM., Shih, SW. & Shih, A.CC. Human action segmentation and classification based on the Isomap algorithm. Multimed Tools Appl 62, 561–580 (2013). https://doi.org/10.1007/s11042-011-0858-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0858-2

Keywords

Navigation