Abstract
Visual analysis of human behavior has attracted a great deal of attention in the field of computer vision because of the wide variety of potential applications. Human behavior can be segmented into atomic actions, each of which indicates a single, basic movement. To reduce human intervention in the analysis of human behavior, unsupervised learning may be more suitable than supervised learning. However, the complex nature of human behavior analysis makes unsupervised learning a challenging task. In this paper, we propose a framework for the unsupervised analysis of human behavior based on manifold learning. First, a pairwise human posture distance matrix is derived from a training action sequence. Then, the isometric feature mapping (Isomap) algorithm is applied to construct a low-dimensional structure from the distance matrix. Consequently, the training action sequence is mapped into a manifold trajectory in the Isomap space. To identify the break points between the trajectories of any two successive atomic actions, we represent the manifold trajectory in the Isomap space as a time series of low-dimensional points. A temporal segmentation technique is then applied to segment the time series into sub series, each of which corresponds to an atomic action. Next, the dynamic time warping (DTW) approach is used to cluster atomic action sequences. Finally, we use the clustering results to learn and classify atomic actions according to the nearest neighbor rule. If the distance between the input sequence and the nearest mean sequence is greater than a given threshold, it is regarded as an unknown atomic action. Experiments conducted on real data demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–522
Blackburn J, Ribeiro E (2007) Human motion recognition using Isomap and dynamic time warping. Proceedings of the Second Workshop on Human Motion, pp285–298
Blank M, Gorelick L, Shechtman E, Irani M, Barsi R (2005) Actions as space-time shapes. Proc IEEE Int Conf Comput Vis 2:1395–1402
Cock KD, Moor BD (2000) Subspace angles and distances between ARMA models. Proceedings of the Fourteenth International Symposium of Mathematical Theory of Networks and Systems
Collins RT, Lipton AJ, Kanade T (2000) Introduction to the special section on video surveillance. IEEE Trans Pattern Anal Mach Intell 22(8):745–746
Cox TF, Cox MAA (2011) Multidimensional scaling. Chapman and Hall
Cutler R, Davis L (2000) Robust real-time periodic motion detection, analysis, and applications. IEEE Trans Pattern Anal Mach Intell 22(8):781–796
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp269–274
Elgammal A, Lee CS (2004) Inferring 3D body pose from silhouettes using activity manifold learning. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:681–688
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98
Hsieh JW, Hsu YT, Mark Liao HY, Chen CC (2008) Video-based human movement analysis and its application to surveillance systems. IEEE Trans Multimed 10(3):372–384
Jain AK, Murthy MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Law MHC, Jain AK (2006) Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans Pattern Anal Mach Intell 28(3):377–391
Liang YM, Shih SW, Shih ACC, Liao HYM, Lin CC (2009) Learning atomic human action using variable-length Markov models. IEEE Trans Syst Man Cybern B 39(1):268–280
Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809
Miyamori H, Iisaku S (2000) Video annotation for content-based retrieval using human behavior analysis and domain knowledge. Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 320–325
Morariu VI, Camps OI (2006) Modeling correspondences for multi-camera tracking using nonlinear manifold learning and target dynamics. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 1:545–552
Nevill-Manning CG, Witten IH (2000) On-line and off-line heuristics for inferring hierarchies of repetitions in sequence. Proc IEEE 88(11):1745–1755
Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Rabiner L, Juan BH (1993) Fundamentals of speech recognition. Prentice-Hall Signal Processing Series
Rane N, Birchfield S (2007) Isomap tracking with particle filtering. Proc IEEE Int Conf Image Process 2:513–516
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sign Process 26(1):43–49
Sharma R, Pavlović VI, Huang TS (1998) Toward multimodal human-computer interface. Proc IEEE 86(5):853–869
Su CW, Mark Liao HY, Tyan HR, Lin CW, Chen DY, Fan KC (2007) Motion flow-based video retrieval. IEEE Trans Multimed 9(6):1193–1201
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
TREC Video Retrieval Evaluation, http://www-nlpir.nist.gov/projects/trecvid/
Turaga PK, Veeraraghavan A, Chellappa R (2007) From videos to verbs: mining videos for activities using a cascade of dynamical systems. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recog 36(3):585–601
TS Wang, HY Shum, YQ Xu, NN Zheng (2001) Unsupervised analysis of human gestures. Proceedings of the IEEE Pacific-Rim Conference on Multimedia, pp174–181
Wang L, Suter D (2008) Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput Vis Image Understand 110(2):153–172
Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785
Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:819–826
Acknowledgment
The authors would like to thank the National Science Council, Taiwan under Contract NSC 99-2632-H-156-001-MY3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liang, YM., Shih, SW. & Shih, A.CC. Human action segmentation and classification based on the Isomap algorithm. Multimed Tools Appl 62, 561–580 (2013). https://doi.org/10.1007/s11042-011-0858-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0858-2