Abstract
This paper proposes a novel method for real-time gesture recognition. Aiming at improving the effectiveness and accuracy of HGR, spatial pyramid is applied to linguistically segment gesture sequence into linguistic units and a temporal pyramid is proposed to get a time-related histogram for each single gesture. Those two pyramids can help to extract more comprehensive information of human gestures from RGB and depth video. A two-layered HGR is further exploited to further reduce the computation complexity. The proposed method obtains high accuracy and low computation complexity performance on the ChaLearn Gesture Dataset, comprising more than 50, 000 gesture sequences recorded.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Kendon, A.: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(3), 311–324 (2007)
Fang, G., Gao, W., Zhao, D.: Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(1), 1–9 (2007)
Baklouti, M., Monacelli, E., Guitteny, V., Couvet, S.: Intelligent assistive exoskeleton with vision based interface. In: Proceedings of the 6th International Conference on Smart Homes and Health Telematics, vol. 5120, pp. 123–135 (2008)
Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)
Wu, Y., Huang, T.S.: Hand modeling analysis and recognition for vision based human computer interaction. IEEE Signal Process. Mag. Spec. Issue Immers. Interact. Technol. 18(3), 51–60 (2001)
Corradini, A.: Real-time gesture recognition by means of hybrid recognizers. In: International Workshop on Gesture and Sign Languages in Human–Computer Interaction, vol. 2298, pp. 34–46 (2001)
Ong, S.C., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)
Aran, O., Keskin, C., Akarun, L.: Computer applications for disabled people and sign language tutoring. In: Proceedings of the Fifth GAP Engineering Congress, pp. 26–28 (2006)
Cooper, H., Holt, B. Bowden, R.: Sign language recognition. In: Visual Analysis of Humans. Springer, London, pp. 539–562 (2011)
Triesch, J., Malsburg, C.: Robotic gesture recognition by cue combination. In: Gesture and Sign Language in Human–Computer Interaction. Lecture Notes in Computer Science. Springer, Berlin, pp. 233–244 (1998)
Hong, S., Setiawan, N.A., Lee, C.: Real-time vision based gesture recognition for human robot interaction. In: Proceedings of International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, vol. 4692, p. 493 (2007)
Grzeszcuk, R., Bradski, G., Chu, M.H., Bouguet, J.Y.: Stereo based gesture recognition invariant to 3d pose and lighting. In: Proceedings of CVPR, pp. 826–833 (2000)
Fujimura, K., Liu, X.: Sign recognition using depth image streams. In: Proceedings of FGR, Southampton, UK, pp. 381–386 (2006)
Hadfield, S., Bowden, R.: Generalised pose estimation using depth. In: Proceedings of ECCV International Workshop: Sign, Gesture, Activity, Heraklion, Crete (2010)
Ershaed, H., Al-Alali, I., Khasawneh, N., Fraiwan, M.: An Arabic sign language computer interface using the Xbox Kinect. In: Annual Undergraduate Research Conference on Applied Computing, Dubai, UAE (2011)
Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44(5), 669–680 (2014)
Suma, E., Lange, B., Rizzo, A., Krum, D., Bolas, M.: FAAST: The flexible action and articulated skeleton toolkit. In: IEEE Virtual Reality Conference, pp. 247–248 (2011)
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.J.: Chalearn gesture challenge: Design and first results. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012)
Zhou, H., Huang, T.: Tracking articulated hand motion with Eigen dynamics analysis. In: Proceedings of International Conference on Computer Vision, vol. 2, pp. 1102–1109 (2003)
Wu, Y., Lin, J., Huang, T.: Capturing natural hand articulation. In: IEEE International Conference on Computer Vision, pp. 426–432 (2001)
Dardas, N.H.A.Q.: Real-time hand gesture detection and recognition for human computer interaction. Ottawa-Carleton Institute for Electrical and Computer Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario (2012)
Kadir, T., Bowden, R., Ong, E.J., Zisserman, A.: Minimal training, large lexicon, unconstrained sign language recognition. In: Proceedings of BMVC, Kingston, UK, pp. 939–948 (2004)
Zhang, L.G., Chen, Y., Fang, G., Chen, X., Gao, W.: A vision-based sign language recognition system using tied-mixture density HMM. In: Proceedings of International Conference on Multimodal interfaces, State College, PA, USA, pp. 198–204 (2004)
Awad, G., Han, J., Sutherland, A.: A unified system for segmentation and tracking of face and hands in sign language recognition. In: Proceedings of ICPR, Hong Kong, China, pp. 239–242 (2006)
Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)
Maung, T.H.H.: Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)
Maraqa, M., Abu-Zaiter, R.: Recognition of Arabic Sign Language (ArSL) using recurrent neural networks. In: International Conference on Applications of Digital Information and Web Technologies, pp. 478–481 (2008)
Akyol, S., Alvarado, P.: Finding relevant image content for mobile sign language recognition. In: International Conference on Signal Processing, Pattern Recognition and Application, Rhodes, Greece, pp. 48–52 (2001)
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6 (2007)
Zahedi, M., Keysers, D. Ney, H.: Appearance-based recognition of words in American sign language. In: Pattern Recognition and Image Analysis. Springer, Heidelberg, pp. 511–519 (2005)
Yin, X., Zhu, X.: Hand posture recognition in gesture-based human–robot interaction. In: IEEE Conference on Industrial Electronics and Applications, pp. 1–6 (2006)
Chen, B.W., He, X., Ji, W., Rho, S., Kung, S.Y.: Support vector analysis of large-scale data based on kernels with iteratively increasing order. J. Supercomput., 1–15 (2015)
Chen, B.W., Wang, J.C., Wang, J.F.: A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11(2), 295–312 (2009)
Chen, B.W., Chen, C.Y., Wang, J.F.: Smart homecare surveillance system: behavior identification based on state transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)
Jiang, F., Wu, S., Yang, G., Zhao, D., Kung, S.Y.: Viewpoint-independent hand gesture recognition with Kinect. SIViP 8(1), 163–172 (2014)
Simpson, P.: Fuzzy min-max neural networks—part 1: classification. IEEE Trans. Neural Netw. 3, 776–786 (1992)
Al-Jarrah, O., Halawani, A.: Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133, 117–138 (2001)
Su, M.C.: A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(2), 276–281 (2000)
Huang, C.L., Huang, W.Y.: Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(6), 292–307 (1998)
Wang, C., Gao, W., Shan, S.: An approach based on phonemes to large vocabulary Chinese sign language recognition. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 393–398 (2002)
Bauer, B., Kraiss, K.F.: Video-based sign recognition using self-organizing subunits. In: Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 434–437 (2002)
Tanibata, N., Shimada, N., Shirai, Y.: Extraction of hand features for recognition of sign language words. In: Proceedings of International Conference on Vision Interface, pp. 391–398 (2002)
Hong, R., Wang, M., Li, G., Nie, L., Zha, Z.J., Chua, T.S.: Multimedia question answering. IEEE Multimedia 19(4), 72–78 (2012)
Jiang, F., Gao, W., Yao, H., Zhao, D., Chen, X.: Synthetic data generation technique in Signer-independent sign language recognition. Pattern Recogn. Lett. 30(5), 513–524 (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Wu, D., Zhu, F., Shao, L.: One shot learning gesture recognition from rgbd images. In: CVPR Workshop on Gesture Recognition, pp. 7–12 (2012)
Lui, Y.M.: A least squares regression framework on manifolds and its application to gesture recognition. In: CVPR Workshop on Gesture Recognition, pp. 13–18 (2012)
Fanello, S. R., Gori, I., Metta, G., Odone, F.: One-shot learning for real-time action recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 31–40. Madeira, Portugal (2013)
Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for one-shot gesture recognition. Pattern Anal. Appl. 1–16 (2015). doi:10.1007/s10044-015-0481-3
Keogh, E., Ratanamahatana, C.A.: exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Wan, J., Ruan, Q., Li, W., An, G., Zhao, R.: 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178 (2006)
Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, 2005, vol. 2, pp. 1458–1465 (2005)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Levenshtein, V.I.: February. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)
Harris, C., Stephens, M.: A combined corner and edge detector. In Alvey Vision Conference, vol. 15, pp. 50–54 (1988)
Dollár, P., Rabaud, V., Gottrell, G.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Mahbub, U., Roy, T., Rahman, M.S, Imtiaz, H.: One-shot-learning gesture recognition using motion history based gesture silhouettes. In: Proceedings of the International Conference on Industrial Application Engineering, pp. 186–193 (2013)
Malgireddy, M.R., Inwogu, I., Govindaraju, V.: A temporal Bayesian model for classifying, detecting and localizing activities in video sequences. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 43–48 (2012)
Acknowledgments
We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant Nos. 61572155 and 61272386.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, F., Ren, J., Lee, C. et al. Spatial and temporal pyramid-based real-time gesture recognition. J Real-Time Image Proc 13, 599–611 (2017). https://doi.org/10.1007/s11554-016-0620-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-016-0620-0
Keywords
Profiles
- Shaohui Liu View author profile