Abstract
This paper aims to estimate constantly changing human poses in videos. Traditional methods fail to locate wrists accurately, which is a tremendously challenging task. We propose a three-stage framework for human pose estimation, emphasizing on the improvement of wrist location accuracy. The first stage applies the pictorial structure model to localize the positions of all joints in each frame and calculate the posterior edge distribution probability of wrists. In the second stage, a visual tracking based method is fused into the posterior edge distribution probability of wrists to obtain the wrist location. Instead of directly predicting the wrist location, the third stage designs a novel cascade error-correction mechanism (CECM) to correct the predicted results. In addition, a skin-based proposal and multifarious reinitializing modes are also involved in CECM. Experiments are conducted on the two public datasets, and results demonstrate the superiority of the proposed algorithm compared to state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang, Y., Ramanan, D.: Articulated Human Detection with Flexible Mixtures of Parts. IEEE Trans. Software Eng. 35(12), 2878–2890 (2013)
Pishchulin, L., Andriluka, M., Gehler, P.: Poselet Conditioned Pictorial Structures. IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595, Portland (2013)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660, Portland (2013)
Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1347–1355, Boston (2015)
Wei, S. E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732, Las Vegas (2016)
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A. L., Wang, X.: Multi-context attention for human pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu (2017)
Park, D., & Ramanan, D..: N-best maximal decoders for part models. IEEE International Conference on Computer Vision, pp. 2627–2634, Spain (2011)
Ramakrishna, V., Kanade, T., Sheikh, Y.: Tracking Human Pose by Tracking Symmetric Parts. IEEE Conference on Computer Vision and Pattern Recognition, Portland, pp. 3728–3735 (2013)
Tokola, R., Choi, W., Savarese, S.: Breaking the chain: liberation from the temporal Markov assumption for tracking human poses. In: IEEE Conference on Computer Vision and Pattern Recognition, Portland, pp. 2424–2431 (2013)
Cherian, A., Mairal, J., Alahari, K., Schmid, C.: Mixing body-part sequences for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 2353–2360 (2014)
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
Li, X., Hu, W., Zhang, Z., Zhang, X., Luo, G.: Robust visual tracking based on incremental tensor subspace learning. In: IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, pp. 1–8 (2007)
Shao, L., Jones, S., Li, X.: Efficient search and localization of human actions in video databases. IEEE Trans. Circ. Syst. Video Technol. 24(3), 504–512 (2014)
Saegusa, R., Metta, G., Sandini, G., Natale, L.: Developmental perception of the self and action. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 183 (2014)
Bousmalis, K., Zafeiriou, S., Morency, L.P., Pantic, M.: Infinite hidden conditional random fields for human behavior analysis. IEEE Trans. Neural Netw. Learn. Syst. 24(1), 170 (2013)
Tao, D., Jin, L., Wang, Y., Li, X.: Rank preserving discriminant analysis for human behavior recognition on wireless sensor networks. IEEE Trans. Ind. Inform. 10(1), 813–823 (2014)
Ding, C., Xu, C., Tao, D.: Multi-task pose-invariant face recognition. IEEE Trans. Image Process. 24(3), 980 (2015)
Zhen, X., Shao, L., Li, X.: Action recognition by spatio-temporal oriented energies. Inf. Sci. 281, 295–309 (2014)
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1), 12–141 (2008)
Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, pp. 1–8 (2008)
Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: IEEE International Conference on Computer Vision, Spain, pp. 1281–1288 (2011)
Zhao, L., Gao, X., Tao, D., Li, X.: Learning a tracking and estimation integrated graphical model for human pose tracking. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3176–3186 (2015)
Ferrari, V., Marinjimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE International Conference on Computer Vision, Anchorage, pp. 1–8 (2008)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61432014, 61501349 and U1605252, in part by the National Key Research and Development Program of China under Grant 2016QY01W0204, in part by Key Industrial Innovation Chain in Industrial Domain under Grant 2016KTZDGY-02, in part by the Fundamental Research Funds for the Central Universities under Grant XJS17074 and JBX170218, in part by National High-Level Talents Special Support Program of China under Grant CS31117200001, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2017JM6050.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dai, H., He, L., Gao, X., Guo, Z., Lu, W. (2017). Cascade Error-Correction Mechanism for Human Pose Estimation in Videos. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds) Intelligence Science and Big Data Engineering. IScIDE 2017. Lecture Notes in Computer Science(), vol 10559. Springer, Cham. https://doi.org/10.1007/978-3-319-67777-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-67777-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67776-7
Online ISBN: 978-3-319-67777-4
eBook Packages: Computer ScienceComputer Science (R0)