Abstract
In this paper we address the problem of automatic pedestrian parsing in surveillance video with only a small number of training samples. Although human parsing has achieved great success with high-capacity models, it is still quite challenging to parse pedestrians in practical surveillance conditions because complicated environmental interferences need more pixel-level training samples to fit. But creating large datasets with pixel-level labels has been extremely costly due to the vast amount of human effort required. Our method is developed to capture the pedestrian information from the non-labeled datasets to update the trained model by reinforcement learning, which achieves elegant performance with only much fewer pixel-level labeled samples. Both quantitative and qualitative experiments conducted on practical surveillance datasets have shown the effectiveness of the proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Li, A., Liu, L., Wang, K., Liu, S., Yan, S.: Clothing attributes assisted person reidentification. IEEE Trans. Circ. Syst. Video Technol. 25(5), 869–878 (2015)
Wang, Z., Ruimin, H., Liang, C., Yi, Y., Jiang, J., Ye, M., Chen, J., Leng, Q.: Zero-shot person re-identification via cross-view consistency. IEEE Trans. Multimedia 18(2), 260–272 (2016)
Ye, M., Liang, C., Yi, Y., Wang, Z., Leng, Q., Xiao, C., Chen, J., Ruimin, H.: Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18(12), 2553–2566 (2016)
Zeng, M., Cao, L., Dong, H., Lin, K., Wang, M., Tong, J.: Estimation of human body shape and cloth field in front of a kinect. Neurocomputing 151, 626–631 (2015)
Yang, J., Franco, J.-S., Hétroy-Wheeler, F., Wuhrer, S.: Estimation of human body shape in motion with wide clothing. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 439–454. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_27
Weber, M., Bauml, M., Stiefelhagen, R.: Part-based clothing segmentation for person retrieval. In: 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 361–366. IEEE (2011)
Yamaguchi, K., Hadi Kiapour, M., Berg, T.L.: Paper doll parsing: retrieving similar styles to parse clothing items. In: IEEE International Conference on Computer Vision, pp. 3519–3526 (2013)
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Computer Vision and Pattern Recognition, pp. 3182–3189 (2014)
Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: A high performance CRF model for clothes parsing. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 64–81. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_5
Liu, S., Liang, X., Liu, L., Shen, X., Yang, J., Xu, C., Lin, L., Cao, X., Yan, S.: Matching-CNN meets KNN: quasi-parametric human parsing. In: Computer Vision and Pattern Recognition, pp. 1419–1427 (2015)
Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., Lin, L., Yan, S.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2402 (2015)
Liu, S., Liang, X., Liu, L., Lin, L.: Transferred human parsing with video context. IEEE Trans. Multimedia 17, 1 (2015)
Xia, F., Zhu, J., Wang, P., Yuille, A.L.: Pose-guided human parsing by an and/or graph using pose-context features. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 3632–3640 (2016)
Liang, X., Xu, C., Shen, X., Yang, J., Tang, J., Lin, L., Yan, S.: Human parsing with contextualized convolutional neural network. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 115–127 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Scenenet: understanding real world indoor scenes with synthetic data. Comput. Sci. 4077–4085 (2015)
Papon, J., Schoeler, M.: Semantic pose using deep networks trained on synthetic RGB-D. In: IEEE International Conference on Computer Vision, pp. 774–782 (2015)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Kaboutari, A., Bagherzadeh, J., Kheradmand, F.: An evaluation of two-step techniques for positive-unlabeled learning in text classification. Int. J. Comput. Appl. Technol. Res. 3, 592–594 (2014)
Day, W.Y., Chi, C.Y., Chen, R.C., Cheng, P.J.: Sampling the web as training data for text classification. Int. J. Digit. Libr. Syst. 1(4), 24–42 (2010)
Benisty, H., Crammer, K.: Metric learning using labeled and unlabeled data for semi-supervised/domain adaptation classification. In: Electrical and Electronics Engineers in Israel, pp. 1–5 (2014)
Tangseng, P., Wu, Z., Yamaguchi, K.: Looking at Outfit to Parse Clothing (2017)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Acknowledgments
The research was supported by the National Nature Science Foundation of China under Grant U1611461, 61231015, 61303114, 61671332 and 61671336, by the EU FP7 QUICK project under Grant Agreement No. PIRSES-GA-2013-612652, by the National High Technology Research and Development Program of China under Grant 2015AA016306, by the Technology Research Program of Ministry of Public Security under Grant 2016JSYJA12, by the Hubei Province Technological Innovation Major Project under Grant 2016AAA015 and 2017AAA123, and by the Nature Science Foundation of Jiangsu Province under Grant BK20160386.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zheng, Q., Chen, J., Jiang, J., Hu, R. (2018). Reinforcing Pedestrian Parsing on Small Scale Dataset. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)