Abstract
The goal of this paper is to evaluate how the fusion of multimodal features (i.e., audio, RGB and depth) can help in the challenging task of people identification based on their gait (i.e., the way they walk), or gait recognition, and by extension to the tasks of gender and shoes recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio or depth patterns associated with the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio and depth recordings in addition to image sequences. The experimental results show that using either early or late fusion techniques to combine feature descriptors from three kinds of modalities (i.e., RGB, depth and audio) improves the state-of-the-art results on the standard experiments defined on the dataset for the tasks of gait, gender and shoes recognition. Additional experiments on CASIA-B (where only visual modality is available) support the benefits of feature fusion as well.
Similar content being viewed by others
References
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Bach, F., Lanckriet, G., Jordan, M.: Multiple kernel learning, conic duality and the SMO algorithm. In: Proceedings of the International Conference on Machine Learning, p. 6 (2004)
Castro, F.M., Marín-Jiménez, M.J., Guil, N.: Empirical study of audio-visual features fusion for gait recognition. In: Proceedings of the International Conference on Computer Analysis of Images and Patterns, pp. 727–739 (2015)
Castro, F.M., Marín-Jiménez, M., Medina-Carnicer, R.: Pyramidal fisher motion for multiview gait recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 1692–1697 (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893. IEEE Computer Society, Washington, DC, USA (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Proceedings of the European Conference on Computer Vision (ECCV) (2006)
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of Scandinavian Conference on Image Analysis, vol. 2749, pp. 363–370 (2003)
Geiger, J., Hofmann, M., Schuller, B., Rigoll, G.: Gait-based person identification by spectral, cepstral and energy-related audio features. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 458–462 (2013)
Geiger, J.T., Kneißl, M., Schuller, B., Rigoll, G.: Acoustic gait-based person identification using hidden Markov models. ArXiv e-prints (2014)
Guan, Y., Li, C.: A robust speed-invariant gait recognition system for walker and runner identification. In: International Conference on Biometrics (ICB), pp. 1–8 (2013)
Han, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 316–322 (2006)
Hofmann, M., Geiger, J., Bachmann, S., Schuller, B., Rigoll, G.: The TUM gait from audio, image and depth (GAID) database: multimodal recognition of subjects and traits. J. Vis. Commun. Image Represent. 25(1), 195–206 (2014)
Hu, M., Wang, Y., Zhang, Z., Zhang, D., Little, J.: Incremental learning for video-based gait recognition with LBP flow. IEEE Trans. Cybern. 43(1), 77–89 (2013)
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 34(3), 334–352 (2004)
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2555–2562 (2013)
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Video-Based Surveillance Systems, pp. 135–144. Springer (2002)
Lartillot, O., Toiviainen, P.: MIR in matlab (ii): a toolbox for musical feature extraction from audio. In: ISMIR, pp. 127–130 (2007)
Lin, Z., Chen, M., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 (2010)
Liu, Y., Zhang, J., Wang, C., Wang, L.: Multiple HOG templates for gait recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 2930–2933. IEEE (2012)
López-Fernández, D., Madrid-Cuevas, F.J., Carmona-Poyato, A., Muñoz Salinas, R., Medina-Carnicer, R.: Entropy volumes for viewpoint-independent gait recognition. Mach. Vis. Appl. 26(7–8), 1079–1094 (2015)
Marín-Jiménez, M., Muñoz Salinas, R., Yeguas-Bolivar, E., de la Blanca, N.P.: Human interaction categorization by using audio-visual cues. Mach. Vis. Appl. 25(1), 71–84 (2014)
Martín-Félez, R., Xiang, T.: Uncooperative gait recognition by learning to rank. Pattern Recognit. 47(12), 3793–3806 (2014)
Osuna, E., Freund, R., Girosi, F.: Support Vector Machines: training and applications. Tech. Rep. AI-Memo 1602, MIT (1997)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 143–156 (2010)
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Sivapalan, S., Chen, D., Denman, S., Sridharan, S., Fookes, C.: Gait energy volumes and frontal gait recognition using depth images. In: Biometrics (IJCB), 2011 International Joint Conference on, pp. 1–6 (2011)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. Proc. Int. Conf. Comput. Vis. 2, 1470–1477 (2003)
Varma, M., Babu, B.R.: More generality in efficient multiple kernel learning. In: ICML, p. 134 (2009)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/ (2008)
Wang, C., Zhang, J., Wang, L., Pu, J., Yuan, X.: Human identification using temporal information preserving gait template. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2164–2176 (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)
Whytock, T., Belyaev, A., Robertson, N.: Dynamic distance-based shape features for gait recognition. J. Math. Imaging Vis. 50(3), 314–326 (2014)
Ye, G., Jhuo, I.H., Liu, D., Jiang, Y.G., Lee, D.T., Chang, S.F.: Joint audio-visual bi-modal codewords for video event detection. In: ICMR, p. 39 (2012)
Ye, G., Liu, D., Jhuo, I.H., Chang, S.F.: Robust late fusion with rank minimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3021–3028 (2012)
Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc. Int. Conf. Pattern Recognit. 4, 441–444 (2006)
Zeng, W., Wang, C., Yang, F.: Silhouette-based gait recognition via deterministic learning. Pattern Recognit. 47(11), 3568–3584 (2014)
Zhang, E., Zhao, Y., Xiong, W.: Active energy image plus 2DLPP for gait recognition. Sig. Process. 90(7), 2295–2302 (2010)
Acknowledgments
This work has been partially funded by project TIC-1692 (Junta de Andalucía), and the Research Project TIN2012-32952 (Spanish Ministry of Science and Technology). We also thank the reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Castro, F.M., Marín-Jiménez, M. & Guil, N. Multimodal features fusion for gait, gender and shoes recognition. Machine Vision and Applications 27, 1213–1228 (2016). https://doi.org/10.1007/s00138-016-0767-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-016-0767-5