Skip to main content
Log in

Multimodal features fusion for gait, gender and shoes recognition

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The goal of this paper is to evaluate how the fusion of multimodal features (i.e., audio, RGB and depth) can help in the challenging task of people identification based on their gait (i.e., the way they walk), or gait recognition, and by extension to the tasks of gender and shoes recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio or depth patterns associated with the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio and depth recordings in addition to image sequences. The experimental results show that using either early or late fusion techniques to combine feature descriptors from three kinds of modalities (i.e., RGB, depth and audio) improves the state-of-the-art results on the standard experiments defined on the dataset for the tasks of gait, gender and shoes recognition. Additional experiments on CASIA-B (where only visual modality is available) support the benefits of feature fusion as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)

    Article  Google Scholar 

  2. Bach, F., Lanckriet, G., Jordan, M.: Multiple kernel learning, conic duality and the SMO algorithm. In: Proceedings of the International Conference on Machine Learning, p. 6 (2004)

  3. Castro, F.M., Marín-Jiménez, M.J., Guil, N.: Empirical study of audio-visual features fusion for gait recognition. In: Proceedings of the International Conference on Computer Analysis of Images and Patterns, pp. 727–739 (2015)

  4. Castro, F.M., Marín-Jiménez, M., Medina-Carnicer, R.: Pyramidal fisher motion for multiview gait recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 1692–1697 (2014)

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893. IEEE Computer Society, Washington, DC, USA (2005)

  6. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Proceedings of the European Conference on Computer Vision (ECCV) (2006)

  7. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of Scandinavian Conference on Image Analysis, vol. 2749, pp. 363–370 (2003)

  8. Geiger, J., Hofmann, M., Schuller, B., Rigoll, G.: Gait-based person identification by spectral, cepstral and energy-related audio features. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 458–462 (2013)

  9. Geiger, J.T., Kneißl, M., Schuller, B., Rigoll, G.: Acoustic gait-based person identification using hidden Markov models. ArXiv e-prints (2014)

  10. Guan, Y., Li, C.: A robust speed-invariant gait recognition system for walker and runner identification. In: International Conference on Biometrics (ICB), pp. 1–8 (2013)

  11. Han, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 316–322 (2006)

    Article  Google Scholar 

  12. Hofmann, M., Geiger, J., Bachmann, S., Schuller, B., Rigoll, G.: The TUM gait from audio, image and depth (GAID) database: multimodal recognition of subjects and traits. J. Vis. Commun. Image Represent. 25(1), 195–206 (2014)

    Article  Google Scholar 

  13. Hu, M., Wang, Y., Zhang, Z., Zhang, D., Little, J.: Incremental learning for video-based gait recognition with LBP flow. IEEE Trans. Cybern. 43(1), 77–89 (2013)

    Article  Google Scholar 

  14. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 34(3), 334–352 (2004)

    Article  Google Scholar 

  15. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2555–2562 (2013)

  16. KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Video-Based Surveillance Systems, pp. 135–144. Springer (2002)

  17. Lartillot, O., Toiviainen, P.: MIR in matlab (ii): a toolbox for musical feature extraction from audio. In: ISMIR, pp. 127–130 (2007)

  18. Lin, Z., Chen, M., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 (2010)

  19. Liu, Y., Zhang, J., Wang, C., Wang, L.: Multiple HOG templates for gait recognition. In: Proceedings of the International Conference on Pattern Recognition, pp. 2930–2933. IEEE (2012)

  20. López-Fernández, D., Madrid-Cuevas, F.J., Carmona-Poyato, A., Muñoz Salinas, R., Medina-Carnicer, R.: Entropy volumes for viewpoint-independent gait recognition. Mach. Vis. Appl. 26(7–8), 1079–1094 (2015)

    Article  Google Scholar 

  21. Marín-Jiménez, M., Muñoz Salinas, R., Yeguas-Bolivar, E., de la Blanca, N.P.: Human interaction categorization by using audio-visual cues. Mach. Vis. Appl. 25(1), 71–84 (2014)

    Article  Google Scholar 

  22. Martín-Félez, R., Xiang, T.: Uncooperative gait recognition by learning to rank. Pattern Recognit. 47(12), 3793–3806 (2014)

    Article  Google Scholar 

  23. Osuna, E., Freund, R., Girosi, F.: Support Vector Machines: training and applications. Tech. Rep. AI-Memo 1602, MIT (1997)

  24. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 143–156 (2010)

  25. Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  26. Sivapalan, S., Chen, D., Denman, S., Sridharan, S., Fookes, C.: Gait energy volumes and frontal gait recognition using depth images. In: Biometrics (IJCB), 2011 International Joint Conference on, pp. 1–6 (2011)

  27. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. Proc. Int. Conf. Comput. Vis. 2, 1470–1477 (2003)

  28. Varma, M., Babu, B.R.: More generality in efficient multiple kernel learning. In: ICML, p. 134 (2009)

  29. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/ (2008)

  30. Wang, C., Zhang, J., Wang, L., Pu, J., Yuan, X.: Human identification using temporal information preserving gait template. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2164–2176 (2012)

    Article  Google Scholar 

  31. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3169–3176 (2011)

  32. Whytock, T., Belyaev, A., Robertson, N.: Dynamic distance-based shape features for gait recognition. J. Math. Imaging Vis. 50(3), 314–326 (2014)

    Article  MATH  Google Scholar 

  33. Ye, G., Jhuo, I.H., Liu, D., Jiang, Y.G., Lee, D.T., Chang, S.F.: Joint audio-visual bi-modal codewords for video event detection. In: ICMR, p. 39 (2012)

  34. Ye, G., Liu, D., Jhuo, I.H., Chang, S.F.: Robust late fusion with rank minimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3021–3028 (2012)

  35. Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc. Int. Conf. Pattern Recognit. 4, 441–444 (2006)

  36. Zeng, W., Wang, C., Yang, F.: Silhouette-based gait recognition via deterministic learning. Pattern Recognit. 47(11), 3568–3584 (2014)

    Article  Google Scholar 

  37. Zhang, E., Zhao, Y., Xiong, W.: Active energy image plus 2DLPP for gait recognition. Sig. Process. 90(7), 2295–2302 (2010)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been partially funded by project TIC-1692 (Junta de Andalucía), and the Research Project TIN2012-32952 (Spanish Ministry of Science and Technology). We also thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco M. Castro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro, F.M., Marín-Jiménez, M. & Guil, N. Multimodal features fusion for gait, gender and shoes recognition. Machine Vision and Applications 27, 1213–1228 (2016). https://doi.org/10.1007/s00138-016-0767-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-016-0767-5

Keywords

Navigation