Skip to main content
Log in

Grassmann multimodal implicit feature selection

Multimedia Systems Aims and scope Submit manuscript

Abstract

In pattern recognition field, objects are usually represented by multiple features (multimodal features). For example, to characterize a natural scene image, it is essential to extract a set of visual features representing its color, texture, and shape information. However, integrating multimodal features for recognition is challenging because: (1) each feature has its specific statistical property and physical interpretation, (2) huge number of features may result in the curse of dimensionality (When data dimension is high, the distances between pairwise objects in the feature space become increasingly similar due to the central limit theory. This phenomenon influences negatively to the recognition performance), and (3) some features may be unavailable. To solve these problems, a new multimodal feature selection algorithm, termed Grassmann manifold feature selection (GMFS), is proposed. In particular, by defining a clustering criterion, the multimodal features are transformed into a matrix, and further treated as a point on the Grassmann manifold in Hamm and Lee (Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning (ICML), pp. 376–383, Helsinki, Finland [2008]). To deal with the unavailable features, L2-Hausdorff distance, a metric between different-sized matrices, is computed and the kernel is obtained accordingly. Based on the kernel, we propose supervised/unsupervised feature selection algorithms to achieve a physically meaningful embedding of the multimodal features. Experimental results on eight data sets validate the effectiveness the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Woods, K., Philip Kegelmeyer, W., Jr., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE T-PAMI 19(4), 405–410 (1997)

    Article  Google Scholar 

  2. Kittler, J., Hatef, M., Duin, R. P. W., Matas, J.: On combining classifier. IEEE T-PAMI 17(10), 226–239 (1998)

    Article  Google Scholar 

  3. Zhou, X., Bhanu, B.: Integrating face and gait for human recognition. In: Proceedings of the computer vision and pattern recognition (CVPR) workshop, pp. 255 (2006)

  4. Tong, H., He, J., Li, M., Zhang, C., Ma, W.-Y.: Graph based multi-modality learning, In: Proceedings of the ACM Multimedia, pp. 862–871 (2005)

  5. Nilsback, M. E., Caputo, B.: Integrating face and gait for human recognition. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR 2004), pp. 578–585 (2004)

  6. Greene, D., Cunningham, P.: A matrix factorization approach for integrating multiple data views. In: Proceedings of ECCV, pp. 423-438 (2009)

  7. Bach, F. R., Lanckriet, G.R.G., Jordan, M.I.: Multiple Kernel learning, conic duality, and the SMO algorithm. In: Proceedings of ICML (2004)

  8. Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: Proceedings of ICCV, pp. 221–228 (2009)

  9. Xia, T., Tao, D., Mei, T., Zhang, Y.: Multiview spectral embedding. IEEE TSMC-B, pp. 929–932 (2002)

  10. Xie, B., Mu, Y., Tao, D.: m-SNE: multiview stochastic neighbor embedding. In: Proc. ICONIP 17(10), 338–346 (2010)

  11. Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)

    Article  MATH  Google Scholar 

  12. Zhang, L., Song, M., Liu, Z., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet cut: exploring spatial structure cue for weakly supervised image segmentation. In: Proceedings of 26th IEEE conference on computer vision and pattern recognition (2013)

  13. Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Bu, J., Chen, C.: Semi-supervised node splitting for random forest construction. In: Proceedings of 26th IEEE conference on computer vision and pattern recognition (2013)

  14. Zhang, L., Song, M., Li, N., Bu, J., Chen, C.: Feature selection for accelerating speech based emotion recognition. ACM Multimedia, pp. 753–756 (2009)

  15. Li, Y., Gong, S., Liddell, H.: Kernel discriminant analysis. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1998)

    Google Scholar 

  16. Wu, Y., Chang, E. Y., Chang, K. C.-C., Smith, J. R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th annual ACM international conference on multimedia, pp. 572–579, New York (2004)

  17. Ma, Z., Nie, F., Yang, Y., Uijlings, J.R.R., Sebe, N.: Web image annotation via subspace-sparsity collaborated feature selection. IEEE T-MM 14(4), 1021–1030 (2012)

    Google Scholar 

  18. Ma, Z., Yang, Y., Nie, F., Uijlings, J., Sebe, N.: Exploiting the entire feature space with sparsity for automatic image annotation. In: Proceedings of ACM Multimedia, pp.283-292 (2011)

  19. Li, Y., Geng, B., Tao, D., Zha, Z.-J., Yang, L., Xu, C.: Difficulty guided image retrieval using linear multiple feature embedding. IEEE T-MM 14(6), 1618–1630 (2012)

    Google Scholar 

  20. Zhang, L., Zhang, L., Tao, D., Huang, X.: On combining multiple features for hyperspectral remote sensing image classification. IEEE T. Geosci. Remote Sens. 50(3), 879–893 (2012)

    Article  Google Scholar 

  21. Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: l 21-Norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of IJCAI, pp. 1589-1594 (2011)

  22. Hamm, J., Lee, D. D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning (ICML), pp. 376–383, Helsinki, Finland, 5–9 June (2008)

  23. Wang, L., Wang, X., Feng, J.: Subspace distance analysis with application to adaptive Bayesian face recognition. Pattern Recogn. 39(3), 456–464 (2006)

    Article  MATH  Google Scholar 

  24. Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE T-IP 21(5), 2887–2897 (2013)

    Google Scholar 

  25. Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., Burlington (1993)

    Google Scholar 

  26. Yu, H., Li, M., Zhang, H.-J., Feng, J.: Color texture moments for content-based image retrieval. In: Proceedings of the ICIP, pp. 24–28 (2003)

  27. Scholkopf, B., Smola, A., Muller, K.-R.: Kernel principal component analysis. In: Advances in Kernel methods—support vector learning, pp. 327–352, MIT Press, Cambridge (1999)

  28. Gu, Q., Li, Z., Han, J.: Joint feature selection and subspace learning. In: Proceedings of IJCAI, pp. 1294–1299 (2011)

  29. Gene, H.G., Van Loan Charles, F.: Matrix computations. Johns Hopkins University Press, Baltimore (1996)

  30. Cao, B., Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Feature selection in a kernel space. In: Proceedings of the international conference on machine learning (ICML), pp. 121–128, Oregon, USA, 20–24 June 2007 (2007)

  31. Gu, Q., Li, Z., Han, J.: Generalized Fisher score for feature selection. In: Proceedings of UAI. pp. 266–273 (2011)

  32. Leibe, B., Schiele, B. (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the IEEE Computer Society on computer vision and pattern recognition, pp. 409–415 (2003)

  33. http://corel.digitalriver.com

  34. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. (CVIU) 106(1), 59–70 (2007)

    Article  Google Scholar 

  35. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, California Institute of Technological Pasadena, CA (2007)

  36. Ragheb, H., Velastin, S., Remagnino, P., Ellis, T.: ViHASi: virtual human action Silhouette data for the performance evaluation of Silhouette-based action recognition methods. Workshop on activity monitoring by multi-camera surveillance systems, pp. 1–10 (2008)

  37. Li, H., Wang, M., Hua, X.: MSRA-MM2.0: a large-scale web multimedia dataset. In: Proceedings of ICDMW, pp. 164-169 (2006)

  38. Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of CIVR, pp. 164-169 (2009)

  39. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data mining and knowledge discovery, pp. 393–423 (2002)

  40. Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE T-PAMI 21(5), 443–449 (1999)

    Article  Google Scholar 

  41. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401-408. ACM, New York (2007)

  42. Porway, J., Wang, K., Yao, B., Zhu, S.C.: Scale-invariant shape features for recognition of object categories. In: Proceedings of ICCV, pp. 90–96. (2004)

  43. Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8 (2007)

  44. Ojala, T., Pietikainen, M., Maenpaa, T.: Scale-invariant shape features for recognition of object categories. IEEE T-PAMI 24(7), 971–987 (2002)

    Article  Google Scholar 

  45. Pinto, N., Cox, D.D., Dicarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput Biol 4(1), e27

  46. Zhang, L., Song, M., Li, N., Bu, J., Chen, C.: Feature selection for fast speech emotion recognition. In: Proceedings of the 17th international conference on multimedia, pp. 753–756 (2009)

  47. Cai, D., He, X., Zhou, K., Han, J., Bao, H.: Locality sensitive discriminant analysis. In: Proceedings of IJCAI, pp. 1713–1726 (2007)

  48. Nie, F., Nie, F., Xiang, S., Jia, Y., Zhang, C., Yan, S.: Trace ratio criterion for feature selection. AAAI, pp. 671–676 (2008)

  49. Sun, Z.: Adaptation for multiple cue integration. In: Proceedings of the IEEE Computer Society international conference on computer vision and pattern recognition (CVPR), pp. 440–445 (2003)

  50. Vishwanathan, S.V.N., Sun, Z., Theera-Ampornpunt, N.: Multiple Kernel learning and the SMO algorithm. In: Proceedings of NIPS, pp. 2361-2369 (2010)

  51. Cristianini N., Scholkopf B.: Support vector machines and kernel methods: the new generation of learning machines. AI Magzine 23(3), 31–41 (2002)

    Google Scholar 

  52. Liu, X., Song, M., Zhao, Q., Tao, D., Bu, J., Chen, C.: Attribute-restricted latent topic model for person re-identification. Pattern Recogn. 45(12), 4204–4213 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant 61170142 and 60873124, by the National Key Technology R\&D Program under Grant (2011BAG05B04), by the Program of International S\&T Cooperation (2013DFG12841), and by the Fundamental Research Funds for the Central Universities (2013FZA5012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingli Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Tao, D., Liu, X. et al. Grassmann multimodal implicit feature selection. Multimedia Systems 20, 659–674 (2014). https://doi.org/10.1007/s00530-013-0317-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-013-0317-1

Keywords

Navigation