Abstract
In pattern recognition field, objects are usually represented by multiple features (multimodal features). For example, to characterize a natural scene image, it is essential to extract a set of visual features representing its color, texture, and shape information. However, integrating multimodal features for recognition is challenging because: (1) each feature has its specific statistical property and physical interpretation, (2) huge number of features may result in the curse of dimensionality (When data dimension is high, the distances between pairwise objects in the feature space become increasingly similar due to the central limit theory. This phenomenon influences negatively to the recognition performance), and (3) some features may be unavailable. To solve these problems, a new multimodal feature selection algorithm, termed Grassmann manifold feature selection (GMFS), is proposed. In particular, by defining a clustering criterion, the multimodal features are transformed into a matrix, and further treated as a point on the Grassmann manifold in Hamm and Lee (Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning (ICML), pp. 376–383, Helsinki, Finland [2008]). To deal with the unavailable features, L2-Hausdorff distance, a metric between different-sized matrices, is computed and the kernel is obtained accordingly. Based on the kernel, we propose supervised/unsupervised feature selection algorithms to achieve a physically meaningful embedding of the multimodal features. Experimental results on eight data sets validate the effectiveness the proposed approach.










Similar content being viewed by others
References
Woods, K., Philip Kegelmeyer, W., Jr., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE T-PAMI 19(4), 405–410 (1997)
Kittler, J., Hatef, M., Duin, R. P. W., Matas, J.: On combining classifier. IEEE T-PAMI 17(10), 226–239 (1998)
Zhou, X., Bhanu, B.: Integrating face and gait for human recognition. In: Proceedings of the computer vision and pattern recognition (CVPR) workshop, pp. 255 (2006)
Tong, H., He, J., Li, M., Zhang, C., Ma, W.-Y.: Graph based multi-modality learning, In: Proceedings of the ACM Multimedia, pp. 862–871 (2005)
Nilsback, M. E., Caputo, B.: Integrating face and gait for human recognition. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR 2004), pp. 578–585 (2004)
Greene, D., Cunningham, P.: A matrix factorization approach for integrating multiple data views. In: Proceedings of ECCV, pp. 423-438 (2009)
Bach, F. R., Lanckriet, G.R.G., Jordan, M.I.: Multiple Kernel learning, conic duality, and the SMO algorithm. In: Proceedings of ICML (2004)
Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: Proceedings of ICCV, pp. 221–228 (2009)
Xia, T., Tao, D., Mei, T., Zhang, Y.: Multiview spectral embedding. IEEE TSMC-B, pp. 929–932 (2002)
Xie, B., Mu, Y., Tao, D.: m-SNE: multiview stochastic neighbor embedding. In: Proc. ICONIP 17(10), 338–346 (2010)
Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)
Zhang, L., Song, M., Liu, Z., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet cut: exploring spatial structure cue for weakly supervised image segmentation. In: Proceedings of 26th IEEE conference on computer vision and pattern recognition (2013)
Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Bu, J., Chen, C.: Semi-supervised node splitting for random forest construction. In: Proceedings of 26th IEEE conference on computer vision and pattern recognition (2013)
Zhang, L., Song, M., Li, N., Bu, J., Chen, C.: Feature selection for accelerating speech based emotion recognition. ACM Multimedia, pp. 753–756 (2009)
Li, Y., Gong, S., Liddell, H.: Kernel discriminant analysis. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1998)
Wu, Y., Chang, E. Y., Chang, K. C.-C., Smith, J. R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th annual ACM international conference on multimedia, pp. 572–579, New York (2004)
Ma, Z., Nie, F., Yang, Y., Uijlings, J.R.R., Sebe, N.: Web image annotation via subspace-sparsity collaborated feature selection. IEEE T-MM 14(4), 1021–1030 (2012)
Ma, Z., Yang, Y., Nie, F., Uijlings, J., Sebe, N.: Exploiting the entire feature space with sparsity for automatic image annotation. In: Proceedings of ACM Multimedia, pp.283-292 (2011)
Li, Y., Geng, B., Tao, D., Zha, Z.-J., Yang, L., Xu, C.: Difficulty guided image retrieval using linear multiple feature embedding. IEEE T-MM 14(6), 1618–1630 (2012)
Zhang, L., Zhang, L., Tao, D., Huang, X.: On combining multiple features for hyperspectral remote sensing image classification. IEEE T. Geosci. Remote Sens. 50(3), 879–893 (2012)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: l 21-Norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of IJCAI, pp. 1589-1594 (2011)
Hamm, J., Lee, D. D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning (ICML), pp. 376–383, Helsinki, Finland, 5–9 June (2008)
Wang, L., Wang, X., Feng, J.: Subspace distance analysis with application to adaptive Bayesian face recognition. Pattern Recogn. 39(3), 456–464 (2006)
Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE T-IP 21(5), 2887–2897 (2013)
Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., Burlington (1993)
Yu, H., Li, M., Zhang, H.-J., Feng, J.: Color texture moments for content-based image retrieval. In: Proceedings of the ICIP, pp. 24–28 (2003)
Scholkopf, B., Smola, A., Muller, K.-R.: Kernel principal component analysis. In: Advances in Kernel methods—support vector learning, pp. 327–352, MIT Press, Cambridge (1999)
Gu, Q., Li, Z., Han, J.: Joint feature selection and subspace learning. In: Proceedings of IJCAI, pp. 1294–1299 (2011)
Gene, H.G., Van Loan Charles, F.: Matrix computations. Johns Hopkins University Press, Baltimore (1996)
Cao, B., Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Feature selection in a kernel space. In: Proceedings of the international conference on machine learning (ICML), pp. 121–128, Oregon, USA, 20–24 June 2007 (2007)
Gu, Q., Li, Z., Han, J.: Generalized Fisher score for feature selection. In: Proceedings of UAI. pp. 266–273 (2011)
Leibe, B., Schiele, B. (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the IEEE Computer Society on computer vision and pattern recognition, pp. 409–415 (2003)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. (CVIU) 106(1), 59–70 (2007)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, California Institute of Technological Pasadena, CA (2007)
Ragheb, H., Velastin, S., Remagnino, P., Ellis, T.: ViHASi: virtual human action Silhouette data for the performance evaluation of Silhouette-based action recognition methods. Workshop on activity monitoring by multi-camera surveillance systems, pp. 1–10 (2008)
Li, H., Wang, M., Hua, X.: MSRA-MM2.0: a large-scale web multimedia dataset. In: Proceedings of ICDMW, pp. 164-169 (2006)
Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of CIVR, pp. 164-169 (2009)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data mining and knowledge discovery, pp. 393–423 (2002)
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE T-PAMI 21(5), 443–449 (1999)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401-408. ACM, New York (2007)
Porway, J., Wang, K., Yao, B., Zhu, S.C.: Scale-invariant shape features for recognition of object categories. In: Proceedings of ICCV, pp. 90–96. (2004)
Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8 (2007)
Ojala, T., Pietikainen, M., Maenpaa, T.: Scale-invariant shape features for recognition of object categories. IEEE T-PAMI 24(7), 971–987 (2002)
Pinto, N., Cox, D.D., Dicarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput Biol 4(1), e27
Zhang, L., Song, M., Li, N., Bu, J., Chen, C.: Feature selection for fast speech emotion recognition. In: Proceedings of the 17th international conference on multimedia, pp. 753–756 (2009)
Cai, D., He, X., Zhou, K., Han, J., Bao, H.: Locality sensitive discriminant analysis. In: Proceedings of IJCAI, pp. 1713–1726 (2007)
Nie, F., Nie, F., Xiang, S., Jia, Y., Zhang, C., Yan, S.: Trace ratio criterion for feature selection. AAAI, pp. 671–676 (2008)
Sun, Z.: Adaptation for multiple cue integration. In: Proceedings of the IEEE Computer Society international conference on computer vision and pattern recognition (CVPR), pp. 440–445 (2003)
Vishwanathan, S.V.N., Sun, Z., Theera-Ampornpunt, N.: Multiple Kernel learning and the SMO algorithm. In: Proceedings of NIPS, pp. 2361-2369 (2010)
Cristianini N., Scholkopf B.: Support vector machines and kernel methods: the new generation of learning machines. AI Magzine 23(3), 31–41 (2002)
Liu, X., Song, M., Zhao, Q., Tao, D., Bu, J., Chen, C.: Attribute-restricted latent topic model for person re-identification. Pattern Recogn. 45(12), 4204–4213 (2012)
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China under Grant 61170142 and 60873124, by the National Key Technology R\&D Program under Grant (2011BAG05B04), by the Program of International S\&T Cooperation (2013DFG12841), and by the Fundamental Research Funds for the Central Universities (2013FZA5012).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, L., Tao, D., Liu, X. et al. Grassmann multimodal implicit feature selection. Multimedia Systems 20, 659–674 (2014). https://doi.org/10.1007/s00530-013-0317-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-013-0317-1