Abstract
Feature selection is an important step for large-scale image data analysis, which has been proved to be difficult due to large size in both dimensions and samples. Feature selection firstly eliminates redundant and irrelevant features and then chooses a subset of features that performs as efficient as the complete set. Generally, supervised feature selection yields better performance than unsupervised feature selection because of the utilization of labeled information. However, labeled data samples are always expensive to obtain, which constraints the performance of supervised feature selection, especially for the large web image datasets. In this paper, we propose a semi-supervised feature selection algorithm that is based on a hierarchical regression model. Our contribution can be highlighted as: (1) Our algorithm utilizes a statistical approach to exploit both labeled and unlabeled data, which preserves the manifold structure of each feature type. (2) The predicted label matrix of the training data and the feature selection matrix are learned simultaneously, making the two aspects mutually benefited. Extensive experiments are performed on three large-scale image datasets. Experimental results demonstrate the better performance of our algorithm, compared with the state-of-the-art algorithms.
Similar content being viewed by others
References
Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., Wang, Z.: Joint multi-label multi-instance learning for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. IEEE, (2008)
Zha, Z.-J., Wang, M., Zheng, Y.-T., Yang, Y., Hong, R., Chua, T.-S.: Interactive video indexing with statistical active learning. IEEE Trans. Multimed. 14, 17–27 (2012)
Zha, Z.-J., Yang, L., Mei, T., Wang, M., Wang, Z.: Visual query suggestion. In: Proceedings of the 17th ACM international conference on Multimedia, pp. 15–24. ACM, (2009)
Zha, Z.-J., Yang, L., Wang, Z., Chua, T.-S., Hua, X.-S.: Visual query suggestion: towards capturing user intent in internet image search. ACM Trans. Multimed. Comput. Commun. Appl. (TOMCCAP) 6(13), 1–19 (2010)
Koller, D., Sahami, M.: Toward optimal feature selection. Technical Report 1996–77, Stanford InfoLab, February (1996)
Han, Y., Yang, Y., Zhou, X.: Co-regularized ensemble for feature selection. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (2013)
Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimed. 15, 572–581 (2012)
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)
Zhang, T., Changsheng, X., Hanqing, L.: A generic framework for video annotation via semi-supervised learning. IEEE Trans. Multimed. 14, 1206–1219 (2012)
Zha, Z.-J., Mei, T., Wang, J., Wang, Z., Hua, X.-S.: Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent. 20(2), 97–103 (2009). Special issue on Emerging Techniques for Multimedia Content Sharing, Search and Understanding
Zhu, J., Hoi, S.C.H., Lyu, M.R., Yan, S.: Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching. ACM Trans. Multimed. Comput. Commun. Appl. (TOMCCAP) 7(1), 4:1–4:24 (2011)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
Nie, F., Dong, X.: Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 19(7), 1921–1932 (2010)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)
Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint \(l_{2, 1}\)-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: l 2, 1-norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 1589–1594. AAAI Press, (2011)
Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: Proceedings of the AAAI Conference on Artificial Intelligence, (2010)
Bao, B.-K., Liu, G., Yan, S.: Inductive robust principal component analysis. IEEE Trans. Image Process. 21(8), 3794–3800 (2012)
Bao, B.-K., Zhu, G., Shen, J., Yan, S.: Robust image analysis with sparse representation on quantized visual features. IEEE Trans. Image Process. 22(3), 860–871 (2013)
Bradley, P. S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: International Conference on Machine Learning (ICML), vol. 98, pp. 82–90, (1998)
Sun, L., Liu, J., Chen, J., Ye, J.: Efficient recovery of jointly sparse vectors. In: Advances in Neural Information Processing Systems, pp. 1812–1820 (2009)
Ma, Z., Yang, Y., Nie, F., Uijlings, J., Sebe, N.: Exploiting the entire feature space with sparsity for automatic image annotation. In: Proceedings of the 19th ACM international conference on Multimedia, pp. 283–292. ACM, (2011)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SIAM International Conference on Data Mining, (2007)
Zhu, X., Ghahramani, Z., Lafferty, J. et al.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, vol. 3, pp. 912–919, (2003)
Zenglin, X., Irwin King, M.R.-T., Lyu, R.J.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp. 39–43. ACM, (2008)
Li, H., Wang, M., Hua, X.-S.: MSRA-MM 2.0: a large-scale web multimedia dataset. In: IEEE International Conference on Data Mining Workshops, pp. 164–169. IEEE, (2009)
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, number 48 in CIVR ’09, pp. 1–9. ACM, (2009)
Wu, F., Yuan, Y., Zhuang, Y.: Heterogeneous feature selection by group lasso with logistic regression. In: Proceedings of the international conference on Multimedia, pp. 983–986, (2010)
Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on Multimedia, pp. 175–184. ACM, (2009)
Acknowledgments
This paper was partially supported by the National Program on the Key Basic Research Project (under Grant 2013CB329301), NSFC (under Grant 61202166), and Doctoral Fund of Ministry of Education of China (under Grant 20120032120042).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, X., Zhang, J., Han, Y. et al. Semi-supervised feature selection via hierarchical regression for web image classification. Multimedia Systems 22, 41–49 (2016). https://doi.org/10.1007/s00530-014-0390-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0390-0