Abstract
In real-world application, data is often represented by hundreds or thousands of features. Most of them, however, are redundant or irrelevant, and their existence may straightly lead to poor performance of learning algorithms. Hence, it is a compelling requisition for their practical applications to choose most salient features. Currently, a large number of feature selection methods using various strategies have been proposed. Among these methods, the mutual information ones have recently gained much more popularity. In this paper, a general criterion function for feature selector using mutual information is firstly introduced. This function can bring up-to-date selectors based on mutual information together under an unifying scheme. Then an experimental comparative study of eight typical filter mutual information based feature selection algorithms on thirty-three datasets is presented. We evaluate them from four essential aspects, and the experimental results show that none of these methods outperforms others significantly. Even so, the conditional mutual information feature selection algorithm dominates other methods on the whole, if training time is not a matter.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Battiti, R.: Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)
Bell, D.A., Wang, H.: A Formalism for Relevance and Its Application in Feature Subset Selection. Machine Learning 41, 175–195 (2000)
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence 97, 245–271 (1997)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, NY (1991)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artifical Intelligence, pp. 1022–1027 (1993)
Fleuret, F.: Fast Binary Feature Selection with Conditional Mutual Information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters 28, 1825–1844 (2007)
Jain, A.K., Duin, R., Mao, J.: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000)
Kira, K., Rendell, L.: A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256 (1992)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Kwak, N., Choi, C.-H.: Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1667–1671 (2002)
Lindenbaum, M., Markovitch, S., Rusakov, D.: Selective Sampling for Nearest Neighbor Classifiers. Machine Learning 54, 125–152 (2004)
Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Novovičová, J., Somol, P., Haindl, M., Pudil, P.: Conditional Mutual Information Based Feature Selection for Classification Task. In: Proc. of the 12th Iberoamericann Congress on Pattern Recognition, Valparaiso, Chile, pp. 417–426 (2007)
Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance,and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Qu, G., Hariri, S., Yousif, M.: A New Dependency and Correlation Analysis for Features. IEEE Transactions on Knowledge and Data Engineering 17(9), 1199–1207 (2005)
Somol, P., Novovičová, J., Pudil, P.: Notes on The Evolution of Feature Selection Methodology. Kybernetika 43(5), 713–730 (2007)
Wang, G., Lochovsky, F.H., Yang, Q.: Feature Selection with Conditional Mutual Information MaxiMin in Text Categorization. In: Proceedings of the 13th ACM CIKM 2004, Washington, USA, pp. 342–349 (2004)
Witten, I.H., Frank, E.: Data Mining - Pracitcal Machine Learning Tools and Techniques with JAVA Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, MN, pp. 1151–1158 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, H., Liu, L., Zhang, H. (2008). Feature Selection Using Mutual Information: An Experimental Study. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)