Abstract
As a high dimensional problem, it is a hard task to analyze the text data sets, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using pair-wise feature similarities, which do not consider discriminative contribution of each feature by utilizing the label information. Here we define an Approximate Markov Blanket (AMB) based on the metric of DIScriminative Contribution (DISC) to eliminate redundant features and propose the AMB-DISC algorithm. Experimental results on the data set of Reuter-21578 show AMB-DISC is much better than the previous state-of-arts feature selection algorithms considering feature redundancy in terms of Micro avg F1 and Macro avg F1.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)
Liu, H., Dougherty, E., Dy, J., Torkkola, K., Tuv, E., Peng, H., Ding, C., Long, F., Berens, M., Parsons, L., Zhao, Z., Yu, L., Forman, G.: Evolving feature selection. IEEE Intelligent Systems 20(6), 64–76 (2005)
Zhu, S., Wang, D., Yu, K., Li, T., Gong, Y.: Feature selection for gene expression using model-based entropy. IEEE Transactions on Computational Biology and Bioinformatics 7(1), 25–36 (2010)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Zeng, X.Q., Li, G.Z., Yang, J.Y., Yang, M.Q., Wu, G.F.: Dimension reduction with redundant genes elimination for tumor classification. BMC Bioinformatics 9(suppl 6), S8 (2008)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42–49. ACM Press, New York (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Survey 34(1), 1–47 (2002)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: International Conference on Machine Learning, pp. 359–366 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zeng, XQ., Chen, SF., Zou, HX. (2011). Redundant Feature Elimination by Using Approximate Markov Blanket Based on Discriminative Contribution. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)