Abstract
For classification of health data, we propose in this paper a fast and accurate feature selection method, FIEBIT (Feature Inclusion and Exclusion Based on Information Theory). FIEBIT selects the most relevant and non-redundant features using Conditional Mutual Information (CMU) while excluding irrelevant and redundant features according to the comparison among Individual Symmetrical Uncertainty (ISU) and Combined Symmetrical Uncertainty (CSU). Small feature subsets are selected before classification without compromising the classification accuracy. In addition, the size of the feature subset is determined automatically. Our preliminary empirical results on health data with hundreds of features suggest FIEBIT is efficient and effective in comparison with representative feature selection methods.
The authors would like to acknowledge Dr H. Altay Guvenir of Bilkent University for donating the Cardiac Arrhythmia Database for public usage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proceedings of KDD 2004, Seattle, WA, USA, pp. 737–742 (2004)
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Wang, G., Lochovsky, F.H., Yang, Q.: Feature selection with conditional mutual information maxmin in text categorization. In: Proceedings of CIKM 2004, Washington, US, November 2004, pp. 8–13 (2004)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Jack, S., Heckerman, D., Kadie, C.: (1998) ftp://ftp.ics.uci.edu/pub/machine-learning-databases/arrhythmia/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, H., Jin, H., Chen, J. (2005). Automatic Feature Selection for Classification of Health Data. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_108
Download citation
DOI: https://doi.org/10.1007/11589990_108
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)