Supportive Utility of Irrelevant Features in Data Preprocessing

Chao, Sam; Li, Yiping; Dong, Mingchui

doi:10.1007/978-3-540-71701-0_42

Sam Chao¹,
Yiping Li¹ &
Mingchui Dong¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1798 Accesses
1 Citations

Abstract

Many classification algorithms degrade their learning performance while irrelevant features are introduced. Feature selection is a process to choose an optimal subset of features and removes irrelevant ones. But many feature selection algorithms focus on filtering out the irrelevant attributes regarding the learned task only, not considering their hidden supportive information to other attributes: whether they are really irrelevant or potentially relevant? Since in medical domain, an irrelevant symptom is treated as the one providing neither explicit information nor supportive information for disease diagnosis. Therefore, the traditional feature selection methods may be unsuitable for handling such critical problem. In this paper, we propose a new method that selecting not only the relevant features, but also targeting at the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The empirical results demonstrate a comparison of performance of various classification algorithms on twelve real-life datasets from UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (2000)
Book Google Scholar
Molina, L.C., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 306–313. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis 1(3), 131–156 (1997)
Article Google Scholar
Miller, A.J.: Subset Selection in Regression. Chapman and Hall, Boca Raton (1990)
MATH Google Scholar
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall International, Englewood Cliffs (1982)
MATH Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine learning, pp. 121–129 (1994)
Google Scholar
Molina, L.C., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 306–313. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Article MATH Google Scholar
Caruana, R., Sa, V.R.: Benefiting from the Variables that Variable Selection Discards. Journal of Machine Learning Research 3, 1245–1264 (2003)
Article MATH Google Scholar
The Hypertensive Research Group of Hear Internal Medicine Department of People’s Hospital of Beijing Medical University: Hundred Questions and Answers in Modern Knowledge of Hypertension (1998)
Google Scholar
Jia, L., Xu, Y.: Guan Xing Bing De Zhen Duan Yu Zhi Liao. Jun Shi Yi Xue Ke Xue Chu Ban She (2001)
Google Scholar
Pazzani, M.J.: Searching for Dependencies in Bayesian Classifiers. In: Proceedings of the Fifth International Workshop on AI and Statistics, pp. 424–429. Springer, Heidelberg (1996)
Google Scholar
Zhu, X.L.: Fundamentals of Applied Information Theory. Tsinghua University Press, Beijing (2000)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Google Scholar
Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of International Conference on Machine Learning, Aberdeen, pp. 249–256. Morgan Kaufmann, San Francisco (1992a)
Google Scholar
Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and New Algorithm. In: Proceedings of AAAI’92, San Jose, CA, AAAI Press, Menlo Park (1992b)
Google Scholar
Jain, A., Zongker, D.: Feature Selection: Evaluation, Application and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–158 (1997)
Article Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Langley, P., Iba, W., Thompsom, K.: An Analysis of Bayesian Classifiers. In: Proceedings of the tenth national conference on artificial intelligence, pp. 223–228. AAAI Press, Menlo Park (1992)
Google Scholar
Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11 (1993)
Google Scholar
Vafaie, H., Imam, I.F.: Feature Selection Methods: Genetic Algorithms vs. Greedy like Search. In: Proceedings of International Conference on Fuzzy and Intelligent Control System (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Technology, Unversity of Macau, Av. Padre Tomás Pereira S.J., Taipa, Macao,
Sam Chao, Yiping Li & Mingchui Dong

Authors

Sam Chao
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingchui Dong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chao, S., Li, Y., Dong, M. (2007). Supportive Utility of Irrelevant Features in Data Preprocessing. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_42

Download citation

DOI: https://doi.org/10.1007/978-3-540-71701-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics