Handling Incomplete Categorical Data for Supervised Learning

Chien, Been-Chian; Lu, Cheng-Feng; Hsu, Steen J.

doi:10.1007/11779568_139

Been-Chian Chien²⁰,
Cheng-Feng Lu²¹ &
Steen J. Hsu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4031))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1653 Accesses

Abstract

Classification is an important research topic in knowledge discovery. Most of the researches on classification concern that a complete dataset is given as a training dataset and the test data contain all values of attributes without missing. Unfortunately, incomplete data usually exist in real-world applications. In this paper, we propose new handling schemes of learning classification models from incomplete categorical data. Three methods based on rough set theory are developed and discussed for handling incomplete training data. The experiments were made and the results were compared with previous methods making use of a few famous classification models to evaluate the performance of the proposed handling schemes.

This work was supported in part by the National Science Council of Taiwan, R. O. C., under contract NSC94-2213-E-024-004.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rough Set Analysis of Classification Data with Missing Values

Analysis of Missing Data Using Matrix-Characterized Approximations

Mining Data with Many Missing Attribute Values Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets

References

Blake, C., Keogh, E., Merz, C.J.: UCI repository of machine learning database. Irvine, University of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, Software (2001), available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chien, B.C., Lin, J.Y., Yang, W.P.: Learning effective classifiers with z-value measure based on genetic programming. Pattern Recognition 37, 1957–1972 (2004)
Article MATH Google Scholar
Chien, B.C., Yang, J.H., Lin, W.Y.: Generating effective classifiers with supervised learning of genetic programming. In: Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery, pp. 192–201 (2003)
Google Scholar
Dempster, P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)
MathSciNet Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, John and Sons Incorporated Publishers, New York (1973)
MATH Google Scholar
Friedman, J.H.: A recursive partitioning decision rule for non-parametric classification. IEEE Transactions on Computer Science, 404–408 (1977)
Google Scholar
Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS, vol. 542, pp. 368–377. Springer, Heidelberg (1991)
Chapter Google Scholar
Grzymala-Busse, J.W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS, vol. 2005, pp. 378–385. Springer, Heidelberg (2001)
Chapter Google Scholar
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, pp. 56–63 (2003)
Google Scholar
Gunn, S.R.: Support vector machines for classification and regression. Technical Report, School of Electronics and Computer Science University of Southampton (1998)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concept and Techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics 31(5) (2001)
Google Scholar
Hong, T.P., Tseng, L.H., Chien, B.C.: Learning fuzzy rules from incomplete numerical data by rough sets. In: Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, pp. 1438–1443 (2002)
Google Scholar
Hong, T.P., Tseng, L.H., Wang, S.-L.: Learning rules from incomplete training examples by rough sets. Expert Systems with Applications 22, 285–293 (2002)
Article Google Scholar
Kohavi, R.: Scaling up the accuracy of naïve-bayes classifiers: a decision-tree hybrid. In: Knowledge Discovery & Data Mining, pp. 202–207. AAAI Press/MIT Press, Cambridge/Menlo Park (1996)
Google Scholar
Koninenko, I., Bratko, K., Roskar, E.: Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stenfan Institute, Ljubljana (1984)
Google Scholar
Kryszkiewicz, M.: Rough set approach to incomplete information systems. Information Science 112, 39–49 (1998)
Article MATH MathSciNet Google Scholar
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MATH MathSciNet Google Scholar
Pawlak, Z., Skowron, A.: Rough membership functions. In: Yager, R.R., Fedrizzi, M., Kacprzyk, J. (eds.) Advances in the Dempster-Shafer Theory of Evidence, pp. 251–271 (1994)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Singleton, A.: Genetic Programming with C++. Byte, pp. 171–176 (1994), http://www.byte.com/art/9402/sec10/ar-t1.htm
Slowinski, R., Stefanowski, J.: Handling various types of uncertainty in the rough set approach. In: Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, pp. 366–376 (1993)
Google Scholar
Stefanowski, J., Tsoukias, A.: On the extension of rough sets under incomplete information. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS, vol. 1711, pp. 73–82. Springer, Heidelberg (1999)
Chapter Google Scholar
Witten, H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National University of Tainan, 33, Sec. 2, Su-Lin St., Tainan, 70005, Taiwan, R.O.C.
Been-Chian Chien
Department of Information Engineering, I-Shou University, Kaohsiung, 840, Taiwan, R.O.C.
Cheng-Feng Lu
Department of Information Management, Ming Hsin University of Science and Technology, 1 Hsin-Hsing Road, Hsin-Fong, Hsin-Chu, 304, Taiwan, R.O.C.
Steen J. Hsu

Authors

Been-Chian Chien
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Feng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Steen J. Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas State University-San Marcos, Nueces 247, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali
ESIA Laboratoire d’Informatique, Sytèmes, Traitement de l’Information et de la Connaissance, Université de Savoie, B.P. 806, F-74016, ANNECY Cedex, France
Richard Dapoigny

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chien, BC., Lu, CF., Hsu, S.J. (2006). Handling Incomplete Categorical Data for Supervised Learning. In: Ali, M., Dapoigny, R. (eds) Advances in Applied Artificial Intelligence. IEA/AIE 2006. Lecture Notes in Computer Science(), vol 4031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11779568_139

Download citation

DOI: https://doi.org/10.1007/11779568_139
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35453-6
Online ISBN: 978-3-540-35454-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Handling Incomplete Categorical Data for Supervised Learning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rough Set Analysis of Classification Data with Missing Values

Analysis of Missing Data Using Matrix-Characterized Approximations

Mining Data with Many Missing Attribute Values Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Handling Incomplete Categorical Data for Supervised Learning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rough Set Analysis of Classification Data with Missing Values

Analysis of Missing Data Using Matrix-Characterized Approximations

Mining Data with Many Missing Attribute Values Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation