Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases

Bruha, Ivan

doi:10.1023/A:1025880714026

Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases

Published: January 2004

Volume 22, pages 71–87, (2004)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Ivan Bruha¹

68 Accesses
14 Citations
Explore all metrics

Abstract

Efficient robust data mining algorithms should comprise some routines for processing unknown (missing) attribute values when acquiring knowledge from real-world databases because these data usually contain a certain percentage of missing values. The paper Bruha and Franek (1996) figures out that each dataset has more or less its own ‘favourite’ routine for processing unknown attribute values. It evidently depends on the magnitude of noise and source of unknownness in each dataset. One possibility how to choose an efficient routine for processing unknown attribute values for a given database is exhibited in this paper. The covering machine learning algorithm CN4, a large extension of the well-known CN2 algorithm, is used here as an inductive vehicle.

Each of the six routines for unknown attribute value processing (which are available in CN4) is used independently in order to process a given database. Afterwards, a meta-learner is used to derive a meta-classifier that makes up the overall (final) decision about the class of input unseen objects. The entire system is called a meta-combiner.

The meta-database that is formed for the meta-learner could be inconsistent which could decrease the performance of the entire meta-classifier. Therefore, the existing meta-system (Meta-CN4) has been enhanced by a ‘purification’ procedure that appropriately solves up the conflict of inconsistent meta-data.

The paper first surveys the CN4 algorithms including its six routines for unknown attribute value processing. Afterwards, it introduces the methodology of the meta-learner including its enhancement that solves inconsistent meta-databases. Finally, the results of experiments with various percentages of unknown attribute values on real-world data are presented and performances of the meta-classifier and the six base classifiers are then compared. The paper also explains the difference between the meta-combiner (meta-learner) described here and the cross-validation procedure used for obtaining the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

References

Berka, P. and Bruha, I. (1995).Various Discretizing Procedures of Numerical Attributes: Empirical Comparisons. In 8th European Conference Machine Learning, Workshop Statistics, Machine Learning, and Knowledge Discovery in Databases (pp. 136-141), Heraklion: Crete.
Google Scholar
Boswell, R. (1990). Manual for CN2, Version 4.1. Turing Institute, Techn. Rept. P-2145/Rab/4/1.3.
Bruha, I. (2002). Unknown Attribute Values Processing by Meta-Learner. International Symposium on Methodologies for Intelligent Systems (ISMIS-2002), Lyon, France.
Bruha, I. and Franek, F. (1996). Comparison of Various Routines for Unknown Attribute Value Processing: Covering Paradigm. International Journal Pattern Recognition and Artificial Intelligence, 10(8), 939-955.
Google Scholar
Bruha, I. and Kockova, S. (1994). A Support for Decision Making: Cost-Sensitive Learning System. Artificial Intelligence in Medicine, 6, 67-82.
Google Scholar
Catlett, J. (1991). On Changing Continuous Attributes into Ordered Discrete Attributes. In Y. Kodratoff (Ed.), Machine Learning (EWSL-91) (pp. 164-178). Berlin Heidelberg: Springer-Verlag.
Google Scholar
Cestnik, B. (1990). Estimating Probabilities: A Crucial Task in Machine Learning. ECAI-90.
Cestnik, B., Kononenko, I., and Bratko, I. (1987). ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users. In I. Bratko and N. Lavrac (Eds.), Progress in Machine Learning. Proc. EWSL'87. Sigma Press.
Clark, P. and Boswell, R. (1991). Rule Induction with CN2: Some Recent Improvements. EWSL'91 (pp. 151-163). Porto.
Clark, P. and Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning, 3, 261-283.
Google Scholar
Fan, D.W., Chan, P.K., and Stolfo, S.J. (1996). A Comparative Evaluation of Combiner and Stacked Generalization. Workshop Integrating Multiple Learning Models, Portland: AAAI.
Google Scholar
Kononenko, I. (1992). Combining Decisions of Multiple Rules. In B. du Boulay and V. Sgurev (Eds.), Artificial Intelligence V: Methodology, Systems, Applications (pp. 87-96). Elsevier Science Publ.
Kononenko, I. and Bratko, I. (1991). Information-Based Evaluation Criterion for Classifier's Performance. Machine Learning, 6, 67-80.
Google Scholar
Lee, C. and Shin, D. (1994). A Context-Sensitive Discretization of Numeric Attributes for Classification Learning. In A. Cohn (Ed.), ECAI-94 (pp. 428-432). Amsterdam: John Wiley.
Google Scholar
Murphy, P.M. and Aha, D.W. UCI Repository of Machine Learning Databases. Irvine, University of California, Dept. of Information and Computer Science.
Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.
Google Scholar
Quinlan, J.R. (1989). Unknown Attribute Values in ID3. International Conference ML, 164-8.
Quinlan, J.R. (1992). C4.5 Programs for Machine Learning. Morgan Kaufmann.

Download references

Author information

Authors and Affiliations

Department of Computing & Software, McMaster University, Hamilton, ON, Canada, L8S4K1
Ivan Bruha

Authors

Ivan Bruha
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bruha, I. Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases. Journal of Intelligent Information Systems 22, 71–87 (2004). https://doi.org/10.1023/A:1025880714026

Download citation

Issue Date: January 2004
DOI: https://doi.org/10.1023/A:1025880714026

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation