Skip to main content
Log in

Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Efficient robust data mining algorithms should comprise some routines for processing unknown (missing) attribute values when acquiring knowledge from real-world databases because these data usually contain a certain percentage of missing values. The paper Bruha and Franek (1996) figures out that each dataset has more or less its own ‘favourite’ routine for processing unknown attribute values. It evidently depends on the magnitude of noise and source of unknownness in each dataset. One possibility how to choose an efficient routine for processing unknown attribute values for a given database is exhibited in this paper. The covering machine learning algorithm CN4, a large extension of the well-known CN2 algorithm, is used here as an inductive vehicle.

Each of the six routines for unknown attribute value processing (which are available in CN4) is used independently in order to process a given database. Afterwards, a meta-learner is used to derive a meta-classifier that makes up the overall (final) decision about the class of input unseen objects. The entire system is called a meta-combiner.

The meta-database that is formed for the meta-learner could be inconsistent which could decrease the performance of the entire meta-classifier. Therefore, the existing meta-system (Meta-CN4) has been enhanced by a ‘purification’ procedure that appropriately solves up the conflict of inconsistent meta-data.

The paper first surveys the CN4 algorithms including its six routines for unknown attribute value processing. Afterwards, it introduces the methodology of the meta-learner including its enhancement that solves inconsistent meta-databases. Finally, the results of experiments with various percentages of unknown attribute values on real-world data are presented and performances of the meta-classifier and the six base classifiers are then compared. The paper also explains the difference between the meta-combiner (meta-learner) described here and the cross-validation procedure used for obtaining the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Berka, P. and Bruha, I. (1995).Various Discretizing Procedures of Numerical Attributes: Empirical Comparisons. In 8th European Conference Machine Learning, Workshop Statistics, Machine Learning, and Knowledge Discovery in Databases (pp. 136-141), Heraklion: Crete.

    Google Scholar 

  • Boswell, R. (1990). Manual for CN2, Version 4.1. Turing Institute, Techn. Rept. P-2145/Rab/4/1.3.

  • Bruha, I. (2002). Unknown Attribute Values Processing by Meta-Learner. International Symposium on Methodologies for Intelligent Systems (ISMIS-2002), Lyon, France.

  • Bruha, I. and Franek, F. (1996). Comparison of Various Routines for Unknown Attribute Value Processing: Covering Paradigm. International Journal Pattern Recognition and Artificial Intelligence, 10(8), 939-955.

    Google Scholar 

  • Bruha, I. and Kockova, S. (1994). A Support for Decision Making: Cost-Sensitive Learning System. Artificial Intelligence in Medicine, 6, 67-82.

    Google Scholar 

  • Catlett, J. (1991). On Changing Continuous Attributes into Ordered Discrete Attributes. In Y. Kodratoff (Ed.), Machine Learning (EWSL-91) (pp. 164-178). Berlin Heidelberg: Springer-Verlag.

    Google Scholar 

  • Cestnik, B. (1990). Estimating Probabilities: A Crucial Task in Machine Learning. ECAI-90.

  • Cestnik, B., Kononenko, I., and Bratko, I. (1987). ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users. In I. Bratko and N. Lavrac (Eds.), Progress in Machine Learning. Proc. EWSL'87. Sigma Press.

  • Clark, P. and Boswell, R. (1991). Rule Induction with CN2: Some Recent Improvements. EWSL'91 (pp. 151-163). Porto.

  • Clark, P. and Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning, 3, 261-283.

    Google Scholar 

  • Fan, D.W., Chan, P.K., and Stolfo, S.J. (1996). A Comparative Evaluation of Combiner and Stacked Generalization. Workshop Integrating Multiple Learning Models, Portland: AAAI.

    Google Scholar 

  • Kononenko, I. (1992). Combining Decisions of Multiple Rules. In B. du Boulay and V. Sgurev (Eds.), Artificial Intelligence V: Methodology, Systems, Applications (pp. 87-96). Elsevier Science Publ.

  • Kononenko, I. and Bratko, I. (1991). Information-Based Evaluation Criterion for Classifier's Performance. Machine Learning, 6, 67-80.

    Google Scholar 

  • Lee, C. and Shin, D. (1994). A Context-Sensitive Discretization of Numeric Attributes for Classification Learning. In A. Cohn (Ed.), ECAI-94 (pp. 428-432). Amsterdam: John Wiley.

    Google Scholar 

  • Murphy, P.M. and Aha, D.W. UCI Repository of Machine Learning Databases. Irvine, University of California, Dept. of Information and Computer Science.

  • Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.

    Google Scholar 

  • Quinlan, J.R. (1989). Unknown Attribute Values in ID3. International Conference ML, 164-8.

  • Quinlan, J.R. (1992). C4.5 Programs for Machine Learning. Morgan Kaufmann.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bruha, I. Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases. Journal of Intelligent Information Systems 22, 71–87 (2004). https://doi.org/10.1023/A:1025880714026

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025880714026

Navigation