Skip to main content

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data

  • Conference paper
New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (RSFDGrC 1999)

Abstract

In real-life data, in general, many attribute values are missing. Therefore, rule induction requires preprocessing, where missing attribute values are replaced by appropriate values. The rule induction method used in our research is based on rough set theory.

In this paper we present our results on a new approach to missing attribute values called a closest fit. The main idea of the closest fit is based on searching through the set of all cases, considered as vectors of attribute values, for a case that is the most similar to the given case with missing attribute values. There are two possible ways to look for the closest case: we may restrict our attention to the given concept or to the set of all cases. These methods are compared with a special case of the closest fit principle: replacing missing attribute values by the most common value from the concept. All algorithms were implemented in system OOMIS. Our experiments were performed on preterm birth data sets collected at the Duke University Medical Center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bairagi, R., Suchindran, C.M.: An estimator of the cutoff point maximizing sum of sensitivity and specificity. Sankhya, Series B, Indian Journal of Statistics 51, 263–269 (1989)

    MathSciNet  Google Scholar 

  2. Booker, L.B., Goldberg, D.E., Holland, J.F.: Classifier systems and genetic algorithms. In: Carbonell, J.G. (ed.) Machine Learning. Paradigms and Methods, pp. 235–282. The MIT Press, Cambridge (1990)

    Google Scholar 

  3. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS (LNAI), vol. 542, pp. 368–377. Springer, Heidelberg (1991)

    Google Scholar 

  4. Grzymala-Busse, J.W.: LERS—A system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)

    Google Scholar 

  5. Grzymala-Busse, J.W., Goodwin, L.K.: Predicting preterm birth risk using machine learning from data with missing values. Bull. of Internat. Rough Set Society 1, 17–21 (1997)

    Google Scholar 

  6. Grzymala-Busse, J.W.: LERS—A knowledge discovery system. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2, Applications, Case Studies and Software Systems, pp. 562–565. Physica-Verlag, Hidleberg (1998)

    Google Scholar 

  7. Grzymala-Busse, J.W., Wang, A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC 1997) at the Third Joint Conference on Information Sciences (JCIS 1997), Research Triangle Park, NC, March 2–5, pp. 69–72 (1997)

    Google Scholar 

  8. Grzymala-Busse, J.W., Zou, X.: Classification strategies using certain and possible rules. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 37–44. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  9. Grzymala-Busse, J.W., Goodwin, L.K., Zhang, X.: Increasing sensitivity of preterm birth by changing rule strengths. In: Submitted for the 8th Workshop on Intelligent Information Systems (IIS 1999), Ustronie, Poland, June 14–18 (1999)

    Google Scholar 

  10. Holland, J.H., Holyoak, K.J., Nisbett, R.E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Cambridge (1986)

    Google Scholar 

  11. Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The AQ15 inductive learning system: An overview and experiments. Department of Computer Science, University of Illinois, Rep. UIUCDCD-R-86-1260 (1986)

    Google Scholar 

  12. Pawlak, Z.: Rough sets. International Journal Computer and Information Sciences 11, 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  13. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  14. Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Data Mining and Knowledge Discovery, pp. 500–529. Physica-Verlag, Hidleberg (1998)

    Google Scholar 

  15. Swets, J.A., Pickett, R.M.: Evaluation of Diagnostic Systems. Methods from Signal Detection Theory. Academic Press, London (1982)

    Google Scholar 

  16. Ziarko, W.: Systems: DataQuest, DataLogic and KDDR. In: Proc. of the Fourth Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery RSFD 1996, Tokyo, Japan, November 6–8, pp. 441–442 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grzymała-Busse, J.W., Grzymała-Busse, W.J., Goodwin, L.K. (1999). A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data. In: Zhong, N., Skowron, A., Ohsuga, S. (eds) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. RSFDGrC 1999. Lecture Notes in Computer Science(), vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-48061-7_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66645-5

  • Online ISBN: 978-3-540-48061-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics