Skip to main content

Some Criterions for Selecting the Best Data Abstractions

  • Chapter
  • First Online:
Progress in Discovery Science

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

Abstract

This paper presents and summarizes some criterions for selecting the best data abstraction for relations in relational databases. The data abstraction can be understood as a grouping of attribute values whose individual aspects are forgotten and are therefore abstracted to some more abstract value together. Consequently, a relation after the abstraction is a more compact one for which data miners will work efficiently. It is however a major problem that, when an important aspect of data values is neglected in the abstraction, then the quality of extracted knowledge becomes worse. So, it is the central issue to present a criterion under which only an adequate data abstraction is selected so as to keep the important information and to reduce the sizes of relations at the same time. From this viewpoint, we present in this paper three criterions and test them for a task of classifying tuples in a relation given several target classes. All the criterions are derived from a notion of similarities among class distributions, and are formalized based on the standard information theory. We also summarize our experimental results for the classification task, and discuss a future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J. and Fu, Y.: Attribute-Oriented Induction in Data Mining. In Advances in Knowledge Discovery and Data Mining (Fayyad, U.N. et.al. eds.), pp.399–421, 1996.

    Google Scholar 

  2. Kudoh, Y. and Haraguchi, M.: An Appropriate Abstration for an Attribute-Oriented Induction Proceeding of The Second International Conference on Discovery Science, LNAI 721, pp.43–55, 1999.

    Google Scholar 

  3. Kudoh, Y. and Haraguchi, M.: Detecting a Compact Decision Tree Based on an Appropriate Abstraction Proc. of 2nd Intl. Conf. on Intelligent Data Engineering and Automated Learning, LNCS-1983, pp.60–70, 2000.

    Google Scholar 

  4. Quinlan, J.R.: C4.5-Programs for Machine Learning, Morgan Kaufmann, 1993.

    Google Scholar 

  5. Shannon, C. E.: A Mathematical Theory of Communication, The Bell system technical journal, vol. 27, pp.379–423 (part I), pp.623–656 (part II), 1948.

    MathSciNet  Google Scholar 

  6. Kudoh, Y., Haraguchi, M. and Okubo, Y.: Data Abstractions for Decision Tree Induction, submitted to an international journal, Jan. 2001.

    Google Scholar 

  7. Murphy, P.M. and Aha, D.W.: UCI Repository of machine learning databases, http://www.ics.uci.edu/ mlearn/MLRepository.html.

  8. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K.: Intorduction to WordNet: An On-line Lexical Database In: International Journal of lexicography 3(4), pp.235–244, 1990.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Haraguchi, M., Kudoh, Y. (2002). Some Criterions for Selecting the Best Data Abstractions. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-45884-0_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43338-5

  • Online ISBN: 978-3-540-45884-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics