Skip to main content

The Lorenz Dominance Order as a Measure of Interestingness in KDD

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Included in the following conference series:

Abstract

Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B.C. Arnold. Majorization and the Lorenz Order: A Brief Introduction. Springer-Verlag, 1987.

    Google Scholar 

  2. H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.

    Article  Google Scholar 

  3. J. Han, Y. Cai, and N. Cercone. Data-driven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, 5(1):29–40, February 1993.

    Google Scholar 

  4. R.J. Hilderman and H. J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, BC, November 2000.

    Google Scholar 

  5. R.J. Hilderman and H.J. Hamilton. Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, 2002.

    Google Scholar 

  6. R.J. Hilderman, Liangchun Li, and H.J. Hamilton. Visualizing data mining results with domain generalization graphs. In U. Fayyad, G.G. Grinstein, and A. Wierse, editors, Information Visualization in Data Mining and Knowledge Discovery, pages 251–270. Morgan Kaufmann Publishers, 2002.

    Google Scholar 

  7. A.W. Marshall and I. Olkin. Inequalities: Theory of Majorization and its Applications. Academic Press, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hilderman, R.J. (2002). The Lorenz Dominance Order as a Measure of Interestingness in KDD. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics