Incompleteness in Data Mining

Jagadish⋆, Hosagrahar Visvesvaraya

doi:10.1007/3-540-45357-1_1

Hosagrahar Visvesvaraya Jagadish⋆⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1358 Accesses
1 Citations

Abstract

Database technology, as well as the bulk of data mining technology, is founded upon logic, with absolute notions of truth and falsehood, at least with respect to the data set. Patterns are discovered exhaustively, with carefully engineered algorithms devised to determine all patterns in a data set that belong to a certain class. For large data sets, many such data mining techniques are extremely expensive, leading to considerable research towards solving these problems more cheaply.

We argue that the central goal of data mining is to find SOME interesting patterns, and not necessarily ALL of them. As such, techniques that can find most of the answers cheaply are clearly more valuable than computationally much more expensive techniques that can guarantee completeness. In fact, it is probably the case that patterns that can be found cheaply are indeed the most important ones.

Furthermore, knowledge discovery can be the most effective with the human analyst heavily involved in the endeavor. To engage a human analyst, it is important that data mining techniques be interactive, hopefully delivering (close to) real time responses and feedback. Clearly then, extreme accuracy and completeness (i.e., finding all patterns satisfying some specified criteria) would almost always be a luxury. Instead, incompleteness (i.e., finding only some patterns) and approximation would be essential.

We exemplify this discussion through the notion of fascicles. Often many records in a database share similar values for several attributes. If one is able to identify and group together records that share similar values for some - even if not all - attributes, one can both obtain a more parsimonious representation of the data, and gain useful insight into the data from a mining perspective. Such groupings are called fascicles. We explore the relationship of fascicle-finding to association rule mining, and experimentally demonstrate the benefit of incomplete but inexpensive algorithms. We also present analytical results demonstrating both the limits and the benefits of such incomplete algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

A tutorial on statistically sound pattern discovery

Article Open access 20 December 2018

Data Mining

Sets of Robust Rules, and How to Find Them

Author information

Authors and Affiliations

University of Michigan, Ann Arbor
Hosagrahar Visvesvaraya Jagadish⋆

Authors

Hosagrahar Visvesvaraya Jagadish⋆
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jagadish⋆, H.V. (2001). Incompleteness in Data Mining. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_1

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_1
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics