Article

Fast discovery of unexpected patterns in data, relative to a Bayesian network

Authors:

Szymon Jaroszewicz,

Tobias SchefferAuthors Info & Claims

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 118 - 127

https://doi.org/10.1145/1081870.1081887

Published: 21 August 2005 Publication History

Abstract

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.

References

[1]

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, 1996.]]

Digital Library

[2]

R. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1999.]]

Digital Library

[3]

H. Dodge and H. Romig. A method of sampling inspection. The Bell System Technical Journal, 8:613--631, 1929.]]

[4]

C. Domingo, R. Gavalda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002.]]

Digital Library

[5]

U. Fayyad, G. Piatetski-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In KDD-96, 1996.]]

[6]

W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.]]

[7]

R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(1--2), July 1996.]]

Digital Library

[8]

G. Hulten and P. Domingos. Mining complex models from aribtrarily large datasets in constant time. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2002.]]

Digital Library

[9]

S. Jaroszewicz and D. Simovici. A general measure of rule interestingness. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery and Data Mining, 2001.]]

Digital Library

[10]

S. Jaroszewicz and D. Simovici. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2004.]]

Digital Library

[11]

F. Jensen. Bayesian Networks and Decision Graphs. Springer Verlag, 2001.]]

Digital Library

[12]

W. Klösgen. Assistant for knowledge discovery in data. In P. Hoschka, editor, Assisting Computer: A New Generation of Support Systems, 1995.]]

[13]

R. Kruse. Knowledge-based operations on graphical models. In Proceedings of the Dagstuhl Seminar on Probabilistic, Logical, and Relational Learning, 2005. In print.]]

[14]

O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximating. In Advances in Neural Information Processing Systems, pages 59--66, 1994.]]

[15]

B. Padmanabhan and A. Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27(3):303--318, 1999.]]

Digital Library

[16]

B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Proceedings of the Sixth SIGKDD Conference on Knowledge Discovery and Data Mining, 2000.]]

Digital Library

[17]

P. Myllymäki, T. Silander, H. Tirri, and P. Uronen. B-course: A web-based tool for bayesian and causal data analysis. International Journal on Artificial Intelligence Tools, 11(3):369--387, 2002.]]

[18]

T. Scheffer. Finding association rules that trade support optimally against confidence. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2001.]]

Digital Library

[19]

T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3:833--862, 2002.]]

Digital Library

[20]

A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1995.]]

[21]

H. Toivonen. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Databases, 1996.]]

Digital Library

Cited By

Shanghooshabad AKurmanji MMa QShekelyan MAlmasi MTriantafillou PLi GLi ZIdreos SSrivastava D(2021)PGMJoins: Random Join Sampling with Graphical ModelsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457302(1610-1622)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457302
Bui-Thi DMeysman PLaukens K(2020)Clustering association rules to build beliefs and discover unexpected patternsApplied Intelligence10.1007/s10489-020-01651-1Online publication date: 20-Feb-2020
https://doi.org/10.1007/s10489-020-01651-1
Henriques RMadeira S(2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s10618-017-0521-2
Show More Cited By

Index Terms

Fast discovery of unexpected patterns in data, relative to a Bayesian network
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

A drawback of traditional data-mining methods is that they do not leverage prior knowledge of users. In prior work, we proposed a method that could discover unexpected patterns in data by using domain knowledge in a systematic manner. In this paper, we ...
Discovery of unexpected patterns in data mining applications
Human disease network guided discovery of interesting itemsets in hospital discharge data
DMMH '11: Proceedings of the 2011 workshop on Data mining for medicine and healthcare

Standard knowledge discovery techniques, such as unsupervised or supervised descriptive rule discovery, have been widely used in medical data mining. Most of the research is focused on developing effective association rule evaluation metrics that would ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

August 2005

844 pages

ISBN:159593135X

DOI:10.1145/1081870

General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD05

Sponsor:

KDD05: The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2005

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
979
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shanghooshabad AKurmanji MMa QShekelyan MAlmasi MTriantafillou PLi GLi ZIdreos SSrivastava D(2021)PGMJoins: Random Join Sampling with Graphical ModelsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457302(1610-1622)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457302
Bui-Thi DMeysman PLaukens K(2020)Clustering association rules to build beliefs and discover unexpected patternsApplied Intelligence10.1007/s10489-020-01651-1Online publication date: 20-Feb-2020
https://doi.org/10.1007/s10489-020-01651-1
Henriques RMadeira S(2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s10618-017-0521-2
Liu WYue KWu HFu XZhang ZHuang W(2018)Markov-network based latent link analysis for community detection in social behavioral interactionsApplied Intelligence10.1007/s10489-017-1040-y48:8(2081-2096)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s10489-017-1040-y
Silva AAntunes C(2016)Constrained pattern mining in the new eraKnowledge and Information Systems10.1007/s10115-015-0860-547:3(489-516)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10115-015-0860-5
Kirsch AMitzenmacher MPietracaprina APucci GUpfal EVandin F(2012)An Efficient Rigorous Approach for Identifying Statistically Significant Frequent ItemsetsJournal of the ACM10.1145/2220357.222035959:3(1-22)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1145/2220357.2220359
Yao DYin MLuo JZhang S(2012)Network Anomaly Detection Using Random Forests and Entropy of Traffic FeaturesProceedings of the 2012 Fourth International Conference on Multimedia Information Networking and Security10.1109/MINES.2012.146(926-929)Online publication date: 2-Nov-2012
https://dl.acm.org/doi/10.1109/MINES.2012.146
Kontonasios KSpyropoulou EDe Bie T(2012)Knowledge discovery interestingness measures based on unexpectednessWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.10632:5(386-399)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1002/widm.1063
Li DLaurent APoncelet P(2011)WebUser: mining unexpected web usageInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0382766:1(90-111)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1504/IJBIDM.2011.038276
Tatti NMampaey M(2010)Using background knowledge to rank itemsetsData Mining and Knowledge Discovery10.1007/s10618-010-0188-421:2(293-309)Online publication date: 1-Sep-2010
https://dl.acm.org/doi/10.1007/s10618-010-0188-4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten