skip to main content
10.1145/1081870.1081887acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast discovery of unexpected patterns in data, relative to a Bayesian network

Published: 21 August 2005 Publication History

Abstract

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.

References

[1]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, 1996.]]
[2]
R. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1999.]]
[3]
H. Dodge and H. Romig. A method of sampling inspection. The Bell System Technical Journal, 8:613--631, 1929.]]
[4]
C. Domingo, R. Gavalda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002.]]
[5]
U. Fayyad, G. Piatetski-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In KDD-96, 1996.]]
[6]
W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.]]
[7]
R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(1--2), July 1996.]]
[8]
G. Hulten and P. Domingos. Mining complex models from aribtrarily large datasets in constant time. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2002.]]
[9]
S. Jaroszewicz and D. Simovici. A general measure of rule interestingness. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery and Data Mining, 2001.]]
[10]
S. Jaroszewicz and D. Simovici. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2004.]]
[11]
F. Jensen. Bayesian Networks and Decision Graphs. Springer Verlag, 2001.]]
[12]
W. Klösgen. Assistant for knowledge discovery in data. In P. Hoschka, editor, Assisting Computer: A New Generation of Support Systems, 1995.]]
[13]
R. Kruse. Knowledge-based operations on graphical models. In Proceedings of the Dagstuhl Seminar on Probabilistic, Logical, and Relational Learning, 2005. In print.]]
[14]
O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximating. In Advances in Neural Information Processing Systems, pages 59--66, 1994.]]
[15]
B. Padmanabhan and A. Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27(3):303--318, 1999.]]
[16]
B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Proceedings of the Sixth SIGKDD Conference on Knowledge Discovery and Data Mining, 2000.]]
[17]
P. Myllymäki, T. Silander, H. Tirri, and P. Uronen. B-course: A web-based tool for bayesian and causal data analysis. International Journal on Artificial Intelligence Tools, 11(3):369--387, 2002.]]
[18]
T. Scheffer. Finding association rules that trade support optimally against confidence. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2001.]]
[19]
T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3:833--862, 2002.]]
[20]
A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1995.]]
[21]
H. Toivonen. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Databases, 1996.]]

Cited By

View all
  • (2021)PGMJoins: Random Join Sampling with Graphical ModelsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457302(1610-1622)Online publication date: 9-Jun-2021
  • (2020)Clustering association rules to build beliefs and discover unexpected patternsApplied Intelligence10.1007/s10489-020-01651-1Online publication date: 20-Feb-2020
  • (2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
  • Show More Cited By

Index Terms

  1. Fast discovery of unexpected patterns in data, relative to a Bayesian network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bayesian networks
    2. association rules
    3. sampling

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)PGMJoins: Random Join Sampling with Graphical ModelsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457302(1610-1622)Online publication date: 9-Jun-2021
    • (2020)Clustering association rules to build beliefs and discover unexpected patternsApplied Intelligence10.1007/s10489-020-01651-1Online publication date: 20-Feb-2020
    • (2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
    • (2018)Markov-network based latent link analysis for community detection in social behavioral interactionsApplied Intelligence10.1007/s10489-017-1040-y48:8(2081-2096)Online publication date: 1-Aug-2018
    • (2016)Constrained pattern mining in the new eraKnowledge and Information Systems10.1007/s10115-015-0860-547:3(489-516)Online publication date: 1-Jun-2016
    • (2012)An Efficient Rigorous Approach for Identifying Statistically Significant Frequent ItemsetsJournal of the ACM10.1145/2220357.222035959:3(1-22)Online publication date: 1-Jun-2012
    • (2012)Network Anomaly Detection Using Random Forests and Entropy of Traffic FeaturesProceedings of the 2012 Fourth International Conference on Multimedia Information Networking and Security10.1109/MINES.2012.146(926-929)Online publication date: 2-Nov-2012
    • (2012)Knowledge discovery interestingness measures based on unexpectednessWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.10632:5(386-399)Online publication date: 1-Sep-2012
    • (2011)WebUser: mining unexpected web usageInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0382766:1(90-111)Online publication date: 1-Jan-2011
    • (2010)Using background knowledge to rank itemsetsData Mining and Knowledge Discovery10.1007/s10618-010-0188-421:2(293-309)Online publication date: 1-Sep-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media