skip to main content
10.1145/2487575.2487618acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Summarizing probabilistic frequent patterns: a fast approach

Published: 11 August 2013 Publication History

Abstract

Mining probabilistic frequent patterns from uncertain data has received a great deal of attention in recent years due to the wide applications. However, probabilistic frequent pattern mining suffers from the problem that an exponential number of result patterns are generated, which seriously hinders further evaluation and analysis. In this paper, we focus on the problem of mining probabilistic representative frequent patterns (P-RFP), which is the minimal set of patterns with adequately high probability to represent all frequent patterns. Observing the bottleneck in checking whether a pattern can probabilistically represent another, which involves the computation of a joint probability of the supports of two patterns, we introduce a novel approximation of the joint probability with both theoretical and empirical proofs. Based on the approximation, we propose an Approximate P-RFP Mining (APM) algorithm, which effectively and efficiently compresses the set of probabilistic frequent patterns. To our knowledge, this is the first attempt to analyze the relationship between two probabilistic frequent patterns through an approximate approach. Our experiments on both synthetic and real-world datasets demonstrate that the APM algorithm accelerates P-RFP mining dramatically, orders of magnitudes faster than an exact solution. Moreover, the error rate of APM is guaranteed to be very small when the database contains hundreds transactions, which further affirms APM is a practical solution for summarizing probabilistic frequent patterns.

References

[1]
C. Aggarwal, Y. Li, and J. Wang. Frequent pattern mining with uncertain data. In SIGKDD, pages 29--38, 2009.
[2]
C. Aggarwal and P. Yu. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009.
[3]
R. J. Bayardo Jr. Efficiently mining long patterns from databases. In SIGMOD, pages 85--93, 1998.
[4]
T. Bernecker, H. Kriegel, M. Renz, F. Verhein, and A. Zuefle. Probabilistic frequent itemset mining in uncertain databases. In SIGKDD, pages 119--128, 2009.
[5]
T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDE, pages 749--754, 2010.
[6]
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In PKDD, pages 74--85, 2002.
[7]
C. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007.
[8]
V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of operations research, 4(3):233--235, 1979.
[9]
D. Cox. The continuity correction. Biometrika, 57(1):217--219, 1970.
[10]
H. Cramér and H. Wold. Some theorems on distribution functions. The Journal of the London Mathematical Society, 11:290--295, 1936.
[11]
R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In SIGKDD, pages 399--407, 2008.
[12]
C. Leung, M. Mateo, and D. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In AKDDM, pages 653--661, 2008.
[13]
C. Liu, L. Chen, and Z. C. Mining probabilistic representative frequent patterns from uncertain data. In SDM, pages 73--81, 2013.
[14]
G. Liu, H. Zhang, and L. Wong. Finding minimum representative pattern sets. In SIGKDD, pages 51--59, 2012.
[15]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999.
[16]
E. Peterson and P. Tang. Fast approximation of probabilistic frequent closed itemsets. In ASRC, pages 214--219, 2012.
[17]
A. Poernomo and V. Gopalkrishnan. Cp-summary: a concise representation for browsing frequent itemsets. In SIGKDD, pages 687--696, 2009.
[18]
J. Shao. Mathematical Statistics. Springer, Berlin, 2009.
[19]
L. Sun, R. Cheng, D. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In SIGKDD, pages 273--282, 2010.
[20]
P. Tang and E. Peterson. Mining probabilistic frequent closed itemsets in uncertain databases. In ASRC, pages 86--91, 2011.
[21]
Y. Tong, L. Chen, Y. Cheng, and P. Yu. Mining frequent itemsets over uncertain databases. VLDB Endowment, 5(11):1650--1661, 2012.
[22]
Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In ICDE, pages 270--281, 2012.
[23]
L. Wang, R. Cheng, S. Lee, and D. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM, pages 429--438, 2010.
[24]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005.
[25]
X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In SIGKDD, pages 314--323, 2005.

Cited By

View all
  • (2022)Efficient Uncertain Sequence Pattern Mining Based on Hadoop PlatformJournal of Circuits, Systems and Computers10.1142/S021812662250261931:15Online publication date: 12-Jul-2022
  • (2022)Mining of High-Utility Sequence Patterns in Large-Scale Uncertain Databases2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927807(1-7)Online publication date: 12-Sep-2022
  • (2022)Mining High Utility-probability Sequential Patterns in Bigdata EnvironmentsGenetic and Evolutionary Computing10.1007/978-981-16-8430-2_46(505-514)Online publication date: 4-Jan-2022
  • Show More Cited By

Index Terms

  1. Summarizing probabilistic frequent patterns: a fast approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. pattern summarization
    2. uncertain data

    Qualifiers

    • Research-article

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Efficient Uncertain Sequence Pattern Mining Based on Hadoop PlatformJournal of Circuits, Systems and Computers10.1142/S021812662250261931:15Online publication date: 12-Jul-2022
    • (2022)Mining of High-Utility Sequence Patterns in Large-Scale Uncertain Databases2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927807(1-7)Online publication date: 12-Sep-2022
    • (2022)Mining High Utility-probability Sequential Patterns in Bigdata EnvironmentsGenetic and Evolutionary Computing10.1007/978-981-16-8430-2_46(505-514)Online publication date: 4-Jan-2022
    • (2021)Mining of High-Utility Patterns in Big IoT-based DatabasesMobile Networks and Applications10.1007/s11036-020-01701-526:1(216-233)Online publication date: 4-Jan-2021
    • (2020)Efficient weighted probabilistic frequent itemset mining in uncertain databasesExpert Systems10.1111/exsy.1255138:5Online publication date: 7-Apr-2020
    • (2018)Analyzing Expected Support-Based Frequent Itemsets over Uncertain Data2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00279(1721-1725)Online publication date: Jun-2018
    • (2017)Discovering Top-k Probabilistic Frequent Itemsets from Uncertain DatabasesProcedia Computer Science10.1016/j.procs.2017.11.482122(1124-1132)Online publication date: 2017
    • (2017)Efficiently mining uncertain high-utility itemsetsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-016-2159-121:11(2801-2820)Online publication date: 1-Jun-2017
    • (2016)Summarizing uncertain transaction databases by Probabilistic Tiles2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727771(4375-4382)Online publication date: Jul-2016
    • (2016)Efficient algorithms for mining high-utility itemsets in uncertain databasesKnowledge-Based Systems10.1016/j.knosys.2015.12.01996:C(171-187)Online publication date: 15-Mar-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media