skip to main content
10.1145/2339530.2339545acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Linear space direct pattern sampling using coupling from the past

Published: 12 August 2012 Publication History

Abstract

This paper shows how coupling from the past (CFTP) can be used to avoid time and memory bottlenecks in direct local pattern sampling procedures. Such procedures draw controlled amounts of suitably biased samples directly from the pattern space of a given dataset in polynomial time. Previous direct pattern sampling methods can produce patterns in rapid succession after some initial preprocessing phase. This preprocessing phase, however, turns out to be prohibitive in terms of time and memory for many datasets. We show how CFTP can be used to avoid any super-linear preprocessing and memory requirements. This allows to simulate more complex distributions, which previously were intractable. We show for a large number of public real-world datasets that these new algorithms are fast to execute and their pattern collections outperform previous approaches both in unsupervised as well as supervised contexts.

Supplementary Material

JPG File (307_m_talk_3.jpg)
MP4 File (307_m_talk_3.mp4)

References

[1]
M. Al Hasan and M. J. Zaki. Output space sampling for graph patterns. PVLDB, 2(1):730--741, 2009.
[2]
R. Bayardo, B. Goethals, and M. J. Zaki, editors. IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004, volume 126 of CEUR Workshop Proceedings. CEUR-WS.org, 2004.
[3]
M. Boley, T. Gärtner, and H. Grosskreutz. Formal concept sampling for counting and threshold-free local pattern mining. In SDM, pages 177--188, 2010.
[4]
M. Boley, C. Lucchese, D. Paurat, and T. Gärtner. Direct local pattern sampling by efficient two-step random procedures. In KDD, pages 582--590, 2011.
[5]
V. Chaoji, M. A. Hasan, S. Salem, J. Besson, and M. J. Zaki. Origami: A novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. and Data Min., 1(2):67--84, 2008.
[6]
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716--725, 2007.
[7]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43--52. ACM, 1999.
[8]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[9]
F. Geerts, B. Goethals, and T. Mielikäinen. Tiling databases. In DS, pages 278--289. Springer, 2004.
[10]
H. Grosskreutz, S. Rüping, and S. Wrobel. Tight optimistic estimates for fast subgroup discovery. In ECML/PKDD, Part I, pages 440--456, 2008.
[11]
D. J. Hand. Pattern detection and discovery. In ESF Exploratory Workshop on Pattern Detection and Discovery, pages 1--12. Springer, 2002.
[12]
W. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97--109, 1970.
[13]
M. Huber. Perfect sampling using bounding chains. Annals of App. Prob., 14(2):734--753, 2004.
[14]
A. J. Knobbe, B. Crémilleux, J. Fürnkranz, and M. Scholz. From local patterns to global models: the lego approach to data mining. In From Local Patterns to Global Models: Proceedings of the ECML/PKDD 2008 Workshop, 2008.
[15]
A. Mitchell-Jones. The Atlas of European Mammals. Poyser Natural History. T & AD Poyser.
[16]
S. Morishita and J. Sese. Traversing itemset lattice with statistical metric pruning. In PODS, pages 226--236, 2000.
[17]
A. Pietracaprina and F. Vandin. Efficient incremental mining of top-k frequent closed itemsets. In DS, pages 275--280, 2007.
[18]
J. G. Propp and D. B. Wilson. Exact sampling with coupled markov chains and applications to statistical mechanics. Rand. Struct. Alg., 9(1-2):223--252, 1996.
[19]
J. Vreeken, M. van Leeuwen, and A. Siebes. Krimp: mining itemsets that compress. Data Min. Knowl. Discov., 23(1):169--214, 2011.

Cited By

View all
  • (2024)Advances in materials informatics: a reviewJournal of Materials Science10.1007/s10853-024-09379-w59:7(2602-2643)Online publication date: 8-Feb-2024
  • (2024)Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxationConstraints10.1007/s10601-024-09373-829:1-2(80-111)Online publication date: 1-Jun-2024
  • (2023)Concise and interpretable multi-label rule setsKnowledge and Information Systems10.1007/s10115-023-01930-665:12(5657-5694)Online publication date: 28-Jul-2023
  • Show More Cited By

Index Terms

  1. Linear space direct pattern sampling using coupling from the past

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cftp
    2. frequent sets
    3. local patterns
    4. sampling

    Qualifiers

    • Research-article

    Conference

    KDD '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Advances in materials informatics: a reviewJournal of Materials Science10.1007/s10853-024-09379-w59:7(2602-2643)Online publication date: 8-Feb-2024
    • (2024)Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxationConstraints10.1007/s10601-024-09373-829:1-2(80-111)Online publication date: 1-Jun-2024
    • (2023)Concise and interpretable multi-label rule setsKnowledge and Information Systems10.1007/s10115-023-01930-665:12(5657-5694)Online publication date: 28-Jul-2023
    • (2022)Concise and interpretable multi-label rule sets2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00017(71-80)Online publication date: Nov-2022
    • (2022)Generic Itemset Mining Based on Reinforcement LearningIEEE Access10.1109/ACCESS.2022.314180610(5824-5841)Online publication date: 2022
    • (2022)High Average-Utility Itemset Sampling Under Length ConstraintsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-05936-0_11(134-148)Online publication date: 16-May-2022
    • (2021)Learning interpretable decision rule setsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542397(27890-27902)Online publication date: 6-Dec-2021
    • (2020)Gibbs Sampling Subjectively Interesting TilesAdvances in Intelligent Data Analysis XVIII10.1007/978-3-030-44584-3_7(80-92)Online publication date: 22-Apr-2020
    • (2019)Accelerating Itemset Sampling using Satisfiability Constraints on FPGA2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714932(1046-1051)Online publication date: Mar-2019
    • (2019)Rank correlated subgroup discoveryJournal of Intelligent Information Systems10.1007/s10844-019-00555-y53:2(305-328)Online publication date: 1-Oct-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media