skip to main content
10.1145/2396761.2396777acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Interactive pattern mining on hidden data: a sampling-based solution

Published: 29 October 2012 Publication History

Abstract

Mining frequent patterns from a hidden dataset is an important task with 43 various real-life applications. In this research, we propose a solution to this problem that is based on Markov Chain Monte Carlo (MCMC) sampling of frequent patterns. Instead of returning all the frequent patterns, the proposed paradigm returns a small set of randomly selected patterns so that the clandestinity of the dataset can be maintained. Our solution also allows interactive sampling, so that the sampled patterns can fulfill the user's requirement effectively. We show experimental results from several real life datasets to validate the capability and usefulness of our solution; in particular, we show examples that by using our proposed solution, an eCommerce marketplace can allow pattern mining on user session data without disclosing the data to the public; such a mining paradigm helps the sellers of the marketplace, which eventually boost the marketplace's own revenue.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. of VLDB, pages 487--499, 1994.
[2]
I. Benjamini, G. Kozma, and W. N. The mixing time of the giant component of a random graph. 2006.
[3]
T. D. Bie. Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery.
[4]
M. Boley and H. Grosskreutz. Approximating the number of frequent sets in dense data. Knowledge and Information Systems, 21:65--89, 2009.
[5]
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Exante: Anticipated data reduction in constrained pattern mining. In Proc. of the 4th PKDD, pages 59--70, 2003.
[6]
S. Bringmann, A. Zimmermann, L. Raedt, and S. Nijssen. Don't be afraid of simpler pattern. In PKDD, pages 55--66, 2004.
[7]
C. Bucila, J. Gehrke, and D. K. and W. White. Dualminer: A dual-pruning algorithm for itemsets withconstraints. Data Mining and Knowledge Discovery, 2003.
[8]
T. K. Chia, K. C. Sim, H. Li, and H. T. Ng. A lattice-based approach to query-by-example spoken document retrieval. In Proc. of the 31st ACM SIGIR conference on Research and development in information retrieval, pages 363--370, 2008.
[9]
R. K. Chung. Spectral Graph Theory. Americal Mathematical Society, 1997.
[10]
A. Dasgupta, X. Jin, B. Jewell, N. Zhang, and G. Das. Unbiased estimation of size and other aggregates over hidden web databases. In Proceedings of the 2010 international conference on Management of data, pages 855--866, 2010.
[11]
A. Dasgupta, N. Zhang, and G. Das. Leveraging count information in sampling hidden databases. In ICDE '09, pages 329--340, 2009.
[12]
A. Dasgupta, N. Zhang, and G. Das. Turbo-charging hidden database samplers with overflowing queries and skew reduction. In Proceedings of the 13th International Conference on Extending Database Technology, pages 51--62, 2010.
[13]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proc. of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202--210, 2003.
[14]
B. Goethals, S. Moens, and J. Vreeken. Mime: A framework for interactive visual pattern mining. In ECML, pages 757--760, 2011.
[15]
O. Goldreich. Foundations of Cryptography Volume II Basic Applications. Cambridge University Press, 2004.
[16]
J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8, Jan. 2004.
[17]
M. Hasan and M. Zaki. Uniform sampling of k maximal patterns. In SIAM Data Mining, 2009.
[18]
M. A. Hasan, N. Parikh, G. Singh, and N. Sundaresan. Query suggestion for e-commerce sites. In Proc. of the fourth ACM international conference on Web search and data mining, WSDM '11, pages 765--774, 2011.
[19]
M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns. In Proc. International Conference on Very Large Data Bases (VLDB), pages 730--741, 2009.
[20]
S. Jagabathula, N. Mishra, and S. Gollapudi. Shopping for products you don't know you need. In Proc. of the fourth ACM international conference on Web search and data mining, pages 705--714, 2011.
[21]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In In Proc. of ICDM'01.
[22]
M. Mampaey, N. Tatti, and J. Vreeken. Tell me what i need to know: succinctly summarizing data with itemsets. In Proc. of the 17th ACM SIGKDD.
[23]
A. Marian, N. Bruno, and L. Gravano. Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst., 29, 2004.
[24]
N. Mishra, R. Saha Roy, N. Ganguly, S. Laxman, and M. Choudhury. Unsupervised query segmentation using only query logs. In Proc. of the 20th international conference companion on WWW'11, pages 91--92.
[25]
S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In Proc. of the 27th VLDB, pages 129--138, 2001.
[26]
P. N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc.of SIGKDD, pages 32--41, 2002.
[27]
Y. Wang and X. Wu. Approximate inverse frequent itemset mining: Privacy, complexity, and approximation. In Proc. of the 5th ICDM, 2005.
[28]
D. Xin, X. Shen, Q. Mei, and J. Han. Discovering interesting patterns through user's interactive feedback. In Proc. of the 12th ACM SIGKDD, 2006.
[29]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In In proc. of ICDM, 2002.
[30]
M. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42, 2001.
[31]
M. J. Zaki. Efficiently mining frequent trees in a forest: Algorithms and applications. In IEEE Tans. on knowledge and data engineering, 17:8, 2005.

Cited By

View all
  • (2022)Knowledge-Based Interactive Postmining of User-Preferred Co-Location Patterns Using OntologiesIEEE Transactions on Cybernetics10.1109/TCYB.2021.305492352:9(9467-9480)Online publication date: Sep-2022
  • (2022)Pattern on demand in transactional distributed databasesInformation Systems10.1016/j.is.2021.101908104:COnline publication date: 1-Feb-2022
  • (2022)Pattern Mining: Current Challenges and OpportunitiesDatabase Systems for Advanced Applications. DASFAA 2022 International Workshops10.1007/978-3-031-11217-1_3(34-49)Online publication date: 16-Jul-2022
  • Show More Cited By

Index Terms

  1. Interactive pattern mining on hidden data: a sampling-based solution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MCMC sampling
    2. interactive pattern mining

    Qualifiers

    • Research-article

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Knowledge-Based Interactive Postmining of User-Preferred Co-Location Patterns Using OntologiesIEEE Transactions on Cybernetics10.1109/TCYB.2021.305492352:9(9467-9480)Online publication date: Sep-2022
    • (2022)Pattern on demand in transactional distributed databasesInformation Systems10.1016/j.is.2021.101908104:COnline publication date: 1-Feb-2022
    • (2022)Pattern Mining: Current Challenges and OpportunitiesDatabase Systems for Advanced Applications. DASFAA 2022 International Workshops10.1007/978-3-031-11217-1_3(34-49)Online publication date: 16-Jul-2022
    • (2020)User Group Analytics Survey and Research OpportunitiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291365132:10(2040-2059)Online publication date: 1-Oct-2020
    • (2020)Cohort analytics: efficiency and applicabilityThe VLDB Journal10.1007/s00778-020-00625-6Online publication date: 27-Aug-2020
    • (2019)Data Pipelines for User Group AnalyticsProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314028(2048-2053)Online publication date: 25-Jun-2019
    • (2019)Sequential pattern sampling with norm-based utilityKnowledge and Information Systems10.1007/s10115-019-01417-3Online publication date: 26-Oct-2019
    • (2018)Sequential Pattern Sampling with Norm Constraints2018 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2018.00024(89-98)Online publication date: Nov-2018
    • (2018)Cohort Representation and Exploration2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2018.00027(169-178)Online publication date: Oct-2018
    • (2017)GeoGuide: An Interactive Guidance Approach for Spatial Data2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)10.1109/iThings-GreenCom-CPSCom-SmartData.2017.170(1112-1117)Online publication date: Jun-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media