Article

Summarizing itemset patterns: a profile-based approach

Authors:

Dong XinAuthors Info & Claims

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 314 - 323

https://doi.org/10.1145/1081870.1081907

Published: 21 August 2005 Publication History

Abstract

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process.In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.

References

[1]

F. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In Proc. of 2004 ACM Int. Conf. on Knowledge Discovery in Databases (KDD'04), pages 12--19, 2004.]]

Digital Library

[2]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of 1993 Int. Conf. on Management of Data (SIGMOD'93), pages 207--216, 1993.]]

Digital Library

[3]

R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of 1995 Int. Conf. on Data Engineering (ICDE'95), pages 3--14, 1995.]]

Digital Library

[4]

L. Baker and A. McCallum. Distributional clustering of words for text classification. In Proc. of 1998 ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR'98), pages 96--103, 1998.]]

Digital Library

[5]

R. Bayardo. Efficiently mining long patterns from databases. In Proc. of 1998 Int. Conf. on Management of Data (SIGMOD'98), pages 85--93, 1998.]]

Digital Library

[6]

T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proc. of 2002 European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD'02), pages 74--85, 2002.]]

Digital Library

[7]

L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In Proc. of 1998 Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), pages 30--36, 1998.]]

[8]

M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of 2003 Int. Conf. on Data Mining (ICDM'03), pages 35--42, 2003.]]

Digital Library

[9]

I. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. J. of Machine Learning Research, 3:1265--1287, 2003.]]

Digital Library

[10]

D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In Proc. 1997 ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'97), pages 209--216, 1997.]]

Digital Library

[11]

J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proc. of 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, 2002.]]

Digital Library

[12]

W. Hoeffding. Probability inequalities for sums of bounded random variables. J. American Statistical Associations, 58:13--30, 1963.]]

[13]

L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI94 Workshop on Knowledge Discovery in Databases (KDD94), page 169--180, 1994.]]

[14]

J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. of 8th Ann. Int. Conf. on Research in Computational Molecular Biology (RECOMB'04), pages 308--315, 2004.]]

Digital Library

[15]

M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proc. of 16th Int. Conf. on Software engineering (ICSE'94), pages 191--200, 1994.]]

Digital Library

[16]

R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations, 2:86--98, 2000.]]

Digital Library

[17]

T. Mielikainen and H. Mannila. The pattern ordering problem. In Prof. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'03), pages 327--338, 2003.]]

[18]

E. Omiecinski. Alternative interest measures for mining associations. IEEE Trans. Knowledge and Data Engineering, 15:57--69, 2003.]]

Digital Library

[19]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. of 7th Int. Conf. on Database Theory (ICDT'99), pages 398--416, 1999.]]

Digital Library

[20]

D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. IEEE Trans. Knowledge and Data Engineering, 15:1409--1421, 2003.]]

Digital Library

[21]

J. Pei, G. Dong, W. Zou, and J. Han. On computing condensed frequent pattern bases. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 378-385, 2002.]]

Digital Library

[22]

J. Pei, A. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Proc. of 2001 ACM Int. Workshop Data Mining and Knowledge Discovery (DMKD'01), pages 7--12, 2001.]]

[23]

M. Steinbach, P. Tan, and V. Kumar. Support envelopes: a technique for exploring the structure of association patterns. In Proc. of 2002 ACM Int. Conf. on Knowledge Discovery in Databases (KDD'04), pages 296--305, 2004.]]

Digital Library

[24]

P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of 2002 ACM Int. Conf. on Knowledge Discovery in Databases (KDD'02), pages 32--41, 2002.]]

Digital Library

[25]

K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proc. of 8th Int. Conf. on Information and Knowledge Management (CIKM'99), pages 483--490, 1999.]]

Digital Library

[26]

X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. of 2004 ACM Int. Conf. on Management of Data (SIGMOD'04), pages 335--346, 2004.]]

Digital Library

[27]

C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. of 2001 ACM Int. Conf. on Knowledge Discovery in Databases (KDD'01), pages 194--203, 2001.]]

Digital Library

Cited By

Trasierras ALuna JFournier-Viger PVentura S(2024)Data heterogeneity's impact on the performance of frequent itemset mining algorithmsInformation Sciences: an International Journal10.1016/j.ins.2024.120981678:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120981
Brackenbury WChard KElmore AUr B(2022)Summarizing Sets of Related ML-Driven Recommendations for Improving File Management in Cloud StorageProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545704(1-11)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545704
Galbrun E(2022)The minimum description length principle for pattern mining: a surveyData Mining and Knowledge Discovery10.1007/s10618-022-00846-z36:5(1679-1727)Online publication date: 4-Jul-2022
https://doi.org/10.1007/s10618-022-00846-z
Show More Cited By

Index Terms

Summarizing itemset patterns: a profile-based approach
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Discovering Periodic-Frequent Patterns in Transactional Databases
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number of patterns, efficient discovery of user-interest-based frequent pattern set becomes the first priority for a mining algorithm. ...
Semantic annotation of frequent patterns

Using frequent patterns to analyze data has been one of the fundamental approaches in many data mining applications. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent ...
High utility pattern mining using the maximal itemset property and lexicographic tree structures

The problem of high utility mining is discovering all of the high utility itemsets in a transactional database. Most algorithms find high utility itemsets in two steps. The first step identifies all of the potential itemsets. The second step then ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

August 2005

844 pages

ISBN:159593135X

DOI:10.1145/1081870

General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD05

Sponsor:

KDD05: The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2005

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

140
Total Citations
View Citations
1,148
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Trasierras ALuna JFournier-Viger PVentura S(2024)Data heterogeneity's impact on the performance of frequent itemset mining algorithmsInformation Sciences: an International Journal10.1016/j.ins.2024.120981678:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120981
Brackenbury WChard KElmore AUr B(2022)Summarizing Sets of Related ML-Driven Recommendations for Improving File Management in Cloud StorageProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545704(1-11)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545704
Galbrun E(2022)The minimum description length principle for pattern mining: a surveyData Mining and Knowledge Discovery10.1007/s10618-022-00846-z36:5(1679-1727)Online publication date: 4-Jul-2022
https://doi.org/10.1007/s10618-022-00846-z
Estes ABall MLovell D(2021)Data Exploration by Representative Region SelectionMathematics of Operations Research10.1287/moor.2020.111546:3(970-1007)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1287/moor.2020.1115
Veroneze RVon Zuben F(2021)Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2020.104147100(104147)Online publication date: Apr-2021
https://doi.org/10.1016/j.engappai.2020.104147
Quezada-Sarmiento PEnciso-Quispe LJumbo-Flores LHernandez W(2020)Knowledge Representation Model for Bodies of Knowledge Based on Design Patterns and Hierarchical GraphsComputing in Science & Engineering10.1109/MCSE.2018.287537022:2(55-63)Online publication date: Mar-2020
https://doi.org/10.1109/MCSE.2018.2875370
Amiri MHasanipanah MBakhshandeh Amnieh H(2020)Predicting ground vibration induced by rock blasting using a novel hybrid of neural network and itemset miningNeural Computing and Applications10.1007/s00521-020-04822-wOnline publication date: 9-Mar-2020
https://doi.org/10.1007/s00521-020-04822-w
Xu XChikersal PDoryab AVillalba DDutcher JTumminia MAlthoff TCohen SCreswell KCreswell JMankoff JDey A(2019)Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College StudentsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33512743:3(1-33)Online publication date: 9-Sep-2019
https://dl.acm.org/doi/10.1145/3351274
Wang TDesai BDesai BAnagnostopoulos DManolopoulos YNikolaidou M(2019)On the appropriate pattern frequentness measure and pattern generation modeProceedings of the 23rd International Database Applications & Engineering Symposium10.1145/3331076.3331125(1-15)Online publication date: 10-Jun-2019
https://dl.acm.org/doi/10.1145/3331076.3331125
Khomprasert ARakthamanon TWaiyamai K(2019)Entropy-based Attribute Clustering2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON)10.1109/ECTI-NCON.2019.8692247(230-233)Online publication date: Jan-2019
https://doi.org/10.1109/ECTI-NCON.2019.8692247
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten