research-article

Effective and efficient itemset pattern summarization: regression-based approaches

Authors:
Ruoming Jin

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

,
Muad Abu-Ata

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

,
Yang Xiang

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

,
Ning Ruan

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2008Pages 399–407https://doi.org/10.1145/1401890.1401941

Published:24 August 2008Publication History

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 399–407

ABSTRACT

In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the nonlinear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The tree-regression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.

References

The r project for statistical computing. http://www.r-project.org/.Google Scholar
Foto Afrati, Aristides Gionis, and Heikki Mannila. Approximating a collection of frequent sets. In KDD, 2004. Google ScholarDigital Library
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD Conference, pages 207--216, May 1993. Google ScholarDigital Library
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 487--499, 1994. Google ScholarDigital Library
Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, pages 3--14, 1995. Google ScholarDigital Library
Alan Agresti. Categorical Data Analysis. Wiley, 2002.Google ScholarCross Ref
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/ borgelt/Software.Google Scholar
Toon Calders and Bart Goethals. Non-derivable itemset mining. Data Min. Knowl. Discov., 14(1):171--206, 2007. Google ScholarDigital Library
Gene H. Golub and Charles F. Van Loan. matrix computations, 3rd. The John Hopkins University Press, 1996. Google ScholarDigital Library
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
Jiawei Han, Jianyong Wang, Ying Lu, and Petre Tzvetkov. Mining top-k frequent closed patterns without minimum support. In ICDM, 2002. Google ScholarDigital Library
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.Google ScholarCross Ref
Jun Huan, Wei Wang, Deepak Bandyopadhyay, Jack Snoeyink, Jan Prins, and Alexander Tropsha. Mining protein family-specific residue packing patterns from protein structure graphs. In Eighth International Conference on Research in Computational Molecular Biology (RECOMB), pages 308--315, 2004. Google ScholarDigital Library
Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Knowledge Discovery and Data Mining (PKDD2000), pages 13--23, 2000. Google ScholarDigital Library
Ruoming Jin and Gagan Agrawal. A systematic approach for optimizing complex mining tasks on multiple datasets. In Proceedings of the ICDE Conference, 2006. Google ScholarDigital Library
Ron Kohavi, Carla Brodley, Brian Frasca, Llew Mason, and Zijian Zheng. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations, 2(2):86--98, 2000. http://www.ecn.purdue.edu/KDDCUP. Google ScholarDigital Library
F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on, 47(2):498--519, 2001. Google ScholarDigital Library
Wei Li and Ari Mozes. Computing frequent itemsets inside oracle 10g. In VLDB, pages 1253--1256, 2004. Google ScholarDigital Library
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Discovering frequent closed itemsets for association rules. In ICDT '99: Proceeding of the 7th International Conference on Database Theory, 1999. Google ScholarDigital Library
Dmitry Pavlov, Heikki Mannila, and Padhraic Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. IEEE Trans. Knowl. Data Eng., 15(6):1409--1421, 2003. Google ScholarDigital Library
Jr. Roberto J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 1998. Google ScholarDigital Library
G. A. F. Seber and C. J. Wild. Nonlinear Regression. John Weiley & Sons, Inc., 1989.Google Scholar
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.Google Scholar
Chao Wang and Srinivasan Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD, 2006. Google ScholarDigital Library
Takashi Washio and Hiroshi Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarDigital Library
Dong Xin, Hong Cheng, Xifeng Yan, and Jiawei Han. Extracting redundancy-aware top-k patterns. In KDD, 2006. Google ScholarDigital Library
Dong Xin, Jiawei Han, Xifeng Yan, and Hong Cheng. Mining compressed frequent-pattern sets. In VLDB, 2005. Google ScholarDigital Library
Xifeng Yan, Hong Cheng, Jiawei Han, and Dong Xin. Summarizing itemset patterns: a profile-based approach. In KDD, 2005. Google ScholarDigital Library
M. T. Yang, R. Kasturi, and A. Sivasubramaniam. An Automatic Scheduler for Real-Time Vision Applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2001. Google ScholarDigital Library
Takeshi Yoshizawa, Iko Pramudiono, and Masaru Kitsuregawa. SQL based association rule mining using commercial RDBMS (IBM db2 UBD EEE). In Data Warehousing and Knowledge Discovery, pages 301--306, 2000. Google ScholarDigital Library
Mohammed J. Zaki. Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, 2002. Google ScholarDigital Library
Mohammed J. Zaki and Charu C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316--325, 2003. Google ScholarDigital Library

Index Terms

Effective and efficient itemset pattern summarization: regression-based approaches
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Frequent subgraph summarization with error control
WAIM'13: Proceedings of the 14th international conference on Web-Age Information Management

Frequent subgraph mining has been an important research problem in the literature. However, the huge number of discovered frequent subgraphs becomes the bottleneck for exploring and understanding the generated patterns. In this paper, we propose to ...
Read More
From frequent itemsets to semantically meaningful visual patterns
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Data mining techniques that are successful in transaction and text data may not be simply applied to image data that contain high-dimensional features and have spatial structures. It is not a trivial task to discover meaningful visual patterns in image ...
Read More
Non-derivable itemset mining

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frequency restoration
pattern summarization
regression
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 627
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective and efficient itemset pattern summarization: regression-based approaches

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Frequent subgraph summarization with error control

From frequent itemsets to semantically meaningful visual patterns

Non-derivable itemset mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective and efficient itemset pattern summarization: regression-based approaches

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Frequent subgraph summarization with error control

From frequent itemsets to semantically meaningful visual patterns

Non-derivable itemset mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media