The Research of Sampling for Mining Frequent Itemsets

Hu, Xuegang; Yu, Haitao

doi:10.1007/11795131_72

Xuegang Hu²² &
Haitao Yu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4062))

Included in the following conference series:

International Conference on Rough Sets and Knowledge Technology

1029 Accesses
3 Citations

Abstract

Efficiently mining frequent itemsets is the key step in extracting association rules from large scale databases. Considering the restriction of min_support in mining association rules, a weighted sampling algorithm for mining frequent itemsets is proposed in the paper. First of all, a weight is given to each transaction data. Then according to the statistical optimal sample size of database, a sample is extracted based on weight of data. In terms of the algorithm, the sample includes large amounts of transaction data consisting of the frequent itemsets with many items inside, so that the frequent itemsets mined from sample are similar to those gained from the original data. Furthermore, the algorithm can shrink the sample size and guarantee the sample quality at the same time. The experiment verifys the validity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Partjasaratjy, S.: Efficient Progressive Sampling for Association Rules, http://www.cse.ohio-state.edu/~srini/papers/ICDM02-sampling.pdf
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in knowledge discovery and data mining, AAAI/MIT Press (1996)
Google Scholar
Toivonen, H.: Sampling Large Databases for Association Rules. In: Proceedings of the 22th International Conference on Very Large Data Bases table of contents, San Jose, pp. 134–145 (1996)
Google Scholar
Wang, C.H., Huang, H.K.: Distributed mining adjustable accuracy association rules using sampling. Journal of computer research and development, China, 1101–1106 (2000)
Google Scholar
Gu, B.H.: Efficiently Determine the Starting Sample Size for Progressive Sampling, http://www.cs.cornell.edu/johannes/papers/dmkd2001-papers/baohua.pdf
Kullback, S.: Information Theory and Statistics. JHohn Wilcy & Sons, Inc., New York
Google Scholar
Zaki, M.J., Parthasarathy, S.: Evaluation of Sampling for Data Mining of Association Rules.Ther University of Rochester Computer Science Department Technical Report. NewYork, pp. 617–618 (1996)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Set of Items in Large Databases. In: Proceedings of ACM SIGMOD, Los Angeles, pp. 207–216 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Technology, Hefei University of Technology, Hefei, 230009
Xuegang Hu & Haitao Yu

Authors

Xuegang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computer Science and Technology, Chongqing, University of Posts and Telecommunication, 400065, Chongqing, P.R. China
Guo-Ying Wang
Department of Electrical and Computer Engineering, University of Manitoba, R3T 5V6, Winnipeg, Manitoba, Canada
James F. Peters
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron
Department of Computer Science, University of Regina, Regina,, S4S 0A2, Saskatchewan, Canada
Yiyu Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, X., Yu, H. (2006). The Research of Sampling for Mining Frequent Itemsets. In: Wang, GY., Peters, J.F., Skowron, A., Yao, Y. (eds) Rough Sets and Knowledge Technology. RSKT 2006. Lecture Notes in Computer Science(), vol 4062. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11795131_72

Download citation

DOI: https://doi.org/10.1007/11795131_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36297-5
Online ISBN: 978-3-540-36299-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics