Skip to main content

The Research of Sampling for Mining Frequent Itemsets

  • Conference paper
Rough Sets and Knowledge Technology (RSKT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4062))

Included in the following conference series:

Abstract

Efficiently mining frequent itemsets is the key step in extracting association rules from large scale databases. Considering the restriction of min_support in mining association rules, a weighted sampling algorithm for mining frequent itemsets is proposed in the paper. First of all, a weight is given to each transaction data. Then according to the statistical optimal sample size of database, a sample is extracted based on weight of data. In terms of the algorithm, the sample includes large amounts of transaction data consisting of the frequent itemsets with many items inside, so that the frequent itemsets mined from sample are similar to those gained from the original data. Furthermore, the algorithm can shrink the sample size and guarantee the sample quality at the same time. The experiment verifys the validity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Partjasaratjy, S.: Efficient Progressive Sampling for Association Rules, http://www.cse.ohio-state.edu/~srini/papers/ICDM02-sampling.pdf

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in knowledge discovery and data mining, AAAI/MIT Press (1996)

    Google Scholar 

  3. Toivonen, H.: Sampling Large Databases for Association Rules. In: Proceedings of the 22th International Conference on Very Large Data Bases table of contents, San Jose, pp. 134–145 (1996)

    Google Scholar 

  4. Wang, C.H., Huang, H.K.: Distributed mining adjustable accuracy association rules using sampling. Journal of computer research and development, China, 1101–1106 (2000)

    Google Scholar 

  5. Gu, B.H.: Efficiently Determine the Starting Sample Size for Progressive Sampling, http://www.cs.cornell.edu/johannes/papers/dmkd2001-papers/baohua.pdf

  6. Kullback, S.: Information Theory and Statistics. JHohn Wilcy & Sons, Inc., New York

    Google Scholar 

  7. Zaki, M.J., Parthasarathy, S.: Evaluation of Sampling for Data Mining of Association Rules.Ther University of Rochester Computer Science Department Technical Report. NewYork, pp. 617–618 (1996)

    Google Scholar 

  8. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Set of Items in Large Databases. In: Proceedings of ACM SIGMOD, Los Angeles, pp. 207–216 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, X., Yu, H. (2006). The Research of Sampling for Mining Frequent Itemsets. In: Wang, GY., Peters, J.F., Skowron, A., Yao, Y. (eds) Rough Sets and Knowledge Technology. RSKT 2006. Lecture Notes in Computer Science(), vol 4062. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11795131_72

Download citation

  • DOI: https://doi.org/10.1007/11795131_72

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36297-5

  • Online ISBN: 978-3-540-36299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics