A Sampling-Based Method for Mining Frequent Patterns from Databases

Chen, Yen-Liang; Ho, Chin-Yuan

doi:10.1007/11540007_65

Yen-Liang Chen²⁰ &
Chin-Yuan Ho²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

944 Accesses
2 Citations

Abstract

Mining frequent item sets (frequent patterns) in transaction databases is a well known problem in data mining research. This work proposes a sampling-based method to find frequent patterns. The proposed method contains three phases. In the first phase, we draw a small sample of data to estimate the set of frequent patterns, denoted as F ^S. The second phase computes the actual supports of the patterns in F ^S as well as identifies a subset of patterns in F ^S that need to be further examined in the next phase. Finally, the third phase explores this set and finds all missing frequent patterns. The empirical results show that our algorithm is efficient, about two or three times faster than the well-known FP-growth algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference, pp. 478–499 (1994)
Google Scholar
Park, J.S., Chen, M.S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Transactions on Knowledge and Data Engineering 9, 813–825 (1997)
Article Google Scholar
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conf. on Management of Data, pp. 255–264 (1997)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proceedings of Int’l Conf. Very Large Data Bases, pp. 432–444 (1995)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12 (2000)
Google Scholar
Relue, R., Wu, X., Huang, H.: Efficient runtime generation of association rules. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 466–473 (2001)
Google Scholar
Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of 2002 Int. Conf. on Knowledge Discovery in Databases, pp. 229–238 (2002)
Google Scholar
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of IEEE International Conference on Data Mining, pp. 441–448 (2001)
Google Scholar
Agrawal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of SIGKDD Conference, pp. 108–118 (2000)
Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of 17th Int. Conf. Data Engineering, pp. 443–452 (2001)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Proceedings of the 22th International Conference on Very Large Databases, pp. 134–145 (1996)
Google Scholar
Agarwal, R., Aggarwal, C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing 61, 350–371 (2001)
Article MATH Google Scholar
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: Proceedings of the ACM-SIGMOD Int’l Conf. on Management of Data, pp. 85–93 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Information Management, National Central Univ, Chung-Li, 320, Taiwan
Yen-Liang Chen & Chin-Yuan Ho

Authors

Yen-Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Yuan Ho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, YL., Ho, CY. (2005). A Sampling-Based Method for Mining Frequent Patterns from Databases. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_65

Download citation

DOI: https://doi.org/10.1007/11540007_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics