SQL Based Frequent Pattern Mining with FP-Growth

Shang, Xuequn; Sattler, Kai-Uwe; Geist, Ingolf

doi:10.1007/11415763_3

Xuequn Shang²²,
Kai-Uwe Sattler²² &
Ingolf Geist²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3392))

Included in the following conference series:

571 Accesses
10 Citations

Abstract

Scalable data mining in large databases is one of today’s real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on DBMS with IBM DB2 UDB EEE V8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Comparative Analysis of Breadth First Search Approach in Mining Frequent Itemsets

Maximal Frequent Itemset Mining Using Breadth-First Search with Efficient Pruning

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

References

Agarwal, R., Aggarwal, C., Prasad, V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing(Special Issue on High Performance Data Mining) (2000)
Google Scholar
Agrawal, R., Shim, K.: Developing tightly-coupled data mining application on a relational database system. In: Proc.of the 2nd Int. Conf. on Knowledge Discovery in Database and Data Mining, Portland, Oregon (1996)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20st VLDB Conference, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Han, J., pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD Conference on Management of data (2000)
Google Scholar
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational database. In: Proc. Of the 1996 SIGMOD workshop on research issues on data mining and knowledge discovery, Montreal, Canada (1996)
Google Scholar
Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17(3), 245–262 (1995)
Article Google Scholar
Meo, R., Psaila, G., Ceri, S.: A new SQL like operator for mining association rules. In: Proc. Of the 22nd Int. Conf. on Very Large Databases, Bombay, India (1996)
Google Scholar
Park, J.S., Chen, M., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Conference on Management of data, pp. 175–186 (1995)
Google Scholar
Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based associaton rule mining on large scale PC cluster: performance comparision with directly coded C implementation. In: Proc. Of Third Pacific-Asia Conf. on Knowledge Discovery and Data Mining (1999)
Google Scholar
Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: DMKD 2003: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)
Google Scholar
Savsere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference (1995)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating mining with relational database systems: alternatives and implications. In: Proc. of the ACM SIGMOD Conference on Management of data, Seattle, Washinton, USA (1998)
Google Scholar
Sattel, K., Dunemann, O.: SQL database primitives for decision tree classifiers. In: Proc. Of the 10nd ACM CIKN Int. Conf. on Information and Knowledge Management, Atlanta, Georgia (2001)
Google Scholar
Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)
Google Scholar
Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for Object-Relational systems. In: Proc. Of the 26th Int. Conf. on Very Large Databases, Cairo, Egypt (2000)
Google Scholar
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, p. 301. Springer, Heidelberg (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Magdeburg, P.O.BOX 4120, 39106, Magdeburg, Germany
Xuequn Shang, Kai-Uwe Sattler & Ingolf Geist

Authors

Xuequn Shang
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Sattler
View author publications
You can also search for this author in PubMed Google Scholar
Ingolf Geist
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universität Würzburg, Am Hubland, 97074, Würzburg, Germany
Dietmar Seipel
Institut für Informatik, CAU Kiel, Germany
Michael Hanus
Fraunhofer FIRST, Berlin
Ulrich Geske
IF Computer Japan, 5-28-2 Sendagi, Bunkyo-ku, 113-0022, Tokyo, Japan
Oskar Bartenstein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shang, X., Sattler, KU., Geist, I. (2005). SQL Based Frequent Pattern Mining with FP-Growth. In: Seipel, D., Hanus, M., Geske, U., Bartenstein, O. (eds) Applications of Declarative Programming and Knowledge Management. INAP WLP 2004 2004. Lecture Notes in Computer Science(), vol 3392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415763_3

Download citation

DOI: https://doi.org/10.1007/11415763_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25560-4
Online ISBN: 978-3-540-32124-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SQL Based Frequent Pattern Mining with FP-Growth

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Comparative Analysis of Breadth First Search Approach in Mining Frequent Itemsets

Maximal Frequent Itemset Mining Using Breadth-First Search with Efficient Pruning

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SQL Based Frequent Pattern Mining with FP-Growth

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Comparative Analysis of Breadth First Search Approach in Mining Frequent Itemsets

Maximal Frequent Itemset Mining Using Breadth-First Search with Efficient Pruning

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation