Abstract
Scalable data mining in large databases is one of today’s real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on DBMS with IBM DB2 UDB EEE V8.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, R., Aggarwal, C., Prasad, V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing(Special Issue on High Performance Data Mining) (2000)
Agrawal, R., Shim, K.: Developing tightly-coupled data mining application on a relational database system. In: Proc.of the 2nd Int. Conf. on Knowledge Discovery in Database and Data Mining, Portland, Oregon (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20st VLDB Conference, Santiago, Chile, pp. 487–499 (1994)
Han, J., pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD Conference on Management of data (2000)
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational database. In: Proc. Of the 1996 SIGMOD workshop on research issues on data mining and knowledge discovery, Montreal, Canada (1996)
Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17(3), 245–262 (1995)
Meo, R., Psaila, G., Ceri, S.: A new SQL like operator for mining association rules. In: Proc. Of the 22nd Int. Conf. on Very Large Databases, Bombay, India (1996)
Park, J.S., Chen, M., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Conference on Management of data, pp. 175–186 (1995)
Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based associaton rule mining on large scale PC cluster: performance comparision with directly coded C implementation. In: Proc. Of Third Pacific-Asia Conf. on Knowledge Discovery and Data Mining (1999)
Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: DMKD 2003: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)
Savsere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference (1995)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating mining with relational database systems: alternatives and implications. In: Proc. of the ACM SIGMOD Conference on Management of data, Seattle, Washinton, USA (1998)
Sattel, K., Dunemann, O.: SQL database primitives for decision tree classifiers. In: Proc. Of the 10nd ACM CIKN Int. Conf. on Information and Knowledge Management, Atlanta, Georgia (2001)
Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)
Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for Object-Relational systems. In: Proc. Of the 26th Int. Conf. on Very Large Databases, Cairo, Egypt (2000)
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, p. 301. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shang, X., Sattler, KU., Geist, I. (2005). SQL Based Frequent Pattern Mining with FP-Growth. In: Seipel, D., Hanus, M., Geske, U., Bartenstein, O. (eds) Applications of Declarative Programming and Knowledge Management. INAP WLP 2004 2004. Lecture Notes in Computer Science(), vol 3392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415763_3
Download citation
DOI: https://doi.org/10.1007/11415763_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25560-4
Online ISBN: 978-3-540-32124-8
eBook Packages: Computer ScienceComputer Science (R0)