Skip to main content
Log in

Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability. As a byproduct of this study, we identify some primitives for native support in database systems for decision-support applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Arning, A., Bollinger, T., Mehta, M., Shafer, J., and Srikant, R. 1996a. The quest data mining system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon.

  • Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C. pp. 207–216.

  • Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996b. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). AAAI/MIT Press, Ch. 12, pp. 307–328.

  • Agrawal, R. and Shafer, J. 1996. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6):962–969.

    Article  Google Scholar 

  • Agrawal, R. and Shim, K. 1996. Developing tightly-coupled data mining applications on a relational database system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon.

  • Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. In Proc. of the ACM SIGMOD Conference on Management of Data.

  • Chamberlin, D. 1996. Using the New DB2: IBM's Object-Relational Database System. Morgan Kaufmann.

  • Chaudhuri, S. 1998. Data mining and database systems: Where is the intersection? Bulletin of the Technical Committee on Data Engineering, 21:4–8.

    Google Scholar 

  • Graefe, G., Fayyad, U., and Chaudhuri, S. 1998. On the efficient gathering of sufficient statistics for classificantion from large SQL databases. In Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), Poster.

  • Han, J., Fu, Y., Koperski, K., Wang, W., and Zaiane, O. 1996. DMQL: A data mining query language for relational datbases. In Proc. of the 1996SIGMODWorkshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada.

  • Houtsma, M. and Swami, A. 1995. Set-oriented mining of association rules. In Int'l Conference on Data Engineering, Taipei, Taiwan.

  • IBM Corporation. 1997. DB2 Universal Database Application Programming Guide Version 5.

  • Imielinski, T. and Mannila, H. 1996. A database perspective on knowledge discovery. Communication of the ACM, 39(11):58–64.

    Article  Google Scholar 

  • Imielinski, T., Virmani, A., and Abdulghani, A. 1996. Discovery board application programming interface and query language for database mining. In Proc. of the 2nd Int'l Conference on Knowledge Discovery and Data Mining, Portland, Oregon.

  • International Business Machines. 1996. IBM Intelligent Miner User's Guide, Version 1 Release 1, SH12-6213-00 ed.

  • Mehta, M., Agrawal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. In Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France.

  • Melton, J. and Mattos, N. 1996. SQL3-A tutorial. In Twenty-Second International Conference on Very Large Data Bases, tutorial.

  • Melton, J. and Simon, A. 1992. Understanding the New SQL: A Complete Guide. Morgan Kauffman.

  • Meo, R., Psaila, G., and Ceri, S. 1996. A new SQL like operator for mining association rules. In Proc. of the 22nd Int'l Conference on Very Large Databases, Bombay, India.

  • Oracle. 1992. Oracle RDBMS Database Administrator's Guide Volumes I, II (Version 7.0).

  • Pirahesh, H. and Reinwald, B. 1998. SQL table function open architecture and data access middleware. In SIGMOD.

  • Rajamani, K., Iyer, B., and Chaddha, A. 1997. Using DB/2's object relational extensions for mining associations rules. Technical Report TR 03,690., Santa Teresa Laboratory, IBM Corporation.

  • Sarawagi, S., Thomas, S., and Agrawal, R. 1998. Integrating association rule mining with databases: Alternatives and implications. In Proc. ACM SIGMOD International Conf. on Management of Data, Seattle, USA.

  • Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proc. of the VLDB Conference, Zurich, Switzerland.

  • Siebes, A. and Kersten, M. L. 1997. KESO: Minimizing database interaction. In Proc. of the 3rd Int'l Conference on Knowledge Discovery and Data Mining, Newport Beach, California.

  • Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland.

  • Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France.

  • Stonebraker, M. R. and Kemnitz, G. 1991. The POSTGRES next generation database management system. Communications of the ACM, 34(10):78–92.

    Article  Google Scholar 

  • Thomas, S. and Sarawagi, S. 1998. Mining generalized association rules and sequential patterns using sql queries. In Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), Poster.

  • Toivonen, H. 1996. Sampling large databases for association rules. In Proc. of the 22nd Int'l Conference on Very Large Databases, Mumbai (Bombay), India, pp. 134–145.

  • Tsur, D., Abiteboul, S., Clifton, C., Motwani, R., and Nestorov, S. 1998. Query flocks: A generalization of association rule mining. In Proc. ACM SIGMOD International Conf. on Management of Data, Seattle, USA.

  • Vitter, J.S., Wang, M., and Iyer, B.R. 1998. Scalable mining for classification rules in relational databases. In IDEAS, pp. 58–67.

  • Zaki, M.J., Parthasarathy, S., Ogihara, M., and Li,W. 1997. New algorithms for fast discovery of association rules. In Proc. of the 3rd Int'l Conference on Knowledge Discovery and Data Mining, Newport Beach, California.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarawagi, S., Thomas, S. & Agrawal, R. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Data Mining and Knowledge Discovery 4, 89–125 (2000). https://doi.org/10.1023/A:1009887712954

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009887712954

Navigation