Skip to main content
Log in

A data mining proxy approach for efficient frequent itemset mining

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)

  2. Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item sets. J. Parallel Distrib. Comput. 61 (2001)

  3. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference (1993)

  4. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (1994)

  5. Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD Conference (1998)

  6. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD Conference (1997)

  7. Bucila, C., Gehrke, J., Kifer, D., White, W.M.: Dualminer: a dual-pruning algorithm for itemsets with constraints. In: Proceedings of the 8th ACM SIGKDD Conference (2002)

  8. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 2001 International Conference on Data Engineering (2001)

  9. Cong, G., Liu, B.: Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of the 2002 International Conference on Data Mining (2002)

  10. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of the 2001 IEEE International Conference on Data Mining (2001)

  11. Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th IEEE International Conference on Data Engineering (2000)

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD Conference (2000)

  13. Han, J., Wang, J., Lu, Y., Tzvethov, P.: Mining top-k frequent closed patterns without minimum. In: Proceedings of the 2002 IEEE International Conference on Data Mining (2002)

  14. Lakshmanan, L.V.S., Leung, C.K.S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM Trans. Database Systems (TODS) 28(4), (2003)

  15. Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings of the 1999 ACM SIGMOD Conference (1999)

  16. Lan, B., Ooi, B.C., Tan, K.L.: Efficient indexing structure for mining frequent patterns. In: Proceedings of the 18th IEEE International Conference on Data Engineering (1994)

  17. Leung, C.S., Ng, R.T., Mannila, H.: A segmentation approach to optimize frequency counting. In: Proceedings of the 18th IEEE International Conference on Data Engineering (2002)

  18. Li, Z., Yu, J.X., Lu, H., Xu, Y., Liu, G.: Data mining proxy: serving large number of users for efficient frequent itemset mining. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004)

  19. Lin, D.I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Proceedings of 6th International Conference on Extending Database Technology, EDBT (1998)

  20. Lin, J.L., Dunham, M.H.: Mining association rules: anti-skew algorithms. In: Proceedings of the 14th IEEE International Conference on Data Engineering (1998)

  21. Liu, G., Lu, H., Lou, W., Xu, Y., Yu, J.X.: Ascending frequency ordered prefix-tree: efficient mining of frequent itemsets. In: Proceedings of International Conference on Database Systems for Advanced Applications’03 (2003)

  22. Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th KDD Conference (2002)

  23. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Proceedings of KDD Workshop (1994)

  24. Morzy, T., Wojciechowski, M., Zakrzewicz, M.: Materialized data mining views. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2000)

  25. Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a knowledge cache for interactive discovery of association rules. In: Proceedings of the 5th ACM SIGKDD Conference (1999)

  26. Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the 1998 ACM SIGMOD Conference (1998)

  27. Park, J.S., Chen, M.S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD Conference (1995)

  28. Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the 6th ACM SIGKDD Conference (2000)

  29. Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent item sets with convertible constraints. In: Proceedings of the 17th IEEE International Conference on Data Engineering (2001)

  30. Pei, J., Han, J., Lu, H., Nishio, S., Shiwei Tang, D.Y.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of 2001 IEEE Conference on Data Mining (2001)

  31. Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000)

  32. Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of 21th International Conference on Very Large Data Bases. Morgan Kaufmann (1995)

  33. Seno, M., Karypis, G.: Lpminer: An algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of 1st IEEE Conference on Data Mining (2001)

  34. Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the 2000 ACM SIGMOD Conference (2000)

  35. Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD Conference (1997)

  36. Toivonen, H.: Sampling large databases for association rules. In: Proceedings of the 22th International Conference on Very Large Data Bases (1996)

  37. Wang, J., Han, J., Pei, J.: CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03) (2003)

  38. Webb, G.I.: Efficient search for association rules. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000)

  39. Xu, Y., Yu, J.X., Liu, G., Lu, H.: From path tree to frequent patterns: A framework for mining frequent patterns. In: Proceedings of IEEE International Conference on Data Mining (2002)

  40. Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SAIM International Conference on Data Mining (2003)

  41. Zaki, M.J.: Scalable algorithms for association mining. Knowl. Data Eng. 12(2), (2000)

  42. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD Conference (2003)

  43. Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed association rule mining. Tech. rep., Rensselaer Polytechnic Institute (1999)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Xu Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J.X., Li, Z. & Liu, G. A data mining proxy approach for efficient frequent itemset mining. The VLDB Journal 17, 947–970 (2008). https://doi.org/10.1007/s00778-007-0047-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0047-0

Keywords

Navigation