Skip to main content
Log in

Cache-conscious frequent pattern mining on modern and emerging processors

The VLDB Journal Aims and scope Submit manuscript

Abstract

Algorithms are typically designed to exploit the current state of the art in processor technology. However, as processor technology evolves, said algorithms are often unable to derive the maximum achievable performance on these modern architectures. In this paper, we examine the performance of frequent pattern mining algorithms on a modern processor. A detailed performance study reveals that even the best frequent pattern mining implementations, with highly efficient memory managers, still grossly under-utilize a modern processor. The primary performance bottlenecks are poor data locality and low instruction level parallelism (ILP). We propose a cache-conscious prefix tree to address this problem. The resulting tree improves spatial locality and also enhances the benefits from hardware cache line prefetching. Furthermore, the design of this data structure allows the use of path tiling, a novel tiling strategy, to improve temporal locality. The result is an overall speedup of up to 3.2 when compared with state of the art implementations. We then show how these algorithms can be improved further by realizing a non-naive thread-based decomposition that targets simultaneously multi-threaded processors (SMT). A key aspect of this decomposition is to ensure cache re-use between threads that are co-scheduled at a fine granularity. This optimization affords an additional speedup of 50%, resulting in an overall speedup of up to 4.8. The proposed optimizations also provide performance improvements on SMPs, and will most likely be beneficial on emerging processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the international conference on Management of Data (SIGMOD), 1993

  2. Agrawal, R., Shafer, J.: Parallel mining of association rules. In: IEEE Transactions on Knowledge and Data Engineering, 1996

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1994

  4. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the International Conference on Data Engineering (ICDE), 1995

  5. Ailamaki, A., DeWitt, D., Hill, M., Scounakis, M.: Weaving relations for cache performance. In: Proceedings of International Conference on Very Large Data Bases (VLDB), 2001.

  6. Ailamaki, A., DeWitt, D.J., Hill, M., Wood, D.: DBMSs on a modern processor: Where does time go? In: Proceedings of International Conference on Very Large Data Bases (VLDB), 1999.

  7. Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of the International Conference on Management of Data (SIGMOD), 1998.

  8. Bender, M., Demaine, E., Farach-Colton, M.: Cache-oblivious b-trees. In: Proceedings of the International Symposium on Foundations of Computer Science (FOCS), 2000.

  9. Boggs, D., Baktha, A., Hawkins, J., Marr, T., Miller, J., Roussel, P., Singhal, R., Toll, B., Venkatraman, S.: The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Tech J. 2004.

  10. Borgelt, C.: Efficient implementations of apriori and eclat. In: Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.

  11. Bradford, J., Fortes, J.: Performance and memory-access characterization of data mining applications. In: Proceedings of the Workshop on Workload Characterization (WWC), 1998.

  12. Brin, S., Motwani, R., Silverstein, C.: Beyond market basket: Generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data (SIGMOD), 1997.

  13. Buehrer, G., Parthasarathy, S., Ghoting, A.: Out-of-core frequent pattern mining on a commodity pc. Technical report, Department of Computer Science and Engineering, The Ohio State University, 2006.

  14. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A maximal frequent itemset mining algorithm for transactional databases. In: Proceedings of the International Conference on Data Engineering (ICDE), 2001.

  15. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1991.

  16. Chen, S., Ailamaki, A., Gibbons, P., Mowry, T.: Improving hash join performance through prefetching. In: Proceedings of the International Conference on Data Engineering (ICDE), 2004

  17. Chen, S., Gibbons, P., Mowry, T.: Improving index performance through prefetching. In: Proceedings of the International Conference on Management of Data (SIGMOD), 2001.

  18. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1999

  19. Frigo, M., Leiserson, C., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: Proceedings of the Symposium on Foundations of Computer Science, 1999.

  20. Garcia, P., Korth, H: Hash-join algorithms on modern multithreaded computer architectures. Technical report, Lehigh University, 2005

  21. Ghoting, A., Buehrer, G., Parthasarathy, S., Kim, D., Nguyen, A.,Chen, Y., Dubey, P.: A characterization of data mining algorithms on a modern processor. In: Proceedings of the ACM SIGMOD Workshop on Data Management on New Hardware, pp. 1–5, 2005.

  22. Goethals, B., Zaki, M.: Advances in frequent itemset mining implementations. In: Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.

  23. Gouda, K., Zaki, M.: Efficiently mining maximal frequent itemsets. In: Proceedings of the International Conference on Data Mining (ICDM), 2001.

  24. Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations, 2003

  25. Grunwald, D., Zorn, B.: Customalloc: Efficient synthesized memory allocators. Softw. Pract. Exp. 23(8), 851–869 (1993)

    Google Scholar 

  26. Han, E., Karypis, G., Kumar, V.: Scalable parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 2000

  27. Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the International Conference on Data Engineering (ICDE), 1999

  28. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the International Conference on Management of Data (SIGMOD), 2000.

  29. IntelCorp. Intel Hyper-Threading Technology, 2004.

  30. Kim, J., Qin, X., Hsu, Y.: Memory characterization of a parallel data mining workload. In: Proceedings of the Workshop on Workload Characterization (WWC), 1999.

  31. Lo, J., Barroso, L., Eggers, S., Gharachorloo, K., Levy, H.,Parekh, S.: An analysis of database workload performance on simultaneous multithreaded processors. In: Proceedings of the International Symposium on Computer Architecture (ISCA), 1998

  32. Mannila, H., Toivonen, H., Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1997.

  33. Mowry, T., Lam, M., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1992.

  34. Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proceedings of the International Conference on Management of Data (SIGMOD), 1995

  35. Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Memory placement techniques for parallel association mining. International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1998.

  36. Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowl. Inform. Syst. J. 2001

  37. Rao, J., Ross, K.: Cache conscious indexing for decision support in main memory. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1999

  38. Rao, J., Ross, K.: Making B+ trees cache conscious in main memory. In: Proceedings of the International Conference on Management of Data (VLDB), 2000.

  39. Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1995.

  40. Shatdal, A., Kant, C., Naughton, J.: Cache-conscious algorithms for relational query processing. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1994

  41. Silverstein, C., Brin, S., Motwani, R., Ullman, J.: Scalable techniques for mining causal structures. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1998

  42. Tullsen, D., Eggers, S., Levy.: Simultaneous multithreading: Maximizing on-chip parallelism. In: Proceedings of the International Symposium on Computer Architecture (ISCA), 1995

  43. Zaki, M., Hsiao, C.: CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of SIAM International Conference on Data Mining (SDM), 2002

  44. Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery discovery of association rules. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1995.

  45. Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2001.

  46. Zhou, J., Cieslewicz, J., Ross, K., Shah, M.: Improving database performance on simultaneous multhithreading processors. In: Proceedings of International Conference on Very Large Data Bases (VLDB), 2005.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srinivasan Parthasarathy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghoting, A., Buehrer, G., Parthasarathy, S. et al. Cache-conscious frequent pattern mining on modern and emerging processors. The VLDB Journal 16, 77–96 (2007). https://doi.org/10.1007/s00778-006-0025-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0025-y

Keywords

Navigation