Skip to main content
Log in

A sparse memory allocation data structure for sequential and parallel association rule mining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we present a sparse memory allocation data structure for sequential and parallel data mining. We explored three algorithms utilizing the proposed data structure: MASP-tree, apriori-TID, and FP-growth. We modified the data structure of apriori-TID and FP-growth algorithms to reduce memory allocation cost. Five data sets are used for comparison. The results show that the modified apriori-TID has a higher speed-up than the modified FP-growth when the proposed data structure is used. A maximum speed-up of 3.42 is observed when MASP algorithm is tested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

References

  1. Agrawal A, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499

  2. Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969

    Article  Google Scholar 

  3. Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15:69–88

    Google Scholar 

  4. Bayardo R (2014) Frequent itemset mining dataset repository. http://www.cs.rpi.edu/~zaki/Workshops/FIMI/data/ (also available at http://fimi.ua.ac.be/data/)

  5. Buza K (2014) Feedback prediction for blogs. In: Data analysis, machine learning and knowledge discovery, pp 145–152. https://archive.ics.uci.edu/ml/datasets/BlogFeedback

  6. Cheung DW, Lee SD, Xiao Y (2002) Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3):498–514

    Article  Google Scholar 

  7. ConcurrentQueue (2015). https://msdn.microsoft.com/en-us/library/dd287208

  8. Fakhrahmad SM, Dastghaibyfard G (2011) An efficient frequent pattern mining method and its parallelization in transactional databases. J Inf Sci Eng 27:511–525

    Google Scholar 

  9. Garg R, Mishra PK (2009) Some observations of sequential, parallel and distributed association rule mining algorithms. In: International Conference on Computer and Automation Engineering, pp 336–342. doi:10.1109/ICCAE.2009.28

  10. Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y-K, Dubey P (2007) Cache-conscious frequent pattern mining on modern and emerging processors. VLDB J 16:77–96. doi:10.1007/s00778-006-0025-y

    Article  Google Scholar 

  11. Haglin D, Mayes KR, Manning AM, Feo J, Gurd JR, Elliot M, Keane JA (2009) Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures. Concurr Comput Pract Exp 21(9):1131–1158

    Article  Google Scholar 

  12. Han E-H, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352

    Article  Google Scholar 

  13. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  14. HSRG (2014) Highway Safety Research Group

  15. Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334

    Article  Google Scholar 

  16. Kambadur P, Ghoting A, Gupta A, Lumsdaine A (2012) Extending task parallelism for frequent pattern mining. CoRR, abs/1211.1658. arXiv:1211.1658v1[cs.DC]

  17. Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A (2009) PFunc: modern task parallelism for modern high performance computing. Proc Conf High Perform Comput Netw Storage Anal. doi:10.1145/1654059.1654103

  18. Lin KW, Lo Y-C (2013) Efficient algorithms for frequent pattern mining in many-task computing environments. Knowl Based Syst 49:10–21. doi:10.1016/j.knosys.2013.04.004

    Article  Google Scholar 

  19. Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd international conference on very large data bases, pp 1275–1285

  20. Negrevergne B, Termier A, Mehaut J, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: IEEE international conference on high performance computing and simulation (HPCS), pp 521–528

  21. Nguyen D, Vo B, Le B (2014) Efficient strategies for parallel mining class association rules. Expert Syst Appl 41(10):4716–4729

    Article  Google Scholar 

  22. Ozkural E, Ucar B, Aykanat C (2011) Parallel frequent item set mining with selective item replication. IEEE Trans Parallel Distrib Syst 22(10):1632–1640

    Article  Google Scholar 

  23. Shanthi MM, Irudhayaraj AA (2009) Multithreading—an efficient technique for enhancing application performance. Int J Recent Trends Eng 165–167

  24. Shen Y, Fu Z, Zhang L, Wang J (2012) Parallel apriori algorithm based on the thread pool. IEEE Int Conf Computer Sci Serv Syst 2235–2238. doi:10.1109/CSSS.2012.555

  25. Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays. Knowl Based Syst 37:462–471

    Article  Google Scholar 

  26. Souliou D, Pagourtzis A, Drosinos N, Tsanakas P (2006) Computing frequent itemsets in parallel using partial support trees. J Syst Softw 79(12):1735–1743

    Article  Google Scholar 

  27. Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592

    Article  Google Scholar 

  28. Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. doi:10.1155/2014/781670

  29. Vu L, Alaghband G (2014) Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput 40(10):768–785. doi:10.1016/j.parco.2014.08.003

    Article  Google Scholar 

  30. Yu KM, Zhou J (2010) Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst Appl 37(3):2486–2494

    Article  MathSciNet  Google Scholar 

  31. Yu K-M, Zhou J, Hong T-P, Zhou J-L (2010) A load-balanced distributed parallel mining algorithm. Expert Syst Appl 37(3):2459–2464

    Article  Google Scholar 

  32. Zaki M, Parthasarathy S, Ogihara M (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373

    Article  Google Scholar 

  33. Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank LA DOTD for continuous support in research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ömer M. Soysal.

Appendix

Appendix

See Table 3.

Table 3 Summary of related work

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soysal, Ö.M., Gupta, E. & Donepudi, H. A sparse memory allocation data structure for sequential and parallel association rule mining. J Supercomput 72, 347–370 (2016). https://doi.org/10.1007/s11227-015-1566-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1566-x

Keywords

Navigation