Abstract
In this paper, we present a sparse memory allocation data structure for sequential and parallel data mining. We explored three algorithms utilizing the proposed data structure: MASP-tree, apriori-TID, and FP-growth. We modified the data structure of apriori-TID and FP-growth algorithms to reduce memory allocation cost. Five data sets are used for comparison. The results show that the modified apriori-TID has a higher speed-up than the modified FP-growth when the proposed data structure is used. A maximum speed-up of 3.42 is observed when MASP algorithm is tested.
Similar content being viewed by others
References
Agrawal A, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15:69–88
Bayardo R (2014) Frequent itemset mining dataset repository. http://www.cs.rpi.edu/~zaki/Workshops/FIMI/data/ (also available at http://fimi.ua.ac.be/data/)
Buza K (2014) Feedback prediction for blogs. In: Data analysis, machine learning and knowledge discovery, pp 145–152. https://archive.ics.uci.edu/ml/datasets/BlogFeedback
Cheung DW, Lee SD, Xiao Y (2002) Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3):498–514
ConcurrentQueue (2015). https://msdn.microsoft.com/en-us/library/dd287208
Fakhrahmad SM, Dastghaibyfard G (2011) An efficient frequent pattern mining method and its parallelization in transactional databases. J Inf Sci Eng 27:511–525
Garg R, Mishra PK (2009) Some observations of sequential, parallel and distributed association rule mining algorithms. In: International Conference on Computer and Automation Engineering, pp 336–342. doi:10.1109/ICCAE.2009.28
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y-K, Dubey P (2007) Cache-conscious frequent pattern mining on modern and emerging processors. VLDB J 16:77–96. doi:10.1007/s00778-006-0025-y
Haglin D, Mayes KR, Manning AM, Feo J, Gurd JR, Elliot M, Keane JA (2009) Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures. Concurr Comput Pract Exp 21(9):1131–1158
Han E-H, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
HSRG (2014) Highway Safety Research Group
Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334
Kambadur P, Ghoting A, Gupta A, Lumsdaine A (2012) Extending task parallelism for frequent pattern mining. CoRR, abs/1211.1658. arXiv:1211.1658v1[cs.DC]
Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A (2009) PFunc: modern task parallelism for modern high performance computing. Proc Conf High Perform Comput Netw Storage Anal. doi:10.1145/1654059.1654103
Lin KW, Lo Y-C (2013) Efficient algorithms for frequent pattern mining in many-task computing environments. Knowl Based Syst 49:10–21. doi:10.1016/j.knosys.2013.04.004
Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd international conference on very large data bases, pp 1275–1285
Negrevergne B, Termier A, Mehaut J, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: IEEE international conference on high performance computing and simulation (HPCS), pp 521–528
Nguyen D, Vo B, Le B (2014) Efficient strategies for parallel mining class association rules. Expert Syst Appl 41(10):4716–4729
Ozkural E, Ucar B, Aykanat C (2011) Parallel frequent item set mining with selective item replication. IEEE Trans Parallel Distrib Syst 22(10):1632–1640
Shanthi MM, Irudhayaraj AA (2009) Multithreading—an efficient technique for enhancing application performance. Int J Recent Trends Eng 165–167
Shen Y, Fu Z, Zhang L, Wang J (2012) Parallel apriori algorithm based on the thread pool. IEEE Int Conf Computer Sci Serv Syst 2235–2238. doi:10.1109/CSSS.2012.555
Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays. Knowl Based Syst 37:462–471
Souliou D, Pagourtzis A, Drosinos N, Tsanakas P (2006) Computing frequent itemsets in parallel using partial support trees. J Syst Softw 79(12):1735–1743
Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592
Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. doi:10.1155/2014/781670
Vu L, Alaghband G (2014) Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput 40(10):768–785. doi:10.1016/j.parco.2014.08.003
Yu KM, Zhou J (2010) Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst Appl 37(3):2486–2494
Yu K-M, Zhou J, Hong T-P, Zhou J-L (2010) A load-balanced distributed parallel mining algorithm. Expert Syst Appl 37(3):2459–2464
Zaki M, Parthasarathy S, Ogihara M (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25
Acknowledgments
The authors would like to thank LA DOTD for continuous support in research.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 3.
Rights and permissions
About this article
Cite this article
Soysal, Ö.M., Gupta, E. & Donepudi, H. A sparse memory allocation data structure for sequential and parallel association rule mining. J Supercomput 72, 347–370 (2016). https://doi.org/10.1007/s11227-015-1566-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1566-x