A sparse memory allocation data structure for sequential and parallel association rule mining

Soysal, Ömer M.; Gupta, Eera; Donepudi, Harisha

doi:10.1007/s11227-015-1566-x

A sparse memory allocation data structure for sequential and parallel association rule mining

Published: 21 November 2015

Volume 72, pages 347–370, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ömer M. Soysal¹,
Eera Gupta¹ &
Harisha Donepudi¹

516 Accesses
12 Citations
Explore all metrics

Abstract

In this paper, we present a sparse memory allocation data structure for sequential and parallel data mining. We explored three algorithms utilizing the proposed data structure: MASP-tree, apriori-TID, and FP-growth. We modified the data structure of apriori-TID and FP-growth algorithms to reduce memory allocation cost. Five data sets are used for comparison. The results show that the modified apriori-TID has a higher speed-up than the modified FP-growth when the proposed data structure is used. A maximum speed-up of 3.42 is observed when MASP algorithm is tested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Association Rules Mining

Article 09 September 2021

A Survey on Representation for Itemsets in Association Rule Mining

An Improved AprioriAll Algorithm Based on Tissue-Like P for Sequential Pattern Mining

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Agrawal A, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Article Google Scholar
Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15:69–88
Google Scholar
Bayardo R (2014) Frequent itemset mining dataset repository. http://www.cs.rpi.edu/~zaki/Workshops/FIMI/data/ (also available at http://fimi.ua.ac.be/data/)
Buza K (2014) Feedback prediction for blogs. In: Data analysis, machine learning and knowledge discovery, pp 145–152. https://archive.ics.uci.edu/ml/datasets/BlogFeedback
Cheung DW, Lee SD, Xiao Y (2002) Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3):498–514
Article Google Scholar
ConcurrentQueue (2015). https://msdn.microsoft.com/en-us/library/dd287208
Fakhrahmad SM, Dastghaibyfard G (2011) An efficient frequent pattern mining method and its parallelization in transactional databases. J Inf Sci Eng 27:511–525
Google Scholar
Garg R, Mishra PK (2009) Some observations of sequential, parallel and distributed association rule mining algorithms. In: International Conference on Computer and Automation Engineering, pp 336–342. doi:10.1109/ICCAE.2009.28
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y-K, Dubey P (2007) Cache-conscious frequent pattern mining on modern and emerging processors. VLDB J 16:77–96. doi:10.1007/s00778-006-0025-y
Article Google Scholar
Haglin D, Mayes KR, Manning AM, Feo J, Gurd JR, Elliot M, Keane JA (2009) Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures. Concurr Comput Pract Exp 21(9):1131–1158
Article Google Scholar
Han E-H, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
Article MathSciNet Google Scholar
HSRG (2014) Highway Safety Research Group
Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334
Article Google Scholar
Kambadur P, Ghoting A, Gupta A, Lumsdaine A (2012) Extending task parallelism for frequent pattern mining. CoRR, abs/1211.1658. arXiv:1211.1658v1[cs.DC]
Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A (2009) PFunc: modern task parallelism for modern high performance computing. Proc Conf High Perform Comput Netw Storage Anal. doi:10.1145/1654059.1654103
Lin KW, Lo Y-C (2013) Efficient algorithms for frequent pattern mining in many-task computing environments. Knowl Based Syst 49:10–21. doi:10.1016/j.knosys.2013.04.004
Article Google Scholar
Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd international conference on very large data bases, pp 1275–1285
Negrevergne B, Termier A, Mehaut J, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: IEEE international conference on high performance computing and simulation (HPCS), pp 521–528
Nguyen D, Vo B, Le B (2014) Efficient strategies for parallel mining class association rules. Expert Syst Appl 41(10):4716–4729
Article Google Scholar
Ozkural E, Ucar B, Aykanat C (2011) Parallel frequent item set mining with selective item replication. IEEE Trans Parallel Distrib Syst 22(10):1632–1640
Article Google Scholar
Shanthi MM, Irudhayaraj AA (2009) Multithreading—an efficient technique for enhancing application performance. Int J Recent Trends Eng 165–167
Shen Y, Fu Z, Zhang L, Wang J (2012) Parallel apriori algorithm based on the thread pool. IEEE Int Conf Computer Sci Serv Syst 2235–2238. doi:10.1109/CSSS.2012.555
Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays. Knowl Based Syst 37:462–471
Article Google Scholar
Souliou D, Pagourtzis A, Drosinos N, Tsanakas P (2006) Computing frequent itemsets in parallel using partial support trees. J Syst Softw 79(12):1735–1743
Article Google Scholar
Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592
Article Google Scholar
Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. doi:10.1155/2014/781670
Vu L, Alaghband G (2014) Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput 40(10):768–785. doi:10.1016/j.parco.2014.08.003
Article Google Scholar
Yu KM, Zhou J (2010) Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst Appl 37(3):2486–2494
Article MathSciNet Google Scholar
Yu K-M, Zhou J, Hong T-P, Zhou J-L (2010) A load-balanced distributed parallel mining algorithm. Expert Syst Appl 37(3):2459–2464
Article Google Scholar
Zaki M, Parthasarathy S, Ogihara M (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373
Article Google Scholar
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank LA DOTD for continuous support in research.

Author information

Authors and Affiliations

Highway Safety Research Group, Louisiana State University, 3535 Nicholson Ext., Baton Rouge, LA, 70803, USA
Ömer M. Soysal, Eera Gupta & Harisha Donepudi

Authors

Ömer M. Soysal
View author publications
Search author on:PubMed Google Scholar
Eera Gupta
View author publications
Search author on:PubMed Google Scholar
Harisha Donepudi
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ömer M. Soysal.

Appendix

See Table 3.

Table 3 Summary of related work

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soysal, Ö.M., Gupta, E. & Donepudi, H. A sparse memory allocation data structure for sequential and parallel association rule mining. J Supercomput 72, 347–370 (2016). https://doi.org/10.1007/s11227-015-1566-x

Download citation

Published: 21 November 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11227-015-1566-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sparse memory allocation data structure for sequential and parallel association rule mining

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Association Rules Mining

A Survey on Representation for Itemsets in Association Rule Mining

An Improved AprioriAll Algorithm Based on Tissue-Like P for Sequential Pattern Mining

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now