A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Saleti, Sumalatha; Subramanyam, R. B. V.

doi:10.1007/s10586-017-1249-x

A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Published: 14 October 2017

Volume 21, pages 1365–1380, (2018)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Sumalatha Saleti¹ &
R. B. V. Subramanyam¹

319 Accesses
4 Citations
Explore all metrics

Abstract

Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms. We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Article 13 March 2021

MapReduce-Based Frequent Pattern Mining Framework with Multiple Item Support

Retrieval of Frequent Itemset Using Improved Mining Algorithm in Hadoop

References

Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)
Article Google Scholar
Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Article Google Scholar
Al-kahtani, M.S., Karim, L.: An efficient distributed algorithm for big data processing. Arab. J. Sci. Eng. doi:10.1007/s13369-016-2405-y
Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)
Article Google Scholar
Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)
Article Google Scholar
Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)
Article MATH Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)
Article Google Scholar
Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)
Article MathSciNet Google Scholar
Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016)
Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)
Article Google Scholar
Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008)
Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011)
Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012)
Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013)
Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)
Article Google Scholar
Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016)
Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015)
Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)
Article Google Scholar
Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)
Article Google Scholar
Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015)
Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)
Article Google Scholar
Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)
Article Google Scholar
Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)
Article Google Scholar
Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008)
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013)
Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)
Article Google Scholar
Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)
Article MathSciNet Google Scholar
Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005)
Google Scholar
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003)
Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)
Article Google Scholar
Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014)
Google Scholar
Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010)

Download references

Author information

Authors and Affiliations

National Institute of Technology, Warangal, India
Sumalatha Saleti & R. B. V. Subramanyam

Authors

Sumalatha Saleti
View author publications
You can also search for this author in PubMed Google Scholar
R. B. V. Subramanyam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumalatha Saleti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saleti, S., Subramanyam, R.B.V. A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework. Cluster Comput 21, 1365–1380 (2018). https://doi.org/10.1007/s10586-017-1249-x

Download citation

Received: 17 May 2017
Revised: 28 August 2017
Accepted: 05 October 2017
Published: 14 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10586-017-1249-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Abstract

Access this article

Similar content being viewed by others

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

MapReduce-Based Frequent Pattern Mining Framework with Multiple Item Support

Retrieval of Frequent Itemset Using Improved Mining Algorithm in Hadoop

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Abstract

Access this article

Similar content being viewed by others

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

MapReduce-Based Frequent Pattern Mining Framework with Multiple Item Support

Retrieval of Frequent Itemset Using Improved Mining Algorithm in Hadoop

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation