Abstract
Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms. We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.











Similar content being viewed by others
References
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)
Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Al-kahtani, M.S., Karim, L.: An efficient distributed algorithm for big data processing. Arab. J. Sci. Eng. doi:10.1007/s13369-016-2405-y
Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)
Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)
Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)
Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)
Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016)
Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)
Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)
Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008)
Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011)
Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012)
Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013)
Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)
Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016)
Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015)
Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)
Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)
Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015)
Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)
Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)
Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)
Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008)
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013)
Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)
Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)
Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005)
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003)
Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)
Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014)
Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saleti, S., Subramanyam, R.B.V. A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework. Cluster Comput 21, 1365–1380 (2018). https://doi.org/10.1007/s10586-017-1249-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1249-x