Skip to main content
Log in

A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms. We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)

    Article  Google Scholar 

  2. Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)

    Article  Google Scholar 

  3. Al-kahtani, M.S., Karim, L.: An efficient distributed algorithm for big data processing. Arab. J. Sci. Eng. doi:10.1007/s13369-016-2405-y

  4. Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)

    Article  Google Scholar 

  5. Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)

    Article  Google Scholar 

  6. Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)

    Article  MATH  Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)

    Article  Google Scholar 

  9. Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)

    Article  MathSciNet  Google Scholar 

  10. Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016)

  11. Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)

    Article  Google Scholar 

  12. Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  13. Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013)

  14. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008)

  15. Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011)

  16. Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012)

  17. Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)

  18. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013)

  19. Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)

    Article  Google Scholar 

  20. Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016)

  21. Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015)

  22. Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)

    Article  Google Scholar 

  23. Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)

    Article  Google Scholar 

  24. Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015)

  25. Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)

    Article  Google Scholar 

  26. Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)

    Article  Google Scholar 

  27. Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)

    Article  Google Scholar 

  28. Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008)

  29. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Article  Google Scholar 

  30. Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013)

  31. Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)

    Article  Google Scholar 

  32. Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)

    Article  MathSciNet  Google Scholar 

  33. Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005)

    Google Scholar 

  34. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003)

  35. Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)

    Article  Google Scholar 

  36. Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014)

    Google Scholar 

  37. Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumalatha Saleti.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saleti, S., Subramanyam, R.B.V. A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework. Cluster Comput 21, 1365–1380 (2018). https://doi.org/10.1007/s10586-017-1249-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1249-x

Keywords

Navigation