Skip to main content

Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11029))

Included in the following conference series:

Abstract

Advances in technology and the increasing growth of popularity on Internet of Things (IoT) for many applications have produced huge volume of data at a high velocity. These valuable big data can be of a wide variety or different veracity. Embedded in these big data are useful information and valuable knowledge. This leads to data science, which aims to apply big data analytics to mine implicit, previously unknown and potentially useful information from big data. As a popular data analytic task, frequent itemset mining discovers knowledge about sets of frequently co-occurring items in the big data. Such a task has drawn attention in both academia and industry partially due to its practicality in various real-life applications. Existing mining approaches mostly use serial, distributed or parallel algorithms to mine the data horizontally (i.e., on a transaction basis). In this paper, we present an alternative big data analytic approach. Specifically, our scalable algorithm uses the MapReduce programming model that runs in a Spark environment to mine the data vertically (i.e., on an item basis). Evaluation results show the effectiveness of our algorithm in big data analytics of frequent itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–399 (1994)

    Google Scholar 

  2. Arora, N.R., Lee, W., Leung, C.K.-S., Kim, J., Kumar, H.: Efficient fuzzy ranking for keyword search on graphs. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part I. LNCS, vol. 7446, pp. 502–510. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32600-4_38

    Chapter  Google Scholar 

  3. Braun, P., Cuzzocrea, A., Jiang, F., Leung, C.K.-S., Pazdor, A.G.M.: MapReduce-based complex big data analytics over uncertain and imprecise social networks. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 130–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_10

    Chapter  Google Scholar 

  4. Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Pazdor, A.G.M., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017)

    Article  Google Scholar 

  5. Brown, J.A., Cuzzocrea, A., Kresta, M., Kristjanson, K.D.L., Leung, C.K., Tebinka, T.W.: A machine learning system for supporting advanced knowledge discovery from chess game data. In: IEEE ICMLA 2017, pp. 649–654 (2017)

    Google Scholar 

  6. Chen, Y.C., Wang, E.T., Chen, A.L.P.: Mining user trajectories from smartphone data considering data uncertainty. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 51–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43946-4_4

    Chapter  Google Scholar 

  7. Cuzzocrea, A., Jiang, F., Leung, C.K., Liu, D., Peddle, A., Tanbeer, S.K.: Mining popular patterns: a novel mining problem and its application to static transactional databases and dynamic data streams. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI. LNCS, vol. 9260, pp. 115–139. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47804-2_6

    Chapter  Google Scholar 

  8. Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. JMLR 15(1), 3389–3393 (2014)

    MATH  Google Scholar 

  9. Gan, W., Lin, J.C.-W., Fournier-Viger, P., Chao, H.-C.: Mining recent high-utility patterns from temporal databases with time-sensitive constraint. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43946-4_1

    Chapter  Google Scholar 

  10. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)

    Google Scholar 

  11. Hoi, C.S.H., Leung, C.K., Tran, K., Cuzzocrea, A., Bochicchio, M., Simonetti, M.: Supporting social information discovery from big uncertain social key-value data via graph-like metaphors. In: Xiao, J., Mao, Z.-H., Suzumura, T., Zhang, L.-J. (eds.) ICCC 2018. LNCS, vol. 10971, pp. 102–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94307-7_8

    Chapter  Google Scholar 

  12. Islam, M.A., Ahmed, C.F., Leung, C.K., Hoi, C.S.H.: WFSM-MaxPWS: an efficient approach for mining weighted frequent subgraphs from edge-weighted graph databases. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 664–676. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_52

    Chapter  Google Scholar 

  13. Leung, C.K.: Big data analysis and mining. In: Encyclopedia of Information Science and Technology, 4th edn, pp. 338–348 (2018)

    Google Scholar 

  14. Leung, C.K.: Data and visual analytics for emerging databases. In: Lee, W., Choi, W., Jung, S., Song, M. (eds.) Proceedings of the 7th International Conference on Emerging Databases. LNEE, vol. 461, pp. 203–213. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6520-0_21

    Chapter  Google Scholar 

  15. Leung, C.K., Carmichael, C.L., Johnstone, P., Xing, R.R., Yuen, D.S.H.: Interactive visual analytics of big data. In: Ontologies and Big Data Considerations for Effective Intelligence, pp. 1–26 (2017)

    Google Scholar 

  16. Leung, C.K.-S., Jiang, F.: Big data analytics of social networks for the discovery of “following” patterns. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 123–135. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_10

    Chapter  Google Scholar 

  17. Leung, C.K.-S., MacKinnon, R.K.: Balancing tree size and accuracy in fast mining of uncertain frequent patterns. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 57–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_5

    Chapter  Google Scholar 

  18. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: ACM RecSys 2008, pp. 107–114 (2008)

    Google Scholar 

  19. Liu, J., Li, J., Xu, S., Fung, B.C.M.: Secure outsourced frequent pattern mining by fully homomorphic encryption. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 70–81. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_6

    Chapter  Google Scholar 

  20. Liu, J., Wu, Y., Zhou, Q., Fung, B.C.M., Chen, F., Yu, B.: Parallel Eclat for opportunistic mining of frequent itemsets. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015, Part I. LNCS, vol. 9261, pp. 401–415. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22849-5_27

    Chapter  Google Scholar 

  21. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE BigData 2013, pp. 111–118 (2013)

    Google Scholar 

  22. Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-Mine: hyper-structure mining of frequent patterns in large databases. In: IEEE ICDM 2001, pp. 441–448 (2001)

    Google Scholar 

  23. Qiu, H., Gu, R., Yuan, C., Huang Y.: YAFIM: a parallel frequent itemset mining algorithm with Spark. In: IEEE IPDPS 2014 Workshops, pp. 1664–1671 (2014)

    Google Scholar 

  24. Rahman, M.M., Ahmed, C.F., Leung, C.K., Pazdor, A.G.M.: Frequent sequence mining with weight constraints in uncertain databases. In: ACM IMCOM 2018, Article no. 48 (2018)

    Google Scholar 

  25. Shafer, T.: The 42 V’s of big data and data science (2017). https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html

  26. Shenoy, P., Bhalotia, J.R., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: ACM SIGMOD 2000, pp. 22–33 (2000)

    Google Scholar 

  27. Wang, K., Tang, L., Han, J., Liu, J.: Top down FP-growth for association rule mining. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 334–340. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_34

    Chapter  Google Scholar 

  28. Zaki, M.J.: Scalable algorithms for association mining. IEEE TKDE 12(3), 372–390 (2000)

    Google Scholar 

  29. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: KDD 2003, pp. 326–335 (2003)

    Google Scholar 

  30. Zhang, Z., Ji, G., Tang, M.: MREclat: an algorithm for parallel mining frequent itemsets. In: CBD 2013, pp. 177–180 (2013)

    Google Scholar 

Download references

Acknowledgements

This project is partially supported by NSERC (Canada) and University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson K. Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leung, C.K., Zhang, H., Souza, J., Lee, W. (2018). Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11029. Springer, Cham. https://doi.org/10.1007/978-3-319-98809-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98809-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98808-5

  • Online ISBN: 978-3-319-98809-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics