Skip to main content

Mining Frequent Itemsets with Vertical Data Layout in MapReduce

  • Conference paper
  • First Online:
Book cover Information Search, Integration and Personalization (ISIP 2014)

Abstract

Frequent itemset mining is a Data Mining technique aiming to generate from a dataset new and interesting information under the form of sets of items. Several algorithms were already proposed, and successfully implemented and used such as Apriori, FP-Growth and Eclat, along with numerous improvements. These algorithms deal with two types of data layouts: horizontal and vertical; the former corresponds to the traditional layout (the individuals as rows and the items as columns) and it is more used due to its facility, but the latter brings important computation reductions. The standard frequent itemset mining algorithms have a high computational complexity and, given the available massive datasets, new approaches were proposed in the literature implementing mining algorithms in parallel, distributed, and lately Cloud Computing paradigms.

In order to overcome the drawbacks related to the computational issues, in this paper, we propose, Apriori_V, a new parallel algorithm for frequent itemset mining from a vertical data layout that was implemented on the MapReduce platform. Apriori_V brings significant improvements related to (1) the use of the vertical data layout with an Apriori-like strategy allowing to reduce the number of operations due to the elimination of several Apriopri-specific tasks such as the pruning, and (2) decrease of the underlying complexity and thus the execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://hadoop.apache.org/.

  2. 2.

    https://spark.apache.org/.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  2. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  3. Burdick, D., Calimlim, M., Gehrke, J.: Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering, pp. 443–452. IEEE Computer Society, Washington DC (2001)

    Google Scholar 

  4. Chu, C.-T., Kim, S.K., Lin, Y.-A., YuanYuan, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, 4–7 December 2006, pp. 281–288 (2006)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on mapreduce framework. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013, pp. 1183–1188. ACM, New York (2013)

    Google Scholar 

  8. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. From Data Mining to Knowledge Discovery: An Overview. American Association for Artificial Intelligence, Menlo Park (1996)

    Google Scholar 

  9. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  10. Huang, D., Song, Y., Routray, R., Qin, F.: Smartcache: an optimized mapreduce implementation of frequent itemset mining. In: IEEE International Conference on Cloud Engineering (IC2E) (2014)

    Google Scholar 

  11. Jen, T.-Y., Taouil, R., Laurent, D.: A dichotomous algorithm for association rule mining. In: 15th International Workshop on Database and Expert Systems Applications (DEXA 2004), with CD-ROM, 30 August–3 September, Zaragoza, pp. 567–571 (2004)

    Google Scholar 

  12. Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of the International Conference on Business Computing and Global Informatization, BCGIN 2011, pp. 475–478. IEEE Computer Society, Washington DC (2011)

    Google Scholar 

  13. Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel Distributed Computing (SNPD), pp. 236–241, August 2012

    Google Scholar 

  14. Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC, pp. 76:1–76:8. ACM, New York (2012)

    Google Scholar 

  15. Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 22–33. ACM, New York (2000)

    Google Scholar 

  16. Singh, S., Garg, R., Mishra, P.K.: A comparative study of association rule mining algorithms on grid and cloud platform. International Assoc. Sci. Innov. Res. (IASIR) 2 (2014)

    Google Scholar 

  17. Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11, 2809–2816 (2014)

    Article  Google Scholar 

  18. Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of apriori algorithm based on hadoop-mapreduce model. Int. J. Rev. Comput. 12, 59–67 (2012)

    Google Scholar 

  19. Yang, X.Y., Liu, Z., Yan, F.: Mapreduce as a programming model for association rules algorithm on hadoop. In: 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102, June 2010

    Google Scholar 

  20. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 326–335. ACM, New York (2003)

    Google Scholar 

  21. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. Technical report, Rochester, NY, USA (1997)

    Google Scholar 

  22. Zhang, Z., Ji, G., Tang, M.: Mreclat: an algorithm for parallel mining frequent itemsets. In: Proceedings of the International Conference on Advanced Cloud and Big Data, CBD 2013, pp. 177–180. IEEE Computer Society, Washington DC (2013)

    Google Scholar 

Download references

Acknowledgements

We would like to gratefully thank Dimitris Kotzinos (ETIS - ENSEA/University of Cergy-Pontoise/CNRS 8051) for his contributions and support during this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Marinica .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jen, TY., Marinica, C., Ghariani, A. (2016). Mining Frequent Itemsets with Vertical Data Layout in MapReduce. In: Kotzinos, D., Choong, Y., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration and Personalization. ISIP 2014. Communications in Computer and Information Science, vol 497. Springer, Cham. https://doi.org/10.1007/978-3-319-38901-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38901-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38900-4

  • Online ISBN: 978-3-319-38901-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics