Mining Frequent Itemsets with Vertical Data Layout in MapReduce

Jen, Tao-Yuan; Marinica, Claudia; Ghariani, Abir

doi:10.1007/978-3-319-38901-1_5

Tao-Yuan Jen¹⁴,
Claudia Marinica¹⁴ &
Abir Ghariani¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 497))

Included in the following conference series:

International Workshop on Information Search, Integration, and Personalization

279 Accesses

Abstract

Frequent itemset mining is a Data Mining technique aiming to generate from a dataset new and interesting information under the form of sets of items. Several algorithms were already proposed, and successfully implemented and used such as Apriori, FP-Growth and Eclat, along with numerous improvements. These algorithms deal with two types of data layouts: horizontal and vertical; the former corresponds to the traditional layout (the individuals as rows and the items as columns) and it is more used due to its facility, but the latter brings important computation reductions. The standard frequent itemset mining algorithms have a high computational complexity and, given the available massive datasets, new approaches were proposed in the literature implementing mining algorithms in parallel, distributed, and lately Cloud Computing paradigms.

In order to overcome the drawbacks related to the computational issues, in this paper, we propose, Apriori_V, a new parallel algorithm for frequent itemset mining from a vertical data layout that was implemented on the MapReduce platform. Apriori_V brings significant improvements related to (1) the use of the vertical data layout with an Apriori-like strategy allowing to reduce the number of operations due to the elimination of several Apriopri-specific tasks such as the pruning, and (2) decrease of the underlying complexity and thus the execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://hadoop.apache.org/.
2.
https://spark.apache.org/.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Article Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering, pp. 443–452. IEEE Computer Society, Washington DC (2001)
Google Scholar
Chu, C.-T., Kim, S.K., Lin, Y.-A., YuanYuan, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, 4–7 December 2006, pp. 281–288 (2006)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on mapreduce framework. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013, pp. 1183–1188. ACM, New York (2013)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. From Data Mining to Knowledge Discovery: An Overview. American Association for Artificial Intelligence, Menlo Park (1996)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Huang, D., Song, Y., Routray, R., Qin, F.: Smartcache: an optimized mapreduce implementation of frequent itemset mining. In: IEEE International Conference on Cloud Engineering (IC2E) (2014)
Google Scholar
Jen, T.-Y., Taouil, R., Laurent, D.: A dichotomous algorithm for association rule mining. In: 15th International Workshop on Database and Expert Systems Applications (DEXA 2004), with CD-ROM, 30 August–3 September, Zaragoza, pp. 567–571 (2004)
Google Scholar
Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of the International Conference on Business Computing and Global Informatization, BCGIN 2011, pp. 475–478. IEEE Computer Society, Washington DC (2011)
Google Scholar
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel Distributed Computing (SNPD), pp. 236–241, August 2012
Google Scholar
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC, pp. 76:1–76:8. ACM, New York (2012)
Google Scholar
Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 22–33. ACM, New York (2000)
Google Scholar
Singh, S., Garg, R., Mishra, P.K.: A comparative study of association rule mining algorithms on grid and cloud platform. International Assoc. Sci. Innov. Res. (IASIR) 2 (2014)
Google Scholar
Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11, 2809–2816 (2014)
Article Google Scholar
Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of apriori algorithm based on hadoop-mapreduce model. Int. J. Rev. Comput. 12, 59–67 (2012)
Google Scholar
Yang, X.Y., Liu, Z., Yan, F.: Mapreduce as a programming model for association rules algorithm on hadoop. In: 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102, June 2010
Google Scholar
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 326–335. ACM, New York (2003)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. Technical report, Rochester, NY, USA (1997)
Google Scholar
Zhang, Z., Ji, G., Tang, M.: Mreclat: an algorithm for parallel mining frequent itemsets. In: Proceedings of the International Conference on Advanced Cloud and Big Data, CBD 2013, pp. 177–180. IEEE Computer Society, Washington DC (2013)
Google Scholar

Download references

Acknowledgements

We would like to gratefully thank Dimitris Kotzinos (ETIS - ENSEA/University of Cergy-Pontoise/CNRS 8051) for his contributions and support during this work.

Author information

Authors and Affiliations

ETIS Laboratory, ENSEA/University of Cergy-Pontoise/CNRS 8051, Cergy-Pontoise, France
Tao-Yuan Jen, Claudia Marinica & Abir Ghariani

Authors

Tao-Yuan Jen
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Marinica
View author publications
You can also search for this author in PubMed Google Scholar
Abir Ghariani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudia Marinica .

Editor information

Editors and Affiliations

Lab. ETIS, Sciences Informatiques, Université de Cergy-Pontoise, Pontoise, France
Dimitrios Kotzinos
HELP University, Kuala Lumpur, Malaysia
Yeow Wei Choong
LRI, University of Paris South, Orsay, France
Nicolas Spyratos
Information Science, Knowledge Media Lab, Hokkaido University, Sapporo, Hokkaido, Japan
Yuzuru Tanaka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jen, TY., Marinica, C., Ghariani, A. (2016). Mining Frequent Itemsets with Vertical Data Layout in MapReduce. In: Kotzinos, D., Choong, Y., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration and Personalization. ISIP 2014. Communications in Computer and Information Science, vol 497. Springer, Cham. https://doi.org/10.1007/978-3-319-38901-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-38901-1_5
Published: 06 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38900-4
Online ISBN: 978-3-319-38901-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics