Abstract
Frequent itemset mining is a Data Mining technique aiming to generate from a dataset new and interesting information under the form of sets of items. Several algorithms were already proposed, and successfully implemented and used such as Apriori, FP-Growth and Eclat, along with numerous improvements. These algorithms deal with two types of data layouts: horizontal and vertical; the former corresponds to the traditional layout (the individuals as rows and the items as columns) and it is more used due to its facility, but the latter brings important computation reductions. The standard frequent itemset mining algorithms have a high computational complexity and, given the available massive datasets, new approaches were proposed in the literature implementing mining algorithms in parallel, distributed, and lately Cloud Computing paradigms.
In order to overcome the drawbacks related to the computational issues, in this paper, we propose, Apriori_V, a new parallel algorithm for frequent itemset mining from a vertical data layout that was implemented on the MapReduce platform. Apriori_V brings significant improvements related to (1) the use of the vertical data layout with an Apriori-like strategy allowing to reduce the number of operations due to the elimination of several Apriopri-specific tasks such as the pruning, and (2) decrease of the underlying complexity and thus the execution time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering, pp. 443–452. IEEE Computer Society, Washington DC (2001)
Chu, C.-T., Kim, S.K., Lin, Y.-A., YuanYuan, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, 4–7 December 2006, pp. 281–288 (2006)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on mapreduce framework. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013, pp. 1183–1188. ACM, New York (2013)
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. From Data Mining to Knowledge Discovery: An Overview. American Association for Artificial Intelligence, Menlo Park (1996)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Huang, D., Song, Y., Routray, R., Qin, F.: Smartcache: an optimized mapreduce implementation of frequent itemset mining. In: IEEE International Conference on Cloud Engineering (IC2E) (2014)
Jen, T.-Y., Taouil, R., Laurent, D.: A dichotomous algorithm for association rule mining. In: 15th International Workshop on Database and Expert Systems Applications (DEXA 2004), with CD-ROM, 30 August–3 September, Zaragoza, pp. 567–571 (2004)
Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of the International Conference on Business Computing and Global Informatization, BCGIN 2011, pp. 475–478. IEEE Computer Society, Washington DC (2011)
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel Distributed Computing (SNPD), pp. 236–241, August 2012
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC, pp. 76:1–76:8. ACM, New York (2012)
Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 22–33. ACM, New York (2000)
Singh, S., Garg, R., Mishra, P.K.: A comparative study of association rule mining algorithms on grid and cloud platform. International Assoc. Sci. Innov. Res. (IASIR) 2 (2014)
Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11, 2809–2816 (2014)
Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of apriori algorithm based on hadoop-mapreduce model. Int. J. Rev. Comput. 12, 59–67 (2012)
Yang, X.Y., Liu, Z., Yan, F.: Mapreduce as a programming model for association rules algorithm on hadoop. In: 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102, June 2010
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 326–335. ACM, New York (2003)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. Technical report, Rochester, NY, USA (1997)
Zhang, Z., Ji, G., Tang, M.: Mreclat: an algorithm for parallel mining frequent itemsets. In: Proceedings of the International Conference on Advanced Cloud and Big Data, CBD 2013, pp. 177–180. IEEE Computer Society, Washington DC (2013)
Acknowledgements
We would like to gratefully thank Dimitris Kotzinos (ETIS - ENSEA/University of Cergy-Pontoise/CNRS 8051) for his contributions and support during this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jen, TY., Marinica, C., Ghariani, A. (2016). Mining Frequent Itemsets with Vertical Data Layout in MapReduce. In: Kotzinos, D., Choong, Y., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration and Personalization. ISIP 2014. Communications in Computer and Information Science, vol 497. Springer, Cham. https://doi.org/10.1007/978-3-319-38901-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-38901-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38900-4
Online ISBN: 978-3-319-38901-1
eBook Packages: Computer ScienceComputer Science (R0)