Abstract
Mining frequent itemsets is an essential data mining problem. As the big data era comes, the size of databases is becoming so large that traditional algorithms will not scale well. An approach to the issue is to parallelize the mining algorithm, which however is a challenge that has not been well addressed yet. In this paper, we propose a MapReduce-based algorithm, Peclat, that parallelizes the vertical mining algorithm, Eclat, with three improvements. First, Peclat proposes a hybrid vertical data format to represent the data, which saves both space and time in the mining process. Second, Peclat adopts the pruning technique from the Apriori algorithm to improve efficiency of breadth-first search. Third, Peclat employs an ordering of itemsets that helps balancing the workloads. Extensive experiments demonstrate that Peclat outperforms the existing MapReduce-based algorithms significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th VLDB, p. 487 (1994)
Chen, X., He, Y., Chen, P., Miao, S., Song, W., Yue, M.: HPFP-Miner: a novel parallel frequent itemset mining algorithm. ICNC 3, 139–143 (2009)
Cyrans, J.-D., Ratt, S., Champagne, R.: Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. In: 2010 DBKDA, pp. 185–189 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: 15th ICDE, pp. 522–529 (1999)
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188 (2013)
Hammoud, S.: MapReduce network enabled algorithms for classification based on association rules. Ph.D. Thesis, Brunel University (2011)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 SIGMOD, pp. 1–12 (2000)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: 2008 ACM Conference on Recommender System (RecSys 2008), pp. 107–114 (2008)
Li, L., Zhang, M.: The strategy of mining association rules based on cloud computing. In: BCGIN, pp. 475–478 (2011)
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing, pp. 236–241 (2012)
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC (2012)
Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a randomized parallel algorithm for approximate association rule mining in MapReduce. In: 21st CIKM, pp. 85–94 (2012)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: 1998 SIGMOD, pp. 343–354 (1998)
Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000)
Sohrabi, M.K., Barforoush, A.A.: Parallel frequent itemset mining using systolic arrays. Knowl. Based Syst. 37, 462–471 (2013)
Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: ICIS (2010)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl Data Eng. 12(3), 372–390 (2000)
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003)
Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)
Apache Mahout. http://mahout.apache.org/
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (61272306), and the Zhejiang Provincial Natural Science Foundation of China (LY12F02024).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, J., Wu, Y., Zhou, Q., Fung, B.C.M., Chen, F., Yu, B. (2015). Parallel Eclat for Opportunistic Mining of Frequent Itemsets. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-22849-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)