Abstract
Association rule mining is one of prominent techniques to discover the relation between data items of a transactional data. The process of mining has been simplified by considering only the frequent itemsets. Pincer search is one of the frequent itemset mining method which combines top-down and bottom-up search techniques to get the benefits of both. Top-down approach in Pincer search reduces the number of candidates in pass of iterations and saves a lot of computing resources. In this work, we present a Parallel Pincer Search (PPS) which is based on distributed implementation on Spark framework. We have converted the search algorithm according to the Spark framework to make it run in parallel. Spark provides a lot of features for the iterative algorithm such as in-memory execution, efficient data structure, better fault tolerant method, etc. We implemented the PPS on a Spark cluster with multiple datasets and analysed the performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceeding VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Lin, D.-I., Kedem, Z.M.: Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowl. Data Eng. 14(3), 553–566 (2002)
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. (Ny) 275, 314–347 (2014)
Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufman, Burlington (1997)
Apache Hadoop. http://hadoop.apache.org
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)
Karau, H., et al.: Learning Spark: Lightning-fast Big Data Analysis. O’Reilly Media Inc., Newton (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM. 51(1), 107–113 (2008)
Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI12 Proceedings of 9th USENIX Conference Networked Systems Design and Implementation, p. 2 (2012)
Zaki, M.J., et al.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29(2), 1–12 (2000). ACM
Zaki, M.J., et al.: New Algorithms for Fast Discovery of Association Rules. KDD 97, 283–286 (1997)
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Rec. 27(2), 85–93 (1998)
Lin, D.-I., Kedem, Z.M.: Pincer-search: a new algorithm for discovering the maximum frequent set. In: International Conference on Extending Database Technology, pp. 103–119. Springer, Berlin (1998)
Ye, Y., Chiang, C.-C.: A parallel apriori algorithm for frequent itemsets mining. In: Fourth International Conference on Software Engineering Research, Management and Applications (SERA 2006), pp. 87–94. IEEE (2006)
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of 6th International Conference on Ubiquitous Information Management and Communication- ICUIMC 2012, p. 76. ACM (2012)
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of Apriori algorithm based on MapReduce. In: 13th ACIS International Conference on Software Engineering Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 236–241 (2012)
Yu, R.-M., et al.: An efficient frequent patterns mining algorithm based on MapReduce framework. In: Software Intelligence Technologies and Applications and International Conference on Frontiers of Internet of Things, pp. 1–5 (2014)
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)
Lin, X.: MR-Apriori: association rules algorithm based on MapReduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 141–144 (2014)
Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102. IEEE (2010)
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: Proceedings of International Parallel and Distributed Processing Symposium IPDPS, pp. 1664–1671 (2014)
Yang, S., Xu, G., Wang, Z., Zhou, F.: The parallel improved Apriori algorithm research based on spark. In: Proceedings of 2015 9th International Conference on Frontier of Computer Science and Technology FCST 2015, pp. 354–359 (2015)
Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, pp. 27–34. ACM (2015)
Gui, F., Ma, Y., Zhang, F., Liu, M., Li, F., Shen, W., Bai, H.: A distributed frequent itemset mining algorithm based on Spark. In: IEEE 19th International Conference on Computer Supported Cooperative Work in Design, vol. 18, pp. 271–275 (2015)
Asuncion, A., Newman, D.: UCI machine learning repository. http://archive.ics.uci.edu/ml/
Srikant, R.: Synthetic data generation code for association and sequential patterns. Available from the IBM Quest Web site http://www.almaden.ibm.com/cs/quest
Brijs, T.: Retail market basket data set. In: Workshop on Frequent Itemset Mining Implementations (FIMI03). http://fimi.ua.ac.be/data/retail.dat
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Sethi, K.K., Dharavath, R., Nyakotey, S. (2018). PPS: Parallel Pincer Search for Mining Frequent Itemsets Based on Spark. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-60618-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)