PPS: Parallel Pincer Search for Mining Frequent Itemsets Based on Spark

Sethi, Krishan Kumar; Dharavath, Ramesh; Nyakotey, Samuel

doi:10.1007/978-3-319-60618-7_35

Krishan Kumar Sethi¹⁸,
Ramesh Dharavath¹⁸ &
Samuel Nyakotey¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

1307 Accesses

Abstract

Association rule mining is one of prominent techniques to discover the relation between data items of a transactional data. The process of mining has been simplified by considering only the frequent itemsets. Pincer search is one of the frequent itemset mining method which combines top-down and bottom-up search techniques to get the benefits of both. Top-down approach in Pincer search reduces the number of candidates in pass of iterations and saves a lot of computing resources. In this work, we present a Parallel Pincer Search (PPS) which is based on distributed implementation on Spark framework. We have converted the search algorithm according to the Spark framework to make it run in parallel. Spark provides a lot of features for the iterative algorithm such as in-memory execution, efficient data structure, better fault tolerant method, etc. We implemented the PPS on a Spark cluster with multiple datasets and analysed the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceeding VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Lin, D.-I., Kedem, Z.M.: Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowl. Data Eng. 14(3), 553–566 (2002)
Article Google Scholar
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. (Ny) 275, 314–347 (2014)
Article Google Scholar
Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufman, Burlington (1997)
MATH Google Scholar
Apache Hadoop. http://hadoop.apache.org
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)
Article Google Scholar
Karau, H., et al.: Learning Spark: Lightning-fast Big Data Analysis. O’Reilly Media Inc., Newton (2015)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM. 51(1), 107–113 (2008)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI12 Proceedings of 9th USENIX Conference Networked Systems Design and Implementation, p. 2 (2012)
Google Scholar
Zaki, M.J., et al.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29(2), 1–12 (2000). ACM
Article Google Scholar
Zaki, M.J., et al.: New Algorithms for Fast Discovery of Association Rules. KDD 97, 283–286 (1997)
Google Scholar
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Rec. 27(2), 85–93 (1998)
Article Google Scholar
Lin, D.-I., Kedem, Z.M.: Pincer-search: a new algorithm for discovering the maximum frequent set. In: International Conference on Extending Database Technology, pp. 103–119. Springer, Berlin (1998)
Google Scholar
Ye, Y., Chiang, C.-C.: A parallel apriori algorithm for frequent itemsets mining. In: Fourth International Conference on Software Engineering Research, Management and Applications (SERA 2006), pp. 87–94. IEEE (2006)
Google Scholar
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of 6th International Conference on Ubiquitous Information Management and Communication- ICUIMC 2012, p. 76. ACM (2012)
Google Scholar
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of Apriori algorithm based on MapReduce. In: 13th ACIS International Conference on Software Engineering Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 236–241 (2012)
Google Scholar
Yu, R.-M., et al.: An efficient frequent patterns mining algorithm based on MapReduce framework. In: Software Intelligence Technologies and Applications and International Conference on Frontiers of Internet of Things, pp. 1–5 (2014)
Google Scholar
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)
Google Scholar
Lin, X.: MR-Apriori: association rules algorithm based on MapReduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 141–144 (2014)
Google Scholar
Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102. IEEE (2010)
Google Scholar
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: Proceedings of International Parallel and Distributed Processing Symposium IPDPS, pp. 1664–1671 (2014)
Google Scholar
Yang, S., Xu, G., Wang, Z., Zhou, F.: The parallel improved Apriori algorithm research based on spark. In: Proceedings of 2015 9th International Conference on Frontier of Computer Science and Technology FCST 2015, pp. 354–359 (2015)
Google Scholar
Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, pp. 27–34. ACM (2015)
Google Scholar
Gui, F., Ma, Y., Zhang, F., Liu, M., Li, F., Shen, W., Bai, H.: A distributed frequent itemset mining algorithm based on Spark. In: IEEE 19th International Conference on Computer Supported Cooperative Work in Design, vol. 18, pp. 271–275 (2015)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository. http://archive.ics.uci.edu/ml/
Srikant, R.: Synthetic data generation code for association and sequential patterns. Available from the IBM Quest Web site http://www.almaden.ibm.com/cs/quest
Brijs, T.: Retail market basket data set. In: Workshop on Frequent Itemset Mining Implementations (FIMI03). http://fimi.ua.ac.be/data/retail.dat

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, 826004, Jharkhand, India
Krishan Kumar Sethi, Ramesh Dharavath & Samuel Nyakotey

Authors

Krishan Kumar Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh Dharavath
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Nyakotey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramesh Dharavath .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research, Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, USA
Ajith Abraham
VIT University, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
School of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal
Ana Maria Madureira
Universiti Teknikal Malaysia Melaka, Durian Tunggal, Malaysia
Azah Kamilah Muda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sethi, K.K., Dharavath, R., Nyakotey, S. (2018). PPS: Parallel Pincer Search for Mining Frequent Itemsets Based on Spark. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-60618-7_35
Published: 19 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics