Skip to main content

PPS: Parallel Pincer Search for Mining Frequent Itemsets Based on Spark

  • Conference paper
  • First Online:
Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016) (SoCPaR 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Included in the following conference series:

  • 1307 Accesses

Abstract

Association rule mining is one of prominent techniques to discover the relation between data items of a transactional data. The process of mining has been simplified by considering only the frequent itemsets. Pincer search is one of the frequent itemset mining method which combines top-down and bottom-up search techniques to get the benefits of both. Top-down approach in Pincer search reduces the number of candidates in pass of iterations and saves a lot of computing resources. In this work, we present a Parallel Pincer Search (PPS) which is based on distributed implementation on Spark framework. We have converted the search algorithm according to the Spark framework to make it run in parallel. Spark provides a lot of features for the iterative algorithm such as in-memory execution, efficient data structure, better fault tolerant method, etc. We implemented the PPS on a Spark cluster with multiple datasets and analysed the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceeding VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  3. Lin, D.-I., Kedem, Z.M.: Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowl. Data Eng. 14(3), 553–566 (2002)

    Article  Google Scholar 

  4. Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. (Ny) 275, 314–347 (2014)

    Article  Google Scholar 

  5. Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufman, Burlington (1997)

    MATH  Google Scholar 

  6. Apache Hadoop. http://hadoop.apache.org

  7. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)

    Article  Google Scholar 

  8. Karau, H., et al.: Learning Spark: Lightning-fast Big Data Analysis. O’Reilly Media Inc., Newton (2015)

    Google Scholar 

  9. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM. 51(1), 107–113 (2008)

    Article  Google Scholar 

  10. Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI12 Proceedings of 9th USENIX Conference Networked Systems Design and Implementation, p. 2 (2012)

    Google Scholar 

  11. Zaki, M.J., et al.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)

    Article  Google Scholar 

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29(2), 1–12 (2000). ACM

    Article  Google Scholar 

  13. Zaki, M.J., et al.: New Algorithms for Fast Discovery of Association Rules. KDD 97, 283–286 (1997)

    Google Scholar 

  14. Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Rec. 27(2), 85–93 (1998)

    Article  Google Scholar 

  15. Lin, D.-I., Kedem, Z.M.: Pincer-search: a new algorithm for discovering the maximum frequent set. In: International Conference on Extending Database Technology, pp. 103–119. Springer, Berlin (1998)

    Google Scholar 

  16. Ye, Y., Chiang, C.-C.: A parallel apriori algorithm for frequent itemsets mining. In: Fourth International Conference on Software Engineering Research, Management and Applications (SERA 2006), pp. 87–94. IEEE (2006)

    Google Scholar 

  17. Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of 6th International Conference on Ubiquitous Information Management and Communication- ICUIMC 2012, p. 76. ACM (2012)

    Google Scholar 

  18. Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of Apriori algorithm based on MapReduce. In: 13th ACIS International Conference on Software Engineering Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 236–241 (2012)

    Google Scholar 

  19. Yu, R.-M., et al.: An efficient frequent patterns mining algorithm based on MapReduce framework. In: Software Intelligence Technologies and Applications and International Conference on Frontiers of Internet of Things, pp. 1–5 (2014)

    Google Scholar 

  20. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)

    Google Scholar 

  21. Lin, X.: MR-Apriori: association rules algorithm based on MapReduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 141–144 (2014)

    Google Scholar 

  22. Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102. IEEE (2010)

    Google Scholar 

  23. Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: Proceedings of International Parallel and Distributed Processing Symposium IPDPS, pp. 1664–1671 (2014)

    Google Scholar 

  24. Yang, S., Xu, G., Wang, Z., Zhou, F.: The parallel improved Apriori algorithm research based on spark. In: Proceedings of 2015 9th International Conference on Frontier of Computer Science and Technology FCST 2015, pp. 354–359 (2015)

    Google Scholar 

  25. Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, pp. 27–34. ACM (2015)

    Google Scholar 

  26. Gui, F., Ma, Y., Zhang, F., Liu, M., Li, F., Shen, W., Bai, H.: A distributed frequent itemset mining algorithm based on Spark. In: IEEE 19th International Conference on Computer Supported Cooperative Work in Design, vol. 18, pp. 271–275 (2015)

    Google Scholar 

  27. Asuncion, A., Newman, D.: UCI machine learning repository. http://archive.ics.uci.edu/ml/

  28. Srikant, R.: Synthetic data generation code for association and sequential patterns. Available from the IBM Quest Web site http://www.almaden.ibm.com/cs/quest

  29. Brijs, T.: Retail market basket data set. In: Workshop on Frequent Itemset Mining Implementations (FIMI03). http://fimi.ua.ac.be/data/retail.dat

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramesh Dharavath .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Sethi, K.K., Dharavath, R., Nyakotey, S. (2018). PPS: Parallel Pincer Search for Mining Frequent Itemsets Based on Spark. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60618-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60617-0

  • Online ISBN: 978-3-319-60618-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics