Skip to main content

An Improved FP-Growth Algorithm Based on SOM Partition

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 727))

Abstract

FP-growth algorithm is an algorithm for mining association rules without generating candidate sets. It has high practical value in many fields. However, it is a memory resident algorithm, and can only handle small data sets. It seems powerless when dealing with massive data sets. This paper improves the FP-growth algorithm. The core idea of the improved algorithm is to partition massive data set into small data sets, which would be dealt with separately. Firstly, systematic sampling methods are used to extract representative samples from large data sets, and these samples are used to make SOM (Self-organizing Map) cluster analysis. Then, the large data set is partitioned into several subsets according to the cluster results. Lastly, FP-growth algorithm is executed in each subset, and association rules are mined. The experimental result shows that the improved algorithm reduces the memory consumption, and shortens the time of data mining. The processing capacity and efficiency of massive data is enhanced by the improved algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adnan, M., Alhajj, R.: A bounded and adaptive memory-based approach to mine frequent patterns from very large databases. IEEE Trans. Syst. Man Cybern. B Cybern. 41(1), 154–172 (2011)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  3. Aouad, L.M., Le-Khac, N.A., Kechadi, T.M.: Distributed frequent itemsets mining in heterogeneous platforms. J. Eng. Comput. Archit. 1(2), 1–12 (2007)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  5. El-Hajj, M., Zaiane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: 12th International Conference on Parallel and Distributed Systems, ICPADS 2006, vol. 1, 8-p. IEEE (2006)

    Google Scholar 

  6. Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. ACM SIGKDD Explor. Newsl. 6(1), 109–117 (2004)

    Article  Google Scholar 

  7. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)

    Article  Google Scholar 

  8. Hearn, D.W., Vijay, J.: Efficient algorithms for the (weighted) minimum circle problem. Oper. Res. 30(4), 777–795 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  9. Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1275–1285. VLDB Endowment (2007)

    Google Scholar 

  10. May, R.: Mining association rules between sets of items in large database. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)

    Google Scholar 

  11. Park, J.S., Chen, M.S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), 813–825 (1997)

    Article  Google Scholar 

  12. Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE (2014)

    Google Scholar 

  13. Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010)

    Google Scholar 

  14. Zou, X., Zhang, W., Liu, Y., Cai, Q.: Study on distributed sequential pattern discovery algorithm. J. Softw. 16(7), 1262–1269 (2005)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuikui Jia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Jia, K., Liu, H. (2017). An Improved FP-Growth Algorithm Based on SOM Partition. In: Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-6385-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6385-5_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6384-8

  • Online ISBN: 978-981-10-6385-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics