An Improved FP-Growth Algorithm Based on SOM Partition

Jia, Kuikui; Liu, Haibin

doi:10.1007/978-981-10-6385-5_15

Kuikui Jia¹⁵ &
Haibin Liu¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 727))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

2553 Accesses
2 Citations

Abstract

FP-growth algorithm is an algorithm for mining association rules without generating candidate sets. It has high practical value in many fields. However, it is a memory resident algorithm, and can only handle small data sets. It seems powerless when dealing with massive data sets. This paper improves the FP-growth algorithm. The core idea of the improved algorithm is to partition massive data set into small data sets, which would be dealt with separately. Firstly, systematic sampling methods are used to extract representative samples from large data sets, and these samples are used to make SOM (Self-organizing Map) cluster analysis. Then, the large data set is partitioned into several subsets according to the cluster results. Lastly, FP-growth algorithm is executed in each subset, and association rules are mined. The experimental result shows that the improved algorithm reduces the memory consumption, and shortens the time of data mining. The processing capacity and efficiency of massive data is enhanced by the improved algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adnan, M., Alhajj, R.: A bounded and adaptive memory-based approach to mine frequent patterns from very large databases. IEEE Trans. Syst. Man Cybern. B Cybern. 41(1), 154–172 (2011)
Article Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Aouad, L.M., Le-Khac, N.A., Kechadi, T.M.: Distributed frequent itemsets mining in heterogeneous platforms. J. Eng. Comput. Archit. 1(2), 1–12 (2007)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
El-Hajj, M., Zaiane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: 12th International Conference on Parallel and Distributed Systems, ICPADS 2006, vol. 1, 8-p. IEEE (2006)
Google Scholar
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. ACM SIGKDD Explor. Newsl. 6(1), 109–117 (2004)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)
Article Google Scholar
Hearn, D.W., Vijay, J.: Efficient algorithms for the (weighted) minimum circle problem. Oper. Res. 30(4), 777–795 (1982)
Article MathSciNet MATH Google Scholar
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1275–1285. VLDB Endowment (2007)
Google Scholar
May, R.: Mining association rules between sets of items in large database. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Google Scholar
Park, J.S., Chen, M.S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), 813–825 (1997)
Article Google Scholar
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE (2014)
Google Scholar
Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010)
Google Scholar
Zou, X., Zhang, W., Liu, Y., Cai, Q.: Study on distributed sequential pattern discovery algorithm. J. Softw. 16(7), 1262–1269 (2005)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

China Aerospace Academy of Systems Science and Engineering, Beijing, 100048, China
Kuikui Jia & Haibin Liu

Authors

Kuikui Jia
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuikui Jia .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Beiji Zou
Central South University, Changsha, China
Min Li
Harbin Institute of Technology, Harbin, China
Hongzhi Wang
Harbin University of Science and Technology, Harbin, China
Xianhua Song
Harbin University of Science and Technology, Harbin, China
Wei Xie
Harbin Sea of Clouds and Computer Technology, Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, K., Liu, H. (2017). An Improved FP-Growth Algorithm Based on SOM Partition. In: Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-6385-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-6385-5_15
Published: 16 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6384-8
Online ISBN: 978-981-10-6385-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics