Abstract
FP-growth algorithm is an algorithm for mining association rules without generating candidate sets. It has high practical value in many fields. However, it is a memory resident algorithm, and can only handle small data sets. It seems powerless when dealing with massive data sets. This paper improves the FP-growth algorithm. The core idea of the improved algorithm is to partition massive data set into small data sets, which would be dealt with separately. Firstly, systematic sampling methods are used to extract representative samples from large data sets, and these samples are used to make SOM (Self-organizing Map) cluster analysis. Then, the large data set is partitioned into several subsets according to the cluster results. Lastly, FP-growth algorithm is executed in each subset, and association rules are mined. The experimental result shows that the improved algorithm reduces the memory consumption, and shortens the time of data mining. The processing capacity and efficiency of massive data is enhanced by the improved algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adnan, M., Alhajj, R.: A bounded and adaptive memory-based approach to mine frequent patterns from very large databases. IEEE Trans. Syst. Man Cybern. B Cybern. 41(1), 154–172 (2011)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Aouad, L.M., Le-Khac, N.A., Kechadi, T.M.: Distributed frequent itemsets mining in heterogeneous platforms. J. Eng. Comput. Archit. 1(2), 1–12 (2007)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
El-Hajj, M., Zaiane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: 12th International Conference on Parallel and Distributed Systems, ICPADS 2006, vol. 1, 8-p. IEEE (2006)
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. ACM SIGKDD Explor. Newsl. 6(1), 109–117 (2004)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)
Hearn, D.W., Vijay, J.: Efficient algorithms for the (weighted) minimum circle problem. Oper. Res. 30(4), 777–795 (1982)
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1275–1285. VLDB Endowment (2007)
May, R.: Mining association rules between sets of items in large database. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Park, J.S., Chen, M.S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), 813–825 (1997)
Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE (2014)
Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010)
Zou, X., Zhang, W., Liu, Y., Cai, Q.: Study on distributed sequential pattern discovery algorithm. J. Softw. 16(7), 1262–1269 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jia, K., Liu, H. (2017). An Improved FP-Growth Algorithm Based on SOM Partition. In: Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-6385-5_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-6385-5_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6384-8
Online ISBN: 978-981-10-6385-5
eBook Packages: Computer ScienceComputer Science (R0)