Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Liu, Fang’ai; Wang, Qianqian; Wang, Xin

doi:10.1007/s10586-018-1859-y

Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Published: 30 January 2018

Volume 22, pages 6133–6141, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Fang’ai Liu¹,
Qianqian Wang¹ &
Xin Wang¹

284 Accesses
6 Citations
Explore all metrics

Abstract

With the rapid development of the World Wide Web technology, complex and diverse data present explosive growth, so frequent itemset mining plays an essential role. In view of the mining frequent itemsets in multiple data streams by limited computing power of a single processor, an improved algorithm of Parallel Mining Collaborative frequent itemsets in multiple data streams (PMCMD-Stream) was proposed. Firstly, the algorithm compresses the potential and frequent itemsets into CP-Tree only by one-scan and applies increment method to inserting or deleting related branch on CP-Tree, we do not need to repeatedly scanning the databases to generate many candidate frequent itemsets and save the running time. Secondly, this parallelized algorithm can be run in the MapReduce programming environment. Finally, the valuable frequent itemsets, namely global collaborative frequent itemsets, were obtained. Because each candidate frequent itemset is independent, and different candidate frequent itemsets can be processed by multiple computing machines concurrently. The experimental results show that PMCMD-Stream algorithm not only can improve the mining efficiency but also have much better scalability than the existing algorithms, so as to discover the collaborative frequent itemsets from large-scale data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Big data analytics on Apache Spark

Article 13 October 2016

Stratified random sampling from streaming and stored data

Article 23 October 2020

References

Gani, A., Siddiqa, A., Shamshirband, S., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Article Google Scholar
Shamshirb, S., Kalantari, S., Sam, D.Z., et al.: Expert security system in wireless sensor networks based on fuzzy discussion multi-agent systems. Sci. Res. Essays 5(24), 3840–3849 (2010)
Google Scholar
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. Extern. Mem. Algorithms 50, 107–118 (1998)
Article MathSciNet Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, pp. 346–357 (2002)
Chapter Google Scholar
Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and mining frequent patterns from large windows over data streams. In: IEEE 24th International Conference on: Data Engineering, ICDE 2008. IEEE, pp. 179–188 (2008)
MacBean, N., Peylin, P., Chevallier, F., et al.: Consistent assimilation of multiple data streams in a carbon cycle data assimilation system. Geosci. Model Dev. 9(10), 3569 (2016)
Article Google Scholar
Che-Qing, J.I.N., Wei-Ning, Q., Ao-Ying, Z.: Analysis and management of streaming data: a survey. J. Softw. 8, 008 (2004)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM sigmod record. ACM, vol. 22(2), pp. 207–216 (1993)
Article Google Scholar
Han, J., Pei, J., Yin, Y., et al.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Chaure, TM., Singh, KR.: Frequent itemset mining techniques—a technical review. In: World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave). IEEE, pp. 1–4 (2016)
Yu, J.X., Chong, Z., Lu, H., et al.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases-Vol. 30. VLDB Endowment, pp. 204–215 (2004)
Hristidis, V., Valdivia, O., Vlachos, M., et al.: Information discovery across multiple streams. Inf. Sci. 179(19), 3268–3285 (2009)
Article Google Scholar
Yeh, M.Y., Dai, B.R., Chen, M.S.: Clustering over multiple evolving streams by events and correlations. IEEE Trans. Knowl. Data Eng. 19(10), 1349–1362 (2007)
Article Google Scholar
Guo, J., Zhang, P., Tan, J., et al.: Mining frequent patterns across multiple data streams. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, pp. 2325–2328 (2011)
Gunopulos, D., Khardon, R., Mannila, H., et al.: Discovering all most specific sentences. ACM Trans. Database Syst. (TODS) 28(2), 140–174 (2003)
Article Google Scholar
Otey, M.E., Wang, C., Parthasarathy, S., et al.: Mining frequent itemsets in distributed and dynamic databases. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE, pp. 617–620 (2003)
Xun, Y., Zhang, J.: A parallel frequent itemsets mining algorithm based on compressed linked list. Icic Express Lett. 9(8), 2313–2318 (2015)
Google Scholar
Deng, Z.H., Wang, Z.H., Jiang, J.J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55(9), 2008–2030 (2012)
Article MathSciNet Google Scholar
Yu, H., Wen, J., Wang, H., et al.: An improved Apriori algorithm based on the Boolean matrix and Hadoop. Procedia Eng. 15, 1827–1831 (2011)
Article Google Scholar
Li, H., Wang, Y., Zhang, D., et al.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems. ACM, pp. 107–114 (2008)
Saabith, A.L.S., Sundararajan, E., Bakar, A.A.: Parallel implementation of apriori algorithms on the hadoop-mapreduce platform-an evaluation of literature. J. Theor. Appl. Inf. Technol. 85(3), 321 (2016)
Google Scholar
Bustio-Martínez, L., Cumplido, R., Hernández-León, R., et al.: On the design of hardware-software architectures for frequent itemsets mining on data streams. J. Intell. Inf. Syst. (2017). https://doi.org/10.1007/s10844-017-0461-8
Article Google Scholar
Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets MapReduce. IEEE Trans. Sys. Man Cyb. 46(3), 313–325 (2016)
Article Google Scholar
Duong, K.C., Bamha, M., Giacometti, A., et al.: MapFIM: memory aware parallelized frequent itemset mining in very large datasets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp. 478–495 (2017)
Google Scholar
Bernecker, T., Cheng, R., Cheung, D.W., et al.: Model-based probabilistic frequent itemset mining. Knowl. Inf. Syst. 37(1), 181–217 (2013)
Article Google Scholar
Wang, S., Wang, G.R.: Frequent items query algorithm for uncertain sensing data. Jisuanji Xuebao (Chin. J. Comput.) 36(3), 571–581 (2013)
Google Scholar
Li, H.F., Lee, S.Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference very Large Data bases, VLDB, vol. 1215, pp. 487–499 (1994)
Baccarelli, E., Cordeschi, N., Mei, A., et al.: Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study. IEEE Netw. 30(2), 54–61 (2016)
Article Google Scholar
Wu, G., Zhang, H., Qiu, M., et al.: A decentralized approach for mining event correlations in distributed system monitoring. J. Parallel Distrib. Comput. 73(3), 330–340 (2013)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the following grants: National Natural Science Foundation of China (No. 61572301, 61772321), the Innovation Fundation of Science and Technology Development Center of Ministry of Education and New H3C Group(2017A15047), Natural Science Foundation of Shandong Province (No. ZR2013FM008, and No. ZR2016FP07), the Open Research Fund from Shandong provincial Key Laboratory of Computer Network (No. SDKLCN-2016-01).

Author information

Authors and Affiliations

School of Information Science & Engineering, Shandong Normal University, No. 88 East Wenhua Road, Jinan, 250014, China
Fang’ai Liu, Qianqian Wang & Xin Wang

Authors

Fang’ai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang’ai Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, F., Wang, Q. & Wang, X. Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams. Cluster Comput 22 (Suppl 3), 6133–6141 (2019). https://doi.org/10.1007/s10586-018-1859-y

Download citation

Received: 27 October 2017
Revised: 12 January 2018
Accepted: 14 January 2018
Published: 30 January 2018
Issue Date: May 2019
DOI: https://doi.org/10.1007/s10586-018-1859-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation