Parallel High Utility Itemset Mining Algorithm on the Spark

Conference paper
First Online: 05 January 2024

pp 167–181
Cite this conference paper

Computer Supported Cooperative Work and Social Computing (ChineseCSCW 2023)

Chengyan Li¹¹,
Lei Zhang¹¹ &
Anqi Sun¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2012))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

542 Accesses

Abstract

In the field of efficient utility itemset mining, considering both internal and external utility values provides a more comprehensive approach compared to traditional frequency-based methods. However, the increased complexity of computations and the generation of numerous candidate itemsets pose challenges for efficient mining on large-scale datasets. To address these challenges, this paper proposes a parallel mining algorithm based on the Spark framework. The algorithm leverages a vertical dataset structure to efficiently store and process the data. A utility table is utilized to store the data items along with their corresponding transaction utility values. By utilizing the utility table, the algorithm can directly access transaction utility values, simplifying the computation process and reducing overhead. To further enhance efficiency, the algorithm combines a prefix partitioning strategy with a minimum utility threshold. By employing this strategy, the generation of candidate itemsets is effectively reduced, resulting in a smaller search space and enhancing the efficiency of the mining process. The algorithm is implemented on the Spark framework, leveraging its capabilities in parallel processing and scalability. By leveraging the distributed computing capabilities of Spark, the algorithm can efficiently mine efficient utility frequent item-sets from large-scale datasets. Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm in performing efficient utility itemset mining. It surpasses traditional approaches and showcases its ability to handle large-scale datasets while maintaining high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Optimization of frequent item set mining parallelization algorithm based on spark platform

Article Open access 02 November 2024

Parallel High Utility Itemset Mining

Chapter © 2022

Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach

Chapter © 2019

References

Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34
Chapter Google Scholar
Liu, Y., Liao, W.-K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79
Chapter Google Scholar
Cheng, Z., Fang, W., Shen, W., et al.: An efficient utility-list based high-utility itemset mining algorithm. Appl. Intell. 53, 6992–7006 (2023)
Article Google Scholar
Pushp, Chand, S.: Mining of high utility itemsets for incremental datasets. In: International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE (2021)
Google Scholar
Dam, T.-L., Li, K., Fournier-Viger, P., Duong, Q.-H.: CLS-Miner: efficient and effective closed high-utility itemset mining. Front. Comput. Sci. 13(2), 357–381 (2018). https://doi.org/10.1007/s11704-016-6245-4
Article Google Scholar
Zida, S., Fournier-Viger, P., Lin, C.W., et al.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 1–31 (2017)
Article Google Scholar
Yildirim, I., Celik, M.: Mining high-average utility itemsets with positive and negative external utilities. New Gener. Comput. 38(1), 153–186 (2019). https://doi.org/10.1007/s00354-019-00078-8
Article Google Scholar
Dong, X., Wang, M., Liu, Y., Xiao, G., Huang, D., Wang, G.: An efficient spatial high-utility occupancy frequent item mining algorithm for mission system integration architecture design using the MBSE method. Aerosp. Syst. 5, 1–16 (2021). https://doi.org/10.1007/s42401-021-00126-6
Article Google Scholar
Kumar, R., Singh, K.: A survey on soft computing-based high-utility itemsets mining. Soft. Comput. 26(13), 6347–6392 (2022)
Article Google Scholar
O'reilly: Learning spark lightning-fast big data analysis. Oreilly & Associates Inc, (2015)
Google Scholar
Saleti, S.: Incremental mining of high utility sequential patterns using MapReduce paradigm. Clust. Comput. 25(2), 805–825 (2021). https://doi.org/10.1007/s10586-021-03448-4
Article Google Scholar
Zhang, F., Liu, M., Gui, F., et al.: A distributed frequent itemset mining algorithm using spark for big data analytics. Cluster Comput. 18, 1493–1501 (2015)
Article Google Scholar
Wu, J.M.-T., Srivastava, G., Wei, M., Yun, U., Chun-Wei Lin, J.: Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework, Inf. Sci., 31–48 (2021)
Google Scholar
Sathyavani, D., Sharmila, D.: Retraction note to: an improved memory adaptive up-growth to mine high utility itemsets from large transaction databases. J. Ambient Intell. Hum. Comput. 14(Suppl 1), 229 (2023)
Article Google Scholar
Ganesan, M., Shankar, S.: High utility fuzzy product mining (HUFPM) using investigation of HUWAS approach. J. Ambient Intell. Hum. Comput. 13, 3271–3281 (2022)
Article Google Scholar
Ishita, S.Z., Ahmed, C.F., Leung, C.K.: New approaches for mining regular high utility sequential patterns. Appl. Intell. 52, 3781–3806 (2022)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by: Natural Science Foundation of Heilongjiang Province (Nos. LH2021F032).

Author information

Authors and Affiliations

Harbin University of Science and Technology, Harbin, 150080, China
Chengyan Li, Lei Zhang & Anqi Sun

Authors

Chengyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Anqi Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Harbin Engineering University, Harbin, China
Tong Wang
Tongji University, Shanghai, China
Hongfei Fan
Guangdong University of Technology, Guangzhou, China
Dongning Liu
Tongji University, Shanghai, China
Bowen Du

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, C., Zhang, L., Sun, A. (2024). Parallel High Utility Itemset Mining Algorithm on the Spark. In: Sun, Y., Lu, T., Wang, T., Fan, H., Liu, D., Du, B. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2023. Communications in Computer and Information Science, vol 2012. Springer, Singapore. https://doi.org/10.1007/978-981-99-9637-7_12

Download citation

DOI: https://doi.org/10.1007/978-981-99-9637-7_12
Published: 05 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9636-0
Online ISBN: 978-981-99-9637-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions