Abstract
In the field of efficient utility itemset mining, considering both internal and external utility values provides a more comprehensive approach compared to traditional frequency-based methods. However, the increased complexity of computations and the generation of numerous candidate itemsets pose challenges for efficient mining on large-scale datasets. To address these challenges, this paper proposes a parallel mining algorithm based on the Spark framework. The algorithm leverages a vertical dataset structure to efficiently store and process the data. A utility table is utilized to store the data items along with their corresponding transaction utility values. By utilizing the utility table, the algorithm can directly access transaction utility values, simplifying the computation process and reducing overhead. To further enhance efficiency, the algorithm combines a prefix partitioning strategy with a minimum utility threshold. By employing this strategy, the generation of candidate itemsets is effectively reduced, resulting in a smaller search space and enhancing the efficiency of the mining process. The algorithm is implemented on the Spark framework, leveraging its capabilities in parallel processing and scalability. By leveraging the distributed computing capabilities of Spark, the algorithm can efficiently mine efficient utility frequent item-sets from large-scale datasets. Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm in performing efficient utility itemset mining. It surpasses traditional approaches and showcases its ability to handle large-scale datasets while maintaining high performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34
Liu, Y., Liao, W.-K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79
Cheng, Z., Fang, W., Shen, W., et al.: An efficient utility-list based high-utility itemset mining algorithm. Appl. Intell. 53, 6992–7006 (2023)
Pushp, Chand, S.: Mining of high utility itemsets for incremental datasets. In: International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE (2021)
Dam, T.-L., Li, K., Fournier-Viger, P., Duong, Q.-H.: CLS-Miner: efficient and effective closed high-utility itemset mining. Front. Comput. Sci. 13(2), 357–381 (2018). https://doi.org/10.1007/s11704-016-6245-4
Zida, S., Fournier-Viger, P., Lin, C.W., et al.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 1–31 (2017)
Yildirim, I., Celik, M.: Mining high-average utility itemsets with positive and negative external utilities. New Gener. Comput. 38(1), 153–186 (2019). https://doi.org/10.1007/s00354-019-00078-8
Dong, X., Wang, M., Liu, Y., Xiao, G., Huang, D., Wang, G.: An efficient spatial high-utility occupancy frequent item mining algorithm for mission system integration architecture design using the MBSE method. Aerosp. Syst. 5, 1–16 (2021). https://doi.org/10.1007/s42401-021-00126-6
Kumar, R., Singh, K.: A survey on soft computing-based high-utility itemsets mining. Soft. Comput. 26(13), 6347–6392 (2022)
O'reilly: Learning spark lightning-fast big data analysis. Oreilly & Associates Inc, (2015)
Saleti, S.: Incremental mining of high utility sequential patterns using MapReduce paradigm. Clust. Comput. 25(2), 805–825 (2021). https://doi.org/10.1007/s10586-021-03448-4
Zhang, F., Liu, M., Gui, F., et al.: A distributed frequent itemset mining algorithm using spark for big data analytics. Cluster Comput. 18, 1493–1501 (2015)
Wu, J.M.-T., Srivastava, G., Wei, M., Yun, U., Chun-Wei Lin, J.: Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework, Inf. Sci., 31–48 (2021)
Sathyavani, D., Sharmila, D.: Retraction note to: an improved memory adaptive up-growth to mine high utility itemsets from large transaction databases. J. Ambient Intell. Hum. Comput. 14(Suppl 1), 229 (2023)
Ganesan, M., Shankar, S.: High utility fuzzy product mining (HUFPM) using investigation of HUWAS approach. J. Ambient Intell. Hum. Comput. 13, 3271–3281 (2022)
Ishita, S.Z., Ahmed, C.F., Leung, C.K.: New approaches for mining regular high utility sequential patterns. Appl. Intell. 52, 3781–3806 (2022)
Acknowledgements
This work was supported in part by: Natural Science Foundation of Heilongjiang Province (Nos. LH2021F032).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, C., Zhang, L., Sun, A. (2024). Parallel High Utility Itemset Mining Algorithm on the Spark. In: Sun, Y., Lu, T., Wang, T., Fan, H., Liu, D., Du, B. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2023. Communications in Computer and Information Science, vol 2012. Springer, Singapore. https://doi.org/10.1007/978-981-99-9637-7_12
Download citation
DOI: https://doi.org/10.1007/978-981-99-9637-7_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9636-0
Online ISBN: 978-981-99-9637-7
eBook Packages: Computer ScienceComputer Science (R0)