Skip to main content

Parallel High Utility Itemset Mining Algorithm on the Spark

  • Conference paper
  • First Online:
Computer Supported Cooperative Work and Social Computing (ChineseCSCW 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2012))

  • 542 Accesses

Abstract

In the field of efficient utility itemset mining, considering both internal and external utility values provides a more comprehensive approach compared to traditional frequency-based methods. However, the increased complexity of computations and the generation of numerous candidate itemsets pose challenges for efficient mining on large-scale datasets. To address these challenges, this paper proposes a parallel mining algorithm based on the Spark framework. The algorithm leverages a vertical dataset structure to efficiently store and process the data. A utility table is utilized to store the data items along with their corresponding transaction utility values. By utilizing the utility table, the algorithm can directly access transaction utility values, simplifying the computation process and reducing overhead. To further enhance efficiency, the algorithm combines a prefix partitioning strategy with a minimum utility threshold. By employing this strategy, the generation of candidate itemsets is effectively reduced, resulting in a smaller search space and enhancing the efficiency of the mining process. The algorithm is implemented on the Spark framework, leveraging its capabilities in parallel processing and scalability. By leveraging the distributed computing capabilities of Spark, the algorithm can efficiently mine efficient utility frequent item-sets from large-scale datasets. Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm in performing efficient utility itemset mining. It surpasses traditional approaches and showcases its ability to handle large-scale datasets while maintaining high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34

    Chapter  Google Scholar 

  2. Liu, Y., Liao, W.-K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79

    Chapter  Google Scholar 

  3. Cheng, Z., Fang, W., Shen, W., et al.: An efficient utility-list based high-utility itemset mining algorithm. Appl. Intell. 53, 6992–7006 (2023)

    Article  Google Scholar 

  4. Pushp, Chand, S.: Mining of high utility itemsets for incremental datasets. In: International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE (2021)

    Google Scholar 

  5. Dam, T.-L., Li, K., Fournier-Viger, P., Duong, Q.-H.: CLS-Miner: efficient and effective closed high-utility itemset mining. Front. Comput. Sci. 13(2), 357–381 (2018). https://doi.org/10.1007/s11704-016-6245-4

    Article  Google Scholar 

  6. Zida, S., Fournier-Viger, P., Lin, C.W., et al.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 1–31 (2017)

    Article  Google Scholar 

  7. Yildirim, I., Celik, M.: Mining high-average utility itemsets with positive and negative external utilities. New Gener. Comput. 38(1), 153–186 (2019). https://doi.org/10.1007/s00354-019-00078-8

    Article  Google Scholar 

  8. Dong, X., Wang, M., Liu, Y., Xiao, G., Huang, D., Wang, G.: An efficient spatial high-utility occupancy frequent item mining algorithm for mission system integration architecture design using the MBSE method. Aerosp. Syst. 5, 1–16 (2021). https://doi.org/10.1007/s42401-021-00126-6

    Article  Google Scholar 

  9. Kumar, R., Singh, K.: A survey on soft computing-based high-utility itemsets mining. Soft. Comput. 26(13), 6347–6392 (2022)

    Article  Google Scholar 

  10. O'reilly: Learning spark lightning-fast big data analysis. Oreilly & Associates Inc, (2015)

    Google Scholar 

  11. Saleti, S.: Incremental mining of high utility sequential patterns using MapReduce paradigm. Clust. Comput. 25(2), 805–825 (2021). https://doi.org/10.1007/s10586-021-03448-4

    Article  Google Scholar 

  12. Zhang, F., Liu, M., Gui, F., et al.: A distributed frequent itemset mining algorithm using spark for big data analytics. Cluster Comput. 18, 1493–1501 (2015)

    Article  Google Scholar 

  13. Wu, J.M.-T., Srivastava, G., Wei, M., Yun, U., Chun-Wei Lin, J.: Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework, Inf. Sci., 31–48 (2021)

    Google Scholar 

  14. Sathyavani, D., Sharmila, D.: Retraction note to: an improved memory adaptive up-growth to mine high utility itemsets from large transaction databases. J. Ambient Intell. Hum. Comput. 14(Suppl 1), 229 (2023)

    Article  Google Scholar 

  15. Ganesan, M., Shankar, S.: High utility fuzzy product mining (HUFPM) using investigation of HUWAS approach. J. Ambient Intell. Hum. Comput. 13, 3271–3281 (2022)

    Article  Google Scholar 

  16. Ishita, S.Z., Ahmed, C.F., Leung, C.K.: New approaches for mining regular high utility sequential patterns. Appl. Intell. 52, 3781–3806 (2022)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by: Natural Science Foundation of Heilongjiang Province (Nos. LH2021F032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, C., Zhang, L., Sun, A. (2024). Parallel High Utility Itemset Mining Algorithm on the Spark. In: Sun, Y., Lu, T., Wang, T., Fan, H., Liu, D., Du, B. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2023. Communications in Computer and Information Science, vol 2012. Springer, Singapore. https://doi.org/10.1007/978-981-99-9637-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9637-7_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9636-0

  • Online ISBN: 978-981-99-9637-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics