An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Han, Meng; Cheng, Haodong; Zhang, Ni; Li, Xiaojuan; Wang, Le

doi:10.1007/s10115-022-01763-9

An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Regular Paper
Published: 24 September 2022

Volume 65, pages 207–240, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Meng Han¹,
Haodong Cheng¹,
Ni Zhang¹,
Xiaojuan Li¹ &
…
Le Wang¹

286 Accesses
1 Citation
Explore all metrics

Abstract

The high utility itemsets mining over data streams will produce many redundant itemsets. To remove redundant itemsets, the researchers proposed to mine the closed high utility itemsets, the number of which is much smaller than that of the complete high utility itemsets and the result is lossless. However, the existing closed high utility itemsets mining algorithm over data streams needs to scan the dataset twice, and this algorithm that requires multiple scans cannot meet the real-time processing requirements of the streaming environment. To solve the above problem, this paper proposed a new algorithm CHUIDS_OSc that only needs to scan the original dataset once to achieve mining closed high utility itemsets over data streams. A new utility-list structure is designed in CHUIDS_OSc, and this structure can quickly complete the construction and update of batch information without rescanning the original dataset. In addition, effective pruning strategies are applied to improve the closed itemsets mining process and eliminate potential low utility candidates. Experimental evaluations show the efficiency and feasibility of the algorithm for scanning and processing datasets. As far as the running time is concerned, it is better than the previously proposed closed high utility itemsets mining algorithms that require multiple scans over data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

FCHM-stream: fast closed high utility itemsets mining over data streams

Article 03 February 2023

Mining Top-K constrained cross-level high-utility itemsets over data streams

Article 21 January 2024

Mining top-k high-utility itemsets from a data stream under sliding window model

Article 08 June 2017

References

Liu Y, Liao W-K, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on utility-based data mining, pp 90–99. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1089827.1089839
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499. Morgan Kaufmann, San Francisco, CA. Citeseer
Tseng VS, Shie B-E, Wu C-W, Philip SY (2012) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59
Article Google Scholar
Dawar S, Goyal V (2015) Up-hist tree: an efficient data structure for mining high utility patterns from transaction databases. In: Proceedings of the 19th international database engineering and applications symposium, pp 56–61. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2790755.2790771
Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 253–262. Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835839
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. https://doi.org/10.1016/j.eswa.2013.11.038
Article Google Scholar
Liu J, Wang K, Fung BC (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th international conference on data mining, Brussels, Belgium, pp 984–989 (2012). https://doi.org/10.1109/ICDM.2012.20. IEEE
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 55–64. Association for Momputing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2396761.2396773
Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014) Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems, Berlin, pp 83–92. Springer. https://doi.org/10.1007/978-3-319-08326-1_9
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381. https://doi.org/10.1016/j.eswa.2014.11.001
Article Google Scholar
Sahoo J, Das AK, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42(13):5754–5778. https://doi.org/10.1016/j.eswa.2015.02.051
Article Google Scholar
Liu J, Wang K, Fung BC (2015) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257. https://doi.org/10.1109/TKDE.2015.2510012
Article Google Scholar
Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2017) Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625. https://doi.org/10.1007/s10115-016-0986-0
Article Google Scholar
Jaysawal BP, Huang J-W (2019) Dmhups: discovering multiple high utility patterns simultaneously. Knowl Inf Syst 59(2):337–359. https://doi.org/10.1007/s10115-018-1207-9
Article Google Scholar
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255. https://doi.org/10.1016/j.ins.2014.01.045
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S, Choi H-J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Article Google Scholar
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57(9):214–231
Article Google Scholar
Jaysawal BP, Huang J-W (2020) Sohupds: a single-pass one-phase algorithm for mining high utility patterns over a data stream. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 490–497. Association for Computing Machinery, New York, NY, USA
Tseng VS, Wu C-W, Fournier-Viger P, Philip SY (2014) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3):726–739. https://doi.org/10.1109/TKDE.2014.2345377
Article Google Scholar
Wu C-W, Fournier-Viger P, Gu J-Y, Tseng VS (2015) Mining closed+ high utility itemsets without candidate generation. In: 2015 Conference on technologies and applications of artificial intelligence (TAAI), Tainan, Taiwan, pp 187–194. IEEE
Fournier-Viger P, Zida S, Lin JC-W, Wu C-W, Tseng, VS (2016) Efim-closed: fast and memory efficient discovery of closed high-utility itemsets. In: International conference on machine learning and data mining in pattern recognition, pp 199–213. Springer. https://doi.org/10.1007/978-3-319-41920-6_15
Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2019) Cls-miner: efficient and effective closed high-utility itemset mining. Front Comput Sci 13(2):357–381. https://doi.org/10.1007/s11704-016-6245-4
Article Google Scholar
Pramanik S, Goswami A (2021) Discovery of closed high utility itemsets using a fast nature-inspired ant colony algorithm. Appl Intell 1–17
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput 108(6):107422
Article Google Scholar
Dam T-L, Ramampiaro H, Nørvåg K, Duong Q-H (2019) Towards efficiently mining closed high utility itemsets from incremental databases. Knowl Based Syst 165:13–29. https://doi.org/10.1016/j.knosys.2018.11.019
Article Google Scholar
Cheng H, Han M, Zhang N, Li X, Wang L (2021) Closed high utility itemsets mining over data stream based on sliding window model. J Comput Res Dev 58(11):2500. https://doi.org/10.7544/issn1000-1239.2021.20200554
Article Google Scholar
Singh K, Singh SS, Luhach AK, Kumar A, Biswas B (2021) Mining of closed high utility itemsets: a survey. Recent Adv Comput Sci Commun 14(1):6–12
Article Google Scholar
Lin JC-W, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Inf Fusion 76(6):122–132
Article Google Scholar
Lin JC-W, Djenouri Y, Srivastava G, Fourier-Viger P (2022) Efficient evolutionary computation model of closed high-utility itemset mining. Appl Intell 1–13
Li H-F, Lee S-Y (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2):1466–1477. https://doi.org/10.1016/j.eswa.2007.11.061
Article Google Scholar
Meng H, Zhihai W, Jidong Y (2015) A method to set decay factor based on gaussian function. J Comput Res Dev 52(12):2834–2843. https://doi.org/10.7544/issn1000-1239.2015.20131883
Article Google Scholar
Chen H, Shu L, Xia J, Deng Q (2012) Mining frequent patterns in a varying-size sliding window of online transactional data streams. Inf Sci 215:15–36. https://doi.org/10.1016/j.ins.2012.05.007
Article MathSciNet Google Scholar
Tsai PS (2010) Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Syst Appl 37(10):6968–6973. https://doi.org/10.1016/j.eswa.2010.03.023
Article Google Scholar
Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-Asia conference on knowledge discovery and data mining, Berlin, Heidelberg, pp 689–695. Springer. https://doi.org/10.1007/11430919_79
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161. https://doi.org/10.1016/j.ins.2014.01.045
Article MathSciNet MATH Google Scholar
Zihayat M, Wu C-W, An A, Tseng VS, Lin C (2017) Efficiently mining high utility sequential patterns in static and streaming data. Intell Data Anal 21(1):103–135. https://doi.org/10.3233/IDA-170874
Article Google Scholar
Tang H, Liu Y, Wang L (2018) A new algorithm of mining high utility sequential pattern in streaming data. Int J Comput Intell Syst 12(1):342
Article Google Scholar
Kim H, Yun U, Baek Y, Kim H, Nam H, Lin JC-W, Fournier-Viger P (2021) Damped sliding based utility oriented pattern mining over stream data. Knowl Based Syst 213:106653
Article Google Scholar
Baek Y, Yun U, Kim H, Nam H, Lee G, Yoon E, Vo B, Lin JC-W (2020) Erasable pattern mining based on tree structures with damped window over data streams. Eng Appl Artif Intell 94(9):103735
Article Google Scholar
Baek Y, Yun U, Lin JC-W, Yoon E, Fujita H (2020) Efficiently mining erasable stream patterns for intelligent systems over uncertain data. Int J Intell Syst 35(11):1699–1734
Article Google Scholar
Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JC-W, Vo B, Pedrycz W (2021) Rhups: mining recent high utility patterns with sliding window-based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27
Article Google Scholar
Cheng H, Han M, Zhang N, Wang L, Li X (2021) Etkds: an efficient algorithm of top-k high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst. https://doi.org/10.3233/JIFS-210610
Article Google Scholar
Lucchese C, Orlando S, Perego R (2005) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36. https://doi.org/10.1109/TKDE.2006.10
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (62062004), the Ningxia Natural Science Foundation Project (2020AAC03216), and the North Minzu University Innovation Project Fund (YCX20077).

Author information

Authors and Affiliations

School of Computer Science and Engineering, North Minzu University, Xixia District, Yinchuan, 750021, Ningxia Hui Autonomous Region, China
Meng Han, Haodong Cheng, Ni Zhang, Xiaojuan Li & Le Wang

Authors

Meng Han
View author publications
You can also search for this author in PubMed Google Scholar
Haodong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ni Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Le Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, M., Cheng, H., Zhang, N. et al. An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan. Knowl Inf Syst 65, 207–240 (2023). https://doi.org/10.1007/s10115-022-01763-9

Download citation

Received: 22 December 2021
Revised: 03 September 2022
Accepted: 10 September 2022
Published: 24 September 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10115-022-01763-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Abstract

Access this article

Similar content being viewed by others

FCHM-stream: fast closed high utility itemsets mining over data streams

Mining Top-K constrained cross-level high-utility itemsets over data streams

Mining top-k high-utility itemsets from a data stream under sliding window model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Abstract

Access this article

Similar content being viewed by others

FCHM-stream: fast closed high utility itemsets mining over data streams

Mining Top-K constrained cross-level high-utility itemsets over data streams

Mining top-k high-utility itemsets from a data stream under sliding window model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation