CLS-Miner: efficient and effective closed high-utility itemset mining

Dam, Thu-Lan; Li, Kenli; Fournier-Viger, Philippe; Duong, Quang-Huy

doi:10.1007/s11704-016-6245-4

CLS-Miner: efficient and effective closed high-utility itemset mining

Research Article
Published: 11 April 2019

Volume 13, pages 357–381, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Thu-Lan Dam^1,2,
Kenli Li^1,3,4,
Philippe Fournier-Viger⁵ &
…
Quang-Huy Duong⁶

185 Accesses
61 Citations
Explore all metrics

Abstract

High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithm outperforms the current state-of-the-art CHUD and CHUI-Miner algorithms, and that CLS-Miner scales linearly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Compact High Utility Itemsets Without Candidate Generation

An efficient fast algorithm for discovering closed⁺ high utility itemsets

Article 25 January 2016

MMC: efficient and effective closed high-utility itemset mining

Article 24 May 2024

References

Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499
Google Scholar
Zaki MJ, Gouda K. Fast vertical mining using diffsets. In: Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 326–335
Google Scholar
Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87
Article MathSciNet Google Scholar
Han J, Wang J, Lu Y, Tzvetkov P. Mining top-k frequent closed patterns without minimum support. In: Proceedings of IEEE International Conference on Data Mining. 2002, 211–218
Google Scholar
Grahne G, Zhu J. Fast algorithms for frequent itemset mining using fptrees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10): 1347–1362
Article Google Scholar
Deng Z H, Lv S L. PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications, 2015, 42(13): 5424–5432
Article Google Scholar
Dam T L, Li K, Fournier-Viger P, Duong Q H. An efficient algorithm for mining top-rank-k frequent patterns. Applied Intelligence, 2016, 45(1): 96–111
Article Google Scholar
Liu Y, Liao WK, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005, 689–695
Chapter Google Scholar
Liu M, Qu J. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 55–64
Google Scholar
Tseng V, Shie B E, Wu C W, Yu P. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772–1786
Article Google Scholar
Fournier-Viger P, Wu C W, Zida S, Tseng V. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of International Symposium on Methodologies for Intelligent Systems. 2014, 83–92
Google Scholar
Song W, Liu Y, Li J. BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining, 2014, 10(1): 1–15
Article Google Scholar
Song W, Zhang Z, Li J. A high utility itemset mining algorithm based on subsume index. Knowledge and Information Systems, 2015, 1–26
Google Scholar
Duong Q H, Liao B, Fournier-Viger P, Dam T L. An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowledge-Based Systems, 2016, 104: 106–122
Article Google Scholar
Ahmed C, Tanbeer S, Jeong B S, Lee Y K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708–1721
Article Google Scholar
Yang Q. Three challenges in data mining. Frontiers of Computer Science in China, 2010, 4(3): 324–333
Article MathSciNet Google Scholar
Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919–933
Article Google Scholar
Song W, Liu Y, Li J. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29–43
Article Google Scholar
Fournier-Viger P, Lin J C W, Duong Q H, Dam T L. In: Fujita H, Ali M, Selamat A, et al, eds. FHM+: Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction. Cham: Springer International Publishing, 2016, 115–127
Tseng V S, Wu C W, Fournier-Viger P, Yu P S. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 726–739
Article Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L. Efficient mining of association rules using closed itemset lattices. Information Systems, 1999, 24(1): 25–46
Article MATH Google Scholar
Lucchese C, Orlando S, Perego R. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 21–36
Article Google Scholar
Zaki M J, Hsiao C J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 462–478
Article Google Scholar
Fournier-Viger P. FHN: efficient mining of high-utility itemsets with negative unit profits. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 16–29
Google Scholar
Tseng V, Wu CW, Fournier-Viger P, Yu P. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 54–67
Article Google Scholar
Wu C W, Fournier-Viger P, Gu J Y, Tseng V S. Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence. 2015, 187–194
Google Scholar
Chan R, Yang Q, Shen Y D. Mining high utility itemsets. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 19–26
Chapter Google Scholar
Lan G C, Hong T P, Tseng V S. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85–107
Article Google Scholar
Gouda K, Zaki M J. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 2005, 11(3): 223–242
Article MathSciNet Google Scholar
Uno T, Kiyomi M, Arimura H. LCMver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2004
Google Scholar
Szathmary L, Valtchev P, Napoli A, Godin R, Boc A, Makarenkov V. A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Annals of Mathematics and Artificial Intelligence, 2014, 70(1–2): 81–105
Article MathSciNet MATH Google Scholar
Fournier-Viger P, Wu C W, Tseng V S. Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the International Conference on Advanced Data Mining and Applications. 2014, 30–43
Google Scholar
Shie B E, Philip S Y, Tseng V S. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 2012, 39(17): 12947–12960
Article Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Proceedings of the International Conference on Database Theory. 1999, 398–416
Google Scholar
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C W, Tseng V S. SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research, 2014, 15: 3569–3573
MATH Google Scholar
Pisharath J, Liu Y, Liao W K, Choudhary A, Memik G, Parhi J. NU-MineBench 2.0. CUCIS Technical Report CUCIS-2005-08-01. 2005
Google Scholar

Download references

Acknowledgements

The research was funded by the National Natural Science Foundation of China (Grant Nos. 61133005, 61432005, 61370095, 61472124, 61202109, and 61472126), and the International Science and Technology Cooperation Program of China (2015DFA11240 and 2014DFBS0010). T.-L. Dam was also partially supported by the science research fund of Hanoi University of Industry, Hanoi, Vietnam.

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Thu-Lan Dam & Kenli Li
Faculty of Information Technology, Hanoi University of Industry, Hanoi, Vietnam
Thu-Lan Dam
CIC of HPC, National University of Defense Technology, Changsha, 410073, China
Kenli Li
National Supercomputing Center in Changsha, Changsha, 410082, China
Kenli Li
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China
Philippe Fournier-Viger
Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Quang-Huy Duong

Authors

Thu-Lan Dam
View author publications
Search author on:PubMed Google Scholar
Kenli Li
View author publications
Search author on:PubMed Google Scholar
Philippe Fournier-Viger
View author publications
Search author on:PubMed Google Scholar
Quang-Huy Duong
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Kenli Li.

Additional information

Thu-Lan Dam received the BS and MS degrees in computer science from Hanoi University of Science and Technology, Vietnam in 2004 and 2009. She is currently working toward the PhD degree in computer science at College of Computer Science and Electronic Engineering, Hunan University, China. Her research interests are optimization, operational research, data mining and knowledge discovery.

Kenli Li received the PhD degree in computer science from Huazhong University of Science and Technology, China in 2003. He was a visiting scholar at University of Illinois at Urbana Champaign, USA from 2004 to 2005. He is currently a full professor of computer science and technology at Hunan University, China, and deputy director of National Supercomputing Center in Changsha. His major research areas include parallel and distributed processing, cloud computing, scheduling algorithms, computer simulation, biological computing, and big data management. He has published more than 100 research papers in international conferences and journals such as IEEE-TC, IEEE-TPDS, JPDC, ICPP, CCGrid. He is an outstanding member of CCF. He is a member of the IEEE and serves on the editorial boards of IEEE Transactions on Computers.

Philippe Fournier-Viger is a full professor at School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, China. He received his PhD degree in cognitive computer science from the University of Quebec in Montreal, Canada in 2010. He has published more than 85 research papers in refereed international conferences and journals. His research interests include data mining, pattern mining, text mining, intelligent tutoring systems, knowledge representation and cognitive modeling. He is the founder of the popular SPMF open-source data mining library.

Quang-Huy Duong received the MS degree in computer science from Hunan University, China in 2016. He is currently working toward the PhD degree in computer science at the Faculty of Information Technology, Mathematics and Electrical Engineering, Norwegian University of Science and Technology, Norway. His research interests are optimization, data stream mining and bioinformatics.

Electronic supplementary material