Skip to main content
Log in

Efficient algorithm for mining high average-utility itemsets in incremental transaction databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we present a novel algorithm for efficiently mining high average-utility itemsets (HAUIs) from incremental databases, in which their volumes can be expanded dynamically. The previous algorithms have inefficiencies in that they must scan a given database multiple times so as to generate candidate itemsets and determine valid itemsets level by level. The reason is that they follow the basic framework of an Apriori-like approach. This drawback can cause critical problems in processing incremental databases because scanning a database becomes a tougher task as the size of the database is increased. In contrast, the algorithm proposed in this paper builds a compact tree structure maintaining all necessary information in order to avoid such excessive database scanning during its mining process. The previous algorithms suffer from the huge generation of unnecessary candidate itemsets at each level accompanied by the naive combination based candidate generation manner of an Apriori-like approach, which generates candidate itemsets with (k+1)-lengths by simply joining itemsets with k-lengths. On the other hand, our algorithm employs the pattern growth approach, which allows the algorithm to generate a set of only essential candidate itemsets. In order for our algorithm to constantly preserve the compactness of its tree structure during the entire incremental mining process, a restructuring technique is exploited. In the performance evaluation, we show that our algorithm is faster and consumes less memory space than competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: 20th international conference on very large data bases, pp 487–499

  2. Ahmed CF, Tanbeer SK, Jeong B, Lee Y (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

    Article  Google Scholar 

  3. Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Software 1:23–34

    Article  Google Scholar 

  4. Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating approach. In: The 12th IEEE international conference on data engineering, pp 106–114

  5. Duong Q, Liao B, Fournier-Viger P, Dam T (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Article  Google Scholar 

  6. Fournier-Viger P, Wu C, Zida S, Tseng V (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: ISMIS, pp 83–92

  7. Fan Y, Ye Y, Chen L (2016) Malicious sequential pattern mining for automatic malware detection. Expert Syst Appl 52:16–25

    Article  Google Scholar 

  8. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12

  9. Hong T, Lee C, Wang S (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265

    Article  Google Scholar 

  10. Hong T, Lee C, Wang S (2009) An incremental mining algorithm for high average-utility itemsets. In: ISPAN 2009, pp 421–425

  11. Koh J, Shieh S (2003) An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: DASFAA, pp 417–424

  12. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

    Article  Google Scholar 

  13. Kim D, Yun U (2016) Efficient mining of high utility pattern with considering of rarity and length. Appl Intell 45(1):152–173

    Article  Google Scholar 

  14. Kim D, Yun U (2016) Mining high utility itemsets based on the time decaying model. Intell Data Anal 20 (5):1157–1180

    Article  Google Scholar 

  15. Lan G, Hong T, Tseng V (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28:193–209

    Google Scholar 

  16. Lan G, Hong T, Tseng V (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(5):1009–1030

    Article  Google Scholar 

  17. Le T, Vo B (2015) An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42 (19):6648–6657

    Article  Google Scholar 

  18. Lee G, Yun U, Ryu K (2014) Sliding window based weighted maximal frequent pattern mining over data streamss. Expert Syst Appl 41(2):694–708

    Article  Google Scholar 

  19. Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256

    Article  Google Scholar 

  20. Lee G, Yun U, Ryang H, Kim D (2016) Approximate maximal frequent pattern mining with weight conditions and error tolerance. Int J Pattern Recognit Artif Intell 30(6):1–42

    Article  Google Scholar 

  21. Lee G, Yun U, Ryang H, Kim D (2016) Erasable itemset mining over incremental databases with weight conditions. Eng Appl Artif Intell 52:213–234

    Article  Google Scholar 

  22. Lin J, Gan W, Hong T, Tseng V (2015) Efficient algorithms for mining up-to-date high utility patterns. Adv Eng Inform 29(3):648–661

    Article  Google Scholar 

  23. Lin J, Gan W, Fournier-Viger P, Hong T, Tseng V (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187

    Article  Google Scholar 

  24. Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining, pp 689–695

  25. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64

  26. Lu T, Vo B, Nguyen HT, Hong T (2014) A new method for mining high average utility itemsets. In: Computer Information Systems and Industrial Management, pp 33–42

  27. Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A Memik G NU-MineBench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/

  28. Ryang H, Yun U (2015) Top-K high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126

    Article  Google Scholar 

  29. Ryang H, Yun U, Ryu K (2016) Fast algorithm for high utility pattern mining with sum of item quantities. Intell Data Anal 20(2):395–415

    Article  Google Scholar 

  30. Tseng V, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Article  Google Scholar 

  31. Tseng V, Wu C, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Article  Google Scholar 

  32. Tanbeer SK, Ahmed CF, Jeong B, Lee Y (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583

    Article  MathSciNet  MATH  Google Scholar 

  33. Tsai C, Lai B (2015) A location-item-time sequential pattern mining algorithm for route recommendation. Knowl-Based Syst 73:97–110

    Article  Google Scholar 

  34. Yun U, Ryang H (2015) Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42(2):323–352

    Article  Google Scholar 

  35. Yun U, Ryang H, Ryu K (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878

    Article  Google Scholar 

  36. Yun U, Kim D, Ryang H, Lee G, Lee K (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30(6):3605–3617

    Article  Google Scholar 

  37. Yun U, Lee G (2016) Incremental mining of weighted maximal frequent itemsets from dynamic databases. Expert Syst Appl 54:304–327

    Article  Google Scholar 

  38. Yun U, Lee G (2016) Sliding window based weighted erasable stream pattern mining for stream data applications. Futur Gener Comput Syst 59:1–20

    Article  Google Scholar 

  39. Yun U, Lee G, Kim C (2016) The smallest valid extension-based efficient, rare graph pattern mining, considering length-decreasing support constraints and symmetry characteristics of graphs. Symmetry 8(5):1–26

    Article  MathSciNet  Google Scholar 

  40. Yun U, Pyun G, Yoon E (2015) Efficient mining of robust closed weighted sequential patterns without information loss. Int J Artif Intell Tools 24(1):1–28

    Article  Google Scholar 

  41. Yun U, Lee G, Lee K (2016) Efficient representative pattern mining based on weight and maximality conditions. Expert Syst 33(5):439–462

    Article  Google Scholar 

  42. Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13

    Article  Google Scholar 

  43. Zhang X, Deng Z (2015) Mining summarization of high utility itemsets. Knowl-Based Syst 84:67–77

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF No. 20152062051 and NRF No. 20155054624), and the Business for Academic-industrial Cooperative establishments funded Korea Small and Medium Business Administration in 2015 (Grants No. C0261068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Unil Yun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, D., Yun, U. Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47, 114–131 (2017). https://doi.org/10.1007/s10489-016-0890-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0890-z

Keywords

Navigation