Skip to main content
Log in

Efficient algorithms to mine concise representations of frequent high utility occupancy patterns

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Identifying all frequent high utility occupancy itemsets (FHUOIs) in a quantitative transaction dataset is a new trend in data mining. By combining both factors of frequency and utility occupancy, these patterns are more suitable for several applications in the real world. These patterns not only reflect the interests of most users but also contribute a high proportion of the utility in supporting transactions. Nonetheless, the set of all discovered FHUOIs may be very large, especially for large and dense datasets or for low values of predefined minimum thresholds. For this reason, it is often quite challenging for users to analyze and use the obtained patterns. To address this issue, this paper proposes two novel algorithms named MaxCloFHUOIM and CloFHUOIM to extract compact representations of FHUOIs. The former is designed to simultaneously mine two concise representations of FHUOIs that consist of all closed FHUOIs and all maximal FHUOIs, whereas the latter only discovers the closed FHUOIs, which provide a lossless summary of all FHUOIs. The proposed algorithms rely on a novel weak upper bound on utility occupancy, to reduce the search space by quickly pruning itemsets with low utility occupancy. Especially, the algorithms integrate two new efficient strategies to prune non-closed FHUOI candidate branches early in the prefix search tree. Results from an in-depth experimental evaluation conducted on several benchmark real-life and synthetic quantitative datasets demonstrate that MaxCloFHUOIM and CloFHUOIM have excellent performance in terms of runtime, memory usage, and scalability. In particular, the proposed algorithms are up to two orders of magnitude faster than a baseline algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

The datasets used to evaluate algorithms and the synthetic dataset generator used in this study, are public and were obtained from [48]. The generated synthetic datasets will be shared on request.

Notes

  1. iff means “if and only if”.

References

  1. Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). pp 487–499

  2. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390

    Article  Google Scholar 

  3. Nguyen LTT, Vo B, Mai T, Nguyen TL (2018) A weighted approach for class association rules. In: Sieminski A, Kozierkiewicz A, Nunez M, Ha Q (eds) Modern approaches for intelligent information and database systems. Studies in computational intelligence, vol 769. Springer, Cham, pp 213–222

  4. Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H (2018) Mining diversified association rules in big datasets: A cluster/GPU/genetic approach. Inf Sci (N Y) 459:117–134

    Article  MathSciNet  Google Scholar 

  5. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  6. Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7:253–265

    Article  Google Scholar 

  7. Moturi S, Tirumalarao SN, Vemuru S (2018) Frequent itemset mining algorithm: a survey. J Theor Appl Inf Technol 96:744–755

    Google Scholar 

  8. Tang L, Zhang L, Luo P, Wang M (2012) Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: ACM International Conference Proceeding Series. pp 75–84

  9. Zhang L, Luo P, Tang L et al (2015) Occupancy-based frequent pattern mining. ACM Trans Knowl Discov Data 10:1–33

    Google Scholar 

  10. Deng ZH (2020) Mining high occupancy itemsets. Future Gener Comput Syst 102:222–229

    Article  Google Scholar 

  11. Zhang K, Zhang Y, Wang Z (2020) Frequent Pattern Mining Based on Occupation and Correlation. In: ICEICT 2020 - IEEE 3rd International Conference on Electronic Information and Communication Technology. pp 161–166

  12. Kim H, Ryu T, Lee C et al (2022) Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans 131:460–475

    Article  Google Scholar 

  13. Nguyen LTT, Mai T, Pham GH et al (2023) An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst 267:110441

    Article  Google Scholar 

  14. Tseng VS, Wu C, Shie B, Yu PS (2010) UP-Growth: An Efficient Algorithm for High Utility Itemset Mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 253–262

  15. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of ACM International Conference on Information and Knowledge Management. pp 55–64

  16. Liu Y, Wang L, Feng L, Jin B (2021) Mining high utility itemsets based on pattern growth without candidate generation. Mathematics 9:1–22

    Google Scholar 

  17. Shen B, Wen Z, Zhao Y, Zhou D, Zheng W (2016) OCEAN: Fast discovery of high utility occupancy itemsets. In: Bailey J, Khan L, Washio T, Dobbie G, Huang J, Wang R (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science, vol 9651. Springer, Cham, pp 354–365

  18. He J, Han X, Wang J, Zhang K (2022) Efficient high-utility occupancy itemset mining algorithm on massive data. Expert Syst Appl 210:118329

    Article  Google Scholar 

  19. Gan W, Lin JCW, Fournier-Viger P et al (2020) HUOPM: High-Utility Occupancy Pattern Mining. IEEE Trans Cybern 50:1195–1208

    Article  Google Scholar 

  20. Agrawal R, Imieliński T, Swami A (2005) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216

    Article  Google Scholar 

  21. Yao H, Hamilton HJ, Butz CJ (2004) A Foundational Approach to Mining Itemset Utilities from Databases. In: Proceedings of the Fourth SIAM International Conference on Data Mining. pp 482–486

  22. Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining. pp 90–99

  23. Zida S, Fournier-Viger P, Lin JC-W, et al (2015) EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining. In: Proceedings of Mexican International Conference on Artificial Intelligence (MICAI 2015). pp 530–546

  24. Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Article  Google Scholar 

  25. Qu J-F, Fournier-Viger P, Liu M et al (2023) Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng 35:10224–10236

    Article  Google Scholar 

  26. Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 660–668

  27. Truong-Chi T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-Utility Pattern Mining. Studies in Big Data, vol 51. Springer, Cham, pp 97–129

  28. Truong T, Duong H, Le B et al (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci (N Y) 568:239–264

    Article  MathSciNet  Google Scholar 

  29. Nguyen A, Nguyen NT, Nguyen LTT, Vo B (2023) Mining inter-sequence patterns with Itemset constraints. Appl Intell 53:19827–19842

    Article  Google Scholar 

  30. Zhang C, Yang Y, Du Z et al (2024) HUSP-SP: faster utility mining on sequence data. ACM Trans Knowl Discov Data 18:1–21

    Google Scholar 

  31. Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. pp 2526–2530

  32. Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) New tighter upper bounds for mining high average-utility itemsets. In: ACM International Conference Proceeding Series. pp 27–32

  33. Truong T, Duong H, Le B et al (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183:104847

    Article  Google Scholar 

  34. Kim H, Yun U, Baek Y et al (2021) Efficient list-based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (N Y) 543:85–105

    Article  Google Scholar 

  35. Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53:6099–6118

    Article  Google Scholar 

  36. Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (N Y) 515:302–323

    Article  MathSciNet  Google Scholar 

  37. Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52:3901–3938

    Article  Google Scholar 

  38. Tseng VS, Wu C, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27:726–739

    Article  Google Scholar 

  39. Chen CM, Chen L, Gan W et al (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci (N Y) 546:1208–1229

    Article  MathSciNet  Google Scholar 

  40. Vemulapalli S, Mogalla S (2021) High utility-occupancy sequential pattern mining algorithm based on utility-occupancy framework. Int J Eng Trends Technol 69:228–235

    Article  Google Scholar 

  41. Wu CW, Fournier-Viger P, Gu JY, Tseng VS (2019) Mining compact high utility itemsets without candidate generation. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-utility pattern mining. Studies in big data, vol 51. Springer, Cham, pp 279–302

  42. Duong H, Hoang T, Tran T et al (2022) Efficient algorithms for mining closed and maximal high utility itemsets. Knowl Based Syst 257:109921

    Article  Google Scholar 

  43. Wu C-W, Fournier-Viger P, Gu J-Y, Tseng VS (2015) Mining Closed + High Utility Itemsets without Candidate Generation. In: Conference on Technologies and Applications of Artificial Intelligence. pp 187–194

  44. Fournier-Viger P, Zida S, Lin JC-W, et al (2016) EFIM-Closed: Fast and Memory Efficient Discovery of Closed High-Utility Itemsets. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. pp 199–213

  45. Nguyen LTT, Vu VV, Lam MTH et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 495:78–99

    Article  Google Scholar 

  46. Fournier-Viger P, Wu C-W, Tseng VS (2014) Novel Concise Representations of High Utility Itemsets Using Generator Patterns. In: International Conference on Advanced Data Mining and Applications. pp 30–43

  47. Fournier-Viger P, Gomariz A, Campos M (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. In: Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ’2014. pp 40–52

  48. Fournier-Viger P, Gomariz A, Soltani A et al (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573

    Google Scholar 

Download references

Funding

This work was supported by Dalat University, Vietnam [Decision number: 1574/QĐ-ĐHĐL, 2023].

Author information

Authors and Affiliations

Authors

Contributions

Hai Duong, Huy Pham and Tin Truong contributed equally to the study conception and design. Material preparation, data collection and analysis were performed by Hai Duong, Huy Pham, and Tin Truong. The first draft of the manuscript was written by Hai Duong, Huy Pham, Tin Truong and Philippe Fournier-Viger. All authors read and approved the content of the manuscript.

Corresponding author

Correspondence to Philippe Fournier-Viger.

Ethics declarations

Compliance with ethical standards

The authors have no conflict of interest. This research was carried using public data. No experiments were conducted with humans or animals.

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Proposition 1

  1. a.

    For any \(B,C\in \left[A\right]:B\subseteq C\subseteq t,\) since \({u}_{rel}\left(B, t\right)\le {u}_{rel}\left(C, t\right)\) and \(\rho \left(B\right)=\rho (C)\), then \(supp\left(B\right)=supp(C)\), so \(occ\left(B\right)=\frac{1}{supp\left(B\right)}{\sum }_{t\in \rho \left(B\right)}{u}_{rel}\left(B, t\right)\le \frac{1}{supp\left(C\right)}{\sum }_{t\in \rho \left(C\right)}{u}_{rel}\left(C, t\right)=occ(C)\), i.e. \({\mathcal{M}}_{\sim }(occ)\).

  2. b.

    For any \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), we first prove that \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \mathrm{\varnothing }\). Indeed, if \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(\exists {C}_{0}\stackrel{\scriptscriptstyle{\text{def}}}{=}A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{1}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{1}\supset A\) and \(support({C}_{1})=support(A)\). By a similar argument, if \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \({C}_{1}\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{2}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{2}\supset {C}_{1}\supset A\) and \(support({C}_{2})=support({C}_{1})=support(A)\), and so on. Since \(\mathcal{A}\) is finite, this process must terminate, i.e. there always exists \(n\in {\mathbb{N}}\): \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) with \({C}_{n}\supset \dots \supset {C}_{1}\supset A\) and \(support\left({C}_{n}\right)=\dots =support({C}_{1})=support(A).\) Thus, \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\) or \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \varnothing .\) Furthermore, assume by contradiction that there exist two different and non-nested closed FHUO itemsets \(B\) and \(C\) in \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\), then for \(D\stackrel{\scriptscriptstyle{\text{def}}}{=}B\bigcup C\supset B\supseteq A\), since \(B,C\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(support(C)=support(B)=support(A),\) then \(\rho (C)=\rho (B)=\rho (A)\) and \(\rho (D)=\rho (B)\bigcap \rho (C)=\rho (A).\) Therefore, \(D\in [A],\) so \(support(D)=support(A)\ge ms\) or \(D\in \mathcal{F}\mathcal{I}.\) Because \({\mathcal{M}}_{\sim }(occ),\) \(occ(D)\ge occ(B)\ge muo\), i.e. \(\exists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \(D\supset B\) and \(support(D)=support(A)\). But this contradicts the fact that \(B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) Hence, \(|\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)|=1.\) Let \(C\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(A),\) then by the previous argument, \(C\) is the maximum itemset and has the highest utility occupancy in \([A]\).

  3. c.

    Let \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\stackrel{\scriptscriptstyle{\text{def}}}{=}\left\{A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \right| \nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\}\). Firstly, since \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\) \(\iff\) [\(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(A\in \mathcal{I}\mathcal{S}:\nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\)] \(\iff\) \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap \mathcal{C}\mathcal{I}\), i.e. \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap\) \(\mathcal{C}\mathcal{I}\). Secondly, we will prove that \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). The first inclusion \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) is evident. Conversely, for any \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \((\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A supp\left(B\right)=supp(A))\)(*), assume by contradiction that \(A\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Then, \(\exists B\in \mathcal{I}\mathcal{S}:B\supset A: supp\left(B\right)=supp(A)\), so \(\rho (B)=\rho (A)\), i.e. \(A,B\in [A]\). Since \({\mathcal{M}}_{\sim }(occ)\) and \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ\left(B\right)\ge occ\left(A\right)\ge mo\) and \(supp\left(B\right)=supp(A)\ge ms\) so \(B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) This contradicts (*). Hence, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Thus, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Furthermore, since \(supp\) is anti-monotonic, for any \(B=A+C\supset A\) with \(x\in C\ne \varnothing\) and \(A\subset B{\prime}=A+\{x\}\subseteq B\), we have \(\rho (A)\supseteq \rho (B^{\prime})\supseteq \rho (B)\), so \(supp(A)\ge supp(B^{\prime})\ge supp(B)\). Additionally, if \(supp(A)=supp(B)\), then \(supp(A)=supp(B^{\prime})=supp(B)\). Hence, \(A\) is closed \(\iff\) [\(A\) has no \(1-forward\) and \(1-backward\)]. Thus, the last assertion is proven.

Appendix 2: Proof of Theorem 1

  1. a.

    For any proper FE \(B\) of \(A\), \(A\subset B=A\oplus E\in \mathcal{F}\mathcal{I}\) with \(E\ne \varnothing\), and transaction \({t}_{i}\in \rho (B)\subseteq TS(A)\subseteq \rho (A)\), then \(m\stackrel{\scriptscriptstyle{\text{def}}}{=}supp(A)\ge n\stackrel{\scriptscriptstyle{\text{def}}}{=}|TS(A)|\ge p\stackrel{\scriptscriptstyle{\text{def}}}{=}supp\left(B\right)\ge ms,\) \(u\left(B,{t}_{i}\right)=u\left(A,{t}_{i}\right)+u\left(E,{t}_{i}\right)\le u\left(A,{t}_{i}\right)+u\left(rem\left(A,{t}_{i}\right)\right)={ub}_{rem}(A,\) \({t}_{i})\), so \({u}_{rel}\left(B, {t}_{i}\right)\le {ub}_{rem\_\_rel}\left(A,{t}_{i}\right)\). Since \(\mathcal{X}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{wubocc}_{i}, i=1..n\}\subseteq \mathcal{Y}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{\widehat{\Phi }}_{i}, i=1..m\}\), \(\mathcal{X}\downarrow\), \(\mathcal{Y}\downarrow\) and \(\rho (B)\subseteq TS(A)\), then the series \(\{{S}_{k}\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{k}{\sum }_{1\le i\le k}{wubocc}_{i},1\le k\le n\}\) is descending, so \(occ\left(B\right)\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{u}_{rel}\left(B, {t}_{i}\right)\le \frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{ub}_{re{m}_{rel}}\left(A,{t}_{i}\right)\le {S}_{p}\le {S}_{ms}=wubocc(A)\le \frac{1}{ms}{\sum }_{1\le i\le ms}{\widehat{\Phi }}_{i}=\widehat{\Phi }(A)\).

  2. b.

    If \(wubocc\left(A\right)<muo\), then \(occ\left(B\right)\le wubocc\left(A\right)<muo\) for any proper FE \(B\) (of \(A\)), which is always a low utility occupancy itemset.

Appendix 3: Proof of Theorem 2

We will prove that

\(C\) is a CFHUOI ⇔ [\(\left\{C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right|C has no 1-forward\, and\,(\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)].

 + “⇒”: For any CFHUOI \(C=A\oplus y\), i.e. \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C)\))(*), assume conversely that \(C\) has \(1-forward\) \(D=A\oplus y\oplus z\) or \((\exists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C and supp\left(D\right)=supp(C))\). Then \(D\supset C\, and\, supp\left(D\right)=supp(C)\ge ms\), so \(D\in [C]\). Since \({M}_{\sim }(occ)\) and \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ(D)\ge occ(C)\ge muo\), i.e. \(D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). But this contradicts (*).

 + “⇐”: If [\(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(C\) has no \(1-forward\) and \((\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)] and assume conversely that \(C=A\oplus y\) is not a CFHUOI, i.e. \(\exists E\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:E=F\oplus y\oplus G\supset C\) with \(y\succ F\supseteq A\) and \(G\succ y\) (such that \(F\supset A\) or \(G\ne \varnothing\)) and \(supp\left(E\right)=supp(C)\). Since such \(E\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(E\) is not yet considered, then \(F=A\) and \(G\ne \varnothing\), i.e. \(\exists z\in G:z\succ y\). For a special forward extension \(D\) of \(C\), \(E\supseteq D=A\oplus y\oplus z\supset C\), we have \(supp\left(E\right)=supp\left(D\right)=supp(C)\), then \(D\) is a \(1-forward\) of \(C\), i.e. a contradiction occurs.

Appendix 4: Proof of Proposition 3

Assume conversely that \(wubocc(C)<muo\) and \(occ(C)\ge muo\) but \(C\) has \(1-forward\), i.e. \(\exists D=C\oplus y\supset C:\) \(supp\left(D\right)=supp(C)\) and \(\rho \left(D\right)=\rho (C).\) Then, \(D\in [C],\) so \(muo>wubocc(C)\ge occ(D)\ge occ(C)\ge muo\) since \({\mathcal{M}}_{\sim }(occ)\). This is a contradiction. The remaining assertions are derived from Theorem 1 and 2.

Appendix 5: Proof of Proposition 4

  1. a.

    Assume conversely that \(\exists A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) but \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A \bigwedge supp\left(B\right)=supp(A)\), so \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and a contradiction occurs. The remaining inclusion, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), is evident.

  2. b.

    We first prove that \(\mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\). For any \(A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. (\(\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))(+), assume \(A\notin \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\), i.e. \(\exists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\), and this contradicts (+). Second, we prove the inverse inclusion. Assume that \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))(++), but \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B^{\prime}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B^{\prime}\supset A\). Then, by Proposition 1b, \(\exists B\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(B^{\prime})\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}({B}^{\prime})\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}: B\supseteq B^{\prime}\supset A\), but this contradicts (++).

Appendix 6: Proof of Theorem 3

For any FE \(G\) of \({B}^{\prime}\) with \(F\succ y\), i.e. \(G={B}{\prime}\oplus F={A}^{\prime}\oplus y\oplus F\in Br\left({B}^{\prime}\right)\). Since \(A\subseteq A\mathrm{^{\prime}}\subset C\) and \(D\succ y\), then for \(H\stackrel{\scriptscriptstyle{\text{def}}}{=}S \bigcup F=C\oplus y\oplus (D \bigcup F)\) (when \(F=\varnothing\), \(H=S\) and \(G={B}{\prime}\)), we have \(G\subset H\). Due to \(supp\left(B\right)=supp(S)\) and \(B\subseteq B\mathrm{^{\prime}}\subset S\), \(\rho \left(B\right)=\rho \left(B^{\prime}\right)=\rho (S)\), so \(\forall T\supseteq B^{\prime}\), \(T\supseteq S\). Then, for any \(T\supseteq G=B^{\prime}\oplus F\), we have \(T\supseteq S\) and \(T\supseteq F\), so \(T\supseteq H\), i.e. \(\rho \left(G\right)\subseteq \rho \left(H\right)\). Because the reverse inclusion is evident, we obtain \(\rho \left(G\right)=\rho \left(H\right)\). Therefore, \(H\in \left[G\right]\). If \(occ\left(G\right)<muo\) or \(supp\left(G\right)<ms\), i.e. \(G\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Otherwise, if \(occ\left(G\right)\ge muo\) and \(supp\left(G\right)\ge ms\), then \(supp\left(H\right)=supp\left(G\right)\ge ms\) and \(occ\left(H\right)\ge occ\left(G\right)\ge muo\) since \({\mathcal{M}}_{\sim }\left(occ\right)\). Hence, \(\exists H\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:H\supset G\) and \(supp\left(H\right)=supp\left(G\right)\), i.e. \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Thus, the \(Br\left({B}^{\prime}\right)\) branch is \(Non-CloFHUO\).

Appendix 7: Proof of Corollary 1

For \(C=A\oplus y\) is being considered, since (*) in (1) does not hold, i.e. \(\exists S\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:S\supset C and supp\left(S\right)=supp\left(C\right)\), then such \(S\) must be already considered and \(S=C^{\prime}\oplus y\oplus D\) with \(C^{\prime}\supset A\). Thus, by Theorem 3, the \(Br\left(C\right)\) branch is \(Non-CloFHUO\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duong, H., Pham, H., Truong, T. et al. Efficient algorithms to mine concise representations of frequent high utility occupancy patterns. Appl Intell 54, 4012–4042 (2024). https://doi.org/10.1007/s10489-024-05296-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05296-2

Keywords

Navigation