Abstract
Identifying all frequent high utility occupancy itemsets (FHUOIs) in a quantitative transaction dataset is a new trend in data mining. By combining both factors of frequency and utility occupancy, these patterns are more suitable for several applications in the real world. These patterns not only reflect the interests of most users but also contribute a high proportion of the utility in supporting transactions. Nonetheless, the set of all discovered FHUOIs may be very large, especially for large and dense datasets or for low values of predefined minimum thresholds. For this reason, it is often quite challenging for users to analyze and use the obtained patterns. To address this issue, this paper proposes two novel algorithms named MaxCloFHUOIM and CloFHUOIM to extract compact representations of FHUOIs. The former is designed to simultaneously mine two concise representations of FHUOIs that consist of all closed FHUOIs and all maximal FHUOIs, whereas the latter only discovers the closed FHUOIs, which provide a lossless summary of all FHUOIs. The proposed algorithms rely on a novel weak upper bound on utility occupancy, to reduce the search space by quickly pruning itemsets with low utility occupancy. Especially, the algorithms integrate two new efficient strategies to prune non-closed FHUOI candidate branches early in the prefix search tree. Results from an in-depth experimental evaluation conducted on several benchmark real-life and synthetic quantitative datasets demonstrate that MaxCloFHUOIM and CloFHUOIM have excellent performance in terms of runtime, memory usage, and scalability. In particular, the proposed algorithms are up to two orders of magnitude faster than a baseline algorithm.
Similar content being viewed by others
Data availability
The datasets used to evaluate algorithms and the synthetic dataset generator used in this study, are public and were obtained from [48]. The generated synthetic datasets will be shared on request.
Notes
iff means “if and only if”.
References
Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). pp 487–499
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390
Nguyen LTT, Vo B, Mai T, Nguyen TL (2018) A weighted approach for class association rules. In: Sieminski A, Kozierkiewicz A, Nunez M, Ha Q (eds) Modern approaches for intelligent information and database systems. Studies in computational intelligence, vol 769. Springer, Cham, pp 213–222
Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H (2018) Mining diversified association rules in big datasets: A cluster/GPU/genetic approach. Inf Sci (N Y) 459:117–134
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree. Data Min Knowl Discov 8:53–87
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7:253–265
Moturi S, Tirumalarao SN, Vemuru S (2018) Frequent itemset mining algorithm: a survey. J Theor Appl Inf Technol 96:744–755
Tang L, Zhang L, Luo P, Wang M (2012) Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: ACM International Conference Proceeding Series. pp 75–84
Zhang L, Luo P, Tang L et al (2015) Occupancy-based frequent pattern mining. ACM Trans Knowl Discov Data 10:1–33
Deng ZH (2020) Mining high occupancy itemsets. Future Gener Comput Syst 102:222–229
Zhang K, Zhang Y, Wang Z (2020) Frequent Pattern Mining Based on Occupation and Correlation. In: ICEICT 2020 - IEEE 3rd International Conference on Electronic Information and Communication Technology. pp 161–166
Kim H, Ryu T, Lee C et al (2022) Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans 131:460–475
Nguyen LTT, Mai T, Pham GH et al (2023) An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst 267:110441
Tseng VS, Wu C, Shie B, Yu PS (2010) UP-Growth: An Efficient Algorithm for High Utility Itemset Mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 253–262
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of ACM International Conference on Information and Knowledge Management. pp 55–64
Liu Y, Wang L, Feng L, Jin B (2021) Mining high utility itemsets based on pattern growth without candidate generation. Mathematics 9:1–22
Shen B, Wen Z, Zhao Y, Zhou D, Zheng W (2016) OCEAN: Fast discovery of high utility occupancy itemsets. In: Bailey J, Khan L, Washio T, Dobbie G, Huang J, Wang R (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science, vol 9651. Springer, Cham, pp 354–365
He J, Han X, Wang J, Zhang K (2022) Efficient high-utility occupancy itemset mining algorithm on massive data. Expert Syst Appl 210:118329
Gan W, Lin JCW, Fournier-Viger P et al (2020) HUOPM: High-Utility Occupancy Pattern Mining. IEEE Trans Cybern 50:1195–1208
Agrawal R, Imieliński T, Swami A (2005) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216
Yao H, Hamilton HJ, Butz CJ (2004) A Foundational Approach to Mining Itemset Utilities from Databases. In: Proceedings of the Fourth SIAM International Conference on Data Mining. pp 482–486
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining. pp 90–99
Zida S, Fournier-Viger P, Lin JC-W, et al (2015) EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining. In: Proceedings of Mexican International Conference on Artificial Intelligence (MICAI 2015). pp 530–546
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Qu J-F, Fournier-Viger P, Liu M et al (2023) Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng 35:10224–10236
Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 660–668
Truong-Chi T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-Utility Pattern Mining. Studies in Big Data, vol 51. Springer, Cham, pp 97–129
Truong T, Duong H, Le B et al (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci (N Y) 568:239–264
Nguyen A, Nguyen NT, Nguyen LTT, Vo B (2023) Mining inter-sequence patterns with Itemset constraints. Appl Intell 53:19827–19842
Zhang C, Yang Y, Du Z et al (2024) HUSP-SP: faster utility mining on sequence data. ACM Trans Knowl Discov Data 18:1–21
Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. pp 2526–2530
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) New tighter upper bounds for mining high average-utility itemsets. In: ACM International Conference Proceeding Series. pp 27–32
Truong T, Duong H, Le B et al (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183:104847
Kim H, Yun U, Baek Y et al (2021) Efficient list-based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (N Y) 543:85–105
Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53:6099–6118
Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (N Y) 515:302–323
Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52:3901–3938
Tseng VS, Wu C, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27:726–739
Chen CM, Chen L, Gan W et al (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci (N Y) 546:1208–1229
Vemulapalli S, Mogalla S (2021) High utility-occupancy sequential pattern mining algorithm based on utility-occupancy framework. Int J Eng Trends Technol 69:228–235
Wu CW, Fournier-Viger P, Gu JY, Tseng VS (2019) Mining compact high utility itemsets without candidate generation. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-utility pattern mining. Studies in big data, vol 51. Springer, Cham, pp 279–302
Duong H, Hoang T, Tran T et al (2022) Efficient algorithms for mining closed and maximal high utility itemsets. Knowl Based Syst 257:109921
Wu C-W, Fournier-Viger P, Gu J-Y, Tseng VS (2015) Mining Closed + High Utility Itemsets without Candidate Generation. In: Conference on Technologies and Applications of Artificial Intelligence. pp 187–194
Fournier-Viger P, Zida S, Lin JC-W, et al (2016) EFIM-Closed: Fast and Memory Efficient Discovery of Closed High-Utility Itemsets. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. pp 199–213
Nguyen LTT, Vu VV, Lam MTH et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 495:78–99
Fournier-Viger P, Wu C-W, Tseng VS (2014) Novel Concise Representations of High Utility Itemsets Using Generator Patterns. In: International Conference on Advanced Data Mining and Applications. pp 30–43
Fournier-Viger P, Gomariz A, Campos M (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. In: Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ’2014. pp 40–52
Fournier-Viger P, Gomariz A, Soltani A et al (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573
Funding
This work was supported by Dalat University, Vietnam [Decision number: 1574/QĐ-ĐHĐL, 2023].
Author information
Authors and Affiliations
Contributions
Hai Duong, Huy Pham and Tin Truong contributed equally to the study conception and design. Material preparation, data collection and analysis were performed by Hai Duong, Huy Pham, and Tin Truong. The first draft of the manuscript was written by Hai Duong, Huy Pham, Tin Truong and Philippe Fournier-Viger. All authors read and approved the content of the manuscript.
Corresponding author
Ethics declarations
Compliance with ethical standards
The authors have no conflict of interest. This research was carried using public data. No experiments were conducted with humans or animals.
Competing interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proof of Proposition 1
-
a.
For any \(B,C\in \left[A\right]:B\subseteq C\subseteq t,\) since \({u}_{rel}\left(B, t\right)\le {u}_{rel}\left(C, t\right)\) and \(\rho \left(B\right)=\rho (C)\), then \(supp\left(B\right)=supp(C)\), so \(occ\left(B\right)=\frac{1}{supp\left(B\right)}{\sum }_{t\in \rho \left(B\right)}{u}_{rel}\left(B, t\right)\le \frac{1}{supp\left(C\right)}{\sum }_{t\in \rho \left(C\right)}{u}_{rel}\left(C, t\right)=occ(C)\), i.e. \({\mathcal{M}}_{\sim }(occ)\).
-
b.
For any \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), we first prove that \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \mathrm{\varnothing }\). Indeed, if \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(\exists {C}_{0}\stackrel{\scriptscriptstyle{\text{def}}}{=}A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{1}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{1}\supset A\) and \(support({C}_{1})=support(A)\). By a similar argument, if \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \({C}_{1}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\). Otherwise, \({C}_{1}\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(\exists {C}_{2}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \({C}_{2}\supset {C}_{1}\supset A\) and \(support({C}_{2})=support({C}_{1})=support(A)\), and so on. Since \(\mathcal{A}\) is finite, this process must terminate, i.e. there always exists \(n\in {\mathbb{N}}\): \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) with \({C}_{n}\supset \dots \supset {C}_{1}\supset A\) and \(support\left({C}_{n}\right)=\dots =support({C}_{1})=support(A).\) Thus, \({C}_{n}\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\) or \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\ne \varnothing .\) Furthermore, assume by contradiction that there exist two different and non-nested closed FHUO itemsets \(B\) and \(C\) in \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\), then for \(D\stackrel{\scriptscriptstyle{\text{def}}}{=}B\bigcup C\supset B\supseteq A\), since \(B,C\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(support(C)=support(B)=support(A),\) then \(\rho (C)=\rho (B)=\rho (A)\) and \(\rho (D)=\rho (B)\bigcap \rho (C)=\rho (A).\) Therefore, \(D\in [A],\) so \(support(D)=support(A)\ge ms\) or \(D\in \mathcal{F}\mathcal{I}.\) Because \({\mathcal{M}}_{\sim }(occ),\) \(occ(D)\ge occ(B)\ge muo\), i.e. \(\exists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:\) \(D\supset B\) and \(support(D)=support(A)\). But this contradicts the fact that \(B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) Hence, \(|\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\left(A\right)|=1.\) Let \(C\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(A),\) then by the previous argument, \(C\) is the maximum itemset and has the highest utility occupancy in \([A]\).
-
c.
Let \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\stackrel{\scriptscriptstyle{\text{def}}}{=}\left\{A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \right| \nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\}\). Firstly, since \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\) \(\iff\) [\(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \(A\in \mathcal{I}\mathcal{S}:\nexists B\in \mathcal{I}\mathcal{S}:B\supset A\wedge supp\left(B\right)=supp(A)\)] \(\iff\) \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap \mathcal{C}\mathcal{I}\), i.e. \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I} \bigcap\) \(\mathcal{C}\mathcal{I}\). Secondly, we will prove that \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}=\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). The first inclusion \(\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) is evident. Conversely, for any \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and \((\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A supp\left(B\right)=supp(A))\)(*), assume by contradiction that \(A\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Then, \(\exists B\in \mathcal{I}\mathcal{S}:B\supset A: supp\left(B\right)=supp(A)\), so \(\rho (B)=\rho (A)\), i.e. \(A,B\in [A]\). Since \({\mathcal{M}}_{\sim }(occ)\) and \(A\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ\left(B\right)\ge occ\left(A\right)\ge mo\) and \(supp\left(B\right)=supp(A)\ge ms\) so \(B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}.\) This contradicts (*). Hence, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Thus, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}=\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{C}\mathcal{I}\). Furthermore, since \(supp\) is anti-monotonic, for any \(B=A+C\supset A\) with \(x\in C\ne \varnothing\) and \(A\subset B{\prime}=A+\{x\}\subseteq B\), we have \(\rho (A)\supseteq \rho (B^{\prime})\supseteq \rho (B)\), so \(supp(A)\ge supp(B^{\prime})\ge supp(B)\). Additionally, if \(supp(A)=supp(B)\), then \(supp(A)=supp(B^{\prime})=supp(B)\). Hence, \(A\) is closed \(\iff\) [\(A\) has no \(1-forward\) and \(1-backward\)]. Thus, the last assertion is proven.
Appendix 2: Proof of Theorem 1
-
a.
For any proper FE \(B\) of \(A\), \(A\subset B=A\oplus E\in \mathcal{F}\mathcal{I}\) with \(E\ne \varnothing\), and transaction \({t}_{i}\in \rho (B)\subseteq TS(A)\subseteq \rho (A)\), then \(m\stackrel{\scriptscriptstyle{\text{def}}}{=}supp(A)\ge n\stackrel{\scriptscriptstyle{\text{def}}}{=}|TS(A)|\ge p\stackrel{\scriptscriptstyle{\text{def}}}{=}supp\left(B\right)\ge ms,\) \(u\left(B,{t}_{i}\right)=u\left(A,{t}_{i}\right)+u\left(E,{t}_{i}\right)\le u\left(A,{t}_{i}\right)+u\left(rem\left(A,{t}_{i}\right)\right)={ub}_{rem}(A,\) \({t}_{i})\), so \({u}_{rel}\left(B, {t}_{i}\right)\le {ub}_{rem\_\_rel}\left(A,{t}_{i}\right)\). Since \(\mathcal{X}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{wubocc}_{i}, i=1..n\}\subseteq \mathcal{Y}\stackrel{\scriptscriptstyle{\text{def}}}{=}\{{\widehat{\Phi }}_{i}, i=1..m\}\), \(\mathcal{X}\downarrow\), \(\mathcal{Y}\downarrow\) and \(\rho (B)\subseteq TS(A)\), then the series \(\{{S}_{k}\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{k}{\sum }_{1\le i\le k}{wubocc}_{i},1\le k\le n\}\) is descending, so \(occ\left(B\right)\stackrel{\scriptscriptstyle{\text{def}}}{=}\frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{u}_{rel}\left(B, {t}_{i}\right)\le \frac{1}{p}{\sum }_{{t}_{i}\in \rho \left(B\right)}{ub}_{re{m}_{rel}}\left(A,{t}_{i}\right)\le {S}_{p}\le {S}_{ms}=wubocc(A)\le \frac{1}{ms}{\sum }_{1\le i\le ms}{\widehat{\Phi }}_{i}=\widehat{\Phi }(A)\).
-
b.
If \(wubocc\left(A\right)<muo\), then \(occ\left(B\right)\le wubocc\left(A\right)<muo\) for any proper FE \(B\) (of \(A\)), which is always a low utility occupancy itemset.
Appendix 3: Proof of Theorem 2
We will prove that
\(C\) is a CFHUOI ⇔ [\(\left\{C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right|C has no 1-forward\, and\,(\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)].
+ “⇒”: For any CFHUOI \(C=A\oplus y\), i.e. \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C)\))(*), assume conversely that \(C\) has \(1-forward\) \(D=A\oplus y\oplus z\) or \((\exists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C and supp\left(D\right)=supp(C))\). Then \(D\supset C\, and\, supp\left(D\right)=supp(C)\ge ms\), so \(D\in [C]\). Since \({M}_{\sim }(occ)\) and \(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(occ(D)\ge occ(C)\ge muo\), i.e. \(D\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). But this contradicts (*).
+ “⇐”: If [\(C\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), \(C\) has no \(1-forward\) and \((\nexists D\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:D\supset C\, and\, supp\left(D\right)=supp(C))\)] and assume conversely that \(C=A\oplus y\) is not a CFHUOI, i.e. \(\exists E\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:E=F\oplus y\oplus G\supset C\) with \(y\succ F\supseteq A\) and \(G\succ y\) (such that \(F\supset A\) or \(G\ne \varnothing\)) and \(supp\left(E\right)=supp(C)\). Since such \(E\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(E\) is not yet considered, then \(F=A\) and \(G\ne \varnothing\), i.e. \(\exists z\in G:z\succ y\). For a special forward extension \(D\) of \(C\), \(E\supseteq D=A\oplus y\oplus z\supset C\), we have \(supp\left(E\right)=supp\left(D\right)=supp(C)\), then \(D\) is a \(1-forward\) of \(C\), i.e. a contradiction occurs.
Appendix 4: Proof of Proposition 3
Assume conversely that \(wubocc(C)<muo\) and \(occ(C)\ge muo\) but \(C\) has \(1-forward\), i.e. \(\exists D=C\oplus y\supset C:\) \(supp\left(D\right)=supp(C)\) and \(\rho \left(D\right)=\rho (C).\) Then, \(D\in [C],\) so \(muo>wubocc(C)\ge occ(D)\ge occ(C)\ge muo\) since \({\mathcal{M}}_{\sim }(occ)\). This is a contradiction. The remaining assertions are derived from Theorem 1 and 2.
Appendix 5: Proof of Proposition 4
-
a.
Assume conversely that \(\exists A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) but \(A\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A \bigwedge supp\left(B\right)=supp(A)\), so \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and a contradiction occurs. The remaining inclusion, \(\mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), is evident.
-
b.
We first prove that \(\mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\). For any \(A\in \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. (\(\nexists B\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))(+), assume \(A\notin \left\{A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\right| \nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\}\), i.e. \(\exists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\), and this contradicts (+). Second, we prove the inverse inclusion. Assume that \(A\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\subseteq \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\) and (\(\nexists B\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B\supset A\))(++), but \(A\notin \mathcal{M}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), i.e. \(\exists B^{\prime}\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:B^{\prime}\supset A\). Then, by Proposition 1b, \(\exists B\stackrel{\scriptscriptstyle{\text{def}}}{=}closure(B^{\prime})\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}({B}^{\prime})\subseteq \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}: B\supseteq B^{\prime}\supset A\), but this contradicts (++).
Appendix 6: Proof of Theorem 3
For any FE \(G\) of \({B}^{\prime}\) with \(F\succ y\), i.e. \(G={B}{\prime}\oplus F={A}^{\prime}\oplus y\oplus F\in Br\left({B}^{\prime}\right)\). Since \(A\subseteq A\mathrm{^{\prime}}\subset C\) and \(D\succ y\), then for \(H\stackrel{\scriptscriptstyle{\text{def}}}{=}S \bigcup F=C\oplus y\oplus (D \bigcup F)\) (when \(F=\varnothing\), \(H=S\) and \(G={B}{\prime}\)), we have \(G\subset H\). Due to \(supp\left(B\right)=supp(S)\) and \(B\subseteq B\mathrm{^{\prime}}\subset S\), \(\rho \left(B\right)=\rho \left(B^{\prime}\right)=\rho (S)\), so \(\forall T\supseteq B^{\prime}\), \(T\supseteq S\). Then, for any \(T\supseteq G=B^{\prime}\oplus F\), we have \(T\supseteq S\) and \(T\supseteq F\), so \(T\supseteq H\), i.e. \(\rho \left(G\right)\subseteq \rho \left(H\right)\). Because the reverse inclusion is evident, we obtain \(\rho \left(G\right)=\rho \left(H\right)\). Therefore, \(H\in \left[G\right]\). If \(occ\left(G\right)<muo\) or \(supp\left(G\right)<ms\), i.e. \(G\notin \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\), then \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Otherwise, if \(occ\left(G\right)\ge muo\) and \(supp\left(G\right)\ge ms\), then \(supp\left(H\right)=supp\left(G\right)\ge ms\) and \(occ\left(H\right)\ge occ\left(G\right)\ge muo\) since \({\mathcal{M}}_{\sim }\left(occ\right)\). Hence, \(\exists H\in \mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:H\supset G\) and \(supp\left(H\right)=supp\left(G\right)\), i.e. \(G\notin \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}\). Thus, the \(Br\left({B}^{\prime}\right)\) branch is \(Non-CloFHUO\).
Appendix 7: Proof of Corollary 1
For \(C=A\oplus y\) is being considered, since (*) in (1) does not hold, i.e. \(\exists S\in \mathcal{C}\mathcal{F}\mathcal{H}\mathcal{U}\mathcal{O}\mathcal{I}:S\supset C and supp\left(S\right)=supp\left(C\right)\), then such \(S\) must be already considered and \(S=C^{\prime}\oplus y\oplus D\) with \(C^{\prime}\supset A\). Thus, by Theorem 3, the \(Br\left(C\right)\) branch is \(Non-CloFHUO\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duong, H., Pham, H., Truong, T. et al. Efficient algorithms to mine concise representations of frequent high utility occupancy patterns. Appl Intell 54, 4012–4042 (2024). https://doi.org/10.1007/s10489-024-05296-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05296-2