Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Truong, Tin; Duong, Hai; Le, Bac; Fournier-Viger, Philippe; Yun, Unil

doi:10.1007/s10489-021-02520-1

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Published: 01 September 2021

Volume 52, pages 6106–6128, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tin Truong¹,
Hai Duong¹,
Bac Le^2,3,
Philippe Fournier-Viger⁴ &
…
Unil Yun⁵

373 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it has several applications, state-of-the-art algorithms have one or more of the following limitations: (1) they rely on a utility function that tends to be biased toward finding long patterns, (2) some algorithms do take pattern length into account using an average-utility function but they adopt an optimistic perspective that can be risky or misleading for some applications, (3) they do not let the user specify additional constraints on patterns to be found. To address these three limitations, this paper defines a novel task of mining frequent high minimum average-utility sequences (FHAUS) with constraints in a quantitative sequence database. This task has the following benefits. First, it uses the average-utility au function based on the minimum utility, which takes the length of a pattern into account to calculate its utility. This helps finding short patterns missed by traditional algorithms and it is based on more safe pessimistic utility calculations. Second, the user can specify a set of monotonic and anti-monotonic constraints C on patterns to filter irrelevant patterns and improve the performance of the mining process. To efficiently find all FHAUSs with constraints, this paper first proposes some novel upper bounds (UBs) and weak upper bounds (WUBs) on the average-utility, which satisfy downward-closure (DC) properties or DC-like properties. Then, to effectively reduce the search space, the paper designs novel width pruning, depth pruning, reducing, and tightening strategies based on the proposed bounds. These proposed novel theoretical results are integrated into an algorithm named C-FHAUSPM (Constrained Frequent High minimum Average-Utility Sequential Pattern Mining) for efficiently discovering all FHAUSs with constraints. Results from extensive experiments on both real-life and synthetic quantitative sequence databases show that C-FHAUSPM is highly efficient in terms of runtime and memory usage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

On efficiently mining high utility sequential patterns

Article 11 January 2016

Jun-Zhe Wang, Jiun-Long Huang & Yi-Cheng Chen

A Survey of High Utility Sequential Pattern Mining

References

Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI 32(5):676–686
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS (2010) Mining high utility web access sequences in dynamic web log data. In Proceedings of 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD2010, pp.76–81
Shie BE, Cheng JH, Chuang KT, Tseng VS (2012) A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp.616–626
Shie BE, Hsiao HF, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp.224–238
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Article Google Scholar
Gan W, Lin JC, Zhang J, Chao H, Fujita H, Yu PS (2020) ProUM: projection-based utility mining on sequence data. Inf. Sci. (Ny). 513 222–240 Elsevier Inc.
Zihayat M, Davoudi H, An A (2017) Top-k utility-based gene regulation sequential pattern discovery. In Proceedings of 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp.266–273
Truong T, Tran A, Duong H, Le B, Fournier-Viger P (2020) EHUSM : mining high utility sequences with a pessimistic utility model. Data Sci Pattern Recognit 4(1):65–83
Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2019) FMaxCloHUSM: An efficient algorithm for mining frequent closed and maximal high utility sequences. Eng Appl Artif Intell 85(1):1–20
Article Google Scholar
Truong T, Tran A, Duong H, Le B (2019) Hupsmt: An efficient algorithm for mining high utility-probability sequences in uncertain databases with multiple minimum utility thresholds. Comput Sci Cybern 35(1):1–20
Article Google Scholar
Lan GC, Hong T-P, Tseng VS, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081
Article Google Scholar
Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.660–668
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Article Google Scholar
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, et al. (2001) PrefixSpan: mining sequential patterns by prefix-projected growth. In Proceedings of the 17th International Conference on Data Engineering, pp.215–224
Fournier-Viger P, Gomariz A, Campos M (2014) Fast vertical Mining of Sequential Patterns Using co-occurrence Information. In Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ‘2014, pp.40–52
Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627
Article Google Scholar
Truong T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In P. Fournier-Viger, J. C.-W. Lin, R. Nkambou, V. Bay, & V. S. Tseng, High-Utility Pattern Mining: Theory, Algorithms and Applications, pp.97–129
Gan W, Lin JC-W, Zhang J, Fournier-Viger P, Chao H, Yu PS (2019) Fast utility mining on complex sequences. CoRR 1904(2):1–15
Google Scholar
Hong T-P, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Lan GC, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility Itemsets with an improved upper-bound strategy. Inf Technol Decis Mak 11(05):1009–1030
Article Google Scholar
Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6(8):7593–7609
Article Google Scholar
Lin JC-W, Ren S, Fournier-Viger P, Hong T-P (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5(8):12927–12940
Article Google Scholar
Wu JMT, Lin JC-W, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6(1):18655–18669
Article Google Scholar
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68(1):346–360
Article Google Scholar
Thilagu M, Nadarajan R (2012) Efficiently Mining of Effective web Traversal Patterns with average utility. Procedia Technol 6(1):444–451
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (Ny) 515(1):302–323
Article MathSciNet Google Scholar
Fournier-Viger P, Li J, Lin JC-W, Truong T, Uday Kiran R (2020) Mining cost-effective patterns in event logs. Knowledge-Based Syst 191:105241
Article Google Scholar
Nguyen LTT, Nguyen P, Nguyen TDD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowledge-Based Syst 175(1):130–144
Article Google Scholar
Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657
Article Google Scholar
Reddy PPC, Uday Kiran R, Zettsu K, Toyoda M, Krishna Reddy P, Kitsuregawa M (2019) Discovering spatial high utility frequent Itemsets in spatiotemporal databases. In Proceedings of International Conference on Big Data Analytics (BDA 2019), pp.287–306
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In Proceedings of the 1st international workshop on Utility-based data mining, pp.90–99
Nguyen LTT, Vu D, Nguyen TDD, Vo B (2020) Mining maximal high utility Itemsets on dynamic profit databases. Cybern Syst 51(2):1–21 Taylor & Francis
Article Google Scholar
Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (NY) 495:78–99 Elsevier Inc
Article Google Scholar
Gan W, Lin JC, Chao H, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci (NY) 504:470–486 Elsevier Inc
Article MathSciNet Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical Mining of High Average-Utility Itemsets Based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314
Article Google Scholar
Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp.2526–2530
Lin JC-W, Li T, Fournier-Viger P, Hong T-P, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Informatics 30(2):233–243
Article Google Scholar
Lin JC-W, Ren S, Fournier-Viger P, Hong T-P, Su J-H, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Article Google Scholar
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (NY) 543:85–105 Elsevier Inc
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowledge-Based Syst 183(1):104847
Article Google Scholar
Wu R, Li Q, Chen X (2019) Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell 49(12):4348–4360
Article Google Scholar
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. Intell Inf Syst 28(2):133–160
Article Google Scholar
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In Proceedings of the ninth international conference on Information and knowledge management, pp.422–429
Chen YL, Chiang MC, Ko MT (2003) Discovering time-interval sequential patterns in sequence databases. Expert Syst Appl 25(3):343–354
Article Google Scholar
Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowledge-Based Syst 89(1):1–13
Article Google Scholar
Mallick B, Garg D, Grover PS (2014) Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary. Inf Technol 11(1):33–42
Google Scholar
Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Inf Technol Decis Mak 09(04):575–599
Article Google Scholar
Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 57(2):311–330
Article Google Scholar
Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914
Article Google Scholar
Fournier-Viger P, Lin JC-W, Gomaris A, Gueniche T, Soltani A, Deng Z et al (2014) SPMF: a Java open-source pattern mining library version 2. Mach Learn Res 15(1):3389–3393
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Dalat University, Dalat, Vietnam
Tin Truong & Hai Duong
Department of Computer Science, Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam
Bac Le
Vietnam National University, Ho Chi Minh City, Vietnam
Bac Le
School of Humanities and Social Sciences, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Philippe Fournier-Viger
Department of Computer Engineering, Sejong University, Seoul, South Korea
Unil Yun

Authors

Tin Truong
View author publications
You can also search for this author in PubMed Google Scholar
Hai Duong
View author publications
You can also search for this author in PubMed Google Scholar
Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Unil Yun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Fournier-Viger.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix 1 (Proof of Theorem 1)

To prove Theorem 1, the following lemma is introduced.

Lemma 1 (Properties of taub)

For each \( {\varPsi}^{\prime}\in \mathcal{D}\hbox{'} \), consider any sequence α and its forward extension β in Ψ^', and m = length(β) ≥ l = length(α).

a.
The sequence \( \left\{\frac{1}{k}{S}_k^{\alpha },1\le k\le n\right\} \) is decreasing.
b.
\( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \) and \( \frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha } \).
c.
AM_aub(α, Ψ^') and taub(α, Ψ^') are gradually tighter UBs on au(α, Ψ^') in each Ψ^', i.e., AM_aub(α, Ψ^') ≥ taub(α, Ψ^') ≥ au(α, Ψ^'), ∀ α ⊆ Ψ^'.
d.
\( \mathcal{AMF}\left( taub,{\varPsi}^{\prime}\right) \), or taub(α, Ψ^') is anti-monotone w.r.t forward extensions in each Ψ^', i.e., taub(β, Ψ^') ≤ taub(α, Ψ^'), for any forward extension β of α in Ψ^'.

Proof.

The two assertions a and d are proven similarly to Lemma 1 in.

b. For any forward extension β of α in Ψ^', Ψ^' ⊇ β = α ⋄ δ ⊇ α, since \( {\alpha}_{first}^{\prime}\subseteq {\beta}_{first}^{\prime}\subseteq {\varPsi}^{\prime } \), \( lastItem\left({\beta}_{first}^{\prime}\right) \) does not precede \( lastItem\left({\alpha}_{first}^{\prime}\right) \) (in Ψ^'). Thus, \( \mathcal{V}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{V}\left(\alpha, {\varPsi}^{\prime}\right) \) and \( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \), so \( \frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha } \).
c. For any \( {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right), \) we always have l = length(α'), \( au\left({\alpha}_{first}^{\prime}\right)\le \frac{1}{l}{S}_l^{\alpha }= taub\left(\alpha, {\varPsi}^{\prime}\right) \), so \( au\left(\alpha, {\varPsi}^{\prime}\right)=\min \left\{ au\left({\alpha}^{\prime}\right)|{\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\right\}\le au\left({\alpha}_{first}^{\prime}\right)\le taub\left(\alpha, {\varPsi}^{\prime}\right) \). Moreover, since \( {S}_l^{\alpha}\le \min \left\{u\left({\varPsi}^{\prime}\right),l\times \max \right\{q\mid \) (a, q) ∈ Ψ^'}}, ∀ α ⊆ Ψ^', it follows that taub(α, Ψ^') ≤ AM_aub(α, Ψ^').

Proof of Theorem 1.

a. (i). For any super-sequence β of α, β ⊇ α, since ρ(β) ⊆ ρ(α) and m = length(β) ≥ l = length(α), then \( \frac{1}{m}u\left({\varPsi}^{\prime}\right)\le \frac{1}{l}u\left({\varPsi}^{\prime}\right) \) and AM_aub(β) ≤ AM_aub(α), i.e., \( \mathcal{AM}\left( AM\_ aub\right) \).
(ii). By Lemma 1.d, for any forward extension β of α in Ψ^', since ρ(β) ⊆ ρ(α) and taub(β, Ψ^') ≤ taub(α, Ψ^'), then\( t\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right). \) Thus, \( \mathcal{AMF}\Big(t\_ aub \)).
(iii). Without loss of the generality, we only consider any forward extension β = α ⋄ z of α = ε ⋄ y. By Lemma 1.d, \( l\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\varepsilon, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \right) \). Similarly, for any backward extension β = ε ⋄ δ ⋄ y of α = ε ⋄ y, since ε ⋄ δ is a forward extension ε, we also have \( l\_ aub\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} taub\left(\varepsilon \diamond \delta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\varepsilon, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \right) \). Hence, \( \mathcal{AMB}i\left(l\_ aub\right) \).
b. + First, we prove that t_aub and l_aub are UBs on au, and t_aub is tighter than l_aub. For any β = α ⋄ δ ⊃ α and Ψ^' ∈ ρ(β) ⊆ ρ(α), by Lemma 1.c and d, we have \( au\left(\alpha \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} au\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right) \), i.e., t_aub is an UB on au. Moreover, \( t\_ aub\left(\alpha \diamond y\right)\overset{\mathrm{def}}{=}{\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \diamond y\right)} taub\left(\alpha \diamond y,{\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \diamond y\right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=l\_ aub\left(\alpha \diamond y\right) \), so t_aub is tighter than l_aub. However, as shown in Example 3, l_aub and AM_aub are incomparable.

+ Next, we prove that t_waub is a WUB on au, and it is tighter than t_aub. By Lemma 1.a, \( twaub\left(\alpha, {\varPsi}^{\prime}\right)\le topwaub\left(\alpha, {\varPsi}^{\prime}\right)=\frac{1}{l+1}{S}_{l+1}^{\alpha}\le \frac{1}{l}{S}_l^{\alpha }= taub\left(\alpha, {\varPsi}^{\prime}\right) \). Besides, since PE(α) ⊆ ρ(α), then \( t\_ waub\left(\alpha \right)={\sum}_{\varPsi^{\prime}\in SC\left(\alpha \right)} twaub\left(\alpha, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in \rho \left(\alpha \right)} taub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ aub\left(\alpha \right) \). By Theorem 1.a, t_waub is tighter than t_aub.

Moreover, for any proper (forward) extension β = α ⋄ δ of α with δ≠ <> and any m: such that \( l<m= length\left(\beta \right)\le n=\mid \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right)\mid \), consider Ψ^' ∈ ρ(β) ⊆ PE(α) ⊆ ρ(α).

(i). Since \( \mathcal{T}\left(\beta, {\varPsi}^{\prime}\right)\subseteq \mathcal{T}\left(\alpha, {\varPsi}^{\prime}\right) \) by Lemma 1.b-c, we have \( au\left(\beta, {\varPsi}^{\prime}\right)\le taub\left(\beta, {\varPsi}^{\prime}\right)=\frac{1}{m}{S}_m^{\beta}\le \frac{1}{m}{S}_m^{\alpha}\le \frac{1}{l+1}{S}_{l+1}^{\alpha }= topwaub\left(\alpha, {\varPsi}^{\prime}\right) \).

(ii). On the other hand, for any \( {\beta}^{\prime}\in \mathcal{O}\left(\beta, {\varPsi}^{\prime}\right) \), there exists \( {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right):{\beta}^{\prime }={\alpha}^{\prime}\diamond {\delta}^{\prime } \), length(rem(α^', Ψ^')) > 0, 1 ≤ k = length(δ^') ≤ length(rem(α^', Ψ^')) and length(β^') = m = l + k, \( au\left({\beta}^{\prime}\right)=\frac{1}{m}\left(u\left({\alpha}^{\prime}\right)+{\sum}_{\left({a}_i,{q}_i\right)\in {\delta}^{\prime }}{q}_i\right)\le \frac{l.{au}^{\prime }+k. Mu}{l+k}=g(k) \). Since \( {g}^{\prime }(k)=\frac{l.\left( Mu-{au}^{\prime}\right)}{{\left(l+k\right)}^2} \) is in the [1, length(rem(α^', Ψ^'))] interval, if Mu > au^', the function g(m) increases, so g(k) ≤ g(length(rem(α^', Ψ^'))) = remwaub^'(α^', Ψ^'). Otherwise, g(k) decreases, so g(k) ≤ g(1) = remwaub^'(α^', Ψ^'). Thus, we always have au(β^') ≤ remwaub^'(α^', Ψ^').Hence, au(β, Ψ^') \( =\min \left\{ au\left({\beta}^{\prime}\right)|{\beta}^{\prime}\in \mathcal{O}\left(\beta, {\varPsi}^{\prime}\right)\right\} \) ≤ max{ \( {remwaub}^{\prime}\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\mid {\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\wedge length\left( rem\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\right)>0\Big\}=\max \left\{{remwaub}^{\prime}\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)|{\alpha}^{\prime}\in \mathcal{O}\left(\alpha, {\varPsi}^{\prime}\right)\wedge rem\left({\alpha}^{\prime },{\varPsi}^{\prime}\right)\ne <>\right\} \) =remwaub(α, Ψ^').

From (i) and (ii), we have au(β, Ψ^') ≤ min {topwau b(α, Ψ^'), remwaub(α, Ψ^')} = twaub(α, Ψ^'). Since ρ(β) ⊆ PE(α), we obtain \( au\left(\beta \right)={\sum}_{\varPsi^{\prime}\in \rho \left(\beta \right)} au\left(\beta, {\varPsi}^{\prime}\right)\le {\sum}_{\varPsi^{\prime}\in PE\left(\alpha \right)} twaub\left(\alpha, {\varPsi}^{\prime}\right)=t\_ waub\left(\alpha \right) \). Thus, t_waub is a WUB on au.

1.2 Appendix 2 (Proof of Theorem 2)

a.
If (t_aub(α) < mu), then by Theorem 1.a.(ii), ∀β = α ⋄ δ ⊇ α, ρ(β) ⊆ ρ(α), au(β) ≤ t_aub(β) ≤ t_aub(α) < mu, i.e., we can deeply prune the branch(α).
b.
If (t_waub(α) < mu), then for any proper forward extension β = α ⋄ δ(⊃α) of α, we have au(β) ≤ t_waub(α) < mu. Hence, propBranch(α) can be pruned.
c.
If WidthPC_{l_aub}(α), for example l_aub(α) < mu, we can also prune branch(α) because \( \mathcal{AMB}i\left(l\_ aub\right) \) yields \( \mathcal{AMF}\left(l\_ aub\right) \). Besides, since \( \mathcal{AMB}i\left(l\_ aub\right) \), the remaining assertions are also true. Indeed, for example, for any y ∈ I(α⋄_ix), then y ≻ x, l_aub(α⋄_ix⋄_iy) ≥ mu. Thus, l_aub(α⋄_iy) ≥ l_aub(α⋄_i x⋄_iy) ≥ mu, so y ∈ I(α), i.e., I(α⋄_ix) ⊆ I(α). The remaining assertions are similarly proven.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Truong, T., Duong, H., Le, B. et al. Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies. Appl Intell 52, 6106–6128 (2022). https://doi.org/10.1007/s10489-021-02520-1

Download citation

Accepted: 07 May 2021
Published: 01 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10489-021-02520-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Abstract

Access this article

Similar content being viewed by others

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

On efficiently mining high utility sequential patterns

A Survey of High Utility Sequential Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendices

1.1 Appendix 1 (Proof of Theorem 1)

1.2 Appendix 2 (Proof of Theorem 2)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Abstract

Access this article

Similar content being viewed by others

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

On efficiently mining high utility sequential patterns

A Survey of High Utility Sequential Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendices

1.1 Appendix 1 (Proof of Theorem 1)

1.2 Appendix 2 (Proof of Theorem 2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation