Fast generation of sequential patterns with item constraints from concise representations

Duong, Hai; Truong, Tin; Tran, Anh; Le, Bac

doi:10.1007/s10115-019-01418-2

Fast generation of sequential patterns with item constraints from concise representations

Regular Paper
Published: 08 November 2019

Volume 62, pages 2191–2223, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hai Duong^1,2,
Tin Truong²,
Anh Tran² &
…
Bac Le¹

267 Accesses
5 Citations
Explore all metrics

Abstract

Constraint-based frequent sequence mining is an important and necessary task in data mining since it shows results very close to the requirements and interests of users. Most existing algorithms for performing this task are based on a traditional approach that mines patterns directly from a sequence database (SDB). However, in fact, SDBs are often very large. The algorithms thus often exhibit poor performance because the number of generated candidates and the search space are enormous, especially for low minimum support thresholds. In addition, these algorithms must read an SDB again when a constraint is changed by the user. In the context of frequently varied constraints, repeatedly scanning SDBs consume much time. To address this issue, we propose a novel approach for generating frequent sequences with various constraints from the two sets of frequent closed sequences (\( {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \)) and frequent generator sequences (\( {{\mathcal{F}}{\mathcal{G}}{\mathcal{S}}} \)), which are the concise representations of the set \( {{\mathcal{F}}{\mathcal{S}}} \) of all frequent sequences. The proposed approach is based on novel theoretical results that show an explicit relationship between \( {{\mathcal{F}}{\mathcal{S}}} \) and these two sets and have been strictly proved. The approach is then used to develop an efficient algorithm named MFS-IC for quickly generating frequent sequences with item constraints, a task that has many real-life applications. Extensive experiments on real-life and synthetic databases show that the proposed MFS-IC algorithm outperforms state-of-the-art algorithms, which directly mine frequent sequences with constraints from an SDB, in terms of runtime, memory usage and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Article 17 February 2017

Bac Le, Hai Duong, … Philippe Fournier-Viger

An Explicit Relationship Between Sequential Patterns and Their Concise Representations

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

Article 20 October 2015

Fabio Fumarola, Pasqua Fabiana Lanotte, … Donato Malerba

Notes

iff means “if and only if”.

References

Abboud Y, Boyer A, Brun A (2017) CCPM: a scalable and noise-resistant closed contiguous sequential patterns mining algorithm. In: 13th International conference on machine learning and data mining MLDM 2017. Springer, pp 147–162
Abboud Y, Brun A, Boyer A (2019) C3Ro: an efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Syst Appl 131:172–189
Article Google Scholar
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering. Washington, DC, pp 3–14
Amo SD, Furtado DA (2007) First-order temporal pattern mining with regular expression constraints. Data Knowl Eng 62(3):401–420
Article Google Scholar
Anh TN, Hai DV, Tin TC, Bac LH (2012) Mining frequent itemsets with dualistic constraints. In: Proceedings of PRICAI 2012, LNAI, pp 807–813
Anh T, Tin T, Bac L (2014) Structures of frequent itemsets and classifying structures of association rule set by order relations. Int J Intell Inf Database Syst 8(4):295–323
Google Scholar
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’02. ACM, New York, pp 429–435
Bac L, Hai D, Tin T, Fournier-Viger P (2017) FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Int J Knowl Inf Syst (KAIS) 53(1):71–107
Article Google Scholar
Bao H, Bay V, Snasel V (2017) An efficient parallel method for mining frequent closed sequential patterns. IEEE Access 5:17392–17402
Article Google Scholar
Béchet N, Cellier P, Charnois T, Crémilleux B (2015) Sequence mining under multiple constraints. In: The 30th annual ACM symposium on applied computing, pp 908–914
Buffett S (2018) Candidate list maintenance in high utility sequential pattern mining. In: 2018 IEEE International conference on big data, pp 644–652
Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci (NY) 178(6):1498–1518
Article MathSciNet MATH Google Scholar
Chen YL, Chiang MC, Ko MT (2003) Discovering time-interval sequential patterns in sequence databases. Expert Syst Appl 25(3):343–354
Article Google Scholar
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia conference on knowledge discovery and data mining, PAKDD ‘2014, pp 40–52
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
MATH Google Scholar
Fournier-Viger P, Gomariz A, Šebek M, Hlosta M (2014) VGEN: fast vertical mining of sequential generator patterns. In: Proceedings of 16th international conference on data warehousing and knowledge discovery, DWKD’14. Springer International Publishing, Munich, pp 476–488
Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48(2):429–463
Article Google Scholar
Gan W, Lin JC, Fournier-Viger P, Chao H, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov Data (TKDD) 13(3):1–34
Article Google Scholar
Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th VLDB conference, pp 223–234
Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Proceedings of 17th Pacific-Asia conference, PAKDD ‘13. Springer, Gold Coast, pp 50–61
Hai D, Tin T, Bac L (2018) Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Int J Eng Appl Artif Intell 67:197–210
Article Google Scholar
Hai D, Tin T, Bay V (2014) An efficient method for mining frequent itemsets with double constraints. Int J Eng Appl Artif Intell (EAAI) 27:148–154
Article Google Scholar
He Z, Zhang S, Gu F, Wu J (2019) Mining conditional discriminative sequential patterns. Inf Sci 478:524–539
Article Google Scholar
Ho J, Lukov L, Chawla S (2005) Sequential pattern mining with constraints on large protein databases. In: Proceedings of the 12th international conference on management of data (COMAD), pp 89–100
Leleu M, Rigotti C, Boulicaut JF, Euvrard G (2003) Constraint-based mining of sequential patterns over datasets with consecutive repetitions. In: Knowledge discovery in databases: PKDD 2003, pp 303–314
Mallick B, Garg D, Grover PS (2014) Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary. Int Arab J Inf Technol 11(1):33–42
Google Scholar
Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst Appl 36(2):2677–2690
Article Google Scholar
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of the 1998 ACM-SIG-MOD international conference on the management of data, pp 13–24
Orlando S, Perego R, Silvestri C (2004) A new algorithm for gap constrained sequence mining. In: The 2004 ACM symposium on applied computing, pp 540–547
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of the IEEE ICDE’01, pp 433–442
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. J IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern growth methods. J Intell Inf Syst 28(2):133–160
Article Google Scholar
Rodríguez-González AY, Lezama F, Iglesias-Alvarez CA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Cote EM (2018) Closed frequent similar pattern mining: reducing the number of frequent similar patterns without information loss. Expert Syst Appl 96:271–283
Article Google Scholar
Sainju AM, Aghajarian D, Jiang Z, Prasad SK (2018) Parallel grid-based colocation mining algorithms on GPUs for big spatial event data. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2018.2871062
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT’96. ACM, pp 3–17
Tin TC, Hai DV, Ngan HNT (2016) Structure of frequent itemsets with extended double constraints. Vietnam J Comput Sci 3(2):119–135
Article Google Scholar
Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 25(2):311–330
Article Google Scholar
Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914
Article Google Scholar
Wang J, Han J, Li Chun (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Article Google Scholar
Wu R, Li Q, Chen X (2019) Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell. https://doi.org/10.1007/s10489-019-01492-7
Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, pp 166–177
Yen SJ, Lee YS (2004) Mining sequential patterns with item constraints. In: Data warehousing and knowledge discovery, pp 381–390
Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599
Article MATH Google Scholar
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the ninth international conference on Information and knowledge management, pp 422–429
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Article MATH Google Scholar
Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowl Based Syst 89:1–13
Article Google Scholar

Download references

Acknowledgements

This work is funded by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2017.300.

Author information

Authors and Affiliations

VNU-HCMC, University of Science, Ho Chi Minh, Vietnam
Hai Duong & Bac Le
Department of Mathematics and Computer Science, University of Dalat, Dalat, Vietnam
Hai Duong, Tin Truong & Anh Tran

Authors

Hai Duong
View author publications
You can also search for this author in PubMed Google Scholar
Tin Truong
View author publications
You can also search for this author in PubMed Google Scholar
Anh Tran
View author publications
You can also search for this author in PubMed Google Scholar
Bac Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bac Le.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix 1: Proof of Proposition 1

(i)
“\( {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \)”: Consider any \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ):\gamma \sqsubseteq \alpha \sqsubseteq \sigma \). Without loss of generality, we can assume that \( \gamma = E_{1} \to E_{2} \to \cdots \to E_{p} \), \( \alpha = F_{1} \to F_{2} \to \cdots \to F_{q} \), \( \sigma = E_{1}^{{\prime }} \to E_{2}^{{\prime }} \to \cdots \to E_{q}^{{\prime }} \), \( F_{i} \subseteq E_{i}^{{\prime }} ,\;\forall i = 1, \ldots ,q \) and \( \exists lp = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \in LP(\alpha ,\gamma ) \subseteq LP(\sigma ,\gamma ) \), with \( 1 \le j_{1} < j_{2} < \cdots < j_{p} \le q: E_{k} \subseteq F_{jk} \subseteq E_{jk}^{{\prime }} ,\;\forall k = 1, \ldots ,p \). Set \( d_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {F_{i} , } \hfill & {{\text{if}}\;i \notin lp} \hfill \\ {F_{i} \backslash E_{k} , } \hfill & { {\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( D_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , } \hfill & {{\text{if}}\; i \notin lp} \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( \delta (lp)\mathop = \limits^{\text{def}} d_{1} \to d_{2} \to \cdots \to d_{q} \), \( E_{i}^{\sim} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {\emptyset ,} \hfill & {{\text{if}}\; i \notin lp} \hfill \\ {E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), \( \forall i = 1, \ldots ,q,\;Ex(\gamma ,lp)\mathop = \limits^{\text{def}} E_{1}^{\sim} \to E_{2}^{\sim} \to \cdots \to E_{q}^{\sim} \), then \( d_{i} \subseteq D_{i} \) and \( F_{i} = E_{i}^{\sim} + d_{i} ,\;\forall i = 1, \ldots ,q \), so \( \alpha = Ex(\gamma ,lp) \oplus \delta (lp) \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \).
(ii)
“\( {{\mathcal{F}}{\mathcal{S}}}(\sigma , \gamma ) \supseteq {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \)”: \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma ,\gamma ) \), \( \alpha = Ex(\gamma , \, lp) \oplus \delta (lp) = F_{1} \to F_{2} \to \cdots \to F_{q} \), then \( F_{i} = E_{i}^{\sim} + d_{i} \), \( d_{i} \subseteq D_{i} \), \( E_{i}^{\sim} \subseteq F_{i} \subseteq E_{i}^{\sim} + D_{i} \subseteq E_{i}^{{\prime }} ,\;\;\forall i = 1, \ldots ,q \). Thus, \( \gamma \sqsubseteq \alpha \sqsubseteq \sigma \), \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma ,\gamma ) \).□

1.2 Appendix 2: Proof of Theorem 1

(a)
First, it is found that \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}},\exists \sigma \in CloSet(\alpha ) \) such that \( ro\mathop = \limits^{\text{def}} \rho (\alpha ) = \rho (\sigma ) \), so \( \sigma \in {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \) and \( ro \in RO \), then \( FS(ro) = [\sigma ] \). Since \( \sim \) is an equivalence relation, then different equivalence classes \( [\sigma ] \) or \( FS(ro) \) are disjoint. Thus, the assertion \( {{\mathcal{F}}{\mathcal{S}}\ominus } = \mathop \sum \nolimits_{ro \in RO} FS(ro) \) is obvious.
(b)
First, by contradiction, it is easy to prove that all different subsets \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ),\forall \sigma_{i} \in CS(ro) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), are disjoint. Indeed, assume conversely that \( \exists \sigma_{m} ,\;\sigma_{i} \in CS(ro),\exists \gamma_{k} \in GenSet(\sigma_{m} ) \), \( \exists \gamma_{j} \in GenSet(\sigma_{i} ):\;(\sigma_{m} \ne \sigma_{i} \;{\text{or}}\;\gamma_{k} \ne \gamma_{j} ) \) and \( \exists \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{m} ,\gamma_{k} ) \cap {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} )\;( \ne \emptyset ) \). Indeed, it is found that two cases \( \sigma_{m} \ne \sigma_{i} \) or (\( \sigma_{m} \equiv \sigma_{i} \) and \( \gamma_{k} \ne \gamma_{j} \)) always lead to contradicting the condition \( not(DCondC(\alpha ,\sigma_{i} ,CS(\rho (\sigma_{i} )))) \) in \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma ) \) or \( not(DCondG(\alpha ,\sigma_{i} ,\gamma_{j} )) \) in \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \). Second, we will prove that \( FS(ro) = \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ),\;\;\forall ro \in RO \).
1. (i)
  “\( \subseteq \)”: \( \forall \alpha \in FS(ro) \), then \( \alpha \in {{\mathcal{F}}{\mathcal{S}}} \) and \( \rho (\alpha ) = ro \). Since \( CloSet(\alpha ) \ne \emptyset \) and \( GenSet(\alpha ) \ne \emptyset , \, [\exists \sigma_{i} \in CloSet(\alpha ),\exists \gamma_{j} \in GenSet(\alpha ):\sigma_{i} \in {{\mathcal{C}}{\mathcal{S}}},\;\gamma_{j} \in {{\mathcal{G}}{\mathcal{S}}},\;\gamma_{j} \sqsubseteq \alpha \sqsubseteq \sigma_{i} ,\;\rho (\sigma_{i} ) = \rho (\gamma_{j} ) = \rho (\alpha )]^{(*)} \), so \( \gamma_{j} \in GenSet(\sigma_{i} ),\;supp(\sigma_{i} ) = supp(\gamma_{j} ) = supp(\alpha ) \ge ms \), \( \sigma_{i} \in {{\mathcal{F}}{\mathcal{C}}{\mathcal{S}}} \) and \( \sigma_{i} \in CS(ro) \). Hence, \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}(\sigma_{i} ,\gamma_{j} ) \). Without loss of generality, we can select \( \sigma_{i} \) and \( \gamma_{j} \) that are, respectively, the first closed and generator sequences satisfying the condition ^(*). Thus, \( FS(ro) \subseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \).
2. (ii)
  “\( \supseteq \)”: \( \forall \sigma_{i} \in CS(ro) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \), we have \( \alpha \in FS(ro) \). Thus, \( {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \subseteq FS(ro) \) and \( FS(ro) \supseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS(ro)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet(\sigma_{i} )}} {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} , \gamma_{j} ) \)□

1.3 Appendix 3: Proof of Theorem 2

(a)
It is clear that \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \subseteq { {\mathcal{F}}{\mathcal{S}}}^{{\prime }} \left( {\sigma , \gamma } \right) \) In the proof of the above three pruning cases, \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma , \gamma ) \), if \( \exists i = 1, \, 2, \, 3 :\;DCond_{i} \) is true, then \( \alpha \) is a duplicate of some previously generated sequences. Thus, \( {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} (\sigma , \gamma ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) and \( {{\mathcal{F}}{\mathcal{S}}}^{{\prime }} \text{(}\sigma , \gamma \text{)} = {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \).
(b)
We will prove the first assertion by contradiction. Assume conversely that there are two sequences \( \alpha = Ex(\gamma ,lp_{m} ) \oplus \delta (lp_{m} ) \) and \( \beta = Ex(\gamma ,lp_{n} ) \oplus \delta (lp_{n} ) \) (in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \)) according to two different position lists in \( LP(\sigma ,\gamma ):lp_{m} \ne lp_{n} \), but \( \alpha \equiv \beta \), i.e., there exist two \( lp_{m} = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \), \( lp_{n} = \{ i_{1} ,i_{2} , \ldots ,i_{p} \} \in LP(\sigma ,\gamma ) \) with \( m < n, \, 1 \le j_{1} < j_{2} < \cdots < j_{p} \le q \), \( 1 \le i_{1} < i_{2} < \cdots < i_{p} \le q \), \( d(lp_{m} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , } \hfill & {{\text{if}}\; i \notin lp_{m} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} ,} \hfill & {{\text{if}}\;i = j_{k} \in lp_{m} } \hfill \\ \end{array} } \right. \), \( d_{i} \mathop = \limits^{\text{def}} d(lp_{n} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E_{i}^{'} , } \hfill & { {\text{if}}\; i \notin lp_{n} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , } \hfill & {{\text{if}}\; i = i_{k} \in lp_{n} } \hfill \\ \end{array} } \right. \), \( F_{i} \subseteq d(lp_{m} )_{i} \), \( F^{{\prime }}_{i} \subseteq d_{i} ,\;\forall i = 1, \ldots ,q \), and two corresponding sequences \( \alpha = F_{1} \to F_{2} \to \cdots \to F_{q} \) and \( \beta = F_{1}^{{\prime }} \to F_{2}^{{\prime }} \to \cdots \to F^{{\prime }}_{q} \) that belong to \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) such that \( \alpha \equiv \beta \) and \( DCond_{k} (lp_{m} ) \), \( DCond_{k} (lp_{n} ) \) are false, \( \forall k = 1, \, 2, \, 3 \).

After deleting all empty itemsets in \( \alpha \) and \( \beta \), we obtain \( \alpha = F_{{u_{1} }} \to F_{{u_{2} }} \to \cdots \to F_{{u_{N} }} \), \( \beta = F^{{\prime }}_{{v_{1} }} \to F^{{\prime }}_{{v_{2} }} \to \cdots \to F^{{\prime }}_{{v_{N} }} \) with \( N = size(\alpha ) = size(\beta ) \), \( 1 \le u_{1} < u_{2} < \cdots < u_{N} \le q, \, 1 \le v_{1} < v_{2} < \cdots < v_{N} \le q \) and \( F_{{u_{i} }} = F^{{\prime }}_{{v_{i} }} \ne \emptyset ,\;\;\forall i = 1, \ldots ,N \) (because \( \alpha \equiv \beta \)). We set \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} \equiv 0,k_{p + 1} = N \) and \( 1 \le k_{1} < k_{2} < \cdots < k_{p} \le N:j_{r} = u_{{k_{r} }} \) (so \( F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \subseteq F^{{\prime }}_{{j_{r} }} \)), \( \forall r = 1,..,p \).

For \( \forall r = 1,..,p + 1 \), \( \forall k = (k_{r - 1} + 1),.., k_{r} \), then \( u_{{k_{r - 1} }} = j_{r - 1} < u_{k} \le j_{r} \le i_{r} \). By induction, we can prove that if \( \{ (v_{h} = u_{h} ,\forall h = \, 0,..,k - 1),\;(v_{k - 1} < i_{r} )\;{\text{and}}\;(\forall r^{{\prime }} = 0,..,r - 1,\;j_{{r^{{\prime }} }} = i_{{r^{{\prime }} }} = u_{{k_{{r^{{\prime }} }} }} = v_{{k_{{r^{{\prime }} }} }} )\} \), then \( \{ v_{k} = u_{k} ,\;{\text{and}}\;{\text{if}}(r \le p)\;{\text{then}}[(v_{k} < i_{r} ,\;{\text{if}}\;k < k_{r} )\;{\text{and}}(j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} )]\} \).

Finally, under the hypothesis\( \alpha \equiv \beta \), then \( \forall r = 1, \ldots ,p + 1,\forall k = (k_{r - 1} + 1), \ldots ,k_{r} \), we always have \( (v_{k} = u_{k} ),\;(j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} ,\;{\text{if}}\;r \le p) \). Therefore, \( (\forall k^{{\prime }} = 1,..,N,\;v_{{k^{{\prime }} }} = u_{{k^{{\prime }} }} )^{(*)} \) and \( (\forall r = 1,..,p,j_{r} = i_{r} )^{(**)} \).

If \( lp_{m} = lp_{n} \), then \( j_{r} = i_{r} ,\;\forall r = 1,..,p \). Since \( \alpha \) and \( \beta \) are generated from two different position lists in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ),\exists i_{0} = 1, \ldots ,N \) such that \( u_{{i_{0} }} \ne v_{{i_{0} }} \). This contradicts ^(*).
If \( lp_{m} \ne lp_{n} \), i.e., \( \exists m_{0} = 1,..,p \) such that \( i_{{m_{0} }} \ne j_{{m_{0} }} \). This also contradicts ^(**). In other words, the hypothesis\( \alpha \equiv \beta \) always yields a contradiction.

Thus, \( \alpha \ne \beta \), i.e., all sequences in \( {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) are distinctly generated.

From Proposition 1 and Theorem 1.a, we have \( {{\mathcal{F}}{\mathcal{S}}}(\sigma ,\gamma ) = {{\mathcal{F}}{\mathcal{S}}}^{'} (\sigma ,\gamma )= {{\mathcal{F}}{\mathcal{S}}}^{**} (\sigma ,\gamma ) \) and the remaining assertions are deduced from Theorem 1.□

1.4 Appendix 4: Proof of Theorem 3

(a)
First, based on the equivalence relation \( \sim \), it is easily found that \( {{\mathcal{F}}{\mathcal{S}}}^{E} = \mathop \sum \nolimits_{{ro \in RO^{E} }} {{\mathcal{F}}{\mathcal{S}}}^{E} (ro) \).
(b)
- “\( \subseteq \)”: \( \forall \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{E} \), according to Theorem 2, we have \( {\mathcal{C}}_{I} (\alpha ) \) and \( \exists ro \in RO \), \( \sigma_{i} \in CS(ro) \subseteq {{\mathcal{C}}{\mathcal{S}}} \) and \( \gamma_{j} \in GenSet(\sigma_{i} ):\alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \), so \( \gamma_{j} \sqsubseteq \alpha \sqsubseteq \sigma_{i} \), \( {\mathcal{C}}_{I} (\gamma_{j} ) \) and \( \gamma_{j} \in GenSet^{E} (\sigma_{i} ) \), \( ro = \rho (\sigma_{i} ) \in RO^{E} \), \( \sigma_{i} \in CS^{E} (ro) \) and \( \alpha \in {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \). All the subsets \( {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \) are disjoint because \( GenSet^{E} (\sigma_{i} ) \subseteq GenSet(\sigma_{i} ),\;{{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{*} (\sigma_{i} ,\gamma_{j} ) \).
- “\( \supseteq \)”: This inclusion is clearly because \( GenSet^{E} (\sigma_{i} ) \subseteq GenSet(\sigma_{i} ) \) and \( {{\mathcal{F}}{\mathcal{S}}}^{*E} (\sigma_{i} ,\gamma_{j} ) \subseteq {{\mathcal{F}}{\mathcal{S}}}^{E} (\sigma_{i} ,\gamma_{j} ) \).
□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duong, H., Truong, T., Tran, A. et al. Fast generation of sequential patterns with item constraints from concise representations. Knowl Inf Syst 62, 2191–2223 (2020). https://doi.org/10.1007/s10115-019-01418-2

Download citation

Received: 15 April 2019
Revised: 08 October 2019
Accepted: 12 October 2019
Published: 08 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10115-019-01418-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast generation of sequential patterns with item constraints from concise representations

Abstract

Access this article

Similar content being viewed by others

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

An Explicit Relationship Between Sequential Patterns and Their Concise Representations

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Appendix 1: Proof of Proposition 1

1.2 Appendix 2: Proof of Theorem 1

1.3 Appendix 3: Proof of Theorem 2

1.4 Appendix 4: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast generation of sequential patterns with item constraints from concise representations

Abstract

Access this article

Similar content being viewed by others

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

An Explicit Relationship Between Sequential Patterns and Their Concise Representations

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix 1: Proof of Proposition 1

1.2 Appendix 2: Proof of Theorem 1

1.3 Appendix 3: Proof of Theorem 2

1.4 Appendix 4: Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation