Abstract
Mining frequent sequences in sequential databases are highly valuable for many real-life applications. However, in several cases, especially when databases are huge and when low minimum support thresholds are used, the cardinality of the result set can be enormous. Consequently, algorithms for discovering frequent sequences exhibit poor performance, showing an important increase in execution time, memory consumption and storage space usage. To address this issue, researchers have studied the tasks of mining frequent closed and generator sequences, as they provide several benefits when compared to the set of frequent sequences. One of the most important benefits is that the cardinalities of frequent closed and generator sequences are generally much less than the cardinality of frequent sequences. Hence, humans find it more convenient to analyze the information provided by closed and generator sequences. Moreover, it was shown that frequent closed sequences have the advantage of being lossless, and they thus preserve information about the frequency of all frequent subsequences, while generator sequences can provide higher accuracy for sequence classification tasks since they are the smallest patterns that characterize groups of sequences. Besides, frequent closed sequences can be combined with generators to produce non-redundant sequential rules and recover the complete set of frequent sequences and their frequencies. This paper proposes two novel algorithms named FCloSM and FGenSM to mine frequent closed and generator sequences efficiently. These algorithms are based on new pruning conditions called extended early elimination (3E) and early pruning techniques named EPCLO and EPGEN, designed to identify non-closed and non-generator patterns early. Based on these techniques, two local pruning strategies called LPCLO and LPGEN are proposed to eliminate non-closed and non-generator patterns more efficiently at two successive levels of the prefix search tree without performing subsequence relation checking. These theoretical results, which are the basis of FCloSM and FGenSM, are mathematically proved and are shown to be more general than those presented in previous work. Extensive experiments show that FCloSM and FGenSM are one to two orders of magnitude faster than the state-of-the-art algorithms for discovering frequent closed sequences (CloSpan, BIDE, ClaSP and CM-ClaSP) and for mining frequent generators (FEAT, FSGP and VGEN), and that FCloSM and FGenSM consume much less memory.















Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, ICDE ’95. IEEE Computer Society, Washington, DC, pp 3–14
Agustina T, Sitanggang IS (2015) Sequential patterns for hotspot occurrences based weather data using Clospan algorithm. In: 3rd international conference on adaptive and intelligent agroindustry (ICAIA). IEEE, pp 245–249
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02. ACM, New York, NY, pp 429–435
Baralis E, Bruno G, Chiusano S, Domenici VC, Mahoto NA, Petrigni C (2010) Analysis of medical pathways by means of frequent closed sequences. In: International conference on Knowledge-based and intelligent information and engineering systems. Springer, Berlin, Heidelberg, pp 418–425
Chen Z, El-Nasr MS, Canossa A, Badler J, Tignor S, Colvin R (2015) Modeling individual differences through frequent pattern mining on role-playing game actions. In: Eleventh artificial intelligence and interactive digital entertainment conference, AIIDE 2015
Fournier-Viger P, Nkambou R, Tseng VS (2011) RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11. ACM, New York, NY, pp 956–961
Fournier-Viger P, Faghihi U, Nkambou R, Mephu Nguifo E (2012) CMRULES: an efficient algorithm for mining sequential rules common to several sequences. Knowl Based Syst 25(1):63–76
Fournier-Viger P, Nkambou R, Mephu Nguifo E, Mayers A, Faghihi U (2013) A multi-paradigm intelligent tutoring system for robotic arm training. IEEE Trans Learn Technol 6(4):364–377
Fournier-Viger P, Wu CW, Tseng VS (2013) Mining maximal sequential patterns without candidate maintenance. In: Proceedings of 9th international conference on advanced data mining and applications, ADMA’13. Springer, Hangzhou, China, pp 169–180
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia conference on knowledge discovery and data mining, PAKDD’2014. pp 40–52
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
Fournier-Viger P, Gomariz A, Šebek M, Hlosta M (2014) VGEN: Fast vertical mining of sequential generator patterns. In: Proceedings of 16th international conference on data warehousing and knowledge discovery, DWKD’14. Springer International Publishing, Munich, Germany, pp 476–488
Gao C, Wang J, He Y, Zhou L (2008) Efficient mining of frequent sequence generators. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, NY, pp 1051–1052
García-Rudolph A, Gibert K (2016) Understanding effects of cognitive rehabilitation under a knowledge discovery approach. Eng Appl Artif Intell 55:165–185
Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Proceedings of 17th Pacific-Asia conference, PAKDD ’13. Springer, Gold Coast, Australia, pp 50–61
Gomez M, Rouvoy R, Adams B, Seinturier L (2016) Reproducing context-sensitive Crashes of mobile apps using crowdsourced monitoring. In: Proceedings of the 3rd IEEE/ACM international conference on mobile software engineering and systems (MOBILESoft’16). ACM, New York, NY, pp 88–99
Grunwald P, Myung IJ, Pitt M (2005) Advances in minimum description length: theory and applications. MIT Press, London
Hai D, Tin T, Bay V (2014) An efficient method for mining frequent itemsets with double constraints. Int J Eng Appl Artif Intell (EAAI) 27:148–154
Harms SK, Deogun J, Tadesse T (2002) Discovering sequential association rules with constraints and time lags in multiple sequences. In: Proceedings of 13th international symposium, ISMIS 2002. Springer, Lyon, France, pp 432–441
Huang H, Yao L, Tsai CY (2016) Transportation service quality improvement through closed sequential pattern mining approach. Cybern Inf Technol 16(3):185–194
Ignatov DI, Mitrofanova E, Muratova A, Gizdatullin D (2015) Pattern mining and machine learning for demographic sequences. In: International conference on knowledge engineering and the semantic web. Springer International Publishing, pp 225–239
Jorritsma W, Cnossen F, Dierckx RA, Oudkerk M, Van Ooijen PM (2016) Pattern mining of user interaction logs for a post-deployment usability evaluation of a radiology PACS client. Int J Med Inf 85(1):36–42
Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on Artificial intelligence, AAAI ’06. ACM, pp 409–414
Lo D, Khoo SC, Li J (2008) Mining and ranking generators of sequential patterns. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM ’08. SIAM, pp 553–564
Lo D, Khoo SC, Wong L (2011) Non-redundant sequential rules: theory and algorithm. Inf Syst 34(4):438–453
Luo C, Chung S (2005) Efficient mining of maximal sequential patterns using multiple samples. In: SIAM international conference on data mining (SDM’05), pp 415–426
Minh-Thai T, Bac L, Bay V, Hong T (2016) Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques. Int J Artif Intell 45(2):333–342
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, ICDT ’12. ACM, London, UK, pp 398–416
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. J IEEE Trans Knowl Data Eng 16(11):1424–1440
Pham TT, Luo J, Hong TP, Vo B (2012) MSGPs: a novel algorithm for mining sequential generator patterns. In: Proceedings of 4th international conference on computational collective intelligence, ICCCI 2012. Springer, Ho Chi Minh City, Vietnam, pp 393–401
Pham TT, Luo J, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Int J Innov Comput Inf Control 9:4811–4824
Pham TT, Luo J, Hong TP, Vo B (2014) An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Eng Appl Artif Intell (EAAI) 32:88–99
Rahman A, Xu Y, Radke K, Foo E (2016) Finding anomalies in SCADA logs using rare sequential pattern mining. In: International conference on network and system security. Springer International Publishing, pp 499–506
Saraswati A, Chang CF, Ghose A, Dam HK (2015) Learning relationships between the business layer and the application layer in ArchiMate models. In: International conference on conceptual modeling. Springer International Publishing, pp 499–513
Schweizer D, Zehnder M, Wache H, Witschel HF, Zanatta D, Rodriguez M (2015) Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, pp 1123–1129
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT ’96. ACM, pp 3–17
Truong T, Tran A (2010) Structure of set of association rules based on concept lattice. In: Advances in intelligent information and database systems, AIIDS ’10. Springer, pp 217–227
Truong T, Duong H, Hoang NTN (2016) Structure of frequent itemsets with extended double constraints. Vietnam J Comput Sci 3(2):119–135
Wang J, Han J, Li Chun (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, pp 166–177
Yi S, Zhao T, Zhang Y, Ma S, Che Z (2011) An effective algorithm for mining sequential generators. Proc Eng 15:3653–3657
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Zhang W (2016) Learning from access logs to mitigate insider threats. Doctoral dissertation, Vanderbilt University
Zhao Y, Wang G, Li Y, Wang Z (2011) Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules. In: IEEE 11th international conference on data mining. IEEE, pp 972–981
Zhao Y, Li Y, Yin Y, Sheng G (2015) Finding top-k covering irreducible contrast sequence rules for disease diagnosis. Comput Math Methods Med 2015:353146. doi:10.1155/2015/353146
Zhao Y, Wang G, Yin Y, Li Y, Wang Z (2016) Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput Appl 27(1):155–166
Acknowledgements
This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.05-2015.07.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
1.1 Appendix 1
To prove Proposition 1 and Theorem 1, we first consider the relation between the measures remSize and length of suffixes stated in Lemma 1.
Lemma 1
(Relation between remSize and length of suffixes).
If \(\alpha \sqsubset \beta \sqsubseteq \varPsi , \textit{suf} = \textit{suf}(\varPsi , \alpha ), {suf}{'} = \textit{suf}(\varPsi , \beta )\), then
-
a.
\(\textit{suf }\sqsupseteq \textit{suf}'\).
-
b.
\(\textit{ suf }= \textit{suf}' \Leftrightarrow \textit{ length}(\textit{suf}) =\textit{ length}(\textit{suf}')\)
$$\begin{aligned} \Leftrightarrow \textit{remSize}(\textit{suf}) =\textit{ remSize}(\textit{suf}')\text { and }{} \textit{ lastItemOf}(\alpha ) = \textit{lastItemOf}(\beta ). \end{aligned}$$
Note that if remSize(suf) = remSize \((\textit{suf}')\), then, in the case of 1\(-{\mathcal{SDB}}\)s, the last condition in Lemma 1.b, lastItemOf \((\alpha ) = \textit{lastItemOf}(\beta )\), is always satisfied, but for \(n-{\mathcal{SDB}}\)s, this condition can be false, i.e., it is necessary to check.
Proof
Let \(\delta = \textit{prefix}(\varPsi , \alpha )\) and \(\delta ' =\textit{ prefix}(\varPsi ,\beta )\).
-
a.
It is clear that \(\delta \sqsubseteq \delta '\). Thus, suf \(\sqsupseteq \textit{suf}'\).
-
b.
The first equivalence is obviously true. Now, we prove the second equivalence.
\(\bullet \) “\(\Rightarrow \)”: If length(suf) = length \(({\textit{suf}}')\), then length \((\delta )\) = length \((\delta ')\). Hence, \(\delta = \delta '\), and thus lastItemOf \((\delta )\) = lastItemOf \((\delta ')\) and size \((\delta )= \textit{size}(\delta ')\). Since remSize(suf) = size(S) - size \((\delta )\) and lastItemOf \((\alpha ) =\) lastItemOf \((\delta )\), lastItemOf \((\beta ) = \textit{lastItemOf}(\delta ')\), we obtain remSize \((\textit{suf})=\textit{ remSize}(\textit{suf}')\) and lastItemOf \((\alpha )= \textit{lastItemOf}(\beta )\).
\(\bullet \) “\(\Leftarrow \)”: If remSize \((\textit{suf}) =\textit{ remSize}(\textit{suf}')\), then size \((\delta ) = \textit{size}(\delta ') = k\) and the \((k-\)1) first events of \(\delta \) and \(\delta '\) are identical, so lastEventOf \((\delta ) \subseteq \textit{ lastEvent}(\delta ')\). Moreover, because \(\alpha \)and \(\beta \) share the same last item, lastItemOf \((\alpha )= \textit{lastItemOf}(\beta )\). Thus, lastItemOf \((\delta ) = \textit{lastItemOf}(\delta ')\) and lastEventOf \((\delta )=\textit{ lastEvent}(\delta ')\). Hence, \(\delta = \delta '\), \(\textit{suf }= \textit{suf}'\) and length(suf) = length \(({\textit{suf}}')\). \(\square \)
1.2 Appendix 2
To prove Theorem 1, we need to demonstrate the anti-monotonicity of the operators: \(\rho \), support, PDB, SI and SE as described in Proposition 1.
Proposition 1
(Properties of SE and SI in PDBs). For two arbitrary sequences \(\alpha \) and \(\beta \) such that \(\alpha \sqsubseteq \beta \), the following assertions hold.
-
a.
-
(i).
\(\rho (\alpha ) \supseteq \rho (\beta )\), support \((\alpha )\geqslant \textit{ support}(\beta )\). (anti-monotonicity of the \(\rho \) and support operators)
-
(ii).
\(\rho (\alpha )= \rho (\beta ) \Leftrightarrow \textit{ support}(\alpha )=\textit{ support}(\beta )\).
-
(iii).
\(\textit{support}(\alpha ) =|{\mathcal {D}}_{\alpha }|\).
-
(i).
-
b.
(Anti-monotonicity of PDB, SI, SE)
-
(i).
\({\mathcal {D}}_{\alpha } \sqsupseteq {\mathcal {D}}_{\beta }\).
-
(ii).
SI \(({\mathcal {D}}_{\alpha }) \geqslant \textit{SI}({\mathcal {D}}_{\beta })\) and \(\textit{SE}({\mathcal {D}}_{\alpha }) \geqslant \textit{ SE}({\mathcal {D}}_{\beta })\).
-
(i).
Proof
-
a.
These assertions are obviously true by the definitions of the \(\rho \) and support operators.
-
b.
-
(i).
\(\varPsi \in \rho (\alpha )\Leftrightarrow (\varPsi \in D\) and \(\alpha \sqsubseteq \varPsi )\)
\(\Leftrightarrow (\varPsi \in {\mathcal {D}}\) and \(\varPsi = \delta \diamondsuit \; suf, \delta = \hbox {prefix}(\varPsi , \alpha )\), \(\textit{suf }=\textit{ suf}(\varPsi , \alpha ))\)
\(\Leftrightarrow (suf = suf(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }\), with \(\varPsi \in {\mathcal {D}}\) and \(\varPsi = \delta \diamondsuit \) suf, p= prefix \((\varPsi ,\alpha ))\) or support \((\alpha ) =|\rho (\alpha )| =|{\mathcal {D}}_{\alpha }|\).
-
(ii).
\(\forall \varPsi \in \rho (\beta ), \textit{suf}' =\) suf \((\varPsi , \beta )\in \mathcal {D}_{\beta }\), we have \((\varPsi = \delta '\diamondsuit \textit{suf}')\wedge (\delta ' =\) prefix \((\varPsi , \beta ))\)
\(\Leftrightarrow (\varPsi = \delta '\diamondsuit \textit{suf}') \wedge (\beta \sqsubseteq \delta ')\wedge (\not \exists \delta '': S = \delta ''\diamondsuit r' \wedge \beta \sqsubseteq \delta ''\sqsubset \delta ')\)
\(\Rightarrow (\varPsi \in \rho (\alpha ))\wedge (\varPsi = \delta ' \diamondsuit \textit{suf}')\wedge (\alpha \sqsubseteq \delta ')\), because \(\alpha \sqsubseteq \beta \sqsubseteq \delta ' \sqsubseteq \varPsi \).
There are two possible cases:
-
If \(\not \exists \delta ''\): \(\varPsi = \delta ''\diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta '' \sqsubset \delta '\), then \(\delta ' =\textit{ prefix}(\varPsi , \alpha )\), and thus \(\textit{suf}' =\textit{suf}(\varPsi , \alpha )\in {\mathcal {D}}_{\alpha }\).
-
Otherwise, \(\exists \delta '': \varPsi = \delta '' \diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta ''\sqsubset \delta '\), then \(\delta ''\) is a prefix (of \(\varPsi \)) containing \(\alpha \). We call \(\gamma =\textit{ prefix}(\varPsi ,\alpha )\) the smallest prefix (of \(\varPsi )\) containing \(\alpha \), i.e., \(\exists \textit{suf }=\textit{ suf}(\varPsi , \alpha ): (\varPsi = \gamma \diamondsuit \textit{suf }) \wedge (\alpha \sqsubseteq \gamma \sqsubseteq \delta ''\sqsubset \delta ')\), thus \(\textit{suf}'\sqsubset \) suf. Therefore, suf \(\in {\mathcal {D}}_{\alpha } \) and \(\textit{suf}'\sqsubset \textit{ suf}\). Finally, in all cases, \(\varPsi \in \rho (\alpha ), \exists \) suf \((\varPsi ,\alpha ) \in {\mathcal {D}}_{{\alpha }} \) and \(\textit{suf}'\sqsubseteq \textit{ suf}\). Hence, \(\mathcal {D}_{\beta } \sqsubseteq {\mathcal {D}}_{\alpha }\).
-
(i).
-
(iii).
By a. (i) and b. (ii), \(\forall \varPsi \in \rho (\beta ) \subseteq \rho (\alpha ), \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta } \textit{suf }=\textit{ suf}(\varPsi ,\alpha ) \in {\mathcal {D}}_{\alpha }\) and \(\textit{suf}' \sqsubseteq \) suf, it follows that length \(({\textit{suf}}')\leqslant \textit{ length}(\textit{suf})\) and remSize \(({\textit{suf}}')\leqslant \textit{ remSize}(\textit{suf})\). Thus, \(\textit{SI}({\mathcal {D}}_{\beta }) \leqslant \textit{ SI}({\mathcal {D}}_{\alpha })\) and SE \((\mathcal {D}_{\beta })\leqslant \textit{ SE}({\mathcal {D}}_{\alpha })\).
\(\square \)
1.3 Appendix 3: Proof of Theorem 1
-
a.
\({\mathcal {D}}_{\alpha }= \mathcal {D}_{\beta }\)
\(\Leftrightarrow (\rho (\alpha )= \rho (\beta ))\wedge (\forall \varPsi \in \rho (\alpha ),\textit{ suf}=\textit{ suf}(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in {\mathcal {D}}_{\beta },\textit{ suf }= \textit{suf}')\)
\(\Rightarrow \textit{SI}({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })\Rightarrow \textit{ SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\).
Conversely, assume that SI \(({\mathcal {D}}_{\alpha })=\) SI \((\mathcal {D}_{\beta })\). By a. (i) and b. (ii)–(iii) of Proposition 1 and because \(\forall \varPsi \in \rho (\alpha )\), length(suf \((\varPsi , \alpha ))+1 >0\), we have that \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), suf \(=\) suf \((\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta }, \textit{suf }\sqsupseteq \textit{suf}', \textit{ length}({\textit{suf}}')=\) length(suf), we have that suf \(= \textit{suf}'\), i.e., \({\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } \) and lastItemOf \((\alpha )=\) lastItemOf \((\beta )\). Thus, SI \(({\mathcal {D}}_{\alpha })=\) SI \((\mathcal {D}_{\beta })\Leftrightarrow {\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } \Rightarrow (\textit{SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\) and lastItemOf \((\alpha ) =\textit{lastItemOf}(\beta ))\).
Finally, assume that SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\). By a. (i) and b. (ii)–(iii) of Proposition 1 and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi , \alpha )) +1 >0\), we must have \(\rho (\alpha )= \rho (\beta )\), thus support \((\alpha )=\textit{ support}(\beta )\) and the first assertion b.(i) is proved. Moreover, \(\forall \varPsi \in \rho (\alpha )\), suf \((\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }\), suf \((\varPsi , \beta ) \in \mathcal {D}_{\beta } \) and remSize(suf \((\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))\).
Additionally, if lastItemOf \((\alpha ) =\) lastItemOf \((\beta )\), i.e., \(\alpha \) and \(\beta \) share the same last item, then \(\forall \varPsi \in \rho (\alpha )\), length(suf \((\varPsi , \alpha ))=\) length(suf \((\varPsi , \beta ))\), so suf \((\varPsi , \alpha )=\textit{ suf}(\varPsi , \beta )\). Hence, SI \(({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })\).
-
b.
-
(ii).
For any itemset A, consider two s-extensions of \(\alpha \) and \(\beta \) with A: \(\gamma = \alpha {\diamondsuit }_{{\alpha }}A, \delta = \beta {\diamondsuit }_{{\alpha }}A\). Similarly, because of the above arguments, if SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\), then \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi ,\alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))\). Moreover, since \(\gamma \) and \(\delta \) share the same last event A, we obtain \(\rho (\gamma )= \rho (\delta )\) and \(\forall \varPsi \in \rho (\gamma )\), remSize(suf \((\varPsi , \gamma ))= \textit{remSize}(\textit{suf}(\varPsi ,\delta ))\). Therefore, \(\gamma \) and \(\delta \) also share the same last item ofA, so we have that length(suf \((\varPsi , \gamma ))=\textit{ length}(\textit{suf}(\varPsi ,\delta ))\). Hence, SI \(({\mathcal {D}}_{\gamma }) =\textit{ SI}(\mathcal {D}_{\delta })\) and \({\mathcal {D}}_{\gamma } = \mathcal {D}_{\delta }\).
-
(iii).
In addition, assume that lastEventOf \((\alpha ) =\) lastEventOf \((\beta )= A\), and for any itemset B, such that \(A \prec _{alp} B \)(which means that all items of A are always preceeding all items of Baccording to the total order relation \(\prec _{alp})\), we consider two i-extensions of \(\alpha \) and \(\beta \) with B: \(\gamma ' = \alpha {\diamondsuit }_{{i}}B, \delta ' = \beta {\diamondsuit }_{{i}}B\). Then, the equality of SE \(({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })\) holds only if \(\rho (\alpha )= \rho (\beta )\) and \(\forall \varPsi \in \rho (\alpha )\), remSize(suf \((\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi ,\beta ))\). Moreover, since \(A \prec _{alp} B \) and lastEventOf \((\alpha )=\) lastEventOf \((\beta )= A\), then lastEventOf \((\gamma ') = \textit{lastEventOf }(\delta ') = A \cup B\), i.e., \(\gamma '\) and \(\delta '\) share the same last itemset \(A \cup B \) and also the same last item of \(A\cup B\). Therefore, \(\rho (\gamma ') = \rho (\delta ')\) and \(\forall \varPsi \in \rho (\gamma ')\), remSize(suf \((\varPsi \), \(\gamma '))= \textit{remSize}(\textit{suf}(\varPsi , \delta '))\) and length(suf \((\varPsi , \gamma '))=\textit{ length}(\textit{suf}(\varPsi ,\delta '))\). Thus, SI \((\mathcal {D}_{\gamma '})=\textit{ SI}(\mathcal {D}_{\delta '})\) and \(\mathcal {D}_{\gamma '}= \mathcal {D}_{\delta '}\).
-
(ii).
\(\square \)
1.4 Appendix 4: Proof of Corollary 1
Note that for the two cases, we always have \(\mathcal {D}_{\gamma } = \mathcal {D}_{\delta }\) and lastEventOf \((\gamma ) =\) lastEventOf \((\delta )\). Hence, all i-extensions and s-extensions \(\gamma '\) of \(\gamma \) and \(\delta '\) of \(\delta \)with the same itemset also have the same last event, and thus, they have the same PDB, \(\mathcal {D}_{\gamma '} = \mathcal {D}_{\delta '}\). The same situation then also occurs for their next descendants. \(\square \)
1.5 Appendix 5: Proof of Corollaries 2 and 3
Corollaries 2–3 a. (i)–(iii). These assertions are true because (1.1), path \((q)\sqsubset \) i_ new, path \((u)\sqsubset \) i_ new, and path \((r)\sqsubset \) i_ new, lastEventOf(path \((r))= \textit{lastEventOf}(\textit{i}\_\textit{ new}) =\{ q,u\}\) and (2.1).
Corollaries 2–3 b. These assertions are also true since path \((v)\sqsubset \) s_ new, (1.1) and the last events of s_ new and path(v) are identical to v and hence (2.1). \(\square \)
Rights and permissions
About this article
Cite this article
Le, B., Duong, H., Truong, T. et al. FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl Inf Syst 53, 71–107 (2017). https://doi.org/10.1007/s10115-017-1032-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1032-6