FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Le, Bac; Duong, Hai; Truong, Tin; Fournier-Viger, Philippe

doi:10.1007/s10115-017-1032-6

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Regular Paper
Published: 17 February 2017

Volume 53, pages 71–107, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Bac Le¹,
Hai Duong²,
Tin Truong² &
…
Philippe Fournier-Viger³

535 Accesses
3 Altmetric
Explore all metrics

Abstract

Mining frequent sequences in sequential databases are highly valuable for many real-life applications. However, in several cases, especially when databases are huge and when low minimum support thresholds are used, the cardinality of the result set can be enormous. Consequently, algorithms for discovering frequent sequences exhibit poor performance, showing an important increase in execution time, memory consumption and storage space usage. To address this issue, researchers have studied the tasks of mining frequent closed and generator sequences, as they provide several benefits when compared to the set of frequent sequences. One of the most important benefits is that the cardinalities of frequent closed and generator sequences are generally much less than the cardinality of frequent sequences. Hence, humans find it more convenient to analyze the information provided by closed and generator sequences. Moreover, it was shown that frequent closed sequences have the advantage of being lossless, and they thus preserve information about the frequency of all frequent subsequences, while generator sequences can provide higher accuracy for sequence classification tasks since they are the smallest patterns that characterize groups of sequences. Besides, frequent closed sequences can be combined with generators to produce non-redundant sequential rules and recover the complete set of frequent sequences and their frequencies. This paper proposes two novel algorithms named FCloSM and FGenSM to mine frequent closed and generator sequences efficiently. These algorithms are based on new pruning conditions called extended early elimination (3E) and early pruning techniques named EPCLO and EPGEN, designed to identify non-closed and non-generator patterns early. Based on these techniques, two local pruning strategies called LPCLO and LPGEN are proposed to eliminate non-closed and non-generator patterns more efficiently at two successive levels of the prefix search tree without performing subsequence relation checking. These theoretical results, which are the basis of FCloSM and FGenSM, are mathematically proved and are shown to be more general than those presented in previous work. Extensive experiments show that FCloSM and FGenSM are one to two orders of magnitude faster than the state-of-the-art algorithms for discovering frequent closed sequences (CloSpan, BIDE, ClaSP and CM-ClaSP) and for mining frequent generators (FEAT, FSGP and VGEN), and that FCloSM and FGenSM consume much less memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFS-SubSC: an efficient algorithm for mining frequent sequences with sub-sequence constraint

Article 11 June 2024

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

Article 20 October 2015

Fast generation of sequential patterns with item constraints from concise representations

Article 08 November 2019

References

Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, ICDE ’95. IEEE Computer Society, Washington, DC, pp 3–14
Agustina T, Sitanggang IS (2015) Sequential patterns for hotspot occurrences based weather data using Clospan algorithm. In: 3rd international conference on adaptive and intelligent agroindustry (ICAIA). IEEE, pp 245–249
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02. ACM, New York, NY, pp 429–435
Baralis E, Bruno G, Chiusano S, Domenici VC, Mahoto NA, Petrigni C (2010) Analysis of medical pathways by means of frequent closed sequences. In: International conference on Knowledge-based and intelligent information and engineering systems. Springer, Berlin, Heidelberg, pp 418–425
Chen Z, El-Nasr MS, Canossa A, Badler J, Tignor S, Colvin R (2015) Modeling individual differences through frequent pattern mining on role-playing game actions. In: Eleventh artificial intelligence and interactive digital entertainment conference, AIIDE 2015
Fournier-Viger P, Nkambou R, Tseng VS (2011) RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11. ACM, New York, NY, pp 956–961
Fournier-Viger P, Faghihi U, Nkambou R, Mephu Nguifo E (2012) CMRULES: an efficient algorithm for mining sequential rules common to several sequences. Knowl Based Syst 25(1):63–76
Article Google Scholar
Fournier-Viger P, Nkambou R, Mephu Nguifo E, Mayers A, Faghihi U (2013) A multi-paradigm intelligent tutoring system for robotic arm training. IEEE Trans Learn Technol 6(4):364–377
Article Google Scholar
Fournier-Viger P, Wu CW, Tseng VS (2013) Mining maximal sequential patterns without candidate maintenance. In: Proceedings of 9th international conference on advanced data mining and applications, ADMA’13. Springer, Hangzhou, China, pp 169–180
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia conference on knowledge discovery and data mining, PAKDD’2014. pp 40–52
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
MATH Google Scholar
Fournier-Viger P, Gomariz A, Šebek M, Hlosta M (2014) VGEN: Fast vertical mining of sequential generator patterns. In: Proceedings of 16th international conference on data warehousing and knowledge discovery, DWKD’14. Springer International Publishing, Munich, Germany, pp 476–488
Gao C, Wang J, He Y, Zhou L (2008) Efficient mining of frequent sequence generators. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, NY, pp 1051–1052
García-Rudolph A, Gibert K (2016) Understanding effects of cognitive rehabilitation under a knowledge discovery approach. Eng Appl Artif Intell 55:165–185
Article Google Scholar
Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Proceedings of 17th Pacific-Asia conference, PAKDD ’13. Springer, Gold Coast, Australia, pp 50–61
Gomez M, Rouvoy R, Adams B, Seinturier L (2016) Reproducing context-sensitive Crashes of mobile apps using crowdsourced monitoring. In: Proceedings of the 3rd IEEE/ACM international conference on mobile software engineering and systems (MOBILESoft’16). ACM, New York, NY, pp 88–99
Grunwald P, Myung IJ, Pitt M (2005) Advances in minimum description length: theory and applications. MIT Press, London
Google Scholar
Hai D, Tin T, Bay V (2014) An efficient method for mining frequent itemsets with double constraints. Int J Eng Appl Artif Intell (EAAI) 27:148–154
Article Google Scholar
Harms SK, Deogun J, Tadesse T (2002) Discovering sequential association rules with constraints and time lags in multiple sequences. In: Proceedings of 13th international symposium, ISMIS 2002. Springer, Lyon, France, pp 432–441
Huang H, Yao L, Tsai CY (2016) Transportation service quality improvement through closed sequential pattern mining approach. Cybern Inf Technol 16(3):185–194
Google Scholar
Ignatov DI, Mitrofanova E, Muratova A, Gizdatullin D (2015) Pattern mining and machine learning for demographic sequences. In: International conference on knowledge engineering and the semantic web. Springer International Publishing, pp 225–239
Jorritsma W, Cnossen F, Dierckx RA, Oudkerk M, Van Ooijen PM (2016) Pattern mining of user interaction logs for a post-deployment usability evaluation of a radiology PACS client. Int J Med Inf 85(1):36–42
Article Google Scholar
Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on Artificial intelligence, AAAI ’06. ACM, pp 409–414
Lo D, Khoo SC, Li J (2008) Mining and ranking generators of sequential patterns. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM ’08. SIAM, pp 553–564
Lo D, Khoo SC, Wong L (2011) Non-redundant sequential rules: theory and algorithm. Inf Syst 34(4):438–453
Google Scholar
Luo C, Chung S (2005) Efficient mining of maximal sequential patterns using multiple samples. In: SIAM international conference on data mining (SDM’05), pp 415–426
Minh-Thai T, Bac L, Bay V, Hong T (2016) Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques. Int J Artif Intell 45(2):333–342
Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, ICDT ’12. ACM, London, UK, pp 398–416
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. J IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Pham TT, Luo J, Hong TP, Vo B (2012) MSGPs: a novel algorithm for mining sequential generator patterns. In: Proceedings of 4th international conference on computational collective intelligence, ICCCI 2012. Springer, Ho Chi Minh City, Vietnam, pp 393–401
Pham TT, Luo J, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Int J Innov Comput Inf Control 9:4811–4824
Google Scholar
Pham TT, Luo J, Hong TP, Vo B (2014) An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Eng Appl Artif Intell (EAAI) 32:88–99
Article Google Scholar
Rahman A, Xu Y, Radke K, Foo E (2016) Finding anomalies in SCADA logs using rare sequential pattern mining. In: International conference on network and system security. Springer International Publishing, pp 499–506
Saraswati A, Chang CF, Ghose A, Dam HK (2015) Learning relationships between the business layer and the application layer in ArchiMate models. In: International conference on conceptual modeling. Springer International Publishing, pp 499–513
Schweizer D, Zehnder M, Wache H, Witschel HF, Zanatta D, Rodriguez M (2015) Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants. In: IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, pp 1123–1129
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT ’96. ACM, pp 3–17
Truong T, Tran A (2010) Structure of set of association rules based on concept lattice. In: Advances in intelligent information and database systems, AIIDS ’10. Springer, pp 217–227
Truong T, Duong H, Hoang NTN (2016) Structure of frequent itemsets with extended double constraints. Vietnam J Comput Sci 3(2):119–135
Article Google Scholar
Wang J, Han J, Li Chun (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Article Google Scholar
Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, pp 166–177
Yi S, Zhao T, Zhang Y, Ma S, Che Z (2011) An effective algorithm for mining sequential generators. Proc Eng 15:3653–3657
Article Google Scholar
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Article MATH Google Scholar
Zhang W (2016) Learning from access logs to mitigate insider threats. Doctoral dissertation, Vanderbilt University
Zhao Y, Wang G, Li Y, Wang Z (2011) Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules. In: IEEE 11th international conference on data mining. IEEE, pp 972–981
Zhao Y, Li Y, Yin Y, Sheng G (2015) Finding top-k covering irreducible contrast sequence rules for disease diagnosis. Comput Math Methods Med 2015:353146. doi:10.1155/2015/353146
Zhao Y, Wang G, Yin Y, Li Y, Wang Z (2016) Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput Appl 27(1):155–166
Article Google Scholar

Download references

Acknowledgements

This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.05-2015.07.

Author information

Authors and Affiliations

HCMC University of Natural Sciences, Ho Chi Minh, Vietnam
Bac Le
Department of Mathematics and Computer Science, University of Dalat, Dalat, Vietnam
Hai Duong & Tin Truong
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen, Shenzhen, China
Philippe Fournier-Viger

Authors

Bac Le
View author publications
You can also search for this author inPubMed Google Scholar
Hai Duong
View author publications
You can also search for this author inPubMed Google Scholar
Tin Truong
View author publications
You can also search for this author inPubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bac Le.

Appendices

1.1 Appendix 1

To prove Proposition 1 and Theorem 1, we first consider the relation between the measures remSize and length of suffixes stated in Lemma 1.

Lemma 1

(Relation between remSize and length of suffixes).

If $\alpha \sqsubset \beta \sqsubseteq \varPsi , \textit{suf} = \textit{suf}(\varPsi , \alpha ), {suf}{'} = \textit{suf}(\varPsi , \beta )$, then

a.
$\textit{suf }\sqsupseteq \textit{suf}'$.
b.
$\textit{ suf }= \textit{suf}' \Leftrightarrow \textit{ length}(\textit{suf}) =\textit{ length}(\textit{suf}')$
$$\begin{aligned} \Leftrightarrow \textit{remSize}(\textit{suf}) =\textit{ remSize}(\textit{suf}')\text { and }{} \textit{ lastItemOf}(\alpha ) = \textit{lastItemOf}(\beta ). \end{aligned}$$

Note that if remSize(suf) = remSize $(\textit{suf}')$, then, in the case of 1$-{\mathcal{SDB}}$s, the last condition in Lemma 1.b, lastItemOf $(\alpha ) = \textit{lastItemOf}(\beta )$, is always satisfied, but for $n-{\mathcal{SDB}}$s, this condition can be false, i.e., it is necessary to check.

Proof

Let $\delta = \textit{prefix}(\varPsi , \alpha )$ and $\delta ' =\textit{ prefix}(\varPsi ,\beta )$.

a.
It is clear that $\delta \sqsubseteq \delta '$. Thus, suf $\sqsupseteq \textit{suf}'$.
b.
The first equivalence is obviously true. Now, we prove the second equivalence.

$\bullet $ “$\Rightarrow $”: If length(suf) = length $({\textit{suf}}')$, then length $(\delta )$ = length $(\delta ')$. Hence, $\delta = \delta '$, and thus lastItemOf $(\delta )$ = lastItemOf $(\delta ')$ and size $(\delta )= \textit{size}(\delta ')$. Since remSize(suf) = size(S) - size $(\delta )$ and lastItemOf $(\alpha ) =$ lastItemOf $(\delta )$, lastItemOf $(\beta ) = \textit{lastItemOf}(\delta ')$, we obtain remSize $(\textit{suf})=\textit{ remSize}(\textit{suf}')$ and lastItemOf $(\alpha )= \textit{lastItemOf}(\beta )$.

$\bullet $ “$\Leftarrow $”: If remSize $(\textit{suf}) =\textit{ remSize}(\textit{suf}')$, then size $(\delta ) = \textit{size}(\delta ') = k$ and the $(k-$1) first events of $\delta $ and $\delta '$ are identical, so lastEventOf $(\delta ) \subseteq \textit{ lastEvent}(\delta ')$. Moreover, because $\alpha $and $\beta $ share the same last item, lastItemOf $(\alpha )= \textit{lastItemOf}(\beta )$. Thus, lastItemOf $(\delta ) = \textit{lastItemOf}(\delta ')$ and lastEventOf $(\delta )=\textit{ lastEvent}(\delta ')$. Hence, $\delta = \delta '$, $\textit{suf }= \textit{suf}'$ and length(suf) = length $({\textit{suf}}')$. $\square $

1.2 Appendix 2

To prove Theorem 1, we need to demonstrate the anti-monotonicity of the operators: $\rho $, support, PDB, SI and SE as described in Proposition 1.

Proposition 1

(Properties of SE and SI in PDBs). For two arbitrary sequences $\alpha $ and $\beta $ such that $\alpha \sqsubseteq \beta $, the following assertions hold.

a.
1. (i).
  $\rho (\alpha ) \supseteq \rho (\beta )$, support $(\alpha )\geqslant \textit{ support}(\beta )$. (anti-monotonicity of the $\rho $ and support operators)
2. (ii).
  $\rho (\alpha )= \rho (\beta ) \Leftrightarrow \textit{ support}(\alpha )=\textit{ support}(\beta )$.
3. (iii).
  $\textit{support}(\alpha ) =|{\mathcal {D}}_{\alpha }|$.
b.
(Anti-monotonicity of PDB, SI, SE)
1. (i).
  ${\mathcal {D}}_{\alpha } \sqsupseteq {\mathcal {D}}_{\beta }$.
2. (ii).
  SI $({\mathcal {D}}_{\alpha }) \geqslant \textit{SI}({\mathcal {D}}_{\beta })$ and $\textit{SE}({\mathcal {D}}_{\alpha }) \geqslant \textit{ SE}({\mathcal {D}}_{\beta })$.

Proof

a.
These assertions are obviously true by the definitions of the $\rho $ and support operators.
b.
1. (i).
  $\varPsi \in \rho (\alpha )\Leftrightarrow (\varPsi \in D$ and $\alpha \sqsubseteq \varPsi )$
  
  $\Leftrightarrow (\varPsi \in {\mathcal {D}}$ and $\varPsi = \delta \diamondsuit \; suf, \delta = \hbox {prefix}(\varPsi , \alpha )$, $\textit{suf }=\textit{ suf}(\varPsi , \alpha ))$
  
  $\Leftrightarrow (suf = suf(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }$, with $\varPsi \in {\mathcal {D}}$ and $\varPsi = \delta \diamondsuit $ suf, p= prefix $(\varPsi ,\alpha ))$ or support $(\alpha ) =|\rho (\alpha )| =|{\mathcal {D}}_{\alpha }|$.
2. (ii).
  $\forall \varPsi \in \rho (\beta ), \textit{suf}' =$ suf $(\varPsi , \beta )\in \mathcal {D}_{\beta }$, we have $(\varPsi = \delta '\diamondsuit \textit{suf}')\wedge (\delta ' =$ prefix $(\varPsi , \beta ))$
  
  $\Leftrightarrow (\varPsi = \delta '\diamondsuit \textit{suf}') \wedge (\beta \sqsubseteq \delta ')\wedge (\not \exists \delta '': S = \delta ''\diamondsuit r' \wedge \beta \sqsubseteq \delta ''\sqsubset \delta ')$
  
  $\Rightarrow (\varPsi \in \rho (\alpha ))\wedge (\varPsi = \delta ' \diamondsuit \textit{suf}')\wedge (\alpha \sqsubseteq \delta ')$, because $\alpha \sqsubseteq \beta \sqsubseteq \delta ' \sqsubseteq \varPsi $.
There are two possible cases:
- If $\not \exists \delta ''$: $\varPsi = \delta ''\diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta '' \sqsubset \delta '$, then $\delta ' =\textit{ prefix}(\varPsi , \alpha )$, and thus $\textit{suf}' =\textit{suf}(\varPsi , \alpha )\in {\mathcal {D}}_{\alpha }$.
- Otherwise, $\exists \delta '': \varPsi = \delta '' \diamondsuit \gamma ' \wedge \alpha \sqsubseteq \delta ''\sqsubset \delta '$, then $\delta ''$ is a prefix (of $\varPsi $) containing $\alpha $. We call $\gamma =\textit{ prefix}(\varPsi ,\alpha )$ the smallest prefix (of $\varPsi )$ containing $\alpha $, i.e., $\exists \textit{suf }=\textit{ suf}(\varPsi , \alpha ): (\varPsi = \gamma \diamondsuit \textit{suf }) \wedge (\alpha \sqsubseteq \gamma \sqsubseteq \delta ''\sqsubset \delta ')$, thus $\textit{suf}'\sqsubset $ suf. Therefore, suf $\in {\mathcal {D}}_{\alpha } $ and $\textit{suf}'\sqsubset \textit{ suf}$. Finally, in all cases, $\varPsi \in \rho (\alpha ), \exists $ suf $(\varPsi ,\alpha ) \in {\mathcal {D}}_{{\alpha }} $ and $\textit{suf}'\sqsubseteq \textit{ suf}$. Hence, $\mathcal {D}_{\beta } \sqsubseteq {\mathcal {D}}_{\alpha }$.
(iii).
By a. (i) and b. (ii), $\forall \varPsi \in \rho (\beta ) \subseteq \rho (\alpha ), \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta } \textit{suf }=\textit{ suf}(\varPsi ,\alpha ) \in {\mathcal {D}}_{\alpha }$ and $\textit{suf}' \sqsubseteq $ suf, it follows that length $({\textit{suf}}')\leqslant \textit{ length}(\textit{suf})$ and remSize $({\textit{suf}}')\leqslant \textit{ remSize}(\textit{suf})$. Thus, $\textit{SI}({\mathcal {D}}_{\beta }) \leqslant \textit{ SI}({\mathcal {D}}_{\alpha })$ and SE $(\mathcal {D}_{\beta })\leqslant \textit{ SE}({\mathcal {D}}_{\alpha })$.

$\square $

1.3 Appendix 3: Proof of Theorem 1

a.
${\mathcal {D}}_{\alpha }= \mathcal {D}_{\beta }$

$\Leftrightarrow (\rho (\alpha )= \rho (\beta ))\wedge (\forall \varPsi \in \rho (\alpha ),\textit{ suf}=\textit{ suf}(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in {\mathcal {D}}_{\beta },\textit{ suf }= \textit{suf}')$

$\Rightarrow \textit{SI}({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })\Rightarrow \textit{ SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })$.

Conversely, assume that SI $({\mathcal {D}}_{\alpha })=$ SI $(\mathcal {D}_{\beta })$. By a. (i) and b. (ii)–(iii) of Proposition 1 and because $\forall \varPsi \in \rho (\alpha )$, length(suf $(\varPsi , \alpha ))+1 >0$, we have that $\rho (\alpha )= \rho (\beta )$ and $\forall \varPsi \in \rho (\alpha )$, suf $=$ suf $(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }, \textit{suf}' =\textit{ suf}(\varPsi , \beta ) \in \mathcal {D}_{\beta }, \textit{suf }\sqsupseteq \textit{suf}', \textit{ length}({\textit{suf}}')=$ length(suf), we have that suf $= \textit{suf}'$, i.e., ${\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } $ and lastItemOf $(\alpha )=$ lastItemOf $(\beta )$. Thus, SI $({\mathcal {D}}_{\alpha })=$ SI $(\mathcal {D}_{\beta })\Leftrightarrow {\mathcal {D}}_{\alpha } = \mathcal {D}_{\beta } \Rightarrow (\textit{SE}({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })$ and lastItemOf $(\alpha ) =\textit{lastItemOf}(\beta ))$.

Finally, assume that SE $({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })$. By a. (i) and b. (ii)–(iii) of Proposition 1 and $\forall \varPsi \in \rho (\alpha )$, remSize(suf $(\varPsi , \alpha )) +1 >0$, we must have $\rho (\alpha )= \rho (\beta )$, thus support $(\alpha )=\textit{ support}(\beta )$ and the first assertion b.(i) is proved. Moreover, $\forall \varPsi \in \rho (\alpha )$, suf $(\varPsi , \alpha ) \in {\mathcal {D}}_{\alpha }$, suf $(\varPsi , \beta ) \in \mathcal {D}_{\beta } $ and remSize(suf $(\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))$.

Additionally, if lastItemOf $(\alpha ) =$ lastItemOf $(\beta )$, i.e., $\alpha $ and $\beta $ share the same last item, then $\forall \varPsi \in \rho (\alpha )$, length(suf $(\varPsi , \alpha ))=$ length(suf $(\varPsi , \beta ))$, so suf $(\varPsi , \alpha )=\textit{ suf}(\varPsi , \beta )$. Hence, SI $({\mathcal {D}}_{\alpha })=\textit{ SI}(\mathcal {D}_{\beta })$.

b.
1. (ii).
  For any itemset A, consider two s-extensions of $\alpha $ and $\beta $ with A: $\gamma = \alpha {\diamondsuit }_{{\alpha }}A, \delta = \beta {\diamondsuit }_{{\alpha }}A$. Similarly, because of the above arguments, if SE $({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })$, then $\rho (\alpha )= \rho (\beta )$ and $\forall \varPsi \in \rho (\alpha )$, remSize(suf $(\varPsi ,\alpha ))=\textit{ remSize}(\textit{suf}(\varPsi , \beta ))$. Moreover, since $\gamma $ and $\delta $ share the same last event A, we obtain $\rho (\gamma )= \rho (\delta )$ and $\forall \varPsi \in \rho (\gamma )$, remSize(suf $(\varPsi , \gamma ))= \textit{remSize}(\textit{suf}(\varPsi ,\delta ))$. Therefore, $\gamma $ and $\delta $ also share the same last item ofA, so we have that length(suf $(\varPsi , \gamma ))=\textit{ length}(\textit{suf}(\varPsi ,\delta ))$. Hence, SI $({\mathcal {D}}_{\gamma }) =\textit{ SI}(\mathcal {D}_{\delta })$ and ${\mathcal {D}}_{\gamma } = \mathcal {D}_{\delta }$.
2. (iii).
  In addition, assume that lastEventOf $(\alpha ) =$ lastEventOf $(\beta )= A$, and for any itemset B, such that $A \prec _{alp} B $(which means that all items of A are always preceeding all items of Baccording to the total order relation $\prec _{alp})$, we consider two i-extensions of $\alpha $ and $\beta $ with B: $\gamma ' = \alpha {\diamondsuit }_{{i}}B, \delta ' = \beta {\diamondsuit }_{{i}}B$. Then, the equality of SE $({\mathcal {D}}_{\alpha })=\textit{ SE}(\mathcal {D}_{\beta })$ holds only if $\rho (\alpha )= \rho (\beta )$ and $\forall \varPsi \in \rho (\alpha )$, remSize(suf $(\varPsi , \alpha ))=\textit{ remSize}(\textit{suf}(\varPsi ,\beta ))$. Moreover, since $A \prec _{alp} B $ and lastEventOf $(\alpha )=$ lastEventOf $(\beta )= A$, then lastEventOf $(\gamma ') = \textit{lastEventOf }(\delta ') = A \cup B$, i.e., $\gamma '$ and $\delta '$ share the same last itemset $A \cup B $ and also the same last item of $A\cup B$. Therefore, $\rho (\gamma ') = \rho (\delta ')$ and $\forall \varPsi \in \rho (\gamma ')$, remSize(suf $(\varPsi $, $\gamma '))= \textit{remSize}(\textit{suf}(\varPsi , \delta '))$ and length(suf $(\varPsi , \gamma '))=\textit{ length}(\textit{suf}(\varPsi ,\delta '))$. Thus, SI $(\mathcal {D}_{\gamma '})=\textit{ SI}(\mathcal {D}_{\delta '})$ and $\mathcal {D}_{\gamma '}= \mathcal {D}_{\delta '}$.

$\square $

1.4 Appendix 4: Proof of Corollary 1

Note that for the two cases, we always have $\mathcal {D}_{\gamma } = \mathcal {D}_{\delta }$ and lastEventOf $(\gamma ) =$ lastEventOf $(\delta )$. Hence, all i-extensions and s-extensions $\gamma '$ of $\gamma $ and $\delta '$ of $\delta $with the same itemset also have the same last event, and thus, they have the same PDB, $\mathcal {D}_{\gamma '} = \mathcal {D}_{\delta '}$. The same situation then also occurs for their next descendants. $\square $

1.5 Appendix 5: Proof of Corollaries 2 and 3

Corollaries 2–3 a. (i)–(iii). These assertions are true because (1.1), path $(q)\sqsubset $ i_ new, path $(u)\sqsubset $ i_ new, and path $(r)\sqsubset $ i_ new, lastEventOf(path $(r))= \textit{lastEventOf}(\textit{i}\_\textit{ new}) =\{ q,u\}$ and (2.1).

Corollaries 2–3 b. These assertions are also true since path $(v)\sqsubset $ s_ new, (1.1) and the last events of s_ new and path(v) are identical to v and hence (2.1). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, B., Duong, H., Truong, T. et al. FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl Inf Syst 53, 71–107 (2017). https://doi.org/10.1007/s10115-017-1032-6

Download citation

Received: 28 June 2016
Revised: 01 November 2016
Accepted: 04 February 2017
Published: 17 February 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10115-017-1032-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MFS-SubSC: an efficient algorithm for mining frequent sequences with sub-sequence constraint

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

Fast generation of sequential patterns with item constraints from concise representations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Appendix 1

Lemma 1

Proof

1.2 Appendix 2

Proposition 1

Proof

1.3 Appendix 3: Proof of Theorem 1

1.4 Appendix 4: Proof of Corollary 1

1.5 Appendix 5: Proof of Corollaries 2 and 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now