Skip to main content

An Explicit Relationship Between Sequential Patterns and Their Concise Representations

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11932))

Included in the following conference series:

Abstract

Mining sequential patterns in a sequence database (SDB) is an important and useful data mining task. Most existing algorithms for performing this task directly mine the set \( {\mathcal{F}\mathcal{S}} \) of all frequent sequences in an SDB. However, these algorithms often exhibit poor performance on large SDBs due to the enormous search space and cardinality of \( {\mathcal{F}\mathcal{S}} \). In addition, constraint-based mining algorithms relying on this approach must read an SDB again when a constraint is changed by the user. To address this issue, this paper proposes a novel approach for generating \( {\mathcal{F}\mathcal{S}} \) from the two sets of frequent closed sequences \( \left( {{\mathcal{F}\mathcal{C}\mathcal{S}}} \right) \) and frequent generator sequences \( ({\mathcal{F}\mathcal{G}\mathcal{S}}) \), which are concise representations of \( {\mathcal{F}\mathcal{S}} \). The proposed approach is based on a novel explicit relationship between \( {\mathcal{F}\mathcal{S}} \) and these two sets. This relationship is the theoretical basis for a novel efficient algorithm named GFS-CR that directly enumerates \( {\mathcal{F}\mathcal{S}} \) from \( {\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( {\mathcal{F}\mathcal{G}\mathcal{S}} \) rather than mining them from an SDB. Experimental results show that GFS-CR outperforms state-of-the-art algorithms in terms of runtime and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    iff means “if and only if”.

  2. 2.

    \( \sum\nolimits_{1 \leqslant i \leqslant n} {A_{i} } \) denotes the union of disjoint sets \( A_{1} ,A_{2} , \ldots , A_{n} \), i.e. \( A_{i} \cap A_{j} = \emptyset \), \( \forall i \ne j \), \( 1 \leqslant i \leqslant ,j \leqslant n \).

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of 11th International Conference on Data Engineering, pp. 3–14 (1995)

    Google Scholar 

  2. Tran, A., Duong, H., Truong, T., Le, B.: Mining frequent itemsets with dualistic constraints. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 807–813. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_77

    Chapter  Google Scholar 

  3. Anh, T., Tin, T., Bac, L.: Structures of frequent itemsets and classifying structures of association rule set by order relations. Intell. Inf. Database Syst. 8(4), 295–323 (2014)

    Google Scholar 

  4. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435 (2002)

    Google Scholar 

  5. Bac, L., Hai, D., Tin, T., Fournier-Viger, P.: FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 53(1), 71–107 (2017)

    Article  Google Scholar 

  6. Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 40–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_4

    Chapter  Google Scholar 

  7. Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. Mach. Learn. Res. 15(1), 3389–3393 (2014)

    MATH  Google Scholar 

  8. Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: VGEN: fast vertical mining of sequential generator patterns. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 476–488. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_42

    Chapter  Google Scholar 

  9. Gomariz, A., Campos, M., Marin, R., Goethals, B.: ClaSP: an efficient algorithm for mining frequent closed sequences. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7818, pp. 50–61. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37453-1_5

    Chapter  Google Scholar 

  10. Hai, D., Tin, T., Bac, L.: Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions. Eng. Appl. Artif. Intell. 67, 197–210 (2018)

    Article  Google Scholar 

  11. Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Eng. Appl. Artif. Intell. 27, 148–154 (2014)

    Article  Google Scholar 

  12. Pei, J., et al.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  13. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140

    Chapter  Google Scholar 

  14. Tin, T., Hai, D., Ngan, H.N.T.: Structure of frequent itemsets with extended double constraints. Vietnam J. Comput. Sci. 3(2), 119–135 (2016)

    Article  Google Scholar 

  15. Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. Knowl. Data Eng. 19(8), 1042–1056 (2007)

    Article  Google Scholar 

  16. Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)

    Google Scholar 

  17. Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2017.300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tin Truong .

Editor information

Editors and Affiliations

Appendices

Appendices

1.1 Appendix 1: Proof of Proposition 1

  1. (i)

    \( {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \)”: Consider any \( \alpha \in {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right):\gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma \). Without loss of generality, we can assume that \( \gamma = E_{1} \to E_{2} \to \ldots \to E_{p} \), \( \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} \), \( \sigma = E_{1}^{'} \to E_{2}^{'} \to \ldots \to E_{q}^{'} \), \( F_{i} \subseteq E_{i}^{'} \), ∀i = 1, …, q and \( \exists lp = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \) \( \in LP(\alpha ,\gamma ) \subseteq LP(\sigma ,\gamma ) \), with \( 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \leqslant q \): \( E_{k} \subseteq F_{jk} \subseteq E_{jk}^{'} \), ∀k = 1,…, p. Set \( d_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {F_{i} ,} & { if\;i \notin lp} \\ {F_{i} \backslash E_{k} , } & {if\;i = j_{k} \in lp} \\ \end{array} } \right. \), \( D_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {E^{{\prime }}_{i} , } & {if\; i \notin lp} \\ {E^{{\prime }}_{i} \backslash E_{k} , } & {if \;i = j_{k} \in lp} \\ \end{array} } \right. \), δ(lp) \( \mathop = \limits^{\text{def}} d_{1} \to d_{2} \to \ldots \to d_{q} \), \( E_{i}^{{ \sim }} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {\emptyset , \quad \quad if\;i \notin lp} \hfill \\ {E_{k} , \;if\; i = j_{k} \in lp} \hfill \\ \end{array} } \right. \), ∀i = 1, …, q, Ex(γ, lp) \( \mathop = \limits^{\text{def}} E_{1}^{{ \sim }} \to E_{2}^{{ \sim }} \to \ldots \to E_{q}^{{ \sim }} \), then \( d_{i} \subseteq D_{i} \) and \( F_{i} = E_{i}^{{ \sim }} + d_{i} \), ∀i = 1,…, q, so \( \alpha \, = \,Ex(\gamma, lp) \oplus \varDelta \left( {lp} \right)\, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \).

  2. (ii)

    \( {\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \supseteq {\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \)’’: \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} '\left( {\sigma ,\gamma } \right) \), \( \alpha = Ex(\gamma , \, lp) \, \oplus \, \varDelta \left( {lp} \right) = F_{1} \to F_{2} \to \ldots \to F_{q} \), then \( F_{i} = E_{i}^{{ \sim }} + d_{i} \), \( d_{i} \subseteq D_{i} \), \( E_{i}^{{ \sim }} \subseteq F_{i} \subseteq E_{i}^{\sim} + D_{i} \subseteq E_{i}^{'} \), ∀i = 1,…, q. Thus, \( \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma \), \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}\left( {\sigma , \gamma } \right) \). □

1.2 Appendix 2: Proof of Theorem 1

  1. a.

    First, we prove that \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} \), \( \exists \sigma \in CloSet(\alpha ) \) such that \( ro\mathop = \limits^{\text{def}} \rho (\alpha ) = \rho (\sigma ) \), so \( \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( ro \in RO \), then FS(ro) =[σ]. Indeed, \( \beta \, \in \,FS\left( {ro} \right) \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,ro] \Leftrightarrow [\beta \, \in \,{\mathcal{F}\mathcal{S}}\, \wedge \,\rho (\beta )\, = \,\rho (\sigma )\, = \,ro] \Leftrightarrow \beta \in [\sigma ] \).

  2. (i)

    \( {\mathcal{F}\mathcal{S}}\, \subseteq \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \)”: \( \forall \alpha \, \in \,{\mathcal{F}\mathcal{S}} \), \( \exists \sigma \in CloSet( \alpha )\;{\text{ and}}\,ro\mathop = \limits^{\text{def}} \rho (\sigma )\, = \,\rho ( \alpha ) \). Then \( \alpha \in [\sigma ],supp(\sigma ) = supp( \alpha ) \geqslant ms \). Thus, \( \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}} \), roRO and \( \alpha \)FS(ro).

  3. (ii)

    \( \mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right)\, \subseteq \,{\mathcal{F}\mathcal{S}} \)”: \( \forall ro\, \in \,RO \) and \( \alpha \)FS(ro), then \( \exists \sigma \, \in \,{\mathcal{F}\mathcal{C}\mathcal{S}}:\,\alpha { \sqsubseteq }\sigma \) and \( ro = \rho (\sigma ) = \rho (\alpha ) \), so \( supp(\alpha ) = supp(\sigma ) \geqslant ms \) and \( \alpha \, \in \,{\mathcal{F}\mathcal{S}} \). Hence, \( {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \).

Since \( { \sim } \) is an equivalence relation, then different equivalence classes [σ] or FS(ro) are disjoint. Thus, \( {\mathcal{F}\mathcal{S}}\, = \,\mathop {\bigcup }\nolimits_{ro\, \in \,RO} FS\left( {ro} \right) \).

  1. b.

    First, we prove that all different subsets \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \), \( \forall \sigma_{i} \in CS\left( {ro} \right) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), are disjoint. Indeed, assume conversely that \( \exists \sigma_{m} \), \( \sigma_{i} \in CS\left( {ro} \right) \), \( \exists \gamma_{k} \in GenSet(\sigma_{m} ) \), \( \exists \gamma_{j} \in GenSet(\sigma_{i} ) \): (\( \sigma_{m} \ne \sigma_{i} \) or \( \gamma_{k} \ne \gamma_{j} \)) and \( \exists \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \cap \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} )( \ne \emptyset ) \). Consider the following two cases.

  1. (i)

    If \( \sigma_{m} \ne \sigma_{i} \), then without loss of generality, we can assume that m < i and \( \exists \gamma \in GenSet(\alpha ) \) (because \( GenSet(\alpha ) \, \ne \emptyset ) \). We have \( \gamma \, \in \,\mathcal{G}\mathcal{S} \), \( \gamma \;{ \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} \), \( \rho (\gamma ) = \rho (\alpha ) = \rho (\sigma_{i} ) \), because \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \). Then \( \gamma \in GenSet(\sigma_{i} ) \), \( \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma ) \) and \( \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma ) \). Moreover, since \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{m} ,\,\gamma_{k} )\, \subseteq \,{\mathcal{F}\mathcal{S}}(\sigma_{m} ,\,\gamma_{k} ) \), then \( \alpha { \sqsubseteq }\sigma_{m} \) with m < i. This contradicts the condition not(DCondC(\( \alpha \), \( \sigma_{i} \), CS(ρ(\( \sigma_{i} \))))) in \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma ) \).

  2. (ii)

    Otherwise, if \( \sigma_{m} \equiv \sigma_{i} \) and \( \gamma_{k} \ne \gamma_{j} \), then without loss of generality, we can assume that k < j. Since \( \alpha \, \in \,{\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{k} ) \subseteq {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\,\gamma_{k} ) \), then \( \alpha \;{ \sqsupseteq }\;\gamma_{k} \), with k < j. This also contradicts the condition not \( (DCondG(\alpha ,\sigma_{i} ,\gamma_{j} )) \) in \( {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\,\gamma_{j} ) \).

    Finally, we prove that \( FS\left( {ro} \right) = \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} , \gamma_{j} } \right) \), \( \forall ro \in RO \).

  3. (iii)

    “⊆”: \( \forall \alpha \in FS\left( {ro} \right) \), then \( \alpha \in {\mathcal{F}\mathcal{S}} \) and ρ(α) = ro. Since \( CloSet\left( \alpha \right) \ne \emptyset \) and \( GenSet\left( \alpha \right) \ne \emptyset \), [\( \exists \sigma_{i} \in CloSet(\alpha ) \), \( \exists \gamma_{j} \in GenSet(\alpha ):\sigma_{i} \in {\mathcal{C}\mathcal{S}},\gamma_{j} \in {\mathcal{G}\mathcal{S}} \), \( \gamma_{j} \; { \sqsubseteq }\;\alpha \;{ \sqsubseteq }\;\sigma_{i} \), \( \rho (\sigma_{i} ) = \rho (\gamma_{j} ) = \rho \left( \alpha \right)]^{{^{\left( * \right)} }} \), so \( \gamma_{j} \in GenSet(\sigma_{i} ) \), \( supp(\sigma_{i} ) = supp(\gamma_{j} ) = supp\left( \alpha \right) \geqslant ms \), \( \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} \) and \( \sigma_{i} \in CS\left( {ro} \right) \). Hence, \( \alpha \in {\mathcal{F}\mathcal{S}}(\sigma_{i} ,\gamma_{j} ) \). Without loss of generality, we can select \( \sigma_{i} \) and \( \gamma_{j} \) that are, respectively, the first closed and generator sequences satisfying the condition(*). Thus, \( FS\left( {ro} \right) \subseteq \mathop \sum \nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} \mathop \sum \nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) \).

  4. (iv)

    “⊇”: \( \forall \sigma_{i} \in CS\left( {ro} \right) \), \( \forall \gamma_{j} \in GenSet(\sigma_{i} ) \), \( \forall \alpha \in {\mathcal{F}\mathcal{S}}^{*} (\sigma_{i} ,\gamma_{j} ) \), we have \( \sigma_{i} \in {\mathcal{F}\mathcal{C}\mathcal{S}} \), \( \alpha { \sqsubseteq }\sigma_{i} \), \( \rho \left( \alpha \right) \, = \rho (\sigma_{i} ) = ro \), so \( supp\left( \alpha \right) = supp(\sigma_{i} ) \geqslant ms \), i.e. \( \alpha \in FS\left( {ro} \right) \) . Thus, \( {\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right) \subseteq FS\left( {ro} \right)\,{\text{and}}\,FS\left( {ro} \right) \supseteq \sum\nolimits_{{\sigma_{i} \in CS\left( {ro} \right)}} {\sum\nolimits_{{\gamma_{j} \in GenSet\left( {\sigma_{i} } \right)}} {{\mathcal{F}\mathcal{S}}^{*} \left( {\sigma_{i} ,\gamma_{j} } \right)} } \). □

1.3 Appendix 3: Proof of Theorem 2

  1. (a)

    It is clear that \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \subseteq { \mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) \). In the proof of the above three pruning cases, \( \forall \alpha \in {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma , \gamma } \right) \), if ∃i = 1, 2, 3: \( DCond_{i} \) is true, then α is a duplicate of some previously generated sequences. Thus, \( {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) \subseteq {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) and \( {\mathcal{F}\mathcal{S}} '\left( {\sigma , \gamma } \right) = {\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \).

  2. (b)

    We will prove the first assertion by contradiction. Assume conversely that there are two sequences \( \alpha = Ex(\gamma ,lp_{m} ) \) ⊕ \( \varDelta (lp_{m} ) \) and \( \beta = Ex(\gamma ,lp_{n} ) \oplus \), \( \varDelta (lp_{n} ) \) (in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \)) according to two different position lists in LP(σ, γ): \( lp_{m} \ne lp_{n} \), but α ≡ β, i.e., there exist two \( lp_{m} = \{ j_{1} ,j_{2} , \ldots ,j_{p} \} \), \( lp_{n} = \{ i_{1} ,i_{2} , \ldots ,i_{p} \} \in LP\left( {\sigma ,\gamma } \right) \) with m < n, \( 1 \leqslant j_{1} < j_{2} < \ldots j_{p} \geqslant q \), \( 1 \leqslant i_{1} < i_{2} < \ldots < i_{p} \leqslant q \), \( d(lp_{m} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E^{{\prime }}_{i} , \quad \quad if\, i \notin lp_{m} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = j_{k} \in lp_{m} } \hfill \\ \end{array} } \right. \), \( d_{i} \mathop = \limits^{\text{def}} d(lp_{n} )_{i} \mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}l} {E_{i}^{'} ,\quad \quad if \,i \notin lp_{n} } \hfill \\ {E^{{\prime }}_{i} \backslash E_{k} , \, if\, i = i_{k} \in lp_{n} } \hfill \\ \end{array} } \right. \), \( F_{i} \subseteq d\left( {lp_{m} } \right)_{i} \), \( F^{{\prime }}_{i} \subseteq d_{i} \), ∀i = 1,…, q, and two corresponding sequences \( \alpha = F_{1} \to F_{2} \to \ldots \to F_{q} \) and \( \beta = F_{1}^{'} \to F_{2}^{'} \to \ldots \to F^{{\prime }}_{q} \) that belong to \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) such that α ≡ β and \( DCond_{k} (lp_{m} ) \), \( DCond_{k} (lp_{n} ) \) are false, ∀k = 1, 2, 3.

After deleting all empty itemsets in α and β, we obtain \( \alpha = F_{{u_{1} }} \to F_{{u_{2} }} \to \ldots \to F_{{u_{N} }} \), \( \beta = F^{{\prime }}_{{v_{1} }} \to F^{{\prime }}_{{v_{2} }} \to \ldots \to F^{{\prime }}_{{v_{N} }} \) with N = size(α) = size(β), \( 1 \leqslant u_{1} < u_{2} < \ldots < u_{N} \leqslant q \), \( 1 \leqslant v_{1} < v_{2} < \ldots < v_{N} \leqslant q \) and \( F_{{u_{i} }} = F^{{\prime }}_{{v_{i} }} \ne \emptyset ,\forall i = 1, \ldots ,N \) (because α ≡ β). We set \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} \equiv \, 0 \), \( k_{p + 1} = N \) and \( 1 \leqslant k_{1} < k_{2} < \ldots < k_{p} \leqslant N \) : \( j_{r} = u_{{k_{r} }} \) (so \( F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \subseteq F^{{\prime }}_{{j_{r} }} \)), ∀r = 1, …, p.

For ∀r = 1, …, p + 1, \( \forall k = \left( {k_{r - 1} + 1} \right), \ldots , k_{r} \), then \( u_{{k_{r - 1} }} = j_{r - 1} < u_{k} \leqslant j_{r} \leqslant i_{r} \) and we prove that if {(\( v_{h} = u_{h} \), ∀h = 0, …, k − 1), \( \left( {v_{k - 1} < i_{r} } \right) \) and \( (\forall r^{{\prime }} = \, 0, \, ..,r - 1,j_{{r^{{\prime }} }} = i_{{r^{{\prime }} }} = u_{{k_{{r^{{\prime }} }} }} = v_{{k_{{r^{{\prime }} }} }} )\} \), (H(r, k)), then {\( v_{k} = u_{k} \), and if (rp) then [(\( v_{k} < i_{r} \), if \( k < k_{r} \)) and \( \left( {j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} } \right) \)]}, (C(r, k)).

Since \( i_{0} = j_{0} = u_{0} = v_{0} = k_{0} = 0 \) and \( i_{0} < i_{1} \), then the hypothesis H(1, 1) is always true. For ∀r = 1, …, p + 1, \( \forall k = \, (k_{r - 1} + 1), ..,k_{r} \), assume that the hypothesis H(r, k) is true.

  1. (i)

    First, we consider any k such that [\( k_{r - 1} + 1 \leqslant k < k_{r} \), for rp] or [\( k_{p} + 1 \leqslant k \leqslant N \), with r = p + 1]. If rp, then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} < j_{r} = u_{{k_{r} }} \) and \( v_{k} \leqslant i_{r} \), because \( F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset \) and \( v_{k - 1} < i_{r} \). Consider the following three cases.

    If (\( r \leqslant p \) and \( v_{k} = i_{r} \)): then \( j_{r - 1} < u_{k} < j_{r} \leqslant v_{k} = i_{r} \), \( E^{{\prime }}_{{u_{k} }} \supseteq F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} = F^{{\prime }}_{{i_{r} }} \supseteq E_{r} \), so \( u_{k} \in lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , u_{k} ,j_{r + 1} , \ldots ,j_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ u_{k} ,j_{2} , \ldots ,j_{p} \} \) if r = 1), \( lp_{pre} \prec lp_{m} \), FI(pre, m) = r, \( lp_{pre} \left[ r \right] \, = u_{k} \) and the \( u_{k}^{th} \) itemset \( \left( {F_{{u_{k} }} } \right) \) contains \( E_{r} \), so \( DCond_{2} (lp_{pre} ,lp_{\varvec{m}} ,u_{k} ) \) is true. Therefore, \( v_{k} < i_{r} \).

    If (\( u_{k} < v_{k} \) and (\( v_{k} < i_{r} \), if \( r \leqslant p \))): then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} \leqslant u_{k - 1} = v_{k - 1} < u_{k} < v_{k} \) and (\( v_{k} < i_{r} \), with \( r \leqslant p \)), \( v_{k} \notin lp_{n} \), \( u_{k} \notin lp_{n} \), \( d\left( {lp_{n} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} \), \( d\left( {lp_{n} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} \), \( F^{{\prime }}_{{v_{k - 1} }} \ne \emptyset \), \( F^{{\prime }}_{r'} = \emptyset \), \( \forall r^{{\prime }} = (v_{k - 1} + 1), \ldots ,(v_{k} - 1) \), \( v_{k - 1} + 1 \le u_{k} \le v_{k} - 1 \). Moreover, for the non-empty itemset \( its_{{t_{v_{k}} }} \) of \( \beta \) such that \( its_{{t_{v_{k}} }} = F^{\prime }_{{v_{k} }} = F_{{u_{k} }} \subseteq E^{\prime }_{{v_{k} }} \cap E^{\prime }_{{u_{k} }} = d\left( {lp_{n} } \right)_{{v_{k} }} \cap d\left( {lp_{n} } \right)_{{u_{k} }} \), so \( DCond_{1} (lp_{n} ,v_{k} ,u_{k} ) \) is true.

    If \( v_{k} < u_{k} \), then \( j_{r - 1} = u_{{k_{r - 1} }} \le u_{k - 1} = v_{k - 1} < v_{k} < u_{k} \) and (\( u_{k} < j_{r} \), with \( r \leqslant p \)). Similarly, \( u_{k} \notin lp_{m} \), \( v_{k} \notin lp_{m} \), \( d\left( {lp_{m} } \right)_{{v_{k} }} = E^{{\prime }}_{{v_{k} }} \), \( d\left( {lp_{m} } \right)_{{u_{k} }} = E^{{\prime }}_{{u_{k} }} \), \( F_{{u_{k - 1} }} \ne \emptyset \), \( F_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = (u_{k - 1} + 1), \ldots ,(u_{k} - 1) \), \( u_{k - 1} + 1 \leqslant v_{k} \leqslant u_{k} - 1 \). Moreover, for the non-empty itemset \( its_{{u_{k} }} \) of \( \alpha \) such that \( its_{{u_{k} }} = F_{{u_{k} }} = F_{{v_{k} }}^{{\prime }} \subseteq E_{{u_{k} }}^{{\prime }} \cap E_{{v_{k} }}^{{\prime }} = d\left( {lp_{m} } \right)_{{u_{k} }} \cap d\left( {lp_{m} } \right)_{{v_{k} }} \), so \( DCond_{1} (lp_{m} ,u_{k} ,v_{k} ) \) is true.

    Hence, \( u_{k} = v_{k} \) and (\( v_{k} < j_{r} \leqslant i_{r} \), with \( r \leqslant p \)). \( \left( {C\left( {r, \, k} \right)^{(1)} } \right) \).

  2. (ii)

    Second, if \( r \leqslant p \), we consider \( k = k_{r} \). Then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} < u_{k} = u_{{k_{r} }} = j_{r} \le i_{r} \) Since \( F_{{u_{k} }} = F^{{\prime }}_{{v_{k} }} \ne \emptyset \) and \( v_{k - 1} = v_{{k_{r} - 1}} < i_{r} \), then \( v_{{k_{r} }} \le i_{r} \). Consider the following cases.

If \( v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} < u_{{k_{r} }} = j_{r} \le i_{r} \), \( E^{{\prime }}_{{v_{{k_{r} }} }} \supseteq F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \). Thus, \( v_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , v_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ v_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), \( FI\left( {pre,n} \right) \, = r,lp_{pre} \left[ r \right] \, = v_{{k_{r} }} \) and with the itemset \( F^{{\prime }}_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( F^{{\prime }}_{{v_{{k_{r} }} }} \supseteq E_{r} \), so \( DCond_{2} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) \) is true (or \( DCond_{3} (lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} , \ldots ,j_{r - 1} , v_{{k_{r} }} ,j_{r + 1} , \ldots ,j_{p} \} ,lp_{m} ,u_{{k_{r} }} ) \) is true).

Therefore, \( j_{r} = u_{{k_{r} }} \leqslant v_{{k_{r} }} \leqslant i_{r} \). But the case \( u_{{k_{r} }} < v_{{k_{r} }} \le i_{r} \) cannot occur.

Indeed, if \( j_{r} = u_{{k_{r} }} < v_{{k_{r} }} = i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} = i_{r} \). Thus, \( F^{{\prime }}_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = u_{{k_{r} }} , \ldots ,(v_{{k_{r} }} - 1) \), \( E^{{\prime }}_{{u_{{k_{r} }} }} \supseteq F_{{u_{{k_{r} }} }} = F_{{j_{r} }} \supseteq E_{r} \), \( u_{{k_{r} }} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , u_{{k_{r} }} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ u_{{k_{1} }} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), FI(pre, n) = r, \( lp_{pre} \left[ r \right] \, = u_{{k_{r} }} \), \( lp_{n} \left[ r \right] \, = i_{r} = v_{{k_{r} }} \). Since \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} \) and \( F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{pre} )_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} \backslash E_{r} \), then for itemset \( its_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} \backslash E_{r} = F_{{u_{{k_{r} }} }} \backslash E_{r} \subseteq d(lp_{n} )_{{v_{{k_{r} }} }} \cap d(lp_{pre} )_{{u_{{k_{r} }} }} \). Therefore, \( DCond_{3} (lp_{pre} ,lp_{n} ,v_{{k_{r} }} ) \) is true.

If \( j_{r} = u_{{k_{r} }} < v_{{k_{r} }} < i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} \leqslant v_{{k_{r} - 1}} = u_{{k_{r} - 1}} < u_{{k_{r} }} = j_{r} < v_{{k_{r} }} < i_{r} \). Hence, \( F^{{\prime }}_{{r^{{\prime }} }} = \emptyset \), \( \forall r^{{\prime }} = \, (v_{{k_{r} - 1}} + 1), \, .., \, (v_{{k_{r} }} - 1) \), \( v_{{k_{r} - 1}} + 1 \leqslant u_{{k_{r} }} \leqslant v_{{k_{r} }} - 1,u_{{k_{r} }} \) and \( v_{{k_{r} }} \notin lp_{n} ,d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} = E^{{\prime }}_{{v_{{k_{r} }} }} ,F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} = E^{{\prime }}_{{u_{{k_{r} }} }} \). Thus, for the non-empty itemset \( its_{{v_{{k_{r} }} }} \) of \( \beta \) such that \( its_{{v_{{k_{r} }} }} = F^{{\prime }}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \subseteq d\left( {lp_{n} } \right)_{{v_{{k_{r} }} }} \cap d\left( {lp_{n} } \right)_{{u_{{k_{r} }} }} \) , the condition \( DCond_{1} \)(\( lp_{n} \), \( v_{{k_{r} }} \), \( u_{{k_{r} }} \)) is true.

Thus, \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} \leqslant i_{r} \). However, if \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} \), then \( i_{r - 1} = j_{r - 1} = u_{{k_{r - 1} }} = v_{{k_{r - 1} }} < v_{{k_{r} }} = u_{{k_{r} }} = j_{r} < i_{r} \), \( E^{{\prime }}_{{j_{r} }} \supseteq F^{{\prime }}_{{j_{r} }} = F^{'}_{{v_{{k_{r} }} }} = F_{{u_{{k_{r} }} }} \supseteq E_{r} \). Thus, \( j_{r} \in lp_{pre} \mathop = \limits^{\text{def}} \{ i_{1} , \ldots ,i_{r - 1} , j_{r} ,i_{r + 1} , \ldots ,i_{p} \} \) (and \( lp_{pre} \mathop = \limits^{\text{def}} \{ j_{1} ,i_{2} , \ldots ,i_{p} \} \), if r = 1), \( lp_{pre} \prec lp_{n} \), FI(pre, n) = r, \( lp_{pre} \)[r] = \( j_{r} \) and the itemset \( F^{{\prime }}_{{j_{r} }} \) of \( \beta \) contains \( E_{r} \), so \( DCond_{2} \)(\( lp_{pre} \), \( lp_{n} \), \( j_{r} \)) is true.

Hence, \( v_{{k_{r} }} = u_{{k_{r} }} = j_{r} = i_{r} \). (C(r, k)(2)).

From (C(r, k) (1)) − (C(r, k) (2)), we have (C(r, k)).

Finally, under the hypothesis α ≡ β, then ∀r = 1, …, p + 1, ∀k = \( (k_{r - 1} + 1), \ldots , k_{r} \), we always have (\( v_{k} = u_{k} \)), (\( j_{r} = i_{r} = u_{{k_{r} }} = v_{{k_{r} }} \), if \( r \leqslant p \)). Therefore,

$$ (\forall k^{{\prime }} = 1, \ldots ,N,v_{{k^{{\prime }} }} = u_{{k^{{\prime }} }} )^{(*)} \,{\text{and}}\, (\forall r = 1, \ldots ,p,j_{r} = i_{r} )^{(**)} . $$

If \( lp_{m} = lp_{n} \), then \( j_{r} = i_{r} \), ∀r = 1,.., p. Since α and β are generated from two different position lists in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \), so \( \exists i_{0} = 1, \ldots ,N \) such that \( u_{{i_{0} }} \ne v_{{i_{0} }} \). This contradicts (*).

If \( lp_{m} \ne lp_{n} \), i.e. \( \exists m_{0} = 1, \ldots ,p \) such that \( i_{{m_{0} }} \ne j_{{m_{0} }} \). This also contradicts (**). In other words, the hypothesis α ≡ β always leads to a contradiction.

Thus, α ≠ β, i.e., all sequences in \( {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) are distinctly generated.

From Proposition 1 and Theorem 1.a, we have \( {\mathcal{F}\mathcal{S}}\left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{{\prime }} \left( {\sigma ,\gamma } \right) = {\mathcal{F}\mathcal{S}}^{**} \left( {\sigma ,\gamma } \right) \) and the remaining assertions are deduced from Theorem 1. □

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duong, H., Truong, T., Le, B., Fournier-Viger, P. (2019). An Explicit Relationship Between Sequential Patterns and Their Concise Representations. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P. (eds) Big Data Analytics. BDA 2019. Lecture Notes in Computer Science(), vol 11932. Springer, Cham. https://doi.org/10.1007/978-3-030-37188-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37188-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37187-6

  • Online ISBN: 978-3-030-37188-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics