Skip to main content
Log in

Discovering frequent chain episodes

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Frequent episode discovery is a popular framework in temporal data mining with many applications. An episode is a partially ordered set of nodes with each node associated with an event-type. The episodes literature has seen different notions of frequency and a variety of associated discovery algorithms under these different frequencies when the associated partial order is total (serial episode) or trivial (parallel episode). Recently an apriori-based discovery algorithm for mining episodes where the associated partial order has no restriction but the node to event-type association is one–one (general injective episodes) was proposed based on the non-overlapped frequency measure. This work pointed out that frequency alone is not a sufficient indicator of interestingness in the context of episodes with general partial orders and introduced a new measure of interestingness called bidirectional evidence (BE) to address this issue. This algorithm discovers episodes by incorporating both frequency and BE thresholds in the level-wise procedure. In this paper, we extend this BE-based algorithm to a much larger class of episodes that we call chain episodes. This class encompasses all serial and parallel episodes (injective or otherwise) and also many other non-injective episodes with unrestricted partial orders. We first discuss how the BE measure can be generalized to chain episodes and prove the monotonicity property it satisfies in this general context. We then describe our candidate generation step (with correctness proofs) which nicely exploits this new monotonicity property. We further describe the frequency counting (with correctness proofs) and BE computation steps for chain episodes. The experimental results demonstrate the effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. An episode is injective if the associated node to event-type map is one to one.

  2. The class was originally introduced in [6] while the term “Chain Episodes” was explicitly used for the first time in [1].

  3. The correctness proof for chain episode counting we present here is a minor extension of the correctness proofs for injective episodes first reported in [1]. An alternate correctness proof for chain episode counting is given in [25] which appeared after [1].

  4. Given any set V, a relationR over V (which is a subset of \(V \times V\)) is said to be a strict partial order if it is irreflexive (i.e. for all \(v \in V\), \((v, v) \notin R\)), asymmetric (i.e. \((v_1, v_2) \in R\) implies that \((v_2, v_1) \notin R\), for all distinct \(v_1, v_2 \in V\)) and transitive (i.e. \(\forall v_1, v_2, v_3 \in V\), \((v_1, v_2) \in R\) and \((v_2, v_3) \in R\) implies that \((v_1, v_3) \in R\)).

  5. We will elaborate later in Sect. 6 on how finite state automata can be used to track occurrences of episodes and a strategy for counting with expiry time constraints.

  6. By frequent here, we mean episodes which satisfy both the frequency and BE thresholds.

  7. An element in \(V_{\alpha }\) is minimal if there is no other element less than it as per \(<_{\alpha }\). Note that a poset can in general have multiple minimal elements.

  8. In \(t_e\), subscript e denotes the end time of the window. In \(h_i^e\), superscript e refers to earliest transiting.

  9. An episode is said to be frequency closed if every superepisode has a strictly lower frequency.

  10. A generator is an episode whose every subepisode has a strictly greater frequency.

  11. Recall from Definition 6, an injective episode \(\alpha \) can be viewed as a partially ordered set of event-types \((X^\alpha ,R^\alpha )\). \((X^\beta ,R^\beta )\) is a maximal subepisode of an injective episode \(\alpha \) if \(X^\beta \subseteq X^\alpha \) and \(R^\beta \) is the restriction of \(R^\alpha \) on to \(X^\beta \). The notion of a maximal subepisode of a general episode is discussed in the next section.

  12. A serial extension of a chain episode \((V_\alpha , <_\alpha , g_\alpha )\) is a serial episode \(\beta = (V_\beta ,<_\beta ,g_\beta )\) where \(V_{\beta }=V_{\alpha }\) and \(g_\alpha = g_\beta \) such that \(<_\alpha \subseteq <_\beta \).

  13. The source codes have all been written in C++. The experiments have been run on a 2.5GHz Pentium PC under a Linux operating system.

References

  1. Achar A (2010) Discovering Frequent episodes with general partial orders. PhD thesis, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India

  2. Achar A, Sastry PS (2015) Statistical significance of general partial orders. Inf Sci 296:175–200

    Article  MathSciNet  MATH  Google Scholar 

  3. Achar A, Laxman S, Sastry PS (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250

    Article  Google Scholar 

  4. Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108

    Article  Google Scholar 

  5. Achar A, Laxman S, Raajay V, Sastry PS (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108

    Article  MathSciNet  MATH  Google Scholar 

  6. Achar A, Laxman S, Raajay V, Sastry PS (2009) Discovering general partial orders from event streams. In: Technical report arXiv: 0902.1227v2 [cs.AI]

  7. Atallah MJ, Gwadera R, Szpankowski W (2004) Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK, pp 3–10

  8. Bouqata B, Caraothers CD, Szymanski BK, Zaki MJ (2006) Vogue: a novel variable order-gap state machine for modeling sequences. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, vol 4213. Springer, Berlin, pp 42–54

  9. Brown EN, Kass RE, Mitra PP (2004) Multiple neuronal spike train data analysis: state of art and future challenges. Nat Neurosci 7(5):456–461

    Article  Google Scholar 

  10. Gan M, Dai H (2012) An efficient one-pass method for discovering bases of recently frequent episodes over online data streams. Int J Innov Comput Inf Control 8(7(A)):4675–4690

    Google Scholar 

  11. Gan M, Dai H (2014) Detecting and monitoring abrupt emergences and submergences of episodes over data streams. Inf Syst 39:277–289

    Article  Google Scholar 

  12. Huang K, Chang C (2008) Efficient mining of frequent episodes from complex sequences. Inf Syst 33(1):96–114

    Article  Google Scholar 

  13. Ibrahim A, Sastry S, Sastry PS (2016) Discovering compressing serial episodes from event sequences. Knowl Inf Syst 47(2):405–432

    Article  Google Scholar 

  14. Iwanuma K, Takano Y, Nabeshima H (2004) On anti-monotone frequency measures for extracting sequential patterns from a single very-long sequence. In: Proceedings of the 2004 IEEE conference on cybernetics and intelligent systems, vol 1, pp 213–217

  15. Laxman S, Sastry PS, Unnikrishnan KP (2005) Discovering frequent episodes and learning hidden Markov models: a formal connection. IEEE Trans Knowl Data Eng 17:1505–1517

    Article  Google Scholar 

  16. Laxman S, Tankasali V, White RW (2008) Stream prediction using a generative model based on frequent episodes in event sequences. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 453–461

  17. Luo J, Bridges SM (2000) Mining fuzzy association rules and fuzzy frequent episodes for intrusion detection. Int J Intell Syst 15:687–703

    Article  MATH  Google Scholar 

  18. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289

    Article  Google Scholar 

  19. Nag A, Fu AW (2003) Mining frequent episodes for relating financial events and stock trends. In: Proceedings of 7th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2003). Springer, Berlin, pp 27–39

  20. Patnaik D, Sastry PS, Unnikrishnan KP (2008) Inferring neuronal network connectivity from spike data: a temporal data mining approach. Sci Program 16:49–77

    Google Scholar 

  21. Patnaik D, Laxman S, Chandramouli B, Ramakrishnan N (2012) Efficient episode mining of dynamic event streams. In: IEEE international conference on data mining, pp 605–614

  22. Sastry PS, Unnikrishnan KP (2010) Conditional probability based significance tests for sequential patterns in multi-neuronal spike trains. Neural Comput 22(4):1025–1059

    Article  MATH  Google Scholar 

  23. Tatti N (2009) Significance of episodes based on minimal windows. In: Proceedings of 2009 IEEE international conference on data mining

  24. Tatti N (2015) Ranking episodes using a partition model. Data Min Knowl Discov 29(5):1312–1342

    Article  MathSciNet  MATH  Google Scholar 

  25. Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(3):34–66

    Article  MathSciNet  MATH  Google Scholar 

  26. Tatti N, Cule B (2011) Mining closed episodes with simultaneous events. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1172–1180

  27. Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of 2010 IEEE international conference on data mining, pp 501–510

  28. Unnikrishnan KP, Shadid BQ, Sastry PS, Laxman S (2009) Root cause diagnostics using temporal datamining, U.S.Patent no. 7509234, 24 Mar

  29. Wang MF, Wu YC, Tsai MF (2008) Exploiting frequent episodes in weighted suffix tree to improve intrusion detection system. In: Proceedings of the 22nd international conference on advanced information networking and applications-workshops. IEEE Computer Society, Washington, pp 1246–1252

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avinash Achar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Comparison with the apriori-based closed episode miner

As stated earlier in Sect. 7, monotonicity property exploited by [27] (or its refined version [25]) and the one exploited here are different. This makes the candidate generation step proposed here substantially different from that of [27] or [25]. The algorithm in [25] produces candidate episodes that are generators of ultimately closed episodes. One needs to ultimately perform a closure operation on the generators to obtain what are called instance-closed episodes. The final set of closed episodes are obtained from post-filtering the set of instance-closed episodes. The main point to note is that [25] generates a potential candidate if all its subepisodes (including that of the same size) are also frequent. In other words, it exploits the subepisode structure that exists within episodes of the same size sharing the same g-map.

In this paper, we are using both frequency and BE to prune candidates. The monotonicity property satisfied by BE is a much weaker condition as compared to that of frequency alone. An \(\ell \)-node episode is generated as a candidate if and only if all its \((\ell - 1)\)-node maximal subepisodes obtained by dropping the last node among all nodes mapped to the same event-type are found frequent. In fact the BE-based measure does not demand the check for the existence of subepisodes of the same size as subepisodes of the same size are not guaranteed to have high BE in spite of the given episode’s high BE. Continuing with the same serial episode event sequence example, the serial episode has a high BE in this data; however, all its subepisodes of the same size will have zero BE as they are obtained by dropping one or more edges from the parent serial episode. For instance, suppose there is an edge from node i to node j in \(\alpha \). If the edge between node i and j is dropped from \(\alpha \) to obtain a \(\beta \), then \(H_{ij}^{\beta }\) will be zero because in the occurrences tracked i precedes j always.

More specifically, Tatti and Cule [25] at each level \(\ell \), first mines for all frequent parallel episodes of size \(\ell \). It then starts generating potential candidates by progressively adding one edge at a time, doing subsequent necessary subepisode existence and closure checks before counting its frequency and mining for frequent generators. An episode with an \(\ell \)-node episode and N edges is constructed as a potential candidate by combining two \(\ell \)-node subepisodes of \((N-1)\) edges which share \((N-2)\) edges in common. In other words, the \(\ell \)-node subepisode obtained by dropping an edge from the both the combinable episodes is the same. Note that the \(g_{\alpha }\) map is assumed to be the same among all the above involved episodes. For each such generated episode, certain intelligent checks for transitive closure are first carried out. This is followed by checking for the existence of subepisodes (as frequent) obtained by dropping either an edge or a node. The last check before computing its frequency would be if its a generator too by making sure its not contained in the closure of any of its subepisodes.

In contrast to this, in the current approach we are constructing a potential candidate of size \((\ell + 1)\) by combining two \(\ell \)-node episodes. This is because the BE-based monotonicity we are exploiting does not guarantee subepisodes of size \((\ell + 1)\) obtained by dropping edges alone to also have a high enough BE. The \((\ell - 1)\)-node subepisode obtained by dropping an appropriate node from the combining \(\ell \)-node episodes is the same. This is what makes the candidate generation steps fundamentally different in our approach from that of [25] or [27].

Computation of BE

Algorithm 4 describes the pseudocode for computing the BE of a given episode. Maintaining multiple automata is easily done by maintaining two lists in addition to the state information consisting of : (i) \(\mathcal {Q}\), the set of currently accepted nodes (ii) \(\mathcal {W}\), the set of nodes an automaton is waiting for. The first is a list of first state transition times of each automata and the second is a list of associated binary matrices. Recall that if h is the occurrence tracked by an automaton, then by the time the automaton reaches its final state, the (ij)-entry in the binary matrix would be 1 if and only if \(t_{h(v_i)}<t_{h(v_j)}\). Both these lists are stored together in TimeMatrixList. The pseudocode assumes that \(\mathcal {Q}\) and \(\mathcal {W}\) store the integer indices of the associated episode nodes. Lines 6–10 consider the case when the automaton is in its start state. If an automaton is not in its start state, we first delete all those automaton whose associated occurrences evidently violate the expiry time constraint (Line 13). After this filtering, if there still exist automaton (TimeMatrixList being non-empty), we compute the next state, update the binary matrix of each of these automata. If the next state also happens to be the final state, then we use the binary matrix of the oldest automaton to update the CountMatrix. By the end of processing the entire event sequence, the (ij)th element of the CountMatrix would contain \(f_{ij}^{\alpha }\), which can be further utilized to compute \(H(\alpha )\) as explained in Sect. 4.

figure g

Implementation issues in counting

As explained earlier in Sect. 3.1, an \(\ell \)-node episode \(\alpha \) is represented using two data structures: an array \(\alpha .g\) such that \(\alpha .g[i]=g_{\alpha }(v_i)\), \(i=1,\ldots ,\ell \) and a binary adjacency matrix, \(\alpha .e\) storing the partial order (\(<_\alpha \)) information. As in the injective episodes case, to efficiently count a set of \(\ell \)-node candidates, we use a collection of lists waits(), indexed by the set of all event-types. Each element in these various lists stores information about the currently active automata corresponding to the various candidates. A typical element in each of these lists is of the form \((\alpha ,\mathbf {q},\mathbf {w},j)\), where, \(\alpha \) is a candidate, \(\mathbf {q}\) and \(\mathbf {w}\) essentially represent the state of an automaton and j is an integer. \(\mathbf {q}\) and \(\mathbf {w}\) are \(\ell \)-length binary vectors encoding the two sets \((\mathcal {Q}^\alpha ,\mathcal {W}^\alpha )\), which represent a state in the FSA associated with \(\alpha \). For example, \(\mathbf {q}[j]=1\) iff \(v_j \in \mathcal {Q}^\alpha \). For an event-type E, if \((\alpha ,\mathbf {q},\mathbf {w},j)\in waits(E)\), it denotes that an automaton of the episode \(\alpha \) is currently in state \((\mathbf {q},\mathbf {w})\) and is waiting for an event-type \(E=\alpha .g[j]=g_{\alpha }(v_j)\) to make a state transition (with \(\mathbf {w}[j]=1\)). As an example, consider the automaton (Fig. 12) corresponding to \((F\rightarrow (E\,G)\rightarrow F)\) in a state with \(\mathcal {Q}^\alpha = \{v_2\}\) and \(\mathcal {W}^\alpha =\{v_1,v_4\}\). Here we would have \((\beta , \mathbf {q}, \mathbf {w}, 1)\in waits(E)\) and \((\beta ,\mathbf {q},\mathbf {w},4)\in waits(G)\) where \(\mathbf {q}=[0\,1\,0\,0]\) and \(\mathbf {w}=[1\,0\,0\,1]\).

In the injective episode case [5], since the \(g_\alpha \)-map is injective, it was convenient to work with the set \(X^\alpha =\{g_\alpha (v_1),g_\alpha (v_2)\dots g_\alpha (v_N)\}\) while defining states of the associated automaton. Consequently, the binary vectors \(\mathbf {q}\) and \(\mathbf {w}\) coded for certain subsets of \(X^\alpha \) as states there. If instead, \(\mathbf {q}\) and \(\mathbf {w}\) coded for subsets of \(V_\alpha \) as states, the algorithm (with pseudocode) presented for injective episodes [5] would still go through (for injective episodes). Generalizing further, since the resultant FSA for general chain episodes turns out to be deterministic always, the counting algorithm for general injective episodes with all the implementation details of [5], would similarly go through for chain episodes also. As explained in Sect. 6.3, the only addition would be that for each state one maintains multiple automata (unlike injective episodes where one needs to maintain at most one automata per state). This is easily done by maintaining the first state transition times of each automata (in a given state) in a list. Hence, for all the implementation details of the counting step for chain episodes, refer [5].

Property of ET occurrences

We now prove Property 2 introduced in Sect. 6.2. We restate it here for convenience.

Property 5

Given a chain episode \(\alpha \) and data stream \(\mathbf {D}\), consider an ET occurrence h and another occurrence \(h'\) of \(\alpha \) in \(\mathbf {D}\) such that \(h'\) starts on or after \(t_{\bar{h}(1)}\). Let \(\mathbf {D}_j\) denote the first j events of \(\mathbf {D}\). For every j, the set of all nodes in \(V_\alpha \) whose associated events under h occur in \(\mathbf {D}_j\) is a superset of the set of all nodes in \(V_\alpha \) whose associated events under \(h'\) occur in \(\mathbf {D}_j\).

Proof

We show this by induction of j. For any \(j< \bar{h}(1)\), the property is obviously true. For \(j=\bar{h}(1)\), where \(\bar{h}(1)=h(v_1^h), v_1^h \in V_\alpha \), if \(h'\) starts strictly after \(t_{\bar{h}(1)}\), then the property is immediate. If \(h'\) also starts at \(t_{\bar{h}(1)}\), then \(h'(v_1^h)\) must be equal to j as we are dealing with chain episodes. (Recall that all nodes in \(\mathcal {W}_0^\alpha \) are unrelated and hence must be mapped to distinct event-types under \(g_\alpha \) for a chain episode \(\alpha \).) Let us assume Property 2 is true for some \(j>\bar{h}(1)\). Let \(\mathcal {Q}\) and \(\mathcal {Q}'\) denote the set of all nodes in \(V_\alpha \) whose associated events under h and \(h'\), respectively, occur in \(\mathbf {D}_j\). By hypothesis, \(\mathcal {Q}'\subseteq \mathcal {Q}\). If \((E_{j+1},t_{j+1})\) is not a part of \(h'\), then the property is immediate for \(j+1\). Suppose \((E_{j+1}, t_{j+1})\) is a part of \(h'\). For convenience, we denote \(h'^{-1}(j+1)\) by \(v_k\). We now claim that \(h(v_k)\) is between \(\bar{h}(1)\) and \((j+1)\) (both inclusive). Since \(h'\) is a valid occurrence, all parents of \(v_k\) belong to \(\mathcal {Q}'\). Since \(\mathcal {Q}'\subseteq \mathcal {Q}\), we also have seen events associated with all parents of \(v_k\) (in \(<_\alpha \)) under h in \(\mathbf {D}_j\). Since h is ET (Definition 14), the event associated with \(v_k\) under h must be \((E_{j+1}, t_{j+1})\) or some event in \(\mathbf {D}_j\) before it. Hence, the property continues to hold on \(\mathbf {D}_{j+1}\) too. \(\square \)

Problems in handling non-chain episodes

The first point we want to make here is that non-chain episodes suffer from the problem of ambiguity in representation. For example, the 4-node episode \(((A\rightarrow C)(A\rightarrow B))\) is not a chain episode. This episode has two representations in spite of constraining the g-map such that \((g_\alpha (v_1),\ldots , g_\alpha (v_N))\) is ordered as per the lexicographic ordering on \(\mathcal {E}\) as shown in Fig. 17a, d. One can verify that both \(\alpha \) (Fig. 17a) and \(\alpha '\) (Fig. 17d) share the same set of occurrences on any event sequence. This ambiguity creeps in mainly because the nodes which map to the event-typeAare unrelated under\(<_{\alpha }\). This ambiguity also reflects in the equivalent array of event-types and adjacency matrix notation. As discussed earlier in Sect. 7, Tatti and Cule [26] considers discovery algorithms to output the most general episodes which includes \(((A \rightarrow C)(A \rightarrow B))\). It also recognizes this issue of inherent ambiguity in representation for general episodes. The algorithm in [26] does not resolve this ambiguity in representation for most general episodes. It tackles it by actually comparing every currently generated (instance) closed episode during the DFS traversal of the space of all episodes, with the remaining currently discovered set of closed episodes. The comparison actually tests for a subepisode relationship whose computation can be very involved for non-chain episodes in general. In fact, it is shown to be NP-hard in general.

Fig. 17
figure 17

Illustrates multiple representation problem of non-chain episode \(((A\rightarrow B)(A\rightarrow C))\)

There would also be difficulties in counting occurrences of non-chain episodes. Consider the above non-chain episode \(\alpha \) (Fig. 17a). To track an occurrence of such an episode, we would initially wait for two As and on seeing an A, we would need to accept the A associated with both \(v_1\) and \(v_2\). This means on seeing A there is more than one next state possible as per Definition 12. Generalizing this, one can show that the construction of an FSA for tracking occurrences of a non-chain episode \(\alpha \) as per Definition 12 always leads to a non-deterministic finite state automaton (NFA). To track occurrences of such an \(\alpha \), one would first need to convert this NFA into an equivalent DFA. In the process of this conversion, the number of states in the equivalent DFA would be larger. In fact it is shown in [26] that checking if an event sequence contains an occurrence of an episode is an NP-complete problem. Thus, counting the occurrences is also not straight forward for non-chain episodes in addition to problems of ambiguous representation.

Given these issues, it looks non-trivial to extend apriori-based discovery algorithms to the class of all episodes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Achar, A., Sastry, P.S. Discovering frequent chain episodes. Knowl Inf Syst 60, 447–494 (2019). https://doi.org/10.1007/s10115-019-01349-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01349-y

Keywords

Navigation