Abstract
We study the problem of finding the longest common sub-pattern (LCSP) shared by two sequences of temporal intervals. In particular we are interested in finding the LCSP of the corresponding arrangements. Arrangements of temporal intervals are a powerful way to encode multiple concurrent labeled events that have a time duration. Discovering commonalities among such arrangements is useful for a wide range of scientific fields and applications, as it can be seen by the number and diversity of the datasets we use in our experiments. In this paper, we define the problem of LCSP and prove that it is NP-complete by demonstrating a connection between graphs and arrangements of temporal intervals. This connection leads to a series of interesting open problems. In addition, we provide an exact algorithm to solve the LCSP problem, and also propose and experiment with three polynomial time and space under-approximation techniques. Finally, we introduce two upper bounds for LCSP and study their suitability for speeding up 1-NN search. Experiments are performed on seven datasets taken from a wide range of real application domains, plus two synthetic datasets. Lastly, we describe several application cases that demonstrate the need and suitability of LCSP.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abraham T, Roddick JF (1999) Incremental meta-mining from large temporal data sets. In: ER ’98: Proceedings of the workshops on data warehousing and data mining, pp 41–54
Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Proceedings of the 15th ACM symposium on applied computing, pp 294–300
Allen J, Ferguson G (1994) Actions and events in interval temporal logic. J Log Comput 4:531–579
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
Berendt B (1996) Explaining preferred mental models in Allen inferences with a metrical model of imagery. In: Proceedings of the 18th annual conference of the cognitive science society, pp 489–494
Bergen B, Chang N (2005) Embodied construction grammar in simulation-based language understanding. Constr Gramm 3:147–190
Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Proceedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer-Verlag, New York, pp 295–300
Chen YC, Peng WC, Le SY (2011) CEMiner—an effcient algorithms for mining closed patterns from interval-based data. In: Proceedings of the IEEE international conference on data mining (ICDM)
Cormen TH, Rivest RL, Leiserson CE, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge
Feige U, Goldwasser S, Lovasz L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: Proceedings of the 32nd annual IEEE symposium on foundations of computer science, pp 2–12
Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. In: Proceedings of the ACM SIGKDD workshop on useful patterns. ACM, New York, UP ’10, pp 45–54. doi:10.1145/1816112.1816119
Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the 6th SIAM data mining conference, vol 124, pp 348–359
Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }\). In: FOCS, pp 627–636
Höppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of knowledge discovery in databases, pp 192–203
Höppner F, Klawonn F (2001) Finding informative rules in interval sequences. In: Proceedings of the 4th international symposium on advances in intelligent data analysis, pp 123–132
Hwang SY, Wei CP, Yang WS (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364
Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16:1–16:42
Kam P, Fu AW (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the 2nd international conference on data warehousing and knowledge discovery, pp 317–326
Kosara R, Miksch S (2001) Visualizing complex notions of time. Stud Health Technol Inf 1:211–215
Kostakis O, Papapetrou P, Hollmén J (2011) Artemis: assessing the similarity of event-interval sequences. In: Proceedings of the conference on machine learning and knowledge discovery in databases (ECML/PKDD 2011), pp 229–244
Kotsifakos A, Papapetrou P, Athitsos V (2013) IBSM: interval-based sequence matching. In: Proceedings of the SIAM conference on data mining (SDM), pp 596–604
Lam HT, Mrchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52. doi:10.1002/sam.11192
Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201. doi:10.1109/TKDE.2007.1055
Lin JL (2003) Mining maximal frequent intervals. In: Proceedings of the 18th ACM symposium on applied computing, pp 624–629
Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part II. Springer-Verlag, Berlin, ECML PKDD ’08, pp 33–49. doi:10.1007/978-3-540-87481-2_3
Mooney C, Roddick JF (2004) Mining relationships between interacting episodes. In: Proceedings of the 4th SIAM international conference on data mining
Mörchen F (2007) Unsupervised pattern mining from symbolic temporal data. SIGKDD Explor Newsl 9:41–55
Mörchen F, Fradkin D (2010) Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the 10th SIAM international conference on data mining, pp 315–326
Pachet F, Ramalho G, Carrive J (1996) Representing temporal musical objects and reasoning in the MusES system. J New Music Res 25(3):252–275
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21:133–171
Patel D, Hsu W, Lee M (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 28th ACM SIGMOD international conference on management of data, ACM, pp 393–404
Paterson M, Dancik V (1994) Longest common subsequences. In: Proceedings of the 19th MFCS, number 841 in LNCS, pp 127–142
Pissinou N, Radev I, Makki K (2001) Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica 5(4):375–409
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Tsourakakis CE, Bonchi F, Gionis A, Gullo F, Tsiarli MA (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112
Villafane R, Hua KA, Tran D, Maulik B (2000) Knowledge discovery from series of interval events. Intell Inf Syst 15(1):71–89
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1):1–20
Winarko E, Roddick JF (2007) Armada—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90. doi:10.1016/j.datak.2006.10.009
Wu SY, Chen YL (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758. doi:10.1109/TKDE.2007.190613
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.
Appendix: Proof of the properties described in Sect. 4.3.2
Appendix: Proof of the properties described in Sect. 4.3.2
-
1.
Supposing that \(LCS(i,j)\) was composed of more than one interval, then there must exist a pair of intervals with the same label in \(\{{S_A}_1,\ldots ,{S_A}_{i-1}\}\) and \(\{{S_B}_1,\ldots ,{S_B}_{i-1}\}\). That is a contradiction since it would imply that not all previous sub-problems yield \(\emptyset \) as their solution.
-
2.
By applying the operation \(LCS(p,q)\otimes (i,j)\) or, equivalently selecting from \(LCS(p,q)\) only the intervals that induce similar relations to the corresponding interval of \(i\) and \(j\), we make sure that the interval corresponding to \(i\) and \(j\) has the same relations to the previous intervals in the produced arrangement. Conversely, the existing intervals have the same relations to the correspondent of \(i\) and \(j\). Additionally, pairs of existing intervals of \(LCS(p,q)\) have identical relations with their correspondents in \(\mathcal {A}\) and \(\mathcal {B}\); this was examined when each interval was added to the solution of the previous sub-problems.
-
3.
In other words, the \(\otimes \) operator does not discard extra intervals. Suppose that the maximal CSPs are correctly retrieved for all previous sub-problems \(LCS(p,q)\), but not for \(LCS(i,j)\). This would imply that an interval belonging to a maximal CSP of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) (where \(A_i\) is matched to \(B_j\)) exists but was not selected for \(LCS(i,j)\). But since the not-selected interval belongs to a maximal CSP then it has the same relation to \({S_A}_i\) and \({S_B}_j\). So, since the relations are the same, the interval would have been selected for \(LCS(i,j)\), which contradicts to the previous. Thus, the algorithm at point \((i,j)\) returns maximal CSPs of \(\{ {{E_S}_A}_1, \ldots , {{E_S}_A}_i\}\) and \(\{ {{E_S}_B}_1, \ldots , {{E_S}_B}_j\}\) that matches \({{E_S}_A}_i\) to \({{E_S}_B}_j\).
-
4.
Suppose there exists a maximal CSP that matches \(A_i\) to \(B_j\) but was not discovered. This would imply that by removing the interval corresponding to \(A_i\) and \(B_j\), one is left with common a sub-pattern \(s\). Then, either \(s\subseteq r,r\in \mathcal {M}_{i-1,j-1}\) or not. In the first case, \(s\) must have been retrieved when performing \(r\otimes (i,j)\), so this cannot be. So, it can only be that \(s\) is maximal but then it must hold that \(s\in \mathcal {M}_{i-1,j-1}\). Contradiction.
An alternative approach is that in the Cartesian graph \(G_{AB}\) (see proof of Theorem 2 for exact definition), this corresponds to finding all maximal cliques containing the vertex \(u\) labeled \((i,j)\) by checking all previously found maximal cliques and for each one returning its intersection with the neighbors of \(u\).
Rights and permissions
About this article
Cite this article
Kostakis, O., Papapetrou, P. Finding the longest common sub-pattern in sequences of temporal intervals. Data Min Knowl Disc 29, 1178–1210 (2015). https://doi.org/10.1007/s10618-015-0404-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0404-3