Abstract
In many daily applications, such as meteorology or patient data, the starting and ending times of the events are stored in a database, resulting in time interval data. Discovering patterns from time interval data can reveal informative patterns, in which the time intervals are related by temporal relations, such as before or overlaps. When multiple temporal variables are sampled in a variety of forms, and frequencies, as well as irregular events that may or may not have a duration, time intervals patterns can be a powerful way to discover temporal knowledge, since these temporal variables can be transformed into a uniform format of time intervals. Predicting the completion of such patterns can be used when the pattern ends with an event of interest, such as the recovery of a patient, or an undesirable event, such as a medical complication. In recent years, an increasing number of studies have been published on time intervals-related patterns (TIRPs), their discovery, and their use as features for classification. However, as far as we know, no study has investigated the prediction of the completion of a TIRP. The main challenge in performing such a completion prediction occurs when the time intervals are coinciding and not finished yet which introduces uncertainty in the evolving temporal relations, and thus on the TIRP’s evolution process. To overcome this challenge, we propose a new structure to represent the TIRP’s evolution process and calculate the TIRP’s completion probabilities over time. We introduce two continuous prediction models (CPMs), segmented continuous prediction model (SCPM), and fully continuous prediction model (FCPM) to estimate the TIRP’s completion probability. With the SCPM, the TIRP’s completion probability changes only at the TIRP’s time intervals’ starting or ending point. The FCPM incorporates, in addition, the duration between the TIRP’s time intervals’ starting and ending time points. A rigorous evaluation of four real-life medical and non-medical datasets was performed. The FCPM outperformed the SCPM and the baseline models (random forest, artificial neural network, and recurrent neural network) for all datasets. However, there is a trade-off between the prediction performance and their earliness since the new TIRP’s time intervals’ starting and ending time points are revealed over time, which increases the CPM’s prediction performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Fig16_HTML.png)
Similar content being viewed by others
Notes
The formula \((k^2-k)/2\) follows from the binomial coefficient \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) =k(k-1)/2=(k^2-k)/2\), where 2 stands for pairs of temporal relations and k is the number of STIs.
Note that in the formulas we use probabilities of continuous variables. The issue will be discussed in detail below, for now let us assume it to be a shorthand notation for probabilities of durations belonging to a narrow interval around the current value.
References
Chang L, Wang T, Yang D, Luan H (2008) Seqstream: mining closed sequential patterns over stream sliding windows. In: 2008 Eighth IEEE international conference on data mining, pp 83–92. IEEE
Höppner F (2001) Learning temporal rules from state sequences. In: IJCAI workshop on learning from temporal and spatial data, vol 25. Citeseer
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2009) Mining frequent arrangements of temporal intervals. Knowl Inf Syst 21(2):133
Mordvanyuk N, López B, Bifet A (2021) Verttirp: robust and efficient vertical frequent time interval-related pattern mining. Expert Syst Appl 168:114276
Harel O, Moskovitch R (2021) Complete closed time intervals-related patterns mining. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 4098–4105
Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Disc 15(2):217–247
Lu EH-C, Tseng VS, Philip SY (2010) Mining cluster-based temporal mobile sequential patterns in location-based service environments. IEEE Trans Knowl Data Eng 23(6):914–927
Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657
Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 29(4):871–913
Patel D, Hsu W, Lee ML (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 393–404
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4):1–22
Itzhak N, Nagori A, Lior E, Schvetz M, Lodha R, Sethi T, Moskovitch R (2020) Acute hypertensive episodes prediction. In: International conference on artificial intelligence in medicine. Springer, pp 392–402
Novitski P, Cohen CM, Karasik A, Shalev V, Hodik G, Moskovitch R (2020) All-cause mortality prediction in t2d patients. In: International conference on artificial intelligence in medicine. Springer, pp 3–13
Teinemaa I, Dumas M, Leontjeva A, Maggi FM (2018) Temporal stability in predictive process monitoring. Data Min Knowl Disc 32(5):1306–1338
Teinemaa I, Dumas M, Rosa ML, Maggi FM (2019) Outcome-oriented predictive process monitoring: review and benchmark. ACM Trans Knowl Discov Data 13(2):1–57
Di Francescomarino C, Ghidini C, Maggi FM, Milani F (2018) Predictive process monitoring methods: Which one suits me best? In: International conference on business process management. Springer, pp 462–479
Henry KE, Hager DN, Pronovost PJ, Saria S (2015) A targeted real-time early warning score (trewscore) for septic shock. Sci Transl Med 7(299):299–122299122
Schvetz M, Fuchs L, Novack V, Moskovitch R (2021) Outcomes prediction in longitudinal data: study designs evaluation, use case in icu acquired sepsis. J Biomed Inform 117:103734
Sheetrit E, Nissim N, Klimov D, Shahar Y (2019) Temporal probabilistic profiles for sepsis prediction in the icu. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2961–2969
Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: 2011 international conference on computer vision. IEEE, pp 1036–1043
Liu L, Wang S, Su G, Hu B, Peng Y, Xiong Q, Wen J (2017) A framework of mining semantic-based probabilistic event relations for complex activity recognition. Inf Sci 418:13–33
Zhu G, Cao J, Li C, Wu Z (2017) A recommendation engine for travel products based on topic sequential patterns. Multimed Tools Appl 76(16):17595–17612
da Silva Junior LLN, Kohwalter TC, Plastino A, Murta LGP (2021) Sequential coding patterns: how to use them effectively in code recommendation. Inf Softw Technol 140:106690
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X et al (2020) Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet 395(10223):497–506
Itzhak N, Tal S, Cohen H, Daniel O, Kopylov R, Moskovitch R (2022) Classification of univariate time series via temporal abstraction and deep learning. In: 2022 IEEE international conference on big data (big data). IEEE, pp 1260–1265
Allen JF (1983) Maintaining knowledge about temporal intervals. ACM, New York
Moskovitch R, Shahar Y (2015) Fast time intervals mining using the transitivity of temporal relations. Knowl Inf Syst 42(1):21–48
Kujala R, Weckström C, Darst RK, Mladenović MN, Saramäki J (2018) A collection of public transport network data sets for 25 cities. Sci Data 5(1):1–14
Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) Mimic-iii, a freely accessible critical care database. Sci Data 3:160035
Mirsky Y, Shabtai A, Rokach L, Shapira B, Elovici Y (2016) Sherlock vs moriarty: a smartphone dataset for cybersecurity research. In: Proceedings of the 2016 ACM workshop on artificial intelligence and security, pp 1–12
Höppner F (2002) Time series abstraction methods-a survey. In: GI Jahrestagung, pp 777–786
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM, pp 2–11
Bonomi L, Jiang X (2018) Pattern similarity in time interval sequences. In: 2018 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 434–435
Ho N, Pedersen TB, Vu M, et al (2021) Efficient and distributed temporal pattern mining. In: 2021 IEEE international conference on big data (big data). IEEE, pp 335–343
Lee Z, Lindgren T, Papapetrou P (2020) Z-miner: an efficient method for mining frequent arrangements of event intervals. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 524–534
Zheng W, Hu J (2022) Multivariate time series prediction based on temporal change information learning method. IEEE Trans Neural Netw Learn Syst
Zheng W, Zhao P, Chen G, Zhou H, Tian Y (2022) A hybrid spiking neurons embedded lstm network for multivariate time series learning under concept-drift environment. IEEE Trans Knowl Data Eng
Ramírez-Gallego S, García S, Mouriño-Talín H, Martínez-Rego D, Bolón-Canedo V, Alonso-Betanzos A, Benítez JM, Herrera F (2016) Data discretization: taxonomy and big data challenge. Wiley Interdiscip Rev Data Min Knowl Discov 6(1):5–21
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: VLDB. Citeseer, vol 385, pp 99
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Freksa C (1992) Temporal reasoning based on semi-intervals. Artif Intell 54(1–2):199–227
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Moskovitch R, Choi H, Hripcsak G, Tatonetti NP (2016) Prognosis of clinical outcomes with temporal patterns and experiences with one class feature selection. IEEE/ACM Trans Comput Biol Bioinf 14(3):555–563
Dvir O, Wolfson P, Lovat L, Moskovitch R (2020) Falls prediction in care homes using mobile app data collection. In: International conference on artificial intelligence in medicine. Springer, pp. 403–413
Moskovitch R, Walsh C, Wang F, Hripcsak G, Tatonetti N (2015) Outcomes prediction via time intervals related patterns. In: 2015 IEEE international conference on data mining. IEEE, pp 919–924
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J et al (2020) Scipy 1.0: fundamental algorithms for scientific computing in python. Nat Methods 17(3):261–272
Freedman D, Diaconis P (1981) On the histogram as a density estimator: L 2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57(4):453–476
Kalbfleisch JD, Prentice RL (2011) The statistical analysis of failure time data, vol 360. Wiley, New York
Verduijn M, Sacchi L, Peek N, Bellazzi R, de Jonge E, de Mol BA (2007) Temporal abstraction for feature extraction: a comparative case study in prediction from intensive care monitoring data. Artif Intell Med 41(1):1–12
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Icml
Acknowledgements
Nevo Itzhak was funded by the Kreitman School of Advanced Graduate Studies and the Israeli Ministry of Science and Technology Jabotinsky scholarship grant #3-16643.
Author information
Authors and Affiliations
Contributions
N.I., S.J., and R.M. designed research. N.I. and R.M. performed research. N.I. analyzed data. N.I., S.J., and R.M. wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Efficient TIRP’s candidates generator
We introduce an efficient TIRP’s candidates generator (Algorithm 3), which exploits the transitivity property to generate all possible TIRPs that can evolve from a given TIRP-prefix with multiple unfinished STIs. The algorithm works by iterating over all possible temporal relations between each pair of adjacent unfinished STIs, according to the lexicographical order, and inferring temporal relations between the remaining unfinished STI pairs using Allen’s transition Table [26]. The relevant temporal relations that can eventually evolve given unfinished STIs \(A^*\), \(B^*\) and \(C^*\) are presented in Table 1. Given the temporal relation that eventually evolved between STIs \(A\) and \(B\), r( \(A\),\(B\)), and the temporal relation between STIs \(B\) and \(C\), r( \(B\),\(C\)), can be used to infer the optional temporal relations between STIs \(A\) and \(C\), r( \(A\),\(C\)).
Algorithm 3 takes a TIRP-prefix as inputs and stores the set of possible TIRPs that can evolve from it in the variable fnlTIRPCnddts. The given TIRP-prefix might include both finished STIs (\(\hat{IS}_f\)) or unfinished STIs (\(\hat{IS}_*\)). Only the temporal relations among pairs of unfinished STIs are unknown and need to be determined. First, fnlTIRPCnddts is initialized to the empty set (line 1), and a variable unfSTIsLen is set to the number of unfinished STIs in the TIRP-prefix (line 2). Then, the function initGenAdjacentUnfSTIs enumerates all possible temporal relations between all pairs of adjacent (according to the lexicographical order) unfinished STIs of the TIRP-prefix. Thus, for \(\check{k}\) unfinished STIs, the function outputs \(3^{\check{k}-1}\) possible candidates, stored in the variable initTIRPCanddts (line 3). In lines 4–6, the algorithm iterates over initTIRPCanddts. Each candidate c is expanded using the function expandTIRPCand (Algorithm 4), which returns all TIRP candidates derived from c using Allen’s transition table (line 5). They are added to fnlTIRPCnddts, which is finally returned by the algorithm.
![figure f](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Figf_HTML.png)
Efficient TIRP’s Candidates Generator
A TIRP-prefix including unfinished STIs \(A^*\), \(B^*\), \(C^*\), and \(D^*\), where \(A^*{\texttt {+}}< B^*{\texttt {+}}< C^*{\texttt {+}} < D^*{\texttt {+}}\) and all possible TIRPs that it can evolve into, presented in two ways: (i) TIRP-prefix schematic and (ii) TIRP-prefix half matrix representation
The function expandTIRPCand takes a parameter adjJump which specifies the “adjacency jump,” i.e., the distance between the analyzed adjacent unfinished STIs. For example, in Fig. 17, given the temporal relations between the adjacent unfinished STIs: \(A\) overlaps \(B\), \(B\) overlaps \(C\), and \(C\) overlaps D, and while considering the second-adjacent unfinished STIs (adjJump=2), r(A, C) is must be overlaps based on r(A, B) and r(B, C). Similarly, r(B, D) must be overlaps based on the r(B, C) and r(C, D). Then, the recursive procedure proceeds by inferring all the temporal relations among c’s adjJump+1 adjacent unfinished STIs. For example, while considering the third-adjacent unfinished STIs in Fig. 17, r(A, D) can be inferred from r(A, B) and r(B, D).
Algorithm 4 presents the recursive expandTIRPCand function. The function takes as arguments the current TIRP candidate to expand (c), the number of TIRP-prefix’s unfinished STIs (unfSTIsLen), the adjacency jump between unfinished STIs (adjJump), and an index i of the earliest unfinished STI to consider. The function returns a set of c’s expanded TIRPs candidates. In lines 2–3, a function getRel is called, which returns the temporal relation between two given unfinished STIs. Note that the temporal relation between the i-th and \(i+1\)-th unfinished STIs (stored in fstRel) is known since it was determined during the enumeration step of Algorithm 3. Also, the temporal relation between the \(i+1\)-th and \(i+\)adjJump-th unfinished STIs (stored in scdRel) is known since it has been determined earlier in the recursive procedure. Based on fstRel and scdRel, the temporal relations among the i-th and \(i+\)adjJump-th unfinished STIs are inferred using Allen’s transition table and stored in inferredRels (line 4).
![figure g](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs10115-023-01910-w/MediaObjects/10115_2023_1910_Figg_HTML.png)
Expand TIRP’s Candidates (expandTIRPCand function)
Then, the algorithm iterates over the inferredRels (lines 5–12) with the current relation stored in the variable rel, and in each iteration executes the following steps: (a) updates c’s temporal relation between the i-th and i+adjJump-th unfinished STIs, which is stored in extC (line 6); (b) if there are more relations between adjJump-adjacent STIs to infer, calls itself recursively with extC and i+1 adding the result to fnlCnds (lines 7–8); (c) otherwise, if there are more relations between adjJump+1-adjacent STIs to infer, calls itself recursively with extC and adjJump+1 adding the result to fnlCnds (lines 9–10); (d) otherwise, adds extC to fnlCnds (lines 11–12). Lastly, fnlCands is returned.
Given a TIRP-prefix with STI series IS of size k, in which there are \(\check{k}\) unfinished STIs, the overall time complexity of creating the TIRP’s candidates, using Algorithms 3 and 4, is bounded in the worst case by the following expression: \(O(3^{(\check{k}^2-\check{k})/2})\). The base of the exponent in the upper bound (i.e., three) represents the number of possible temporal relations between each pair of unfinished STIs. The formula \((\check{k}^2-\check{k})/2\) follows from the binomial coefficient \(\left( {\begin{array}{c}\check{k}\\ 2\end{array}}\right) =\check{k}(\check{k}-1)/2=(\check{k}^2-\check{k})/2\), where two stands for pairs of temporal relations and \(\check{k}\) is the number of unfinished STIs. However, this worst-case, upper-bound complexity expression is far from the practical situation for the algorithms. The reduction is achieved by using the transitivity of temporal relations. The overall number of generated candidates is smaller than the naive generation, which does not exploit the transitivity property.
Appendix B Cutoffs used for the knowledge-based abstraction
The knowledge-based abstraction was performed using cutoffs defined by a domain expert, in which the states for each dataset were as follows:
-
CSP dataset:
-
MAP [mmHg]: \(\le \)60, (60,90], >90
-
CVP [mmHg]: \(\le \)5, (5,17], >17
-
FiO2 [%]: \(\le \)41, (41,60], >60
-
HR [bpm]: \(\le \)60, (60,110], >110
-
PEEP [cmH2O]: \(\le \)10, >10
-
TMP [\(^\circ \)C]: \(\le \)35, (35,38.5], >38.5
-
BE [mEq/L]: \(\le \)-6, (-6,-3], >-3)
-
CI: [\(L/min/m^2\)] \(\le \)2.5, >2.5
-
Glucose [mmol/L]: \(\le \)2.5, (2.5,10] >10
-
CKMB [%]: \(\le \)25, (25,50], >50
-
-
AHE dataset:
-
HR [bpm]: \(\le \)50, (50, 100], >100
-
RESP [breath/min]: \(\le \)7, (7,20], >20
-
SpO2 [%]: \(\le \)88, (88,100], >100
-
SABP [mmHg]: \(\le \)90, (90,140], >140
-
-
DBT dataset:
-
Blood Glucose [mg/dL]: \(\le \)100, (100,125], (126,200], >200
-
HbA1C [%]: \(\le \)7, (7,9], (9,10.5], >10.5
-
LDL Cholesterol [mg/dL]: \(\le \)100, (100,130], (130,160], >160
-
Creatinine [mg/dL]: \(\le \)1, (1,1.5], (1.5,2.5], (2.5,4], >4
-
Albumin [g/dL]: \(\le \)3.5, >3.5
-
The cutoffs for the EFIF dataset were not applicable.
Appendix C imbalance ratio
The TIRPs’ instances imbalance ratio was defined as the number of instances of a complete TIRP divided by the total number of instances. Figure 18 presents the number of TIRPs’ instances imbalance ratio and the performance prediction while using the early warning strategies for each TIRPs’ instances imbalance ratio and values of \(\tau \). As can be shown in Fig. 18, the imbalance ratio was lower than 0.3 for most of the TIRPs. Overall, there was a positive correlation between the AUPRC performances and the imbalance ratio for all datasets. In contrast, there was no significant correlation between the imbalance ratio and the AUROC.
Appendix D Parameters of baseline models
The parameters of each model are selected after testing the performance of each combination (not in a greedy comparison approach), and here we describe the parameters that performed best.
Random forest (RF) [51] classifier utilizes an ensemble learning approach, a technique that combines the decisions from multiple models. We used 100 trees in the forest for the RF model and a maximum depth of 3 for a tree, in which the combinations of 30, 50, 100, and 200 trees in the forest and depths of 2, 3, 5, or 10 were tested. Bootstrap was used when building trees and out-of-bag samples to estimate the generalization accuracy. RF was implemented with Python 3.6 Scikit-Learn (https://scikit-learn.org) version 0.22.1.
The artificial neural network (ANN) [52] is comprised of neuron layers, which contain an input layer, hidden layers, and an output layer. In this paper, the fully connected ANN architecture was used, in which all neurons in one layer are connected to all neurons in the next layer. We used ANN with two hidden layers with 50 neurons for each hidden layer and with the activation function ReLU [56]. We trained all models using a maximum epoch of 20, a batch size of 16, and a learning rate of 0.001 with gradually decreasing. A validation set was used to measure the generalization error by randomly taking 20 percent of all training data. We used early stopping on the validation set, in which lower than a change of 0.001 was not considered an improvement for the loss. For ANN, combinations of one, two, three, or five hidden layers with 20, 50, or 100 neurons and batch sizes of 16, 32, or 64 were tested. ANN was implemented with Python 3.6 Scikit-Learn (https://scikit-learn.org) version 0.22.1.
The sequential deep learning recurrent neural network (RNN) [53] is designed to learn the data dependencies in a sequence. We used RNN with five hidden units per layer and a recurrent dropout of 0.2 and an activation function, ReLU [56]. We trained all models using a maximum epoch of 20, a batch size of 16, and a learning rate of 0.001 with a gradually decreasing. A validation set was used to measure the generalization error by randomly taking 20 percent of all training data. We used early stopping on the validation set, in which lower than a change of 0.001 for more than two epochs was not considered an improvement for the loss. For RNN, combinations of one, two, three, or five hidden units per layer, dropout of 0.2, 0.3, or 0.5, with batch sizes of 16, 32, or 64 were tested. RNN was implemented with Keras (https://keras.io) version 2.2.5.
For parameters we did not specify, we used the default.
Appendix E Preliminary analysis
1.1 E.1 TIRP’s completion prediction relative to the end
This analysis’s goal was to evaluate the prediction of a TIRP’s completion at the instances’ different portions over time using the CPMs (Sect. 5.2.1). The TIRP-prefixes’ instances were evaluated at each time stamp until the end of the entity’s data (i.e., patient), in which the decisions for the TIRP’s completion were determined at different instances’ revealed time portions. Then, the metrics (Sect. 5.3) were computed for all instances for each revealed time portion. The setup for this analysis was the same as described in Sect. 5.2.1. This preliminary analysis was excluded from the Evaluation and Results (Sects. 5–6) since it examines the TIRP’s completion prediction in retrospect, in which the instances’ different portions of decision points over time were selected based on the end of the entity’s data.
The CSP, AHE, and DBT datasets were abstracted with eleven combinations of temporal abstraction methods (KB, GRAD, EWD, EFD, and SAX) and the number of symbols (2, 3, 4, and KB), in which the TIRPs were discovered for each combination. The EFIF dataset was not abstracted using KB and resulted in ten combinations of temporal abstraction methods (GRAD, EWD, EFD, and SAX) and the number of symbols (2, 3, and 4). All five continuous prediction models (SCPM, FCPM, RF, ANN, and RNN) were evaluated on the TIRP-prefixes’ detected instances.
1.2 E.2 Analysis results
The results are based on 2,392 10-fold cross-validation runs on 1,094 TIRPs for the CSP dataset, 158 TIRPs for the AHE dataset, 766 TIRPs for the DBT dataset, and 374 TIRPs for the EFIF dataset.
We first evaluated the overall performance of the five continuous prediction models: SCPM, FCPM, RF, ANN, and RNN on predicting a TIRP’s completion at the instances’ revealed portions of time. Figure 19 presents the mean AUROC, and AUPRC results over the instances’ revealed portions of time, in which each point on the graph represents the mean performance results for the different TIRPs.
As expected, it can be seen from Fig. 19 that as long as time goes by for each instance, the continuous prediction models provided more accurate predictions, resulting in better AUROC and AUPRC performance over time. In all datasets, the FCPM performed best and the SCPM worst in terms of AUROC. However, in terms of AUPRC, the SCPM performed worst at making relatively early decisions regarding the instances of TIRP’s completion and drastically improved as long as time goes. As a result, the SCPM performed with better AUPRC than the other baselines in all datasets at making a relatively late decision regarding the instances TIRP’s completion. Also, for the DBT and EFIF datasets, even though the FCPM performed with better AUPRC than other baseline models at making relatively early decisions, SCPM performed better at making relatively late decisions. Overall, the FCPM performed better than the baseline models in terms of AUROC and AUPRC. The AUPRC performance of the SCPM was poor at making relatively early decisions but was improved drastically as long as time went by. As a result, the AUPRC performances of FCPM and SCPM were close at making relatively late decisions regarding a TIRP’s completion.
The FCPM performed better than the baseline models in terms of AUROC and AUPRC. The AUPRC performance of the SCPM was poor at making relatively early decisions but was improved drastically as long as time went by. As a result, the AUPRC performances of FCPM and SCPM were close at making relatively late decisions regarding a TIRP’s completion
The following results in this analysis are only involved in the FCPM results as we demonstrated its superiority.
Next, we evaluated the TIRP’s completion at the instances’ revealed portions of time by the FCPM for each temporal abstraction method (KB, GRAD, EWD, EFD, and SAX). Figure 20 presents the number of TIRPs per temporal abstraction method and the mean AUROC and AUPRC results over the instances’ revealed portions of time. Each point on the chart represents the mean performance results of the FCPM in providing the completions predictions for the different TIRPs.
Figure 20 shows that the CSP, DBT, and EFIF datasets abstracted with EWD resulted in more TIRPs that end with the target event than other temporal abstraction methods. However, the AHE dataset that was abstracted with EFD resulted in more TIRPs that end with the target event. As can be seen from all datasets, the CPMs provided less accurate predictions for TIRPs discovered using GRAD. The prediction performances of EFD and SAX were quite similar, but SAX performed slightly better. Looking at the overall mean results, FCPM provided more accurate predictions for TIRPs discovered using KB and EWD. However, EWD performed much better than KB for the DBT dataset.
We also evaluated the TIRP’s completion at the instances’ revealed portions of time provided by the FCPM for each number of symbols (two, three, four, and a varied number of symbols for KB). Figure 21 presents the number of TIRPs per number of symbols and the mean AUROC and AUPRC results over the instances’ revealed portions of time. Each point on the graph represents the mean performance of the FCPM at a instances’ revealed portion of time, averaging the results of the different TIRPs.
Figure 21 shows that CPMs provided more accurate predictions over time, in terms of AUROC, for TIRPs discovered using two symbols per variable or KB. In terms of AUPRC, two symbols and KB performed better for the CSP dataset. For the AHE, DBT, and EFIF datasets, four symbols per variable performed better than other numbers of symbols at making relatively early decisions, but as long as time went by, two symbols and KB closed the gap and performed better at making relatively late decisions. The poor performance of the three symbols per variable is related to the poor performance of the GRAD abstraction, which was tested only with three symbols. Looking at the overall mean results, CPMs provided more accurate predictions over time for TIRPs discovered using two symbols per variable or KB than three and four symbols per variable.
Lastly, we evaluated the performance of a different number of symbols (two, three, four, and a varied number of symbols for KB) only on TIRPs discovered using EWD and KB. Figure 22 presents the number of TIRPs per temporal abstraction and number of symbols and the mean AUROC and AUPRC result over the instances’ revealed portions of time. Each point on the graph represents the mean performance of the FCPM, at a instances revealed portion of time, in providing the completions probabilities for the different TIRPs.
Figure 22 shows the CSP and DBT datasets that were abstracted with EWD with two symbols per variable resulting in more TIRPs that end with the target event than other temporal abstraction methods. In contrast, for the AHE dataset, there was much difference between the EWD or KB and the number of TIRPs. For the EFIF dataset, the different number of symbols per variable resulted in a similar number of TIRPs. EWD with two and four symbols per variable resulted in more accurate predictions over time. In addition, it can be shown that FCPM provided more accurate predictions over time, in terms of AUROC, for TIRPs discovered using KB for the CSP and AHE datasets. In contrast, for the DBT dataset, FCPM provided less accurate predictions with KB. In addition, in terms of AUPRC, FCPM performed poorly with KB for the AHE and DBT datasets. For the DBT dataset, EWD performed better than KB with all numbers of symbols. Overall, FCPM performed slightly better over time for TIRPs discovered using EWD with two symbols per variable. In contrast, FCPM performed poorly over time for TIRPs discovered using EWD with four symbols per variable.
In summary, the FCPM performed better than the baseline models in terms of AUROC and AUPRC. The AUPRC performance of the SCPM was poor at making relatively early decisions but improved as time went by. Moreover, FCPM provided less accurate predictions for TIRPs discovered using GRAD and more accurate predictions for TIRPs discovered using KB and EWD. While comparing the predictions over time provided by FCPM, TIRPs discovered that using two symbols per variable or KB led to better results. More specifically, FCPM performed slightly better over time for TIRPs discovered using EWD with two symbols per variable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Itzhak, N., Jaroszewicz, S. & Moskovitch, R. Continuous prediction of a time intervals-related pattern’s completion. Knowl Inf Syst 65, 4797–4846 (2023). https://doi.org/10.1007/s10115-023-01910-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01910-w