Multi-period classification: learning sequent classes from temporal domains

Henriques, Rui; Madeira, Sara C.; Antunes, Cláudia

doi:10.1007/s10618-014-0376-8

Multi-period classification: learning sequent classes from temporal domains

Published: 27 August 2014

Volume 29, pages 792–819, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Rui Henriques¹,
Sara C. Madeira¹ &
Cláudia Antunes¹

515 Accesses
1 Citation
Explore all metrics

Abstract

As the majority of real-world decisions change over time, extending traditional classifiers to deal with the problem of classifying an attribute of interest across different time periods becomes increasingly important. Tackling this problem, referred to as multi-period classification, is critical to answer real-world tasks, such as the prediction of upcoming healthcare needs or administrative planning tasks. In this context, although existing research provides principles for learning single labels from complex data domains, less attention has been given to the problem of learning sequences of classes (symbolic time series). This work motivates the need for multi-period classifiers, and proposes a method, cluster-based multi-period classification (CMPC), that preserves local dependencies across the periods under classification. Evaluation against real-world datasets provides evidence of the relevance of multi-period classifiers, and shows the superior performance of the CMPC method against peer methods adapted from long-term prediction for multi-period tasks with a high number of periods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification-driven temporal discretization of multivariate time series

Article 02 October 2014

Robert Moskovitch & Yuval Shahar

Classification of multivariate time series via temporal abstraction and time intervals mining

Article 01 October 2014

Robert Moskovitch & Yuval Shahar

RTL: A Robust Time Series Labeling Algorithm

Notes

Available in http://web.tecnico.ulisboa.pt/rmch/software/evoc/
In general, this classifier slightly outperforms the performance of kNN lazy learners (Aha et al. 1991) and C4.5 decision trees (Quinlan 1993) for the used data settings. We hypothesize that this is due to the fact that the learned dependencies among subsets of informative events can model relevant temporal or cross-attribute dependencies.
http://doc.gold.ac.uk/~mas02mg/software/hmmweka/
http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data
http://archive.ics.uci.edu/ml/datasets/Diabetes
http://www.heritagehealthprize.com/c/hhp/data (under a granted permission)
Complete list of results available in http://web.tecnico.ulisboa.pt/rmch/software/evoc/

References

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Google Scholar
Azuaje F (2011) Integrative data analysis for biomarker discovery. In: Bioinformatics and biomarker discovery: omic data analysis for personalized medicine, pp 137–154
Bache K, Lichman M (2013) UCI machine learning repository
Baldi P, Chauvin Y, Hunkapliier Y, McClure M (1994) Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci USA 91(3):1059–1063
Article Google Scholar
Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In SDM’11. SIAM / Omnipress, Mesa, pp 699–710
Baxter RA, Williams GJ, He H (2001) Feature selection for temporal health records. In PAKDD, London, UK. Springer-Verlag, London, pp 198–209
Ben Taieb S, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Syst Appl 39(8):7067–7083
Article Google Scholar
Ben Taieb S, Sorjamaa A, Bontempi G (2010) Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 73:1950–1957
Article Google Scholar
Bengio S, Fessant F, Collobert D (1996) Use of modular architectures for time series prediction. Neural Process Lett 3:101–106
Article Google Scholar
Bishop C (2006) Pattern recognition and machine learning., Information science and statisticsSpringer, New York
MATH Google Scholar
Bontempi G, Ben Taieb S (2011) Conditionally dependent strategies for multiple-step-ahead prediction in local learning. Int J Forecast 27(2004):689–699
Article Google Scholar
Bontempi G, Birattari M, and Bersini H (1998) Lazy learning for iterated time-series prediction. In Suykens JAK, Vandewalle J (eds) IW on advanced black-box tech for nonlinear modeling, Leuven, Belgium. Katholieke University, Leuven, pp 62–68
Bradley PS, Reina CA, Fayyad UM (2000) Clustering very large databases using EM mixture models. In: Pattern recognition, international conference on 2:2076+
Brahim-Belhouari S, Bermak A (2004) Gaussian process for nonstationary time series prediction. Comput Stat Data Anal 47(4):705–712
Article MATH MathSciNet Google Scholar
Cadez I, Heckerman D, Meek C, Smyth P, White S (2000) Visualization of navigation patterns on a web site using model-based clustering. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, New York, NY, USA. ACM, New York, pp 280–284
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Article Google Scholar
Geurts P (2001) Pattern extraction for time series classification. In: Principles of data mining and knowledge discovery. LNCS, vol 2168. Springer, Heidelberg, pp 115–127
Graves A (2012) Supervised sequence labelling with recurrent neural networks., Studies in computational intelligenceSpringer, New York
Book MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. JSTOR Appl Stat 28(1):100–108
Article MATH Google Scholar
Henriques R, Antunes C (2012) On the need of new approaches for the novel problem of long-term prediction over multi-dimensional data. In: Lee R (ed) Computer and information science 2012, vol 429., Studies in computational intelligenceSpringer, Berlin, pp 121–138
Chapter Google Scholar
Henriques R, Antunes C (2014) Learning predictive models from integrated healthcare data: capturing temporal and cross-attribute dependencies. In: HICSS, IEEE
Henriques R, Pina S, Antunes C (2013) Temporal mining of integrated healthcare data: methods, revealings and implications. In: SDM IW on data mining for medicine and healthcare. SIAM, pp 52–60
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Ji Y, Hao J, Reyhani N, Lendasse A (2005) Direct and recursive prediction of time series using mutual information selection. In: IWANN. LNCS, vol 3512. Springer, Heidelberg, pp 1010–1017
Kirshner S (2005) Modeling of multivariate time series using hidden Markov models. PhD thesis, AAI3164062
Kriegel H-P, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdisc Rew 1(3):231–240
Google Scholar
Letham B, Rudin C, Madigan D (2013) Sequential event prediction. Mach Learn 93(2–3):357–380
Article MATH MathSciNet Google Scholar
Lockett AJ, Miikkulainen R (2009) Temporal convolution machines for sequence learning. Technical report AI-09-04, University of Texas at Austin
Mantaci S, Restivo A, Sciortino M (2008) Distance measures for biological sequences: some recent approaches. Int J Approx Reason 47(1):109–124
Article MATH MathSciNet Google Scholar
Moen P (2000) Attribute, event sequence and event type similarity notions for data mining. University of Helsinki
Mörchen F (2003) Time series feature extraction for data mining using DWT and DFT. Reihe Informatik Univ
Mörchen F (2006) Time series knowledge mining. Wissenschaft in Dissertationen. Görich & Weiershäuser
Murphy K (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, UC Berkeley, Computer Science Division
Nguyen H-L, Ng W-K, Woon Y-K (2013) Closed motifs for streaming time series classification. KAIS, pp 1–25
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CA
Google Scholar
Povinelli RJ, Johnson MT, Lindgren AC, Ye J (2004) Time series classification using gaussian mixture models of reconstructed phase spaces. IEEE Trans Knowl Data Eng 16(6):779–789
Article Google Scholar
Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA
Rahman S, Bakar A, Hussein Z (2008) A review on protein sequence clustering research. ICBE, vol 21., IFMBE ProceedingsSpringer, Berlin-Heidelberg, pp 275–278
Google Scholar
Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14(4):750–767
Sorjamaa A, Hao J, Reyhani N, Ji Y, Lendasse A (2007) Methodology for long-term prediction of time series. Neurocomputing 70:2861–2869
Article Google Scholar
Sorjamaa A, Lendasse A (2006) Time series prediction using dirrec strategy. In: ESANN’06, pp 143–148
Taieb SB, Bontempi G, Sorjamaa A, Lendasse A (2009) Long-term prediction of time series by combining direct and mimo strategies. In IJCNN, Piscataway, NJ, USA. IEEE Press, pp 1559–1566
Toft P, Rostrup E, Nielsen FA, Nielsen FA, Hansen LK, Goutte C, Goutte C (1998) On clustering fMRI time series. Neuroimage 9:298–310
Google Scholar
Tseng V, Lee C-H (2009a) Effective temporal data classification by integrating sequential pattern mining and probabilistic induction. Expert Syst Appl 36(5):9524–9532
Article Google Scholar
Tseng VS, Lee C-H (2009b) Effective temporal data classification by integrating sequential pattern mining and probabilistic induction. Expert Syst Appl 36(5):9524–9532
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi label classification: an overview. Int J Data Wareh Min 3(3):1–13
Article Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article Google Scholar
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In ICML. ACM, New York, pp 1033–1040
Zhang M-L, Zhou Z-H (2005) A k-nearest neighbor based algorithm for multi-label classification. IEEE International Conference on Granular Computing, vol 2, pp 718–721

Download references

Acknowledgments

The authors deeply thank the reviewers of this manuscript for the detailed, attentive and insightful feedback. This work was supported by Fundação para a Ciência e Tecnologia under the multi-annual funding of INESC-ID PEst-OE/EEI/LA0021/2013 and the Ph.D. Grant SFRH/BD/75924/2011.

Author information

Authors and Affiliations

KDBIO group, INESC-ID, Instituto Superior Técnico, Universidade de Lisboa Computer Science and Engineering Department, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1, 1049-001, Lisboa, Portugal
Rui Henriques, Sara C. Madeira & Cláudia Antunes

Authors

Rui Henriques
View author publications
You can also search for this author in PubMed Google Scholar
Sara C. Madeira
View author publications
You can also search for this author in PubMed Google Scholar
Cláudia Antunes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Henriques.

Additional information

Responsible editor: Dr. Eamonn Keogh.

Appendix: Complementary metrics

Multi-period classifiers can be evaluated when the attribute under classification is either nominal or ordinal. In the paper, we targeted nominal attributes on the learning codomain and adopted simple loss functions (based on matching operators) to evaluate the performance of the proposed methods. However, three additional views are included in this appendix. First, loss functions to deal with ordinal labels. Second, meaningful evaluation metrics based on compact confusion matrices when a high number of labels is available. Third, distance metrics that can account for misalignements, such as temporal shifts.

1.1 Multi-period classification with ordinal labels

Multi-period accuracy $Acc_j$ can be derived from loss functions applied along the horizon of prediction. Representative loss functions include the simple, average normalized or relative root mean squared error. To draw comparisons with literature results, we suggest the use of Normalized Root Mean Squared Error, NRMSE (5) and of Symmetric Mean Absolute Percentage of Error, SMAPE (6) (Ben Taieb et al. 2010).

$$\begin{aligned} {\hbox {Acc}}_j(\varvec{y}_j,\hat{\varvec{y}}_j)= 1-{\hbox {NRMSE}}(\varvec{y}_j,\hat{\varvec{y}}_j)=1-\frac{\sqrt{\frac{1}{h}\Sigma _{i=1}^{h}(y_j^i-\hat{y}_j^i)^2}}{y_{\max }-y_{\min }}\in [0,1]\end{aligned}$$

(5)

$$\begin{aligned} {\hbox {Acc}}_j(\varvec{y}^j,\hat{\varvec{y}}^j)=1-{\hbox {SMAPE}}(\varvec{y}_j,\hat{\varvec{y}}_j)=1-\frac{1}{h}\Sigma _{i=1}^h\frac{\mid y_j^{i}-\hat{y}_j^{i}\mid }{(y_j^{i}+\hat{y}_j^{i})/2}\in [0,1] \end{aligned}$$

(6)

1.2 Evaluation using compact confusion matrices

In order to account for further critical performance views, a classic confusion matrix can be computed for each period. This solution, illustrated in Fig. 7, has the undesirable property of not offering compact views to study performance. For instance, multiple metrics need to be computed for each label and period in order to obtain a global view of the multi-period classifier sensitivity. A simple option, similarly to (3) and (4), would be to average the values for an instance across the $h$ periods. However, for the ordinal setting, instead of simply computing the matchings, a normalized distance needs to be applied between each pair of observed and estimated labels.

However, with this option we loose the ability to understand which periods are affecting the score. A second option is to collapse the labels’ axis by defining a predicate. For this goal, we can rely on a mapping function $T$ to map a set of observed $h$ labels as a single label. An illustrative function is one that decides whether an instance is of interest (positive) or not based on the observed values. For example, relevant patients can be defined as having at least one hospitalization across the horizon of prediction. Still, this option requires the computation of each metric for the $h$ periods. Thus, we propose the use of this option with a simple test (based on a fixed $\beta $-threshold) to evaluate the adequacy of the $h$ predictions for a particular instance, ${\hbox {Acc}}(y,\hat{y})\ge \beta $ ((7) and (8)). Understandably, this option comes at a cost of defining a new labeling function $T$ and of working with $\beta $-threshold levels. Table 6 presents the revised confusion matrix for multi-period classification when two classes are considered. Resulting sensitivity (7) and specificity (8) metrics for this setting are computed as follows:

$$\begin{aligned} {\hbox {Sensitivity}}_c&= \frac{\Sigma _{j=1}^{m}(c=T(\varvec{y}_j))\wedge Acc(\varvec{y}_j,\hat{\varvec{y}}_j)\ge \beta }{\Sigma _{j=1}^{m}c=T(\varvec{y}_j)}, \end{aligned}$$

(7)

$$\begin{aligned} {\hbox {Specificity}}_c&= \frac{\Sigma _{j=1}^{m} (c\ne T(\varvec{y}_j))\wedge Acc(\varvec{y}_j,\hat{\varvec{y}}_j)\ge \beta }{\Sigma _{j=1}^{m} c\ne T(\varvec{y}_j)}. \end{aligned}$$

(8)

Table 6 Multi-period confusion matrix

Full size table

1.3 Complementary evaluation metrics

Understandably, the distance functions used to evaluate the performance of multi-period classifiers are conservative for the cases where mismatches are caused by temporal shifts. To avoid a significant penalization of the performance of multi-period classifiers when misalignments occur on the time or cardinality axes, their evaluation can rely on more expressive time series’ similarity functions.

Ding et al. (2008) and Batista et al. (2011) compare the properties of alternative similarity functions when the attribute under classification is ordinal or numeric. Dynamic Time Warping (DTW) treats misalignments, which becomes critical when dealing with long horizons of prediction. Longest Common Subsequence deals with gap constraints. Pattern-based functions consider shifting and scaling in both the temporal and the amplitude axes.

When the output attribute is nominal, similarity functions proposed to compare biomolecular sequences based significant functional or structural similarity can be applied (Mantaci et al. 2008). These functions are also able to identify temporal shifts as they rely on sequence alignment operators. Moreover, they are able to deal with shifts on the amplitude axis by detecting character level differences.

On one hand, these similarity functions have the advantage of smoothing error accumulation by allowing temporal misalignments. On the other hand, their use can mask the structural accuracy of multi-period classifiers and lead to more optimistic results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Henriques, R., Madeira, S.C. & Antunes, C. Multi-period classification: learning sequent classes from temporal domains. Data Min Knowl Disc 29, 792–819 (2015). https://doi.org/10.1007/s10618-014-0376-8

Download citation

Received: 25 June 2013
Accepted: 07 August 2014
Published: 27 August 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s10618-014-0376-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-period classification: learning sequent classes from temporal domains

Abstract

Access this article

Similar content being viewed by others

Classification-driven temporal discretization of multivariate time series

Classification of multivariate time series via temporal abstraction and time intervals mining

RTL: A Robust Time Series Labeling Algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Complementary metrics

1.1 Multi-period classification with ordinal labels

1.2 Evaluation using compact confusion matrices

1.3 Complementary evaluation metrics

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-period classification: learning sequent classes from temporal domains

Abstract

Access this article

Similar content being viewed by others

Classification-driven temporal discretization of multivariate time series

Classification of multivariate time series via temporal abstraction and time intervals mining

RTL: A Robust Time Series Labeling Algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Complementary metrics

Appendix: Complementary metrics

1.1 Multi-period classification with ordinal labels

1.2 Evaluation using compact confusion matrices

1.3 Complementary evaluation metrics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation