Abstract
In recent years, early classification on time series has become increasingly important in time-sensitive applications. Existing shapelet based methods still cannot work well on this problem. First, the effectiveness of traditional shapelet based methods would be influenced by the number of shapelet candidates. Second, it is difficult for previous methods to obtain diverse shapelets in shapelet selection. In this paper, we propose an Improved Early Distinctive Shapelet Classification method named IEDSC. We first present a new method to more precisely measure the similarity between time series, which takes into account of the relative trend of time series. Second, in shapelet extraction, we propose a pruning technique to reduce the number of shapelets by predicting the starting positions of shapelets with good quality. In addition, a new shapelet selection method is also proposed to remove the similar shapelets, so as to maintain the diversity of shapelets. Finally, the experimental results on 16 benchmark datasets show that the proposed method outperforms state-of-the-art for early classification on time series.
Similar content being viewed by others
References
Ando, S., Suzuki, E.: Minimizing response time in time series classification. Knowl. Inf. Syst. 46(2), 449–476 (2016)
Bentley, J. L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: 8th Acm-Siam symposium on discrete algorithms, pp. 360–369 (1997)
Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proc.acm Sigkdd int.conf.on knowledge discovery & data mining, pp 493–498 (2003)
Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Yanping, H.B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The ucr time series classification archive. https://www.cs.ucr.edu/eamonn/time_series_data_2018/ (2018)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)
Di Marzio, M., Taylor, C. C.: Kernel density classification and boosting: an l2 analysis. Stat. Comput. 15(2), 113–123 (2005)
Fulcher, B.D.: Feature-based time-series analysis. arXiv:1709.08055 (2017)
Ghalwash, M. F., Obradovic, Z.: Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinforma. 13(1), 195 (2012). https://doi.org/10.1186/1471-2105-13-195
Ghalwash, M. F., Radosavljevic, V., Obradovic, Z.: Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp. 402–411 (2014)
Ghalwash, M. F., Ramljak, D., Obradovic, Z.: Early classification of multivariate time series using a hybrid hmm/svm model. In: Proceedings of the 2012 IEEE international conference on bioinformatics and biomedicine (BIBM), pp. 1–6 (2012)
Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 392–401 (2014)
Hartvigsen, T., Sen, C., Kong, X., Rundensteiner, E.: Adaptive-halting policy network for early classification. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 101–110 (2019)
He, G., Duan, Y., Peng, R., Jing, X., Qian, T., Wang, L.: Early classification on multivariate time series. Neurocomputing 149, 777–787 (2015)
He, G., Zhao, W., Xia, X., Peng, R., Wu, X.: An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage. Soft Computing (2018)
Jiang, L., Li, C., Cai, Z.: Learning decision tree for ranking. Knowl. Inf. Syst. 20(1), 123–135 (2009)
Karlsson, I., Papapetrou, P., Boström, H.: Early random shapelet forest. In: Calders, T., Ceci, M., Malerba, D. (eds.) Discovery science, pp 261–276. Springer International Publishing, Cham (2016)
Keller, J. M., Gray, M. R., Givens, J. A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)
Keogh, E., Jessica, L., Ada, F.: Hot sax: Finding the most unusual time series subsequence: Algorithms and applications. In: International conference on data mining, pp. 1–27 (2008)
Li, G., Bräysy, O., Jiang, L., Wu, Z., Wang, Y.: Finding time series discord based on bit representation clustering. Knowl.-Based Syst. 54, 243–254 (2013)
Li, G., Yan, W., Wu, Z.: Discovering shapelets with key points in time series classification. Expert Syst. Appl. 132, 76–86 (2019)
Lin, T. H., Kaminski, N., Bar-Joseph, Z.: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24(13), 147–155 (2008)
Lines, J., Davis, L. M., Hills, J., Bagnall, A.: A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp. 289–297. ACM (2012)
Ma, C., Weng, X., Shan, Z.: Early classification of multivariate time series based on piecewise aggregate approximation. In: Health information science, pp. 81–88 (2017)
Mori, U., Mendiburu, A., Dasgupta, S., Lozano, J. A.: Early classification of time series by simultaneously optimizing the accuracy and earliness. IEEE Transactions on Neural Networks and Learning Systems (2017)
Mori, U., Mendiburu, A., Keogh, E., Lozano, J. A.: Reliable early classification of time series based on discriminating the classes over time. Data Min. Knowl. Disc. 31(1), 233–263 (2017)
Parrish, N., Anderson, H. S., Gupta, M. R., Hsiao, D. Y.: Classifying with confidence from incomplete information. J. Mach. Learn. Res. 14(1), 3561–3589 (2013)
Romain, T., Simon, M.: Cost-aware early classification of time series. In: Machine learning and knowledge discovery in databases, pp. 632–647 (2016)
Sangnier, M., Gauthier, J., Rakotomamonjy, A.: Early and reliable event detection using proximity space representation. In: Proceedings of the 33rd international conference on international conference on machine learning - vol. 48, ICML’16, pp. 2310–2319 (2016)
Schäfer, P., Leser, U.: Teaser: Early and accurate time series classification. arXiv:1908.03405 (2019)
Song, W., Wang, L., Xiang, Y., Zomaya, A. Y.: Geographic spatiotemporal big data correlation analysis via the hilbert-huang transformation. J. Comput. Syst. Sci. 89, 130–141 (2017)
Wang, S., Cao, J., Yu, P.S.: Deep learning for spatio-temporal data mining: A survey. arXiv:1906.04928 (2019)
Wang, W., Chen, C., Wang, W., Rai, P., Carin, L.: Earliness-aware deep convolutional networks for early time series classification. arXiv:1611.04578 (2016)
Wu, J., Pan, S., Zhu, X., Cai, Z.: Boosting for multi-graph classification. IEEE Trans. Cybern. 45(3), 430–443 (2015)
Xing, Z., Pei, J., Yu, P. S.: Early prediction on time series: A nearest neighbor approach. In: International jont conference on artifical intelligence, pp. 1297–1302 (2009)
Xing, Z., Pei, J., Yu, P. S.: Early classification on time series. Knowl. Inf. Syst. 31(1), 105–127 (2012)
Xing, Z., Pei, J., Yu, P. S., Wang, K.: Extracting interpretable features for early classification on time series. In: 11th Siam international conference on data mining, SDM 2011, April 28-30, 2011, Mesa, Arizona, USA, pp. 247–258 (2011)
Ye, L., Keogh, E.: Time series shapelets:a new primitive for data mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 - July, pp. 947–956 (2009)
Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Zimmerman, Z., Silva, D. F., Mueen, A., Keogh, E.: Time series joins, motifs, discords and shapelets: A unifying view that exploits the matrix profile. Data Mining & Knowledge Discovery 32(1), 83–123 (2018)
Zalewski, W., Silva, F., Maletzke, A. G., Ferrero, C. A.: Exploring shapelet transformation for time series classification in decision trees. Knowl.-Based Syst. 112, 80–91 (2016)
Acknowledgment
The authors would like to thank Prof. Eamonn Keogh and all the people who have contributed to the UCR time series classification archive for their selfless work. We also thank the anonymous reviewers for their valuable advice.
The work is supported by the National Natural Science Foundation of China (No. 61702468), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (no. KLIGIP-2018B03) and the Zhejiang Provincial Natural Science Foundation of China (No. LZ18F020001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, W., Li, G., Wu, Z. et al. Extracting diverse-shapelets for early classification on time series. World Wide Web 23, 3055–3081 (2020). https://doi.org/10.1007/s11280-020-00820-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-020-00820-z