Abstract
Time series play a major role in many analysis tasks. As an example, in the stock market, they can be used to model price histories and to make predictions about future trends. Sometimes, information contained in a time series is complemented by other kinds of data, which may be encoded by static attributes, e.g., categorical or numeric ones, or by more general discrete data sequences. In this paper, we present J48SS, a novel decision tree learning algorithm capable of natively mixing static, sequential, and time series data for classification purposes. The proposed solution is based on the well-known C4.5 decision tree learner, and it relies on the concept of time series shapelets, which are generated by means of multi-objective evolutionary computation techniques and, differently from most previous approaches, are not required to be part of the training set. We evaluate the algorithm against a set of well-known UCR time series datasets, and we show that it provides better classification performances with respect to previous approaches based on decision trees, while generating highly interpretable models and effectively reducing the data preparation effort.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adesuyi, A.S., Munch, Z.: Using time-series NDVI to model land cover change: a case study in the Berg River catchment area, Western Cape, South Africa. Int. J. Environ. Chem. Ecol. Geol. Geophys. Eng. 9(5), 537–542 (2015)
Arathi, M., Govardhan, A.: Effect of Mahalanobis distance on time series classification using shapelets. In: Satapathy, S., Govardhan, A., Raju, K., Mandal, J. (eds.) CSI 2015. AISC, vol. 338, pp. 525–535. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13731-5_57
Barros, R.C., Freitas, A.A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(3), 291–312 (2012)
Boström, H.: Concurrent learning of large-scale random forests. In: SCAI. Frontiers in Artificial Intelligence and Applications, vol. 227, pp. 20–29. IOS Press (2011)
Brunello, A., Gallo, P., Marzano, E., Montanari, A., Vitacolonna, N.: An event-based data warehouse to support decisions in multi-channel, multi-service contact centers. J. Cases Inf. Technol. 21(1), 33–51 (2019)
Brunello, A., Marzano, E., Montanari, A., Sciavicco, G.: J48S: a sequence classification approach to text analysis based on decision trees. In: Damaševičius, R., Vasiljevienė, G. (eds.) ICIST 2018. CCIS, vol. 920, pp. 240–256. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99972-2_19
Chen, Y., et al.: The UCR time series classification archive, July 2015
Dabhi, V.K., Chaudhary, S.: A survey on techniques of improving generalization ability of genetic programming solutions. arXiv preprint arXiv:1211.1119 (2012)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Durillo, J.J., Nebro, A.J., Alba, E.: The jMetal framework for multi-objective optimization: design and architecture. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2010), Barcelona, Spain, pp. 4138–4325, July 2010
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-662-05094-1
Fitzgerald, J., Azad, R.M.A., Ryan, C.: A bootstrapping approach to reduce over-fitting in genetic programming. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2013), pp. 1113–1120. ACM (2013)
Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M.: Genetic programming, validation sets, and parsimony pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 109–120. Springer, Heidelberg (2006). https://doi.org/10.1007/11729976_10
Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37207-0_7
Grabocka, J., Wistuba, M., Schmidt-Thieme, L.: Scalable discovery of time-series shapelets. arXiv preprint arXiv:1503.03238 (2015)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Hou, L., Kwok, J.T., Zurada, J.M.: Efficient learning of timeseries shapelets. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016) (2016)
Kampouraki, A., Manis, G., Nikou, C.: Heartbeat time series classification with support vector machines. IEEE Trans. Inf. Technol. Biomed. 13(4), 512–518 (2009)
Karim, F., Majumdar, S., Darabi, H., Chen, S.: LSTM fully convolutional networks for time series classification, 6, 1662–1669 (2018). arXiv preprint arXiv:1709.05206
Karlsson, I., Papapetrou, P., Boström, H.: Generalized random shapelet forests. Data Min. Knowl. Discov. 30(5), 1053–1085 (2016)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD 2003), pp. 2–11. ACM (2003)
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 1–41 (2010)
Mörchen, F., Ultsch, A.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 660–665. ACM (2005)
Moskovitch, R., Shahar, Y.: Classification-driven temporal discretization of multivariate time series. Data Min. Knowl. Discov. 29(4), 871–913 (2015)
Nerlove, M., Grether, D.M., Carvalho, J.L.: Analysis of Economic Time Series: A Synthesis. Academic Press, New York (2014)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Rakthanmanon, T., Keogh, E.: Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM International Conference on Data Mining (SIAM 2013), pp. 668–676 (2013)
Renard, X., Rifqi, M., Erray, W., Detyniecki, M.: Random-shapelet: an algorithm for fast shapelet discovery. In: Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015), pp. 1–10. IEEE (2015)
Schäfer, P., Leser, U.: Fast and accurate time series classification with WEASEL. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM 2017), pp. 637–646. ACM (2017)
Shah, M., Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning DTW-shapelets for time-series classification. In: Proceedings of the 3rd IKDD Conference on Data Science (CODS 2016), p. 3. ACM (2016)
Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO 2010), pp. 877–884. ACM (2010)
Wei, L.Y., et al.: A hybrid time series model based on AR-EMD and volatility for medical data forecasting: a case study in the emergency department. Int. J. Manag., Econ. Soc. Sci. (IJMESS) 6(Spec. Issue), 166–184 (2017)
Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)
Wistuba, M., Grabocka, J., Schmidt-Thieme, L.: Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016). https://www.cs.waikato.ac.nz/ml/weka/book.html
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 947–956. ACM (2009)
Acknowledgments
Andrea Brunello and Angelo Montanari would like to thank the PRID project ENCASE - Efforts in the uNderstanding of Complex interActing SystEms for the support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Brunello, A., Marzano, E., Montanari, A., Sciavicco, G. (2018). A Novel Decision Tree Approach for the Handling of Time Series. In: Groza, A., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2018. Lecture Notes in Computer Science(), vol 11308. Springer, Cham. https://doi.org/10.1007/978-3-030-05918-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-05918-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05917-0
Online ISBN: 978-3-030-05918-7
eBook Packages: Computer ScienceComputer Science (R0)