Abstract
In time series classification (TSC) literature, approaches which incorporate multiple feature extraction domains such as HIVE-COTE and TS-CHIEF have generally shown to perform better than single domain approaches in situations where no expert knowledge is available for the data. Time series extrinsic regression (TSER) has seen very little activity compared to TSC, but the provision of benchmark datasets for regression by researchers at Monash University and the University of East Anglia provide an opportunity to see if this insight gleaned from TSC literature applies to regression data. We show that extracting random shapelets and intervals from different series representations and concatenating the output as part of a feature extraction pipeline significantly outperforms the single domain approaches for both classification and regression. In addition to our main contribution, we provide results for shapelet based algorithms on the regression archive datasets using the RDST transform, and show that current interval based approaches such as DrCIF can find noticeable scalability improvements by adopting the pipeline format.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2017)
Bagnall, A., et al.: The UEA multivariate time series classification archive. arXiv preprint arXiv:1811.00075 (2018)
Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17, 1–10 (2016)
Bostrom, A., Bagnall, A.: Binary shapelet transform for multiclass time series classification. Trans. Large-Scale Data Knowl. Centered Syst. 32, 24–46 (2017)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Cabello, N., Naghizade, E., Qi, J., Kulik, L.: Fast and accurate time series classification through supervised interval search. In: IEEE International Conference on Data Mining (2020)
Cabello, N., Naghizade, E., Qi, J., Kulik, L.: Fast, accurate and interpretable time series classification through randomization. arXiv preprint arXiv:2105.14876 (2021)
Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W.: Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh-A Python package). Neurocomputing 307, 72–77 (2018)
Dau, H., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)
Dempster, A., Petitjean, F., Webb, G.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Disc. 34, 1454–1495 (2020)
Dempster, A., Schmidt, D.F., Webb, G.I.: HYDRA: competing convolutional kernels for fast and accurate time series classification. arXiv preprint arXiv:2203.13652 (2022)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Deng, H., Runger, G., Tuv, E., Vladimir, M.: A time series forest for classification and feature extraction. Inf. Sci. 239, 142–153 (2013)
Fawaz, H., et al.: InceptionTime: finding AlexNet for time series classification. Data Min. Knowl. Disc. 34(6), 1936–1962 (2020)
Flynn, M., Large, J., Bagnall, T.: The contract random interval spectral ensemble (c-RISE): the effect of contracting a classifier on accuracy. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds.) HAIS 2019. LNCS (LNAI), vol. 11734, pp. 381–392. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29859-3_33
García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
Gay, D., Bondu, A., Lemaire, V., Boullé, M.: Interpretable feature construction for time series extrinsic regression. In: Karlapalem, K., et al. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 804–816. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_63
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Guijo-Rubio, D., Middlehurst, M., Arcencio, G., Silva, D.F., Bagnall, A.: Unsupervised feature based algorithms for time series extrinsic regression. arXiv preprint arXiv:2305.01429 (2023)
Guillaume, A., Vrain, C., Elloumi, W.: Random dilated shapelet transform: a new approach for time series shapelets. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds.) Pattern Recognition and Artificial Intelligence: Third International Conference, ICPRAI 2022, Paris, France, 1–3 June 2022, Proceedings, Part I, pp. 653–664. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09037-0_53
Herrmann, M., Tan, C.W., Salehi, M., Webb, G.I.: Proximity Forest 2.0: a new effective and scalable similarity-based classifier for time series. arXiv preprint arXiv:2304.05800 (2023)
Lines, J., Davis, L., Hills, J., Bagnall, A.: A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012)
Lines, J., Taylor, S., Bagnall, A.: Time series classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles. ACM Trans. Knowl. Discov. Data 12(5), 1–36 (2018)
Lubba, C., Sethi, S., Knaute, P., Schultz, S., Fulcher, B., Jones, N.: Catch22: canonical time-series characteristics. Data Min. Knowl. Disc. 33(6), 1821–1852 (2019)
Lucas, B., et al.: Proximity forest: an effective and scalable distance-based classifier for time series. Data Min. Knowl. Disc. 33(3), 607–635 (2019)
Middlehurst, M., Large, J., Cawley, G., Bagnall, A.: The temporal dictionary ensemble (TDE) classifier for time series classification. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds.) ECML PKDD 2020. LNCS (LNAI), vol. 12457, pp. 660–676. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67658-2_38
Middlehurst, M., Bagnall, A.: The FreshPRINCE: a simple transformation based pipeline time series classifier. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds.) Pattern Recognition and Artificial Intelligence, ICPRAI 2022. LNCS, vol. 13364, pp. 150–161. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09282-4_13
Middlehurst, M., Large, J., Bagnall, A.: The canonical interval forest (CIF) classifier for time series classification. In: IEEE International Conference on Big Data, pp. 188–195 (2020)
Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.: HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach. Learn. 110, 3211–3243 (2021)
Middlehurst, M., Schäfer, P., Bagnall, A.: Bake off redux: a review and experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029 (2023)
Nguyen, T.L., Ifrim, G.: Fast time series classification with random symbolic subsequences. In: Guyet, T., Ifrim, G., Malinowski, S., Bagnall, A., Shafer, P., Lemaire, V. (eds.) International Workshop on Advanced Analytics and Learning on Temporal Data, vol. 13812, pp. 50–65. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-24378-3_4
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Schäfer, P., Leser, U.: Fast and accurate time series classification with WEASEL. In: Proceedings of the ACM Conference on Information and Knowledge Management, pp. 637–646 (2017)
Schäfer, P., Leser, U.: Weasel 2.0 - a random dilated dictionary transform for fast, accurate and memory constrained time series classification. arXiv preprint arXiv:2301.10194 (2023)
Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.I.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min. Knowl. Discov. 34(3), 742–775 (2020)
Tan, C.W., Bergmeir, C., Petitjean, F., Webb, G.: Time series extrinsic regression. Data Min. Knowl. Discov. 35, 1032–1060 (2021)
Tan, C.W., Dempster, A., Bergmeir, C., Webb, G.: MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Min. Knowl. Discov. 36, 1623–1646 (2022)
Acknowledgements
This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant number EP/W030756/1. The experiments were carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia (UEA). We would like to thank all those responsible for helping maintain the time series dataset archives and those contributing to open source implementations of the algorithms.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Middlehurst, M., Bagnall, A. (2023). Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression. In: Ifrim, G., et al. Advanced Analytics and Learning on Temporal Data. AALTD 2023. Lecture Notes in Computer Science(), vol 14343. Springer, Cham. https://doi.org/10.1007/978-3-031-49896-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-49896-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49895-4
Online ISBN: 978-3-031-49896-1
eBook Packages: Computer ScienceComputer Science (R0)