Forest based on Interval Transformation (FIT): A time series classifier with adaptive features

https://doi.org/10.1016/j.eswa.2022.118923Get rights and content

Highlights

  • A variety of series transformation and interval features to expand feature space.

  • Adaptive selection of transformation series and feature by cross-validation.

  • Forest based on Interval Transformation for time series classification.

  • Experiments on real datasets verify effectiveness of our proposal.

Abstract

Time series classification (TSC) is an important task in time series data mining and has attracted a lot of research attention. Most TSC algorithms aim to achieve high classification accuracy while reducing the computational complexity. Currently, Time Series Combination of Heterogeneous and Integrated Embedding Forest (TS-CHIEF) is considered to be one of the state-of-the-art TSC algorithms. However, compared with fast algorithms such as Time Series Forest (TSF), TS-CHIEF still has high computation cost. On the premise that the TSF algorithm is fast, we propose a new TSC algorithm, Forest based on Interval Transformation (called FIT), which takes into account both accuracy and efficiency. FIT uses cross-validation to select appropriate transformation series and corresponding interval features, and adaptively converts the interval features of each series in the process of formal training. Subsequently, the transformed feature set is combined with the random forest training FIT model. We evaluate the performance of FIT on 85 UCR time series classification datasets. The experimental results demonstrate that FIT can achieve better accuracy while maintaining high efficiency compared with the state-of-the-art methods.

Introduction

In the past few decades, the research on time series classification (TSC) has gradually become an important research direction of machine learning. Numerous researchers have proposed a wide variety of algorithms, including time series similarity based methods and important features of time series extraction based methods. TSC can be applied in the area of electrocardiogram detection (Pourbabaee et al., 2018, Wang et al., 2013), financial forecast (Tay & Cao, 2001) and motion recognition (Savadkoohi, Oladunni, & Thompson, 2021) and can reduce the workload of workers in the corresponding fields.

Among the existing TSC methods, a popular way is to use the integrated method to improve the classification performance. Through homogeneous integration or heterogeneous integration, the accuracy of the TSC algorithm can be greatly improved. Wherein, forest-based method is a classical integration algorithm, and Proximity Forest (PF) (Lucas et al., 2019) is a representative method with ensemble. PF uses a distance-based forest model through the random selection of distance measure at each node and the idea of dividing and conquering trees. The training complexity of the PF algorithm is quasi-linear with the number of training instances, and it has a quadratic relationship with the length of the time series. Moreover, PF can achieve classification accuracy close to that of the state-of-the-art TSC algorithm HIVE-COTE (Lines, Taylor, & Bagnall, 2018). Inspired by PF, Shifaz et al. propose a new algorithm TS-CHIEF (Shifaz, Pelletier, Petitjean, & Webb, 2020). In addition to distance-based representation in PF, TS-CHIEF uses interval-based and dictionary-based representations. It adopts the idea similar to PF, and uses heterogeneous integration to improve the classification accuracy. TS-CHIEF achieves high accuracy comparable to that of HIVE-COTE, and has the same training complexity as PF.

However, compared with fast algorithms such as BOSS (Schäfer, 2015) only based on symbolic representation and TSF (Deng, Runger, Tuv, & Vladimir, 2013) only based on interval representation, the computational complexity of PF and TS-CHIEF is still relatively high. Although TSF has low training time complexity, the accuracy of TSF is not competitive with other algorithms due to the randomness of interval selection in the training process. For interval-based classification method, if we can obtain the best interval to distinguish different classes of time series, the classification accuracy will be better improved. On the basis of TSF, which is a fast algorithm based on intervals, we try to develop an algorithm that considers both accuracy and efficiency.

We propose a new more accurate TSC method based on interval features, named Forest based on Interval Transformation (FIT for short). Due to the feature limitation of the original time series in TSF and the randomness of the interval features, although the TSF algorithm is fast, the accuracy is not competitive. Therefore, our proposed algorithm FIT focuses on adding more transformation series and selecting discriminative feature representations. We design an overall scheme, that is, using cross-validation to select the appropriate transformation series and interval feature representation, and then combine the selected transformation series and interval feature representation with formal training. The scheme consists of two stages, cross-validation stage and formal training stage respectively. In the cross-validation stage, we use some transformation series and extracted features corresponding to random intervals to construct a classifier, and perform cross-validation to evaluate the performance of the transformation series on dataset, so as to select the appropriate transformation series and interval features. In the formal training stage, we use the selected transformation series and corresponding interval features to guide the training of the classifier. We use the corresponding interval features of the selected series to complete the calculation of interval features, thereby obtaining a new feature set as an input to construct a decision tree. FIT is a tree-based ensemble classification method, which adopts a random forest method to train similar trees as the final classifier. The corresponding interval features of these transformation series can more effectively identify the discriminative information on the time series of different classes, and thus FIT can more accurately classify the time series of unknown classes.

The main contributions of our work are summarized as follows.

  • We propose a TSC algorithm based on interval feature transformation called FIT. FIT combines the two stages of cross-validation and formal training to achieve both efficiency and accuracy.

  • In the cross-validation stage, we propose an adaptive selection method. This method can automatically select expressive transformation series and discriminative interval features for different datasets. In the formal training stage, we use the selected results of cross-validation to automatically assemble the interval features, and use the feature set combined with random forest to train the classifier.

  • We conducted experiments on UCR time series classification archive (Dau et al., 2018) to evaluate the accuracy of classification. Experiments show that FIT is competitive with the state-of-the-art methods in terms of accuracy. Meanwhile, we also evaluated the efficiency of FIT on the UCR archive. Experiments show that FIT has higher efficiency compared with the candidates.

The rest of the paper is organized as follows. Section 2 is the related work of TSC algorithms. In Section 3, we present our new TSC method FIT in detail. Section 4 is the experimental evaluation. Finally, we conclude the paper and point out the future direction in Section 5.

Section snippets

Related work

In recent years, TSC has attracted much interest, and many algorithms have been proposed. We will make the reviews of the related work below.

The proposed FIT algorithm

We propose a new TSC algorithm, a forest classifier based on interval transformation called FIT. FIT can adaptively select the transformation series and the interval feature representation. TSF proposes to select a set of intervals randomly in time series and transforms them into three features, mean, standard deviation and slope respectively. These interval features are then combined with random forest to construct classifier. Inspired by TSF, our proposed FIT algorithm can obtain

Experiments

We conducted comprehensive experiments to evaluate the performance of FIT. We use the dataset of the UCR Time Series Classification Archive (Dau et al., 2018), which is the standard library in the TSC research field. We compared the accuracy and efficiency with the state-of-the-art TSC algorithms. Besides, we make a case study of the series transformation method.

All the experimental results are obtained on the computer with AMD Ryzen 5 4600U with Radeon Graphics (2.10 GHz), 16 GB. The

Conclusion

In this paper, we propose a new TSC algorithm called FIT by selecting time series interval features. FIT first extracts the appropriate series transformation method for the current dataset through cross-validation, and then selects the appropriate interval feature through the number of feature nodes of the classifier constructed in the cross-validation, and uses the obtained interval feature set to train the FIT model. Experiments show that FIT can obtain high classification accuracy on the 85

CRediT authorship contribution statement

Guiling Li: Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing, Funding acquisition. Shaolin Xu: Methodology, Software, Visualization, Formal analysis, Writing – original draft, Writing – review & editing. Senzhang Wang: Methodology, Writing – original draft, Writing – review & editing. Philip S. Yu: Formal analysis, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank Prof. Eamonn Keogh and all the people who have contributed to the UCR time series classification archive for their selfless work.

The work is supported by the National Natural Science Foundation of China (No. 61702468), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing, China (No. KLIGIP-2018B03).

References (39)

  • DauH.A. et al.

    The UCR time series classification archive

    (2018)
  • DempsterA. et al.

    ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

    Data Mining and Knowledge Discovery

    (2020)
  • DemšarJ.

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • FawazH.I. et al.

    Deep learning for time series classification: a review

    Data Mining and Knowledge Discovery

    (2019)
  • FawazH.I. et al.

    Inceptiontime: Finding alexnet for time series classification

    Data Mining and Knowledge Discovery

    (2020)
  • FulcherB.D. et al.

    Hctsa: A computational framework for automated time-series phenotyping using massive feature extraction

    Cell Systems

    (2017)
  • Grabocka, J., Schilling, N., Wistuba, M., & Schmidt-Thieme, L. (2014). Learning time-series shapelets. In Proceedings...
  • HeG. et al.

    Online rule-based classifier learning on dynamic unlabeled multivariate time series data

    IEEE Transactions on Systems, Man, and Cybernetics: Systems

    (2020)
  • HillsJ. et al.

    Classification of time series by shapelet transformation

    Data Mining and Knowledge Discovery

    (2014)
  • Cited by (0)

    View full text