skip to main content
10.1145/3549737.3549761acmotherconferencesArticle/Chapter ViewAbstractPublication PagessetnConference Proceedingsconference-collections
research-article

SCALE-BOSS: A framework for scalable time-series classification using symbolic representations

Published: 09 September 2022 Publication History

Abstract

Time-Series Classification (TSC) is an important problem in many fields across sciences. Many algorithms for TSC use symbolic representation to combat noise. In this paper we propose a framework, namely SCALE-BOSS, to build TSC algorithms that exploit time-series models based on symbolic representations. While alternative symbolic representations can be incorporated, we have opted to use the Bag-Of-SFA (BOSS) approach, and thus SFA, as a state-of-the-art symbolic time series representation. We investigate the efficiency of several instantiations of this framework based on two main variations, where the TSC model is built either by a time-series classification or by a clustering algorithm. The objective is to advance the computational efficiency of TSC classification algorithms without sacrificing their accuracy. We evaluate the instantiations of the SCALE-BOSS framework on those datasets in the UCR time-series repository that include the largest training sets. Comparisons with state of the art methods on TSC show the balance between computational efficiency and accuracy on predictions achieved.

References

[1]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen, and Thomas Seidl. 2010. Moa: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the first workshop on applications of pattern analysis. PMLR, 44–50.
[2]
Aaron Bostrom and Anthony Bagnall. 2017. Binary shapelet transform for multiclass time series classification. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII. Springer, 24–46.
[3]
Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. 2019. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305.
[4]
Angus Dempster, François Petitjean, and Geoffrey I Webb. 2020. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34, 5 (2020), 1454–1495.
[5]
Angus Dempster, Daniel F Schmidt, and Geoffrey I Webb. 2021. Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 248–257.
[6]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226–231.
[7]
Johann Faouzi and Hicham Janati. 2020. pyts: A Python Package for Time Series Classification.J. Mach. Learn. Res. 21(2020), 46–1.
[8]
Apostolos Glenis and George A Vouros. 2020. Balancing between scalability and accuracy in time-series classification for stream and batch settings. In International Conference on Discovery Science. Springer, 265–279.
[9]
James Large, Anthony Bagnall, Simon Malinowski, and Romain Tavenard. 2019. On time series classification with dictionary-based classifiers. Intelligent Data Analysis 23, 5 (2019), 1073–1089.
[10]
Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J Király. 2019. sktime: A unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872(2019).
[11]
Benjamin Lucas, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I Webb. 2019. Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery 33, 3 (2019), 607–635.
[12]
Matthew Middlehurst, James Large, Gavin Cawley, and Anthony Bagnall. 2020. The temporal dictionary ensemble (TDE) classifier for time series classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 660–676.
[13]
Matthew Middlehurst, William Vickers, and Anthony Bagnall. 2019. Scalable dictionary classifiers for time series classification. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 11–19.
[14]
Thach Le Nguyen and Georgiana Ifrim. 2021. MrSQM: Fast time series classification with symbolic representations. arXiv preprint arXiv:2109.01036(2021).
[15]
Hae-Sang Park and Chi-Hyuck Jun. 2009. A simple and fast algorithm for K-medoids clustering. Expert systems with applications 36, 2 (2009), 3336–3341.
[16]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.
[17]
Patrick Schäfer. 2015. The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery 29, 6 (2015), 1505–1530.
[18]
Patrick Schäfer. 2016. Scalable time series classification. Data Mining and Knowledge Discovery 30, 5 (2016), 1273–1298.
[19]
Patrick Schäfer and Mikael Högqvist. 2012. SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 516–527.
[20]
Patrick Schäfer and Ulf Leser. 2017. Fast and accurate time series classification with weasel. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 637–646.
[21]
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1–21.
[22]
David Sculley. 2010. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web. 1177–1178.
[23]
Pavel Senin and Sergey Malinchik. 2013. Sax-vsm: Interpretable time series classification using sax and vector space model. In 2013 IEEE 13th international conference on data mining. IEEE, 1175–1180.
[24]
Ahmed Shifaz, Charlotte Pelletier, François Petitjean, and Geoffrey I Webb. 2020. TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Mining and Knowledge Discovery 34, 3 (2020), 742–775.
[25]
Lexiang Ye and Eamonn Keogh. 2011. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data mining and knowledge discovery 22, 1 (2011), 149–182.
[26]
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56–65.
[27]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: an efficient data clustering method for very large databases. ACM sigmod record 25, 2 (1996), 103–114.

Cited By

View all
  • (2024)SCALE-BOSS-MR: Scalable Time Series Classification Using Multiple Symbolic RepresentationsApplied Sciences10.3390/app1402068914:2(689)Online publication date: 13-Jan-2024
  • (2024)CTCTime: A New Model for Unidimensional Time Series ClassificationNeural Processing Letters10.1007/s11063-024-11694-x56:5Online publication date: 4-Oct-2024
  • (2024)Probabilistic SAX: A Cognitively-Inspired Method for Time Series Classification in Cognitive IoT Sensor NetworkMobile Networks and Applications10.1007/s11036-024-02322-y29:3(809-824)Online publication date: 1-Jun-2024

Index Terms

  1. SCALE-BOSS: A framework for scalable time-series classification using symbolic representations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SETN '22: Proceedings of the 12th Hellenic Conference on Artificial Intelligence
    September 2022
    450 pages
    ISBN:9781450395977
    DOI:10.1145/3549737
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 September 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. framework
    2. scalable
    3. symbolic representation
    4. time series classification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SETN 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SCALE-BOSS-MR: Scalable Time Series Classification Using Multiple Symbolic RepresentationsApplied Sciences10.3390/app1402068914:2(689)Online publication date: 13-Jan-2024
    • (2024)CTCTime: A New Model for Unidimensional Time Series ClassificationNeural Processing Letters10.1007/s11063-024-11694-x56:5Online publication date: 4-Oct-2024
    • (2024)Probabilistic SAX: A Cognitively-Inspired Method for Time Series Classification in Cognitive IoT Sensor NetworkMobile Networks and Applications10.1007/s11036-024-02322-y29:3(809-824)Online publication date: 1-Jun-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media