Abstract
In this work, we focus on quantitatively evaluating and ranking explanation methods for time series classification based on their informativeness. Time series classification has many applications and evaluating which parts of the time series are most informative for a classifier decision is important. For example, to decide between Arabica and Robusta coffee leaves, we can use an explanation method to highlight the time series parts which differentiate these leaves. Although many explanation methods have been proposed for images and time series data, it is still unclear how to objectively evaluate them. Here, we evaluate two model-specific explanation approaches - ResNet-CAM and MrSEQL-SM, and two model-agnostic approaches, LIME combined with classifiers MrSEQL and ROCKET. We generate saliency-based explanations for each classifier on three time series classification datasets from the UCR benchmark. Importance weights for all points in the timeseries are extracted based on each explanation method, in order to perturb specific parts of the time series and assess the impact on the classification accuracy of referee classifiers. We propose a new ranking-based methodology to compare multiple explanation methods on the basis of their informativeness, by using explanation-based perturbation and aggregating the explanation rank over the referee classifiers. This enables us to compare explanation methods within a single dataset and also across multiple datasets. We provide an in-depth analysis of the results attained, also including runtime analysis for each method. Our results indicate model-specific approaches MrSEQL-SM and ResNet-CAM are much faster than model-agnostic approaches MrSEQL-LIME and ROCKET-LIME and that MrSEQL-SM yields the highest informativeness rank among the explanation methods compared.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B Stat. Methodol. 82(4), 1059–1086 (2020). https://doi.org/10.1111/rssb.12377
Bagnall, A., Flynn, M., Large, J., Lines, J., Middlehurst, M.: A tale of two toolkits, report the third: on the usage and performance of HIVE-COTE v1.0 (2020). http://arxiv.org/abs/2004.06069
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2016). https://doi.org/10.1007/s10618-016-0483-9
Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. DAMI. https://link.springer.com/article/10.1007/s10618-020-00701-z
Deng, H., Runger, G., Tuv, E., Vladimir, M.: A time series forest for classification and feature extraction. Inf. Sci. 239, 142–153 (2013)
Dhariyal, B., Nguyen, T.L., Gsponer, S., Ifrim, G.: An examination of the state-of-the-art for multivariate time series classification. In: ICDMW (2020)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning (2017)
Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning (2019)
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1
Kim, B., Khanna, R., Koyejo, O.O.: Examples are not enough, learn to criticize! Criticism for interpretability. In: NeurIPS, vol. 29, pp. 2280–2288. Curran Associates, Inc. (2016)
Le Nguyen, T., Gsponer, S., Ilie, I., O’Reilly, M., Ifrim, G.: Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min. Knowl. Disc. 33(4), 1183–1222 (2019). https://doi.org/10.1007/s10618-019-00633-3
Lei, Y., Wu, Z.: Time series classification based on statistical features. EURASIP J. Wirel. Commun. Netw. 2020(1), 1–13 (2020). https://doi.org/10.1186/s13638-020-1661-4
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. DAMI 15(2), 107–144 (2007)
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions (2017)
Metzenthen, E.: Lime for time code repository. https://github.com/emanuel-metzenthin/Lime-For-Time/blob/master/demo/LIME-Pipeline.ipynb
Molnar, C.: Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/
Nguyen, T.T., Le Nguyen, T., Ifrim, G.: A model-agnostic approach to quantifying the informativeness of explanation methods for time series classification. In: Lemaire, V., Malinowski, S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (eds.) AALTD 2020. LNCS (LNAI), vol. 12588, pp. 77–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65742-0_6
Ozyegen, O., Ilic, I., Cevik, M.: Evaluation of local explanation methods for multivariate time series forecasting, pp. 1–13 (2020). http://arxiv.org/abs/2009.09092
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?'' explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016)
Santos, T., Kern, R.: A literature survey of early time series classification and deep learning. In: CEUR Workshop Proceedings, vol. 1793 (2017)
Schäfer, P.: The BOSS is concerned with time series classification in the presence of noise. DAMI 29(6), 1505–1530 (2015). https://doi.org/10.1007/s10618-014-0377-7
Schäfer, P., Högqvist, M.: SFA: a symbolic Fourier approximation and index for similarity search in high dimensional datasets. In: EDBT, pp. 516–527 (2012)
Schäfer, P., Leser, U.: Fast and accurate time series classification with WEASEL. In: CIKM, pp. 637–646 (2017)
Turing, A.: Sktime specifications. https://www.turing.ac.uk/research/research-projects/sktime-toolbox-data-science-time-series
Ye, L., Keogh, E.: Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. DAMI 22(1–2), 149–182 (2011)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization (2015)
Acknowledgments
This publication has emanated from research supported in part by a grant from Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183), the Insight Centre for Data Analytics (12/RC/2289_P2) and the VistaMilk SFI Research Centre (SFI/16/RC/3835). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The authors would like to thank the reviewers for their constructive feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Agarwal, S., Nguyen, T.T., Nguyen, T.L., Ifrim, G. (2021). Ranking by Aggregating Referees: Evaluating the Informativeness of Explanation Methods for Time Series Classification. In: Lemaire, V., Malinowski, S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2021. Lecture Notes in Computer Science(), vol 13114. Springer, Cham. https://doi.org/10.1007/978-3-030-91445-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-91445-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91444-8
Online ISBN: 978-3-030-91445-5
eBook Packages: Computer ScienceComputer Science (R0)