research-article

Interpreting Convolutional Sequence Model by Learning Local Prototypes with Adaptation Regularization

Authors:

Zhengzhang Chen,

Haifeng ChenAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 1366 - 1375

https://doi.org/10.1145/3459637.3482355

Published: 30 October 2021 Publication History

Abstract

In many high-stakes applications of machine learning models, outputting only predictions or providing statistical confidence is usually insufficient to gain trust from end users, who often prefer a transparent reasoning paradigm. Despite the recent encouraging developments on deep networks for sequential data modeling, due to the highly recursive functions, the underlying rationales of their predictions are difficult to explain. Thus, in this paper, we aim to develop a sequence modeling approach that explains its own predictions by breaking input sequences down into evidencing segments (i.e., sub-sequences) in its reasoning. To this end, we build our model upon convolutional neural networks, which, in their vanilla forms, associates local receptive fields with outputs in an obscure manner. To unveil it, we resort to case-based reasoning, and design prototype modules whose units (i.e., prototypes) resemble exemplar segments in the problem domain. Each prediction is obtained by combining the comparisons between the prototypes and the segments of an input. To enhance interpretability, we propose a training objective that delicately adapts the distribution of prototypes to the data distribution in latent spaces, and design an algorithm to map prototypes to human-understandable segments. Through extensive experiments in a variety of domains, we demonstrate that our model can achieve high interpretability generally, together with a competitive accuracy to the state-of-the-art approaches.

References

[1]

Dimitrios Alikaniotis, Helen Yannakoudakis, and Marek Rei. 2016. Automatic Text Scoring Using Neural Networks. In ACL. 715--725.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. (2015).

[3]

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).

[4]

Joost Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable neural predictions with differentiable binary variables. arXiv preprint arXiv:1905.08160 (2019).

[5]

Steven Bird, Edward Loper, and Ewan Klein. 2009. Natural Language Processing with Python. O'Reilly Media Inc.

Digital Library

[6]

R Bousseljot, D Kreiseler, and A Schnabel. 1995. Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet. Biomedizinische Technik/Biomedical Engineering, Vol. 40, s1 (1995), 317--318.

[7]

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).

[8]

Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD. 1721--1730.

Digital Library

[9]

Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2016. Interpretable deep models for ICU outcome prediction. In AMIA Annual Symposium, Vol. 2016. American Medical Informatics Association, 371.

[10]

Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This looks like that: deep learning for interpretable image recognition. In NeurIPS. 8928--8939.

[11]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NeurIPS. 3504--3512.

Digital Library

[12]

UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res., Vol. 47, D1 (2019), D506--D515.

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[14]

ANSI-AAMI EC57. 1998. Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms. Association for the Advancement of Medical Instrumentation (1998).

[15]

Alan H Gee, Diego Garcia-Olano, Joydeep Ghosh, and David Paydarfar. 2019. Explaining deep classification of time-series data with learned prototypes. arXiv preprint arXiv:1904.08935 (2019).

[16]

Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017a. A Convolutional Encoder Model for Neural Machine Translation. In ACL. 123--135.

[17]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017b. Convolutional sequence to sequence learning. In ICML. 1243--1252.

Digital Library

[18]

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation, Vol. 101, 23 (2000), e215--e220.

[19]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.

Digital Library

[20]

Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. 2014. Learning time-series shapelets. In KDD. 392--401.

Digital Library

[21]

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res., Vol. 13, 1 (2012), 723--773.

Digital Library

[22]

Peter Hase, Chaofan Chen, Oscar Li, and Cynthia Rudin. 2019. Interpretable Image Recognition with Hierarchical Prototypes. In HCOMP, Vol. 7. 32--40.

[23]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[24]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[25]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780.

Digital Library

[26]

Xisen Jin, Junyi Du, Zhongyu Wei, Xiangyang Xue, and Xiang Ren. 2019. Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models. arXiv preprint arXiv:1911.06194 (2019).

[27]

Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. 2018. Ecg heartbeat classification: A deep transferable representation. In ICHI. IEEE, 443--444.

[28]

Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A Convolutional Neural Network for Modelling Sentences. In ACL. 655--665.

[29]

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078 (2015).

[30]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746--1751.

[31]

Janet L Kolodner. 1992. An introduction to case-based reasoning. Artif. Intell. Rev., Vol. 6, 1 (1992), 3--34.

[32]

Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI.

[33]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML. 97--105.

Digital Library

[34]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res., Vol. 9, Nov (2008), 2579--2605.

[35]

David Alvarez Melis and Tommi Jaakkola. 2018. Towards robust interpretability with self-explaining neural networks. In NeurIPS. 7775--7784.

[36]

Yao Ming, Panpan Xu, Huamin Qu, and Liu Ren. 2019. Interpretable and steerable sequence learning via prototypes. In KDD. 903--913.

Digital Library

[37]

George B Moody and Roger G Mark. 2001. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol., Vol. 20, 3 (2001), 45--50.

[38]

W James Murdoch, Peter J Liu, and Bin Yu. 2018. Beyond word importance: Contextual decomposition to extract interactions from LSTMs. arXiv preprint arXiv:1801.05453 (2018).

[39]

W James Murdoch and Arthur Szlam. 2017. Automatic rule extraction from long short term memory networks. arXiv preprint arXiv:1702.02540 (2017).

[40]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In EMNLP-IJCNLP. 188--197.

[41]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).

[42]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In NAACL-HLT. 2227--2237.

[43]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In KDD. 1135--1144.

Digital Library

[44]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In AAAI.

[45]

Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell., Vol. 1, 5 (2019), 206--215.

[46]

Chandan Singh, W James Murdoch, and Bin Yu. 2018. Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337 (2018).

[47]

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush. 2017. Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans. Vis. Comput. Graph., Vol. 24, 1 (2017), 667--676.

[48]

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. 2018. Multi-pointer co-attention networks for recommendation. In KDD. 2309--2318.

Digital Library

[49]

Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In EMNLP-IJCNLP. 11--20.

[50]

Jingyuan Yang, Chuanren Liu, Mingfei Teng, Ji Chen, and Hui Xiong. 2017. A unified view of social and temporal modeling for B2B marketing campaign recommendation. IEEE Trans. Knowl. Data. Eng., Vol. 30, 5 (2017), 810--823.

[51]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS. 5753--5763.

Digital Library

[52]

Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In KDD. 947--956.

Digital Library

Cited By

Ragno ARosa BCapobianco R(2024)Prototype-Based Interpretable Graph Neural NetworksIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32226185:4(1486-1495)Online publication date: Apr-2024
https://doi.org/10.1109/TAI.2022.3222618
Rastogi ASong Mentor D(2024)Importance Sampling to Learn Vasopressor Dosage to Optimize Patient Mortality in an Interpretable Manner2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825148(7530-7539)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825148
Adey BHabib AKarmakar C(2024)Exploration of an intrinsically explainable self-attention based model for prototype generation on single-channel EEG sleep stage classificationScientific Reports10.1038/s41598-024-79139-y14:1Online publication date: 11-Nov-2024
https://doi.org/10.1038/s41598-024-79139-y
Show More Cited By

Index Terms

Interpreting Convolutional Sequence Model by Learning Local Prototypes with Adaptation Regularization
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Interpretable and Steerable Sequence Learning via Prototypes
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

One of the major challenges in machine learning nowadays is to provide predictions with not only high accuracy but also user-friendly explanations. Although in recent years we have witnessed increasingly popular use of deep neural networks for sequence ...
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Computer Vision – ECCV 2022
Abstract
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer)...
A Convolutional Sequence to Sequence Model for Multimodal Dynamics Prediction in Ski Jumps
MMSports'18: Proceedings of the 1st International Workshop on Multimedia Content Analysis in Sports

A convolutional sequence to sequence model for predicting the jump forces of ski jumpers directly from pose estimates is presented. We collect the footage of multiple, unregistered cameras together with the output of force measurement plates and present ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
250
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ragno ARosa BCapobianco R(2024)Prototype-Based Interpretable Graph Neural NetworksIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32226185:4(1486-1495)Online publication date: Apr-2024
https://doi.org/10.1109/TAI.2022.3222618
Rastogi ASong Mentor D(2024)Importance Sampling to Learn Vasopressor Dosage to Optimize Patient Mortality in an Interpretable Manner2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825148(7530-7539)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825148
Adey BHabib AKarmakar C(2024)Exploration of an intrinsically explainable self-attention based model for prototype generation on single-channel EEG sleep stage classificationScientific Reports10.1038/s41598-024-79139-y14:1Online publication date: 11-Nov-2024
https://doi.org/10.1038/s41598-024-79139-y
Jiang YYu WSong DCheng WChen H(2023)Interpretable Skill Learning for Dynamic Treatment Regimes through Imitation2023 57th Annual Conference on Information Sciences and Systems (CISS)10.1109/CISS56502.2023.10089648(1-6)Online publication date: 22-Mar-2023
https://doi.org/10.1109/CISS56502.2023.10089648
Zhu WSong DChen YCheng WZong BMizoguchi TLumezanu CChen HLuo J(2022)Deep Federated Anomaly Detection for Multivariate Time Series Data2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10064694(1-10)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10064694

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten