skip to main content
10.1145/3307339.3342189acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Model-Agnostic Interpretation of Cancer Classification with Multi-Platform Genomic Data

Published: 04 September 2019 Publication History

Abstract

Machine learning models are often criticised for being black-boxes. Recent work in this field has aimed to address this criticism by developing methods to explain the underlying behaviour of machine learning models. These explanations are designed to help the end-user interpret how the models input features are used to make a prediction. Here, we present an extension to one such method, referred to as local interpretable model-agnostic explanations, to interpret multimodal tumor type classification from multi-platform genomic data. We propose a framework for transparent biomedical machine learning by leveraging interpretable dimensionality reduction to facilitate gene-wise explanations for the model behaviour. Using RNA-seq expression and single nucleotide variation (SNV) data from eight cancer types, our experimental results uncovered the models use of clinically relevant genes for cancer cell stratification. We demonstrate that model-agnostic explanations can provide valuable information to a clinician or scientist when predictive ability and interpretability are of absolute importance.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.
[2]
Zakariya Yahya Algamal and Muhammad Hisyam Lee. 2015. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Systems with Applications, Vol. 42, 23 (2015), 9326--9332.
[3]
Simon Anders, Paul Theodor Pyl, and Wolfgang Huber. 2015. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, Vol. 31, 2 (2015), 166--169.
[4]
John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and Fabio A González. 2017. Gated Multimodal Units for Information Fusion. arXiv preprint arXiv:1702.01992 (2017).
[5]
Emanuele Borgonovo and Elmar Plischke. 2016. Sensitivity analysis: a review of recent advances. European Journal of Operational Research, Vol. 248, 3 (2016), 869--887.
[6]
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam
[7]
: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 839--847.
[8]
Kristian Cibulskis, Michael S Lawrence, Scott L Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S Lander, and Gad Getz. 2013. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature biotechnology, Vol. 31, 3 (2013), 213.
[9]
ENCODE Project Consortium et almbox. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, Vol. 489, 7414 (2012), 57.
[10]
Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems. 24--30.
[11]
Anthony A Firek, Mia C Perez, Amber Gonda, Li Lei, Iqbal Munir, Alfred A Simental, Frances E Carr, Benjamin J Becerra, Marino De Leon, and Salma Khan. 2017. Pathologic significance of a novel oncoprotein in thyroid cancer progression. Head & neck, Vol. 39, 12 (2017), 2459--2469.
[12]
Daniela Gasparotto, Roberta Maestro, Sara Piccinin, Tamara Vukosavljevic, Luigi Barzan, Sandro Sulfaro, and Mauro Boiocchi. 1997. Overexpression of CDC25A and CDC25B in head and neck cancers. Cancer Research, Vol. 57, 12 (1997), 2366--2368.
[13]
Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout networks. arXiv preprint arXiv:1302.4389 (2013).
[14]
Yang Guo, Shuhui Liu, Zhanhuai Li, and Xuequn Shang. 2017. Towards the classification of cancer subtypes by using cascade deep forest model in gene expression data. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 1664--1669.
[15]
Bertrand Iooss and Paul Lema^itre. 2015. A review on global sensitivity analysis methods. In Uncertainty management in simulation-optimization of complex systems. Springer, 101--122.
[16]
Hirak Kashyap, Hasin Afzal Ahmed, Nazrul Hoque, Swarup Roy, and Dhruba Kumar Bhattacharyya. 2015. Big data analytics in bioinformatics: A machine learning perspective. arXiv preprint arXiv:1506.05101 (2015).
[17]
Electron Kebebew, Julie Weng, Juergen Bauer, Gustavo Ranvier, Orlo H Clark, Quan-Yang Duh, Daniel Shibru, Boris Bastian, and Ann Griffin. 2007. The prevalence and prognostic value of BRAF mutation in thyroid cancer. Annals of surgery, Vol. 246, 3 (2007), 466.
[18]
Chie Kikutake, Minako Yoshihara, Tetsuya Sato, Daisuke Saito, and Mikita Suyama. 2018. Intratumor heterogeneity of HMCN1 mutant alleles associated with poor prognosis in patients with breast cancer. Oncotarget, Vol. 9, 70 (2018), 33337.
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[20]
Xu Li, Xiaocong Wang, and Pujun Gao. 2017. Diabetes mellitus and risk of hepatocellular carcinoma. BioMed research international, Vol. 2017 (2017).
[21]
Muxuan Liang, Zhizhong Li, Ting Chen, and Jianyang Zeng. 2015. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Vol. 12, 4 (2015), 928--937.
[22]
Boyu Lyu and Anamul Haque. 2018. Deep learning based tumor type classification using gene expression data. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 89--96.
[23]
Jacqueline Mersch, Michelle A Jackson, Minjeong Park, Denise Nebgen, Susan K Peterson, Claire Singletary, Banu K Arun, and Jennifer K Litton. 2015. Cancers associated with BRCA 1 and BRCA 2 mutations other than breast and ovarian. Cancer, Vol. 121, 2 (2015), 269--275.
[24]
Hayato Nakagawa, Yuki Hayata, Satoshi Kawamura, Tomoharu Yamada, Naoto Fujiwara, and Kazuhiko Koike. 2018. Lipid metabolic reprogramming in hepatocellular carcinoma. Cancers, Vol. 10, 11 (2018), 447.
[25]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining . ACM, 1135--1144.
[26]
Raheleh Roudi, Alireza Korourian, Ahmad Shariftabrizi, and Zahra Madjd. 2015. Differential expression of cancer stem cell markers ALDH1 and CD133 in various lung cancer subtypes. Cancer investigation, Vol. 33, 7 (2015), 294--302.
[27]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision . 618--626.
[28]
Shirish Krishnaj Shevade and S Sathiya Keerthi. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, Vol. 19, 17 (2003), 2246--2253.
[29]
Mandeep Kumar Singh, Bhrigu Kumar Das, Sandeep Choudhary, Deepak Gupta, and Umesh K Patil. 2018. Diabetes and hepatocellular carcinoma: A pathophysiological link and pharmacological management. Biomedicine & Pharmacotherapy, Vol. 106 (2018), 991--1002.
[30]
Arida Ferti Syafiandini, Ito Wasito, Setiadi Yazid, Aries Fitriawan, and Mukhlis Amien. 2016. Cancer subtype identification using deep learning approach. In Computer, Control, Informatics and its Applications (IC3INA), 2016 International Conference on. IEEE, 108--112.
[31]
V'itor Teixeira, Rui Camacho, and Pedro G Ferreira. 2017. Learning influential genes on cancer gene expression data with stacked denoising autoencoders. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 1201--1205.
[32]
Mathias Uhlen, Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors, et almbox. 2017. A pathology atlas of the human cancer transcriptome. Science, Vol. 357, 6352 (2017), eaan2507.
[33]
Charles J Vaske, Stephen C Benz, J Zachary Sanborn, Dent Earl, Christopher Szeto, Jingchun Zhu, David Haussler, and Joshua M Stuart. 2010. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, Vol. 26, 12 (2010), i237--i245.
[34]
Mingda Wang, Jun Han, Hao Xing, Han Zhang, Zhenli Li, Lei Liang, Chao Li, Shuyang Dai, Mengchao Wu, Feng Shen, et almbox. 2016. Dysregulated fatty acid metabolism in hepatocellular carcinoma. Hepatic oncology, Vol. 3, 4 (2016), 241--251.
[35]
Mingzhao Xing, Ali S Alzahrani, Kathryn A Carson, David Viola, Rossella Elisei, Bela Bendlova, Linwah Yip, Caterina Mian, Federica Vianello, R Michael Tuttle, et almbox. 2013. Association between BRAF V600E mutation and mortality in patients with papillary thyroid cancer. Jama, Vol. 309, 14 (2013), 1493--1501.
[36]
Yaping Xu, Yue Deng, Zhenhua Ji, Haibin Liu, Yueyang Liu, Hu Peng, Jian Wu, and Jingping Fan. 2014. Identification of thyroid carcinoma related genes with mRMR and shortest path approaches. PLoS one, Vol. 9, 4 (2014), e94022.
[37]
Yuchen Yuan, Yi Shi, Changyang Li, Jinman Kim, Weidong Cai, Zeguang Han, and David Dagan Feng. 2016. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations. BMC bioinformatics, Vol. 17, 17 (2016), 476.

Cited By

View all
  • (2023)Interpreting Deep Machine Learning Models: An Easy Guide for OncologistsIEEE Reviews in Biomedical Engineering10.1109/RBME.2021.313135816(192-207)Online publication date: 2023
  • (2023)Bladder cancer gene expression prediction with explainable algorithmsNeural Computing and Applications10.1007/s00521-023-09142-336:4(1585-1597)Online publication date: 11-Nov-2023
  • (2022)Explainable and secure artificial intelligence: taxonomy, cases of study, learned lessons, challenges and future directionsEnterprise Information Systems10.1080/17517575.2022.209853717:9Online publication date: 26-Jul-2022
  • Show More Cited By

Index Terms

  1. Model-Agnostic Interpretation of Cancer Classification with Multi-Platform Genomic Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
      September 2019
      716 pages
      ISBN:9781450366663
      DOI:10.1145/3307339
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 September 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cancer detection
      2. information fusion
      3. machine learning
      4. model interpretation

      Qualifiers

      • Short-paper

      Conference

      BCB '19
      Sponsor:

      Acceptance Rates

      BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;
      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Interpreting Deep Machine Learning Models: An Easy Guide for OncologistsIEEE Reviews in Biomedical Engineering10.1109/RBME.2021.313135816(192-207)Online publication date: 2023
      • (2023)Bladder cancer gene expression prediction with explainable algorithmsNeural Computing and Applications10.1007/s00521-023-09142-336:4(1585-1597)Online publication date: 11-Nov-2023
      • (2022)Explainable and secure artificial intelligence: taxonomy, cases of study, learned lessons, challenges and future directionsEnterprise Information Systems10.1080/17517575.2022.209853717:9Online publication date: 26-Jul-2022
      • (2022)A Review of Framework for Machine Learning InterpretabilityAugmented Cognition10.1007/978-3-031-05457-0_21(261-272)Online publication date: 16-Jun-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media