short-paper

Model-Agnostic Interpretation of Cancer Classification with Multi-Platform Genomic Data

Authors:

Sanzheng QiaoAuthors Info & Claims

BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Pages 34 - 41

https://doi.org/10.1145/3307339.3342189

Published: 04 September 2019 Publication History

Abstract

Machine learning models are often criticised for being black-boxes. Recent work in this field has aimed to address this criticism by developing methods to explain the underlying behaviour of machine learning models. These explanations are designed to help the end-user interpret how the models input features are used to make a prediction. Here, we present an extension to one such method, referred to as local interpretable model-agnostic explanations, to interpret multimodal tumor type classification from multi-platform genomic data. We propose a framework for transparent biomedical machine learning by leveraging interpretable dimensionality reduction to facilitate gene-wise explanations for the model behaviour. Using RNA-seq expression and single nucleotide variation (SNV) data from eight cancer types, our experimental results uncovered the models use of clinically relevant genes for cancer cell stratification. We demonstrate that model-agnostic explanations can provide valuable information to a clinician or scientist when predictive ability and interpretability are of absolute importance.

References

[1]

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.

Digital Library

[2]

Zakariya Yahya Algamal and Muhammad Hisyam Lee. 2015. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Systems with Applications, Vol. 42, 23 (2015), 9326--9332.

Digital Library

[3]

Simon Anders, Paul Theodor Pyl, and Wolfgang Huber. 2015. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, Vol. 31, 2 (2015), 166--169.

[4]

John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and Fabio A González. 2017. Gated Multimodal Units for Information Fusion. arXiv preprint arXiv:1702.01992 (2017).

[5]

Emanuele Borgonovo and Elmar Plischke. 2016. Sensitivity analysis: a review of recent advances. European Journal of Operational Research, Vol. 248, 3 (2016), 869--887.

[6]

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam

[7]

: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 839--847.

[8]

Kristian Cibulskis, Michael S Lawrence, Scott L Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S Lander, and Gad Getz. 2013. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature biotechnology, Vol. 31, 3 (2013), 213.

[9]

ENCODE Project Consortium et almbox. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, Vol. 489, 7414 (2012), 57.

[10]

Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems. 24--30.

Digital Library

[11]

Anthony A Firek, Mia C Perez, Amber Gonda, Li Lei, Iqbal Munir, Alfred A Simental, Frances E Carr, Benjamin J Becerra, Marino De Leon, and Salma Khan. 2017. Pathologic significance of a novel oncoprotein in thyroid cancer progression. Head & neck, Vol. 39, 12 (2017), 2459--2469.

[12]

Daniela Gasparotto, Roberta Maestro, Sara Piccinin, Tamara Vukosavljevic, Luigi Barzan, Sandro Sulfaro, and Mauro Boiocchi. 1997. Overexpression of CDC25A and CDC25B in head and neck cancers. Cancer Research, Vol. 57, 12 (1997), 2366--2368.

[13]

Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout networks. arXiv preprint arXiv:1302.4389 (2013).

Digital Library

[14]

Yang Guo, Shuhui Liu, Zhanhuai Li, and Xuequn Shang. 2017. Towards the classification of cancer subtypes by using cascade deep forest model in gene expression data. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 1664--1669.

[15]

Bertrand Iooss and Paul Lema^itre. 2015. A review on global sensitivity analysis methods. In Uncertainty management in simulation-optimization of complex systems. Springer, 101--122.

[16]

Hirak Kashyap, Hasin Afzal Ahmed, Nazrul Hoque, Swarup Roy, and Dhruba Kumar Bhattacharyya. 2015. Big data analytics in bioinformatics: A machine learning perspective. arXiv preprint arXiv:1506.05101 (2015).

[17]

Electron Kebebew, Julie Weng, Juergen Bauer, Gustavo Ranvier, Orlo H Clark, Quan-Yang Duh, Daniel Shibru, Boris Bastian, and Ann Griffin. 2007. The prevalence and prognostic value of BRAF mutation in thyroid cancer. Annals of surgery, Vol. 246, 3 (2007), 466.

[18]

Chie Kikutake, Minako Yoshihara, Tetsuya Sato, Daisuke Saito, and Mikita Suyama. 2018. Intratumor heterogeneity of HMCN1 mutant alleles associated with poor prognosis in patients with breast cancer. Oncotarget, Vol. 9, 70 (2018), 33337.

[19]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[20]

Xu Li, Xiaocong Wang, and Pujun Gao. 2017. Diabetes mellitus and risk of hepatocellular carcinoma. BioMed research international, Vol. 2017 (2017).

[21]

Muxuan Liang, Zhizhong Li, Ting Chen, and Jianyang Zeng. 2015. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Vol. 12, 4 (2015), 928--937.

Digital Library

[22]

Boyu Lyu and Anamul Haque. 2018. Deep learning based tumor type classification using gene expression data. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 89--96.

Digital Library

[23]

Jacqueline Mersch, Michelle A Jackson, Minjeong Park, Denise Nebgen, Susan K Peterson, Claire Singletary, Banu K Arun, and Jennifer K Litton. 2015. Cancers associated with BRCA 1 and BRCA 2 mutations other than breast and ovarian. Cancer, Vol. 121, 2 (2015), 269--275.

[24]

Hayato Nakagawa, Yuki Hayata, Satoshi Kawamura, Tomoharu Yamada, Naoto Fujiwara, and Kazuhiko Koike. 2018. Lipid metabolic reprogramming in hepatocellular carcinoma. Cancers, Vol. 10, 11 (2018), 447.

[25]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining . ACM, 1135--1144.

Digital Library

[26]

Raheleh Roudi, Alireza Korourian, Ahmad Shariftabrizi, and Zahra Madjd. 2015. Differential expression of cancer stem cell markers ALDH1 and CD133 in various lung cancer subtypes. Cancer investigation, Vol. 33, 7 (2015), 294--302.

[27]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision . 618--626.

[28]

Shirish Krishnaj Shevade and S Sathiya Keerthi. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, Vol. 19, 17 (2003), 2246--2253.

[29]

Mandeep Kumar Singh, Bhrigu Kumar Das, Sandeep Choudhary, Deepak Gupta, and Umesh K Patil. 2018. Diabetes and hepatocellular carcinoma: A pathophysiological link and pharmacological management. Biomedicine & Pharmacotherapy, Vol. 106 (2018), 991--1002.

[30]

Arida Ferti Syafiandini, Ito Wasito, Setiadi Yazid, Aries Fitriawan, and Mukhlis Amien. 2016. Cancer subtype identification using deep learning approach. In Computer, Control, Informatics and its Applications (IC3INA), 2016 International Conference on. IEEE, 108--112.

[31]

V'itor Teixeira, Rui Camacho, and Pedro G Ferreira. 2017. Learning influential genes on cancer gene expression data with stacked denoising autoencoders. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on. IEEE, 1201--1205.

[32]

Mathias Uhlen, Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors, et almbox. 2017. A pathology atlas of the human cancer transcriptome. Science, Vol. 357, 6352 (2017), eaan2507.

[33]

Charles J Vaske, Stephen C Benz, J Zachary Sanborn, Dent Earl, Christopher Szeto, Jingchun Zhu, David Haussler, and Joshua M Stuart. 2010. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, Vol. 26, 12 (2010), i237--i245.

Digital Library

[34]

Mingda Wang, Jun Han, Hao Xing, Han Zhang, Zhenli Li, Lei Liang, Chao Li, Shuyang Dai, Mengchao Wu, Feng Shen, et almbox. 2016. Dysregulated fatty acid metabolism in hepatocellular carcinoma. Hepatic oncology, Vol. 3, 4 (2016), 241--251.

[35]

Mingzhao Xing, Ali S Alzahrani, Kathryn A Carson, David Viola, Rossella Elisei, Bela Bendlova, Linwah Yip, Caterina Mian, Federica Vianello, R Michael Tuttle, et almbox. 2013. Association between BRAF V600E mutation and mortality in patients with papillary thyroid cancer. Jama, Vol. 309, 14 (2013), 1493--1501.

[36]

Yaping Xu, Yue Deng, Zhenhua Ji, Haibin Liu, Yueyang Liu, Hu Peng, Jian Wu, and Jingping Fan. 2014. Identification of thyroid carcinoma related genes with mRMR and shortest path approaches. PLoS one, Vol. 9, 4 (2014), e94022.

[37]

Yuchen Yuan, Yi Shi, Changyang Li, Jinman Kim, Weidong Cai, Zeguang Han, and David Dagan Feng. 2016. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations. BMC bioinformatics, Vol. 17, 17 (2016), 476.

Cited By

Amorim JAbreu PFernandez AReyes MSantos JAbreu M(2023)Interpreting Deep Machine Learning Models: An Easy Guide for OncologistsIEEE Reviews in Biomedical Engineering10.1109/RBME.2021.313135816(192-207)Online publication date: 2023
https://doi.org/10.1109/RBME.2021.3131358
Kırboğa K(2023)Bladder cancer gene expression prediction with explainable algorithmsNeural Computing and Applications10.1007/s00521-023-09142-336:4(1585-1597)Online publication date: 11-Nov-2023
https://doi.org/10.1007/s00521-023-09142-3
Eldrandaly KAbdel-Basset MIbrahim MAbdel-Aziz N(2022)Explainable and secure artificial intelligence: taxonomy, cases of study, learned lessons, challenges and future directionsEnterprise Information Systems10.1080/17517575.2022.209853717:9Online publication date: 26-Jul-2022
https://doi.org/10.1080/17517575.2022.2098537
Show More Cited By

Index Terms

Model-Agnostic Interpretation of Cancer Classification with Multi-Platform Genomic Data
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Segmentation of Microscopic Breast Cancer Images for Cancer Detection
ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications

Breast cancer is one of serious diseases that affect mainly woman and late diagnosis can lead to death. However early diagnosis increases survivability significantly, therefore making it very important. There are different diagnosis techniques for early ...
Comparison of Fusion Methodologies Using CNV and RNA-Seq for Cancer Classification: A Case Study on Non-Small-Cell Lung Cancer
Bioengineering and Biomedical Signal and Image Processing
Abstract
Lung cancer is one of the most frequent cancer types, and one among those causing more deceases worldwide. Nowadays, in order to improve the diagnosis of cancer more screenings are performed to the same patient and various biological sources are ...
The Genomic and Transcriptomic Analysis of Stomach Cancer
ICBBS '19: Proceedings of the 2019 8th International Conference on Bioinformatics and Biomedical Science

The increasing number in cancer population necessitates the urgency for studying cancer. Cancer, the abnormal proliferation of cells arising from a particular organ and metastasizing to nearby areas, is a general category that contains more than 100 ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 2019

716 pages

ISBN:9781450366663

DOI:10.1145/3307339

General Chairs:
Xinghua (Mindy) Shi
Temple University, USA
,
Michael Buck
University of Buffalo, USA
,
Program Chairs:
Jian Ma
Carnegie Mellon University, USA
,
Pierangelo Veltri
University Magna Graecia of Catanzaro, Italy

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

BCB '19

Sponsor:

SIGBio

BCB '19: 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 7 - 10, 2019

NY, Niagara Falls, USA

Acceptance Rates

BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Amorim JAbreu PFernandez AReyes MSantos JAbreu M(2023)Interpreting Deep Machine Learning Models: An Easy Guide for OncologistsIEEE Reviews in Biomedical Engineering10.1109/RBME.2021.313135816(192-207)Online publication date: 2023
https://doi.org/10.1109/RBME.2021.3131358
Kırboğa K(2023)Bladder cancer gene expression prediction with explainable algorithmsNeural Computing and Applications10.1007/s00521-023-09142-336:4(1585-1597)Online publication date: 11-Nov-2023
https://doi.org/10.1007/s00521-023-09142-3
Eldrandaly KAbdel-Basset MIbrahim MAbdel-Aziz N(2022)Explainable and secure artificial intelligence: taxonomy, cases of study, learned lessons, challenges and future directionsEnterprise Information Systems10.1080/17517575.2022.209853717:9Online publication date: 26-Jul-2022
https://doi.org/10.1080/17517575.2022.2098537
de Abreu Araújo IHidaka Torres RNeto N(2022)A Review of Framework for Machine Learning InterpretabilityAugmented Cognition10.1007/978-3-031-05457-0_21(261-272)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-05457-0_21

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents