Clinical intelligence: New machine learning techniques for predicting clinical drug response
Introduction
Machine learning (ML) techniques have been successfully applied to solve many real biological problems [[1], [2], [3], [4], [5], [6], [7], [8]]. Specifically, ML techniques are pertinent to improving the prediction performance or making accurate predictions for a given task [9]. ML methods provide promising solutions in artificial intelligence when applied to clinical informatics, and could improve the cancer drug discovery process in the coming years. Fig. 1 reports the increasing number of ML publications in the field of clinical informatics since 2014, to demonstrate the avid interest in finding solutions that incorporate ML methods.
The primary goal of cancer research is to discover the most effective treatment for each cancer patient, where each patient responds differently to a specific treatment due to (1) external factors, such as use of tobacco products and unhealthy diet; and (2) internal factors, such as cancer cell heterogeneity and immune conditions. As the number of cancer patients worldwide increases every year, correctly predicting the sensitivity (i.e., responding) or resistance (non-responding) of a cancer to a specific drug (also called predicting the clinical drug response) will be of significant interest to clinicians and care-givers [10].
Data-driven approaches employing machine learning for cancer drug sensitivity prediction are principally categorized into three groups: the supervised approach, transfer learning approach, and ranking approach. Several supervised approaches to cancer drug sensitivity prediction have been developed. Supervised approaches perform drug sensitivity prediction via the use of labeled clinical data. For example, Turki et al. [10] proposed a supervised approach modelling the drug sensitivity prediction as a link prediction problem. First, the proposed approach takes the data of cancer patients as the input expression. Second, a feature learning technique is applied to the expression data to generate new feature representation of the expression data. Third, feature selection and instance selection are performed via statistical leverage scores and active learning to generate data with fewer features and fewer examples. Finally, a machine learning algorithm is applied to the reduced data to generate drug sensitivity predictions. Geeleher et al. [11] proposed a supervised approach that takes microarray data from cancer cell lines as input training data and microarray data from cancer clinical trials as testing data. The training data and testing data are then processed and homogenized. A machine learning algorithm is applied to the training data, to obtain a model. The obtained model is applied to the testing data, to generate in-vivo drug sensitivity predictions. Majumder et al. [12] proposed CANScript, a supervised approach to perform drug sensitivity prediction. CANScript processes data related to colorectal cancer (CRC) as well as head and neck squamous cell carcinoma (HNSCC) tumors of 109 patients. Then, a machine learning algorithm is applied to the 109 patients' data, to obtain a model. Finally, the obtained model is applied to the testing data of 55 cancer patients, to generate drug sensitivity predictions. Basu et al. proposed a supervised approach employing a weighted version of elastic net [13] for cancer drug sensitivity prediction. Other supervised approaches have also been proposed for the task of cancer drug sensitivity prediction [14,15].
Transfer learning approaches design computational techniques employing auxiliary data from a related task B to improve the prediction performance or to make accurate predictions in a target task A [1,[16], [17], [18]]. Turki et al. [1] proposed transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients. These approaches were designed to employ auxiliary data of a related task, such as drug sensitivity prediction of breast cancer patients, to improve prediction accuracy in the task of predicting drug sensitivity of multiple myeloma patients. Several in-vitro datasets of related tasks have been utilized to build models and perform in-vivo drug sensitivity predictions of multiple myeloma patients. Recently Turki et al. [18] proposed another transfer learning approach for cancer drug sensitivity prediction using Procrustes analysis and mean shift. The goal of this approach was to change the representation of auxiliary data of a related task to a representation closer to data in the target task. Then, a machine learning algorithm is applied to the combination of auxiliary data of the new representation and target training data, to obtain a model used to generate predictions on the target test set.
The work described here differs from Refs. [1,18] in three ways. (1) We present two new transfer learning approaches, where the first approach includes a transfer mechanism that incorporates a boosting technique to improve the transferring mechanism by means of excluding some auxiliary data that is not relevant for the target task. The second approach employs only the transfer mechanism of the first approach, without the boosting technique. (2) We include additional performance testing results under the transfer learning and domain adaptation settings, using only in-vivo clinical data to train models and generate predictions. (3) We evaluate existing transfer learning algorithms [19,20], and compare the same with the proposed transfer learning approaches.
Ranking approaches employ a data-driven technique to identify the most effective drugs for a given cancer. Costello et al. [21] assessed the performance of many data-driven techniques for the task of ranking the most effective drugs for each breast cancer cell line. The training data consisted of 53 breast cancer cell lines, where each breast cancer cell line was associated with 28 drugs ordered in precedence of the most effective drug (ranked first) to the most ineffective (ranked last). The goal was to find a data-driven technique capable of correctly ranking drugs for each cell line in the test set of 18 breast cancer cell lines. All prediction algorithms were evaluated against the ground truth using a weighted probabilistic c-index (wpc-index) and Spearman correlations. The best performing technique employed a supervised algorithm utilizing a new feature representation coupled with a probabilistic nonlinear regression. The second-best performing technique utilized random forest regression. The remaining prediction algorithms generate predictions that were not statistically significant.
Although supervised algorithms are designed for superior accuracy of their predictions, these algorithms require a sizeable set of negative and positive training examples, that in turn is associated with higher costs of cancer drug sensitivity screening. Ranking approaches face the similar challenge of high costs associated with cancer drug screening, that in this case is owing to each cancer cell line being screened against several drug compounds. Moreover, obtaining labeled clinical data of patients might introduce ethical issues. In contrast to supervised algorithms, transfer learning approaches can employ auxiliary data from related tasks to improve the prediction performance or to make accurate predictions in a target task of cancer drug sensitivity prediction. However, this process requires the development of computational techniques that adopt successful knowledge transfer mechanisms.
Contributions. The main contributions of this paper are as follows.
- -
We propose new transfer learning approaches for the clinical informatics domain, enabling state-of-the-art machine learning algorithms to achieve high performance results from several real clinical datasets pertaining to patients of multiple myeloma, triple-negative breast cancer, and breast cancer.
- -
The proposed approaches adopt modified versions of boosting and advanced transfer learning algorithms, that are the first to be applied in the clinical informatics domain [19,22].
- -
Unlike previous works, we evaluate the proposed approaches using several performance measures against baselines, including existing transfer learning algorithms, such as TrAdaBoost and CORAL-SVM [19,20], based solely on in-vivo data.
- -
We perform an empirical study to demonstrate the predictive accuracy of the proposed transfer learning approaches. Experimental results show the effectiveness and superior performance of the proposed approaches, when compared to the baseline approaches.
Organization. The rest of this paper is organized as follows. Section 2 reviews existing research related to this paper. Section 3 describes in detail the proposed transfer learning approaches. Section 4 presents experimental results, comparing the proposed approaches against the baselines. Section 5 provides a discussion of the results. Section 6 concludes the paper and suggests directions of future work.
Section snippets
Related work
There are two main bodies of research related to our work — correlation alignment for unsupervised domain adaptation (CORAL), particularly CORAL-SVM, and boosting for transfer learning (TrAdaBoost) [19,20].
The first transfer learning approach (PT1)
Fig. 2 shows the flowchart of the first transfer learning approach, which takes the following input: target training set consisting of tumor samples and the corresponding drug sensitivity labels , auxiliary data set consisting of tumor samples and the corresponding drug sensitivity labels of a related task, and a target test set consisting of unseen tumor samples. The first proposed approach, named
Experiments and results
In this section, we first describe the datasets used in this study. Then, we present our experimental methodology. Finally, we compare the proposed approaches against several baselines using different performance measures.
Discussion
Our first proposed transfer learning approach (PT1) aims to incorporate auxiliary data of a transformed representation closer to the target training set into the input of a machine learning algorithm, to obtain a prediction model. If the transformed auxiliary data are not closer to the target training set, then PT1 abstains from incorporating the transformed auxiliary data into the target training data. Consequently, only the target training data are provided as input to a machine learning
Conclusion and future work
We propose two new transfer learning approaches to accurately predict the clinical drug response of patients with different cancer types, including breast cancer, triple-negative breast cancer, and multiple myeloma. The first proposed approach works by transferring knowledge of auxiliary data from a related task to a target task of drug sensitivity prediction. The second approach includes an abstention mechanism, whose aim is to abstain from including those tumor samples that do not have closer
Conflicts of interest
None declared.
Acknowledgements
This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. G-121-611-39. The authors, therefore, acknowledge with thanks DSR for technical and financial support.
Turki Turki received the B.S. degree in computer science from King Abdulaziz University, the M.S. degree in computer science from NYU.POLY, and the Ph.D. degree in computer science from the New Jersey Institute of Technology. He is currently an assistant professor with the Department of Computer Science, King Abdulaziz University, Saudi Arabia. His research interests include algorithms, machine learning, data mining, big data analytics, sustainable computing, health informatics, bioinformatics,
References (66)
Prediction of anti-cancer drug response by kernelized multi-task learning
Artif. Intell. Med.
(2016)- et al.
Machine learning models to predict the progression from early to late stages of papillary renal cell carcinoma
Comput. Biol. Med.
(2018) - et al.
Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B
Comput. Biol. Med.
(2017) - et al.
A survey of machine learning applications in HIV clinical research and care
Comput. Biol. Med.
(2017) - et al.
Machine learning ensemble modelling to classify caesarean section and vaginal delivery types using Cardiotocography traces
Comput. Biol. Med.
(2018) - et al.
Replicating human expertise of mechanical ventilation waveform analysis in detecting patient-ventilator cycling asynchrony using machine learning
Comput. Biol. Med.
(2018) - et al.
Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer
Lancet
(2003) - et al.
A randomized phase 2 study of erlotinib alone and in combination with bortezomib in previously treated advanced non-small cell lung cancer
J. Thorac. Oncol.
(2009) - et al.
Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib
Blood
(2007) - et al.
Assessment of an RNA interference screen-derived mitotic and ceramide pathway metagene as a predictor of response to neoadjuvant paclitaxel for primary triple-negative breast cancer: a retrospective analysis of five clinical trials
Lancet Oncol.
(2010)
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power
Inf. Sci.
Anti-cancer drug response prediction using neighbor-based collaborative filtering with global effect removal
Mol. Ther. Nucleic Acids
A landscape of pharmacogenomic interactions in cancer
Cell
A survey of multi-source domain adaptation
Inf. Fusion
Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients
IEEE Access
Deep learning applications in medical image analysis
IEEE Access
Foundations of Machine Learning
A link prediction approach to cancer drug sensitivity prediction
BMC Syst. Biol.
Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines
Genome Biol.
Predicting clinical response to anticancer drugs using an ex vivo platform that captures tumour heterogeneity
Nat. Commun.
RWEN: response-weighted elastic net for prediction of chemosensitivity of cancer cell lines
Bioinformatics
Learning approaches to improve prediction of drug sensitivity in breast cancer patients
Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties
Publ. Library Sci. (PLoS) One
A survey of transfer learning
J. Big Data
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction
J. Bioinf. Comput. Biol.
Boosting for transfer learning
Correlation Alignment for Unsupervised Domain Adaptation, Domain Adaptation in Computer Vision Applications
A community effort to assess and improve drug sensitivity prediction algorithms
Nat. Biotechnol.
A decision-theoretic generalization of on-line learning and an application to boosting
Reverse engineering gene regulatory networks using sampling and boosting techniques
Adjuvant docetaxel or vinorelbine with or without trastuzumab for breast cancer
N. Engl. J. Med.
Bortezomib/docetaxel combination therapy in patients with anthracycline-pretreated advanced/metastatic breast cancer: a phase I/II dose-escalation study
Br. J. Canc.
Cited by (21)
Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review
2022, International Journal of Medical InformaticsCitation Excerpt :Based on the PROBAST tool, none of the studies attained the ideal overall (concerns for) bias rating of low, nor (concerns for) applicability rating of low (Table S2). The studies had ratings ranging from moderate to high: Turki and Wang [30], Diao et al. [31], Veeranki et al. [34], Bora et al. [46], Xiao et al. [41], Brinker et al. [45], Kouchaki et al. [39], and Bazila and Ponniah et al. [42] (7 studies) showed serious concern for bias rating; Turki and Wang [30], Veeranki et al. [34], Kouchaki et al. [39], Bazila and Ponniah et al. [42], Brinker et al. [45] and Bora et al. [46] (6 studies showed serious overall concern for applicability rating). Three studies met 85% of the selected criteria in the TRIPOD statement.
A review on machine learning approaches and trends in drug discovery
2021, Computational and Structural Biotechnology JournalCitation Excerpt :It is also possible to predict the stability in human liver microsomes by calculating different molecular descriptors and chemical indices from 25 ChEMBL datasets with values close to 70% in validation [35]. An interesting new approach to predict the effect of a drug on a tumor line by obtaining information about the genes involved in the response of the drug in different tumors is the one followed in [101] from expression data (GEO). Specifically, they used three different types of tumors and by means of transfer learning they extracted information previously from different metadata of each tumor line with AUC values of 70%.
Machine learning algorithms for predicting drugs–tissues relationships
2019, Expert Systems with ApplicationsCitation Excerpt :Table 1 summarizes the machine learning algorithms that utilize this baseline. This baseline works in the domain adaptation setting within the transfer learning approach as presented in Sun, Feng, and Saenko (2017) and Turki and Wang (2019). Here, the representation of auxiliary data is changed into a representation closer to that of the target training data.
Artificial intelligence: opportunities and challenges in the clinical applications of triple-negative breast cancer
2023, British Journal of Cancer
Turki Turki received the B.S. degree in computer science from King Abdulaziz University, the M.S. degree in computer science from NYU.POLY, and the Ph.D. degree in computer science from the New Jersey Institute of Technology. He is currently an assistant professor with the Department of Computer Science, King Abdulaziz University, Saudi Arabia. His research interests include algorithms, machine learning, data mining, big data analytics, sustainable computing, health informatics, bioinformatics, computational biology, and social networks. His works have been published in journals such as BMC Genomics, BMC Systems Biology, IEEE Access, BioMed Research International, Journal of Bioinformatics and Computational Biology, Computers in Biology and Medicine, and Genes. He was presented with the distinction award from the deanship of scientific research at the King Abdulaziz University. He is supported by King Abdulaziz University and is currently working on several biomedicine related projects. Dr. Turki has served on the program committees of several international conferences. Also, he is an editorial board member of Computers in Biology and Medicine and Sustainable Computing: Informatics and Systems.
Jason T. L. Wang received the Ph.D. degree in computer science from the Courant Institute of Mathematical Sciences at New York University. He is a Professor of Computer Science at New Jersey Institute of Technology, and Director of the University's Data and Knowledge Engineering Laboratory. Dr. Wang's research interests include data science and computational biomedicine. He has published 9 books and over 150 refereed journal and conference papers in these areas. Dr. Wang has served on the program committees of over 200 national and international conferences, and on the editorial boards of several journals including ACM Transactions on Knowledge Discovery from Data (TKDD).