Precise prediction of multiple anticancer drug efficacy using multi target regression and support vector regression analysis

https://doi.org/10.1016/j.cmpb.2022.107027Get rights and content

Highlights

  • The enhanced multi target regression and support vector regression was found to be the most effective and accurate method for the selection of anticancer drugs based on tumor features of the patients in a personalized manner.

  • Increasing the training samples and statistical feature engineering improve the robustness of the model.

  • EL_MTR is the best to predict multiple anticancer drug efficacies and improve the accuracy of ranking drugs, irrespective of sample size.

  • ELM_SVR performs better than other MTR models with a large sample size and precise ranking process.

Abstract

Background and objectives

The prediction of multiple drug efficacies using machine learning prediction techniques based on clinical and molecular attributes of tumors is a new approach in the field of precision medicine of oncology. The selection of suitable and effective therapeutic drugs among the potential drugs is performed computationally considering the tumor features. In this study, we developed and validated machine learning models to predict the efficacy of five anti-cancer drugs according to the clinical and molecular attributes of 30 oral squamous cell carcinoma (OSCC) cohorts. This sounds a bit odd – consider: Ranking of the drugs was achieved using their apoptotic priming.

Methods

We developed multiple drug efficacy prediction models based on three types of tumor characteristics by applying machine learning methods, including multi-target regression (MTR) and support vector regression (SVR). The prediction accuracy of existing machine learning methods was enhanced by introducing novel pre-processing techniques to develop Enhanced MTR (E_MTR), Enhanced Log-based MTR (EL_MTR), Enhanced Multi-target SVR (EM_SVR), and Enhanced Log-based Multi-target SVR (ELM_SVR). As a unique capability, ELM_SVR and EL_MTR rank the drugs based on their predicted efficacy. All the drug efficacy prediction models were built using OSCC real samples and theoretical samples. The best model was selected was based on dataset size and evaluation metrics, such as error terms, residuals and parameter tuning, and cross-validated (CV) using 30 real samples and 340 theoretical samples.

Results

When 30 real tumor samples were used for the train-test and CV methods, MTR models predicted the efficacy with less error than SVR models. Comparatively, using 340 theoretical samples for the train-test and CV methods, though MTR improved the performance, SVR predicted the efficacy with zero error. We found that, for small samples, the proposed MTR provided a 0.01 difference between actual apoptotic priming and predicted priming of five drugs. For large samples, the predicted values by the proposed SVR had a difference of 0.00001. The error terms (Actual vs. Predicted) also reveal that the enhanced log model is suitable when MTR is applied. Meanwhile, the enhanced model is suitable for SVR learning for multiple drug efficacy prediction. It was found that the predicted ranks of the drugs based on the multi-targeted efficacy prediction exactly match the actual rankings.

Conclusion

We developed efficient statistical and machine learning models using MTR and SVR analysis for anticancer drug efficacy, which will be useful in the field of precision medicine to choose the most suitable drugs in personalized manner. The performance results of the proposed enhanced ranking techniques are described as follows: i) EL_MTR is the best to predict multiple anticancer drug efficacies and improve the accuracy of ranking drugs, irrespective of sample size; and ii) ELM_SVR performs better than other MTR models with a large sample size and precise ranking process.

Introduction

Oral cancer is the sixth most common cancer worldwide, with more than 300,000 deaths per year [28,29], affecting 16.1% of men and 10.4% of women [38]. Relapsed oral cancer is a challenging issue due to its aggressiveness and limited chemotherapeutic agents. Mainly, the treatment requirements vary between patients due to the diverse genetic features and environmental issues [37]. The early diagnosis of OSCC and providing treatment based on prediction may reduce the metastasis rate and there by improve the survival rate [46], [47], [48]. Hence, the need for precision medicine has emerged. While therapeutic decisions are made based on tumor location, cytogenetics, and histology [34], the lymph node status and histological grade gave up in drug response prediction [33]. Along with large-scale clinical trials, computational predictions are effective in their preciseness and drug sensitivity forecasting [29,32,49,50]. Drug sensitivity is determined by machine learning methods using genomic and therapeutic features of cancer cells [31, 39], for datasets like Cancer Target Discovery, and development screening database [30] such as Genomics of Drug Sensitivity in Cancer [39]. Earlier prediction using machine learning methods [36] and sensitivity analysis through statistical methods [34,35] has proven to provide accurate drug efficacy prediction. Moreover, a recent review discusses the use of ML techniques for five major medical applications in depth, providing guidance for clinicians, researchers, and decision-makers [40].

The different machine learning methods must be carefully selected based on the data and objective of the application. However, according to the ‘no free lunch’ theorem, there is no single best algorithm for predictive modeling [20]. The presence of redundancy, noise, and multivariate nature compound dependencies between input and target variables are challenges in building a good predictive model [21]. Multi-target regression methods provide an effective model based on the mapping of features with target variables and, therefore, guarantee better representation and interpretability of different data [22,23,48]. Further, multi-target methods pave the way to produce simpler models with enhanced computational efficiency [22].

Another study applied deep learning to build an optimized model for cancer classification applications. During pre-processing, log-transformation was performed to convert the data to a uniform range, which subsequently yielded interpretable patterns and helped to reduce skew [24]. Biological characteristics of the molecular subtypes were used to learn deep features of the subtype distribution of a patient's samples [25]. DNA microarray cancer data classification was done using a type-2 fuzzy system and c-means clustering, which managed noisy, outlier, non-linear gene expression profiles [26]. However, the expected classification accuracy was not achieved, and computational complexity was increased because of irrelevant, limited, and noisy genes in the database [27]. Later, another researcher used feature selection to avoid the above-said issue [26]. Few studies did imputations to handle missing data. Overfitting of models were avoided by weighting thru bootstrapping method, also model calibration and discrimination is verified. They reduced bias and validated their model with external data [49,50].

Recently, we employed Multi-Linear Regression (MLR), modified MLR-Weighted Least square, and Enhanced MLR-WLS for the prediction of anticancer drug efficacy [1]. For this analysis, we used real data and theoretical samples with clinical, cellular, and molecular attributes of tumors to predict general efficacy. To predict the individually multi-drug efficacy, we developed and employed MTR and SVR-based models for 30 real samples and 340 theoretical samples. MTR models were found best for small-size data, while SVR models were best for large-size data; cross-validation was performed to prove the models’ robustness. The models were then used to predict the apoptotic sensitivity of five drugs, including Paclitaxel, Vincristine, Daunorubicin, Cinblastine, and Doxorubicin.

Section snippets

Materials and methods

In this work, 30 oral squamous cell carcinoma (OSCC) cohort data was used [Robert et al., 2018]. The attributes, such as molecular (drug response gene expression), cellular (apoptotic priming), and clinical (age, sex, tumor-grade, tumor-stage and clinical-stages) data, were used as the dataset for developing the drug efficacy prediction models (Table 1). Further, 340 theoretical samples were used to improve the prediction performance of the proposed models. Statistical analysis of necessary

Machine learning models

Multi-target regression and widely-used support vector machine techniques were applied to the pre-processed data for the prediction of multidrug efficacy.

Results and discussions

Two different machine-learning techniques were employed on a data to predict the efficacy of multiple drugs. The components created by PLSR for MTR models and residuals were obtained to measure the performance of the MTR models. The values of SVR parameters were determined to verify the SVR models. The models created by MTR and SVR were evaluated based on the RMSE, which computes the mean difference between actual and predicted priming values [1,2], similarly mean squared error is also used [48]

Contribution and findings

Common feature processing for two different machine-learning techniques are applied. MTR is a linear regression, which can predict multivariable in a single execution, and SVR is capable of doing linear and nonlinear regression to predict a single target variable. We have succeeded in developing two pre-processing methods, which work well for both of these models. Generally, one of the variables from higher correlated pairs is to be removed to get good performance of the models. We have used

Declaration of Competing Interest

There is no conflict of Interest.

References (50)

  • F. Miao et al.

    A novel continuous blood pressure estimation approach based on datamining techniques

    IEEE J. Biomed. Health Inform.

    (2017)
  • M.R. Haque et al.

    A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models

    IEEE Access

    (2021)
  • P. Chittora

    Prediction of chronic kidney disease - a machine learning perspective

    IEEE Access

    (2021)
  • F. Yung et al.

    An investigation of demographic and drug-use patterns in fentanyl and carfentanil deaths in Ontario

    Forensic Sci. Med. Pathol.

    (2021)
  • R: A Language and Environment for Statistical Computing

    (2006)
  • D. Nguyen et al.

    Tumor classification by partial least squares using microarray gene expression data

    Bioinformatics

    (2002)
  • J. Kresta et al.

    Multivariate statistical monitoring of process operating performance

    Can. J. Chem. Eng.

    (1991)
  • R. Cramer et al.

    Comparative molecular-field analysis (ComFA). 1. Effect of shape on binding of steroids to carrier proteins

    J. Am. Chem. Soc.

    (1988)
  • M. Björn-Helge et al.

    The pls package: principal component and partial least squares regression in R

    J. Stat. Softw.

    (2007)
  • V.N. Vapnik

    The Nature of Statistical Learning Theory

    (1995)
  • V.N. Vapnik et al.

    Support vectormethod for function approximation, regression estimation andsignal processing

    Adv. Neural Inf. Process. Syst.

    (1996)
  • S. Gunn

    Support Vector Machines for Classification and Regression

    (1997)
  • D.H. Wolpert et al.

    No free lunch theorems for optimization

    IEEE Trans. Evol. Comput.

    (1997)
  • H. Borchani et al.

    A survey on multi-output regression

    Wiley Interdiscipl. Rev. Data Min. Knowl. Discov.

    (2015)
  • D. Kocev et al.

    Using single- and multitargetregression trees and ensembles to model a compound index of vegetation condition

    Ecol. Model.

    (2009)
  • Cited by (7)

    View all citing articles on Scopus
    View full text