Precise prediction of multiple anticancer drug efficacy using multi target regression and support vector regression analysis

doi:10.1016/j.cmpb.2022.107027

Computer Methods and Programs in Biomedicine

Volume 224, September 2022, 107027

https://doi.org/10.1016/j.cmpb.2022.107027 Get rights and content

Highlights

•
The enhanced multi target regression and support vector regression was found to be the most effective and accurate method for the selection of anticancer drugs based on tumor features of the patients in a personalized manner.
•
Increasing the training samples and statistical feature engineering improve the robustness of the model.
•
EL_MTR is the best to predict multiple anticancer drug efficacies and improve the accuracy of ranking drugs, irrespective of sample size.
•
ELM_SVR performs better than other MTR models with a large sample size and precise ranking process.

Abstract

Background and objectives

The prediction of multiple drug efficacies using machine learning prediction techniques based on clinical and molecular attributes of tumors is a new approach in the field of precision medicine of oncology. The selection of suitable and effective therapeutic drugs among the potential drugs is performed computationally considering the tumor features. In this study, we developed and validated machine learning models to predict the efficacy of five anti-cancer drugs according to the clinical and molecular attributes of 30 oral squamous cell carcinoma (OSCC) cohorts. This sounds a bit odd – consider: Ranking of the drugs was achieved using their apoptotic priming.

Methods

We developed multiple drug efficacy prediction models based on three types of tumor characteristics by applying machine learning methods, including multi-target regression (MTR) and support vector regression (SVR). The prediction accuracy of existing machine learning methods was enhanced by introducing novel pre-processing techniques to develop Enhanced MTR (E_MTR), Enhanced Log-based MTR (EL_MTR), Enhanced Multi-target SVR (EM_SVR), and Enhanced Log-based Multi-target SVR (ELM_SVR). As a unique capability, ELM_SVR and EL_MTR rank the drugs based on their predicted efficacy. All the drug efficacy prediction models were built using OSCC real samples and theoretical samples. The best model was selected was based on dataset size and evaluation metrics, such as error terms, residuals and parameter tuning, and cross-validated (CV) using 30 real samples and 340 theoretical samples.

Results

When 30 real tumor samples were used for the train-test and CV methods, MTR models predicted the efficacy with less error than SVR models. Comparatively, using 340 theoretical samples for the train-test and CV methods, though MTR improved the performance, SVR predicted the efficacy with zero error. We found that, for small samples, the proposed MTR provided a 0.01 difference between actual apoptotic priming and predicted priming of five drugs. For large samples, the predicted values by the proposed SVR had a difference of 0.00001. The error terms (Actual vs. Predicted) also reveal that the enhanced log model is suitable when MTR is applied. Meanwhile, the enhanced model is suitable for SVR learning for multiple drug efficacy prediction. It was found that the predicted ranks of the drugs based on the multi-targeted efficacy prediction exactly match the actual rankings.

Conclusion

We developed efficient statistical and machine learning models using MTR and SVR analysis for anticancer drug efficacy, which will be useful in the field of precision medicine to choose the most suitable drugs in personalized manner. The performance results of the proposed enhanced ranking techniques are described as follows: i) EL_MTR is the best to predict multiple anticancer drug efficacies and improve the accuracy of ranking drugs, irrespective of sample size; and ii) ELM_SVR performs better than other MTR models with a large sample size and precise ranking process.

Introduction

Oral cancer is the sixth most common cancer worldwide, with more than 300,000 deaths per year [28,29], affecting 16.1% of men and 10.4% of women [38]. Relapsed oral cancer is a challenging issue due to its aggressiveness and limited chemotherapeutic agents. Mainly, the treatment requirements vary between patients due to the diverse genetic features and environmental issues [37]. The early diagnosis of OSCC and providing treatment based on prediction may reduce the metastasis rate and there by improve the survival rate [46], [47], [48]. Hence, the need for precision medicine has emerged. While therapeutic decisions are made based on tumor location, cytogenetics, and histology [34], the lymph node status and histological grade gave up in drug response prediction [33]. Along with large-scale clinical trials, computational predictions are effective in their preciseness and drug sensitivity forecasting [29,32,49,50]. Drug sensitivity is determined by machine learning methods using genomic and therapeutic features of cancer cells [31, 39], for datasets like Cancer Target Discovery, and development screening database [30] such as Genomics of Drug Sensitivity in Cancer [39]. Earlier prediction using machine learning methods [36] and sensitivity analysis through statistical methods [34,35] has proven to provide accurate drug efficacy prediction. Moreover, a recent review discusses the use of ML techniques for five major medical applications in depth, providing guidance for clinicians, researchers, and decision-makers [40].

The different machine learning methods must be carefully selected based on the data and objective of the application. However, according to the ‘no free lunch’ theorem, there is no single best algorithm for predictive modeling [20]. The presence of redundancy, noise, and multivariate nature compound dependencies between input and target variables are challenges in building a good predictive model [21]. Multi-target regression methods provide an effective model based on the mapping of features with target variables and, therefore, guarantee better representation and interpretability of different data [22,23,48]. Further, multi-target methods pave the way to produce simpler models with enhanced computational efficiency [22].

Another study applied deep learning to build an optimized model for cancer classification applications. During pre-processing, log-transformation was performed to convert the data to a uniform range, which subsequently yielded interpretable patterns and helped to reduce skew [24]. Biological characteristics of the molecular subtypes were used to learn deep features of the subtype distribution of a patient's samples [25]. DNA microarray cancer data classification was done using a type-2 fuzzy system and c-means clustering, which managed noisy, outlier, non-linear gene expression profiles [26]. However, the expected classification accuracy was not achieved, and computational complexity was increased because of irrelevant, limited, and noisy genes in the database [27]. Later, another researcher used feature selection to avoid the above-said issue [26]. Few studies did imputations to handle missing data. Overfitting of models were avoided by weighting thru bootstrapping method, also model calibration and discrimination is verified. They reduced bias and validated their model with external data [49,50].

Recently, we employed Multi-Linear Regression (MLR), modified MLR-Weighted Least square, and Enhanced MLR-WLS for the prediction of anticancer drug efficacy [1]. For this analysis, we used real data and theoretical samples with clinical, cellular, and molecular attributes of tumors to predict general efficacy. To predict the individually multi-drug efficacy, we developed and employed MTR and SVR-based models for 30 real samples and 340 theoretical samples. MTR models were found best for small-size data, while SVR models were best for large-size data; cross-validation was performed to prove the models’ robustness. The models were then used to predict the apoptotic sensitivity of five drugs, including Paclitaxel, Vincristine, Daunorubicin, Cinblastine, and Doxorubicin.

Section snippets

Materials and methods

In this work, 30 oral squamous cell carcinoma (OSCC) cohort data was used [Robert et al., 2018]. The attributes, such as molecular (drug response gene expression), cellular (apoptotic priming), and clinical (age, sex, tumor-grade, tumor-stage and clinical-stages) data, were used as the dataset for developing the drug efficacy prediction models (Table 1). Further, 340 theoretical samples were used to improve the prediction performance of the proposed models. Statistical analysis of necessary

Machine learning models

Multi-target regression and widely-used support vector machine techniques were applied to the pre-processed data for the prediction of multidrug efficacy.

Results and discussions

Two different machine-learning techniques were employed on a data to predict the efficacy of multiple drugs. The components created by PLSR for MTR models and residuals were obtained to measure the performance of the MTR models. The values of SVR parameters were determined to verify the SVR models. The models created by MTR and SVR were evaluated based on the RMSE, which computes the mean difference between actual and predicted priming values [1,2], similarly mean squared error is also used [48]

Contribution and findings

Common feature processing for two different machine-learning techniques are applied. MTR is a linear regression, which can predict multivariable in a single execution, and SVR is capable of doing linear and nonlinear regression to predict a single target variable. We have succeeded in developing two pre-processing methods, which work well for both of these models. Generally, one of the variables from higher correlated pairs is to be removed to get good performance of the models. We have used

Declaration of Competing Interest

There is no conflict of Interest.

References (50)

B.M. Robert et al.
Computational models for predicting anticancer drug efficacy: a multi linear regression analysis based on molecular, cellular and clinical data of oral squamous cell carcinoma cohort
Comput. Methods Programs Biomed.
(2019)
H. Martens
Reliable and relevant modelling of real world data: a personal account of the development of PLS regression
Chemom. Intell. Lab. Syst.
(2001)
S. Wold
Personal memories of the early PLS development
Chemom. Intell. Lab. Syst.
(2001)
S. Ahmed
Pharmacogenomics of drug metabolizing enzymes and trans- porters: relevance to precision medicine
Genom. Proteom. Bioinform.
(2016)
M. Shehab et al.
Machine learning in medical applications: a review of state-of-the-art methods
Comput. Biol. Med.
(2022)
S.N. Dorman et al.
Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning
Mol Oncol
(2016)
I.K. Robert
R in Action, Data Analysis and Graphics With R
(2011)
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
(2014)
E. Rezaei
Clinical and associated inflammatory biomarker features predictive of short-term outcomes in non-systemic juvenile idiopathic arthritis
Rheumatology
(2020)
R. Tiwari et al.
Correlation-based attribute selection using genetic algorithm
Int. J. Comput. Appl.
(2010)

F. Miao et al.

A novel continuous blood pressure estimation approach based on datamining techniques

IEEE J. Biomed. Health Inform.

(2017)

M.R. Haque et al.

A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models

IEEE Access

(2021)

P. Chittora

Prediction of chronic kidney disease - a machine learning perspective

IEEE Access

(2021)

F. Yung et al.

An investigation of demographic and drug-use patterns in fentanyl and carfentanil deaths in Ontario

Forensic Sci. Med. Pathol.

(2021)

R: A Language and Environment for Statistical Computing

(2006)

D. Nguyen et al.

Tumor classification by partial least squares using microarray gene expression data

Bioinformatics

(2002)

J. Kresta et al.

Multivariate statistical monitoring of process operating performance

Can. J. Chem. Eng.

(1991)

R. Cramer et al.

Comparative molecular-field analysis (ComFA). 1. Effect of shape on binding of steroids to carrier proteins

J. Am. Chem. Soc.

(1988)

M. Björn-Helge et al.

The pls package: principal component and partial least squares regression in R

J. Stat. Softw.

(2007)

V.N. Vapnik

The Nature of Statistical Learning Theory

(1995)

V.N. Vapnik et al.

Support vectormethod for function approximation, regression estimation andsignal processing

Adv. Neural Inf. Process. Syst.

(1996)

S. Gunn

Support Vector Machines for Classification and Regression

(1997)

D.H. Wolpert et al.

No free lunch theorems for optimization

IEEE Trans. Evol. Comput.

(1997)

H. Borchani et al.

A survey on multi-output regression

Wiley Interdiscipl. Rev. Data Min. Knowl. Discov.

(2015)

D. Kocev et al.

Using single- and multitargetregression trees and ensembles to model a compound index of vegetation condition

Ecol. Model.

(2009)

Cited by (7)

Insights into the structure-activity relationship of pyrimidine-sulfonamide analogues for targeting BRAF V600E protein
2024, Biophysical Chemistry
B-rapidly accelerated fibrosarcoma (BRAF) V600E plays a crucial role in the progression of cutaneous melanoma. Core structures of BRAF V600E inhibitors are based on pyrimidine-sulfonamide scaffolds. Exploring the QSAR of these structures can improve our understanding of BRAF V600E inhibitor drug design. This study utilized machine learning-based QSAR to elucidate chemical substructures of pyrimidine-sulfonamide analogues that correlated to the BRAF V600E inhibitory activity. The findings indicate that the support vector regression (SVR) combined with 15 fingerprints achieved the highest statistical performances in terms of goodness-of-fit, robustness, and predictability. Nine key fingerprints from pyrimidine-sulfonamide analogues were identified to exert the BRAF V600E inhibitory activity. These key fingerprints were validated using network-based activity cliff landscape and molecular docking. Together, the developed algorithm can serve as a screening tool for designing BRAF V600E inhibitors. To further utilize this model, we deployed our developed algorithm at https://qsarlabs.com/#braf.
Machine learning approach for microbial growth kinetics analysis of acetic acid-producing bacteria isolated from organic waste
2024, Biochemical Engineering Journal
This study proposes novel hybrid methodology that combines machine learning (ML) techniques with experimental strategies to analyse microbial growth-kinetics of acetic acid-producing bacteria isolated from fruit waste. This work employs ML algorithms to create different models such as multivariate linear regression (MLR), partial least square regression (PLSR), Kernel ridge regression (KRR), support vector regression (SVR), Gradient boosting regression (GBR) that captures time-dependent patterns of bacterial growth dynamics. Experiments for microbial growth kinetic analysis were conducted on best isolate of acid producing bacteria with different glucose concentrations (1–5 %) at predefined operating conditions. It is found significant growth rate (µ) was obtained at 4 % and 5 % concentration of glucose from experimental work. 0.0588 h⁻¹ and 0.0571 h⁻¹ are the specific growth rate obtained at 4 % and 5 % glucose concentration respectively. Proposed ML models employed to predict growth rate kinetics theoretically at varied glucose concentrations. Comparative results indicate that GBR model exhibits superior performance in predicting growth kinetics than other models. GBR model fits the experimental results approximately with lower RMSE (0.004) than other models. This enables more accurate representation of growth patterns that is difficult to discernible through conventional analytical methods. This approach will help to understand growth kinetics of acetic acid-producing bacteria for resource recovery, wastewater treatment, and bioremediation.
Orally Administered Drugs and Their Complicated Relationship with Our Gastrointestinal Tract
2024, Microorganisms
Optimized support vector regression predicting treatment duration among tuberculosis patients in Malaysia
2024, Multimedia Tools and Applications
MAK: a machine learning framework improved genomic prediction via multi-target ensemble regressor chains and automatic selection of assistant traits
2023, Briefings in Bioinformatics
Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges
2023, Frontiers in Pharmacology

View all citing articles on Scopus

View full text

Precise prediction of multiple anticancer drug efficacy using multi target regression and support vector regression analysis

Highlights

Abstract

Background and objectives

Methods

Results

Conclusion

Introduction

Section snippets

Materials and methods

Machine learning models

Results and discussions

Contribution and findings

Declaration of Competing Interest

Comput. Methods Programs Biomed.

Chemom. Intell. Lab. Syst.

Chemom. Intell. Lab. Syst.

Genom. Proteom. Bioinform.

Comput. Biol. Med.

Mol Oncol

R in Action, Data Analysis and Graphics With R

Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst

Clinical and associated inflammatory biomarker features predictive of short-term outcomes in non-systemic juvenile idiopathic arthritis

Rheumatology

Correlation-based attribute selection using genetic algorithm

Int. J. Comput. Appl.

A novel continuous blood pressure estimation approach based on datamining techniques

IEEE J. Biomed. Health Inform.

A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models

IEEE Access

Prediction of chronic kidney disease - a machine learning perspective

IEEE Access

An investigation of demographic and drug-use patterns in fentanyl and carfentanil deaths in Ontario

Forensic Sci. Med. Pathol.

R: A Language and Environment for Statistical Computing

Tumor classification by partial least squares using microarray gene expression data

Bioinformatics

Multivariate statistical monitoring of process operating performance

Can. J. Chem. Eng.

Comparative molecular-field analysis (ComFA). 1. Effect of shape on binding of steroids to carrier proteins

J. Am. Chem. Soc.

The pls package: principal component and partial least squares regression in R

J. Stat. Softw.

The Nature of Statistical Learning Theory

Support vectormethod for function approximation, regression estimation andsignal processing

Adv. Neural Inf. Process. Syst.

Support Vector Machines for Classification and Regression

No free lunch theorems for optimization

IEEE Trans. Evol. Comput.

A survey on multi-output regression

Wiley Interdiscipl. Rev. Data Min. Knowl. Discov.

Using single- and multitargetregression trees and ensembles to model a compound index of vegetation condition

Ecol. Model.