Precise prediction of multiple anticancer drug efficacy using multi target regression and support vector regression analysis
Introduction
Oral cancer is the sixth most common cancer worldwide, with more than 300,000 deaths per year [28,29], affecting 16.1% of men and 10.4% of women [38]. Relapsed oral cancer is a challenging issue due to its aggressiveness and limited chemotherapeutic agents. Mainly, the treatment requirements vary between patients due to the diverse genetic features and environmental issues [37]. The early diagnosis of OSCC and providing treatment based on prediction may reduce the metastasis rate and there by improve the survival rate [46], [47], [48]. Hence, the need for precision medicine has emerged. While therapeutic decisions are made based on tumor location, cytogenetics, and histology [34], the lymph node status and histological grade gave up in drug response prediction [33]. Along with large-scale clinical trials, computational predictions are effective in their preciseness and drug sensitivity forecasting [29,32,49,50]. Drug sensitivity is determined by machine learning methods using genomic and therapeutic features of cancer cells [31, 39], for datasets like Cancer Target Discovery, and development screening database [30] such as Genomics of Drug Sensitivity in Cancer [39]. Earlier prediction using machine learning methods [36] and sensitivity analysis through statistical methods [34,35] has proven to provide accurate drug efficacy prediction. Moreover, a recent review discusses the use of ML techniques for five major medical applications in depth, providing guidance for clinicians, researchers, and decision-makers [40].
The different machine learning methods must be carefully selected based on the data and objective of the application. However, according to the ‘no free lunch’ theorem, there is no single best algorithm for predictive modeling [20]. The presence of redundancy, noise, and multivariate nature compound dependencies between input and target variables are challenges in building a good predictive model [21]. Multi-target regression methods provide an effective model based on the mapping of features with target variables and, therefore, guarantee better representation and interpretability of different data [22,23,48]. Further, multi-target methods pave the way to produce simpler models with enhanced computational efficiency [22].
Another study applied deep learning to build an optimized model for cancer classification applications. During pre-processing, log-transformation was performed to convert the data to a uniform range, which subsequently yielded interpretable patterns and helped to reduce skew [24]. Biological characteristics of the molecular subtypes were used to learn deep features of the subtype distribution of a patient's samples [25]. DNA microarray cancer data classification was done using a type-2 fuzzy system and c-means clustering, which managed noisy, outlier, non-linear gene expression profiles [26]. However, the expected classification accuracy was not achieved, and computational complexity was increased because of irrelevant, limited, and noisy genes in the database [27]. Later, another researcher used feature selection to avoid the above-said issue [26]. Few studies did imputations to handle missing data. Overfitting of models were avoided by weighting thru bootstrapping method, also model calibration and discrimination is verified. They reduced bias and validated their model with external data [49,50].
Recently, we employed Multi-Linear Regression (MLR), modified MLR-Weighted Least square, and Enhanced MLR-WLS for the prediction of anticancer drug efficacy [1]. For this analysis, we used real data and theoretical samples with clinical, cellular, and molecular attributes of tumors to predict general efficacy. To predict the individually multi-drug efficacy, we developed and employed MTR and SVR-based models for 30 real samples and 340 theoretical samples. MTR models were found best for small-size data, while SVR models were best for large-size data; cross-validation was performed to prove the models’ robustness. The models were then used to predict the apoptotic sensitivity of five drugs, including Paclitaxel, Vincristine, Daunorubicin, Cinblastine, and Doxorubicin.
Section snippets
Materials and methods
In this work, 30 oral squamous cell carcinoma (OSCC) cohort data was used [Robert et al., 2018]. The attributes, such as molecular (drug response gene expression), cellular (apoptotic priming), and clinical (age, sex, tumor-grade, tumor-stage and clinical-stages) data, were used as the dataset for developing the drug efficacy prediction models (Table 1). Further, 340 theoretical samples were used to improve the prediction performance of the proposed models. Statistical analysis of necessary
Machine learning models
Multi-target regression and widely-used support vector machine techniques were applied to the pre-processed data for the prediction of multidrug efficacy.
Results and discussions
Two different machine-learning techniques were employed on a data to predict the efficacy of multiple drugs. The components created by PLSR for MTR models and residuals were obtained to measure the performance of the MTR models. The values of SVR parameters were determined to verify the SVR models. The models created by MTR and SVR were evaluated based on the RMSE, which computes the mean difference between actual and predicted priming values [1,2], similarly mean squared error is also used [48]
Contribution and findings
Common feature processing for two different machine-learning techniques are applied. MTR is a linear regression, which can predict multivariable in a single execution, and SVR is capable of doing linear and nonlinear regression to predict a single target variable. We have succeeded in developing two pre-processing methods, which work well for both of these models. Generally, one of the variables from higher correlated pairs is to be removed to get good performance of the models. We have used
Declaration of Competing Interest
There is no conflict of Interest.
References (50)
- et al.
Computational models for predicting anticancer drug efficacy: a multi linear regression analysis based on molecular, cellular and clinical data of oral squamous cell carcinoma cohort
Comput. Methods Programs Biomed.
(2019) Reliable and relevant modelling of real world data: a personal account of the development of PLS regression
Chemom. Intell. Lab. Syst.
(2001)Personal memories of the early PLS development
Chemom. Intell. Lab. Syst.
(2001)Pharmacogenomics of drug metabolizing enzymes and trans- porters: relevance to precision medicine
Genom. Proteom. Bioinform.
(2016)- et al.
Machine learning in medical applications: a review of state-of-the-art methods
Comput. Biol. Med.
(2022) - et al.
Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning
Mol Oncol
(2016) R in Action, Data Analysis and Graphics With R
(2011)Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
(2014)Clinical and associated inflammatory biomarker features predictive of short-term outcomes in non-systemic juvenile idiopathic arthritis
Rheumatology
(2020)- et al.
Correlation-based attribute selection using genetic algorithm
Int. J. Comput. Appl.
(2010)
A novel continuous blood pressure estimation approach based on datamining techniques
IEEE J. Biomed. Health Inform.
A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models
IEEE Access
Prediction of chronic kidney disease - a machine learning perspective
IEEE Access
An investigation of demographic and drug-use patterns in fentanyl and carfentanil deaths in Ontario
Forensic Sci. Med. Pathol.
R: A Language and Environment for Statistical Computing
Tumor classification by partial least squares using microarray gene expression data
Bioinformatics
Multivariate statistical monitoring of process operating performance
Can. J. Chem. Eng.
Comparative molecular-field analysis (ComFA). 1. Effect of shape on binding of steroids to carrier proteins
J. Am. Chem. Soc.
The pls package: principal component and partial least squares regression in R
J. Stat. Softw.
The Nature of Statistical Learning Theory
Support vectormethod for function approximation, regression estimation andsignal processing
Adv. Neural Inf. Process. Syst.
Support Vector Machines for Classification and Regression
No free lunch theorems for optimization
IEEE Trans. Evol. Comput.
A survey on multi-output regression
Wiley Interdiscipl. Rev. Data Min. Knowl. Discov.
Using single- and multitargetregression trees and ensembles to model a compound index of vegetation condition
Ecol. Model.
Cited by (7)
Machine learning approach for microbial growth kinetics analysis of acetic acid-producing bacteria isolated from organic waste
2024, Biochemical Engineering JournalOptimized support vector regression predicting treatment duration among tuberculosis patients in Malaysia
2024, Multimedia Tools and ApplicationsMachine learning in onco-pharmacogenomics: a path to precision medicine with many challenges
2023, Frontiers in Pharmacology