Small dataset solves big problem: An outlier-insensitive binary classifier for inhibitory potency prediction
Introduction
Cancer cells show the ability to uptake glucose and an increased rate of glycolysis to support a high proliferation rate even in a low oxygen environment and other limited energy resources conditions. Malignant cells invoke the glycolysis pathway by switching from undertaking mitochondrial oxidative phosphorylation (OXPHOS). Thus, compared to noncancerous cells, the amounts of the redox cofactor nicotinamide adenine dinucleotide (NAD) are higher in cancer cells, indicating the important role of NAD+ in many critical cellular processes of cancer cell growth, including cell cycle, transcriptional regulation, chromatin dynamic regulation, and etc. [1].
Nicotinamide phosphoribosyltransferase (NAMPT) is found to be a rate-limiting enzyme in the NAD salvage synthesis pathway, which affects the activity of NAD-dependent enzymes to regulate cellular biological processes [2]. The de novo synthesis pathway cannot be effectively utilized by tumor cells because their NA levels are generally not enough to drive NAD generation, such that they regenerate more NAD through NAMPT salvage synthesis pathway [3], [4]. Hence, cancer cells without sufficient NAPRT1 depend on NAMPT for NAD generation and cellular metabolism regulation, which causes them more vulnerable to the cytotoxic influences of NAMPT inhibitors. Moreover, the NAMPT is found to indirectly promote the binding of reduced glutathione (GSH) to reactive oxygen species (ROS) by enhancing the activity of NAD-dependent enzymes and up-regulating the level of NADPH, so that tumor cells are able to adapt to the environment with a lack of energy and aerobic conditions [2], [5]. Therefore, NAMPT is a potential target for antitumor therapy development, and NAMPT inhibitors might be promising in cancer treatment.
FK866 and GMX1777 were identified as the first generation NAMPT inhibitors and showed strong anticancer efficacy [6]. However, toxicity profile and pharmacokinetics data determined from phase I clinical trials for advanced solid tumor malignancies indicated that FK866 and GMX1777 showed the toxicity of various symptoms in the alimentary canal and dose-dependent thrombocytopenia [6]. To overcome the drawbacks of poor oral bioavailability and the short plasma half-life, the urea-contained second generation NAMPT inhibitors including GNE-617 [7] were developed and showed that a potent cellular activity in vitro and preclinical efficacy in vivo of NAMPT required aromatic nitrogen positioned for phosphoribosylation [8]. A novel inhibitor named LSN3154567 discovered by a virtual screening and structure-guided design pipeline showed a potent and broad spectrum of anticancer activity yet mitigated retinal toxicities observed in other NAMPT inhibitors [9]. OT-82 identified by Korotchkina et al. [10] was generated from chemical library screenings followed by head-to-lead optimization and showed a strong potency against the hematopoietic malignant tumor. OT-82 enjoys the superior property of not causing retinal or cardiac toxicity in a mouse model. Its application for treating relapsed lymphoma is currently being studied in phase I clinical trials. Additionally, novel nonsubstrate NAMPT inhibitors with potent preclinical efficacy and pharmacokinetics that enable oral dosing were identified by Korotchkina et al. with the lead molecule, A-1293201, containing an isoindoline head group [11]. Furthermore, a series of isoindoline ureas were studied as the first-in-class non-substrate NAMPT inhibitors [12].
Various experiment-based in vitro or in vivo methods for ranking compounds according to inhibitory potency have been investigated, yet most of them suffer from a lot of expense and time. Computational methods which predict the precise value of binding affinity are another class of important screening approaches, which may be seriously interfered with by small regression errors. Moreover, it is considered a challenging task for model generalization because of variations in experimental data obtained from different research institutions or experimental methodologies [13]. In this paper, we first propose an adaptive in vitro method for data augmentation, and meanwhile, to transfer the inhibitory potency prediction problem to a binary classification problem. Our method only needs an extremely small dataset for training obtained from the same experimental environment and conditions to keep reliability and practicability. The adaptive data augmentation strategy originates from two perspectives. First, in drug screening, we only need the related ranks of the inhibitory potency between different compounds while the exact value is unnecessary. Second, the differences between compounds are of value to be considered because difficulties appear in the prediction of similar compounds. In this way, we can arrange the experimental priority for the query compounds according to these ranks.
Usually, the known compounds for training with very high dimensional features of molecular descriptors are limited. Models trained with such a small dataset tend to be of poor generalization because the number of features is far larger than the number of samples. The outliers in the dataset even make this situation more serious by dragging the learned distribution of the models to satisfy the outliers. In addition, non-Gaussian systematic errors caused by measurement generally exist in data with different experimental assays. To address this issue, we develop an outlier-insensitive classifier to bypass the impact of the outliers. Moreover, a -norm based feature selection module is designed to deal with the redundancy of the feature dimension. Finally, the related rank of the inhibitory potency of an unknown compound is easily predicted by one-to-rest classifications with known ones in the library building for reference. Three benchmark datasets with only not more than inhibitors are utilized for evaluation.
In this study, we propose a machine learning model to accelerate the discovery of drugs targeting NAMPT. This study aims to give priority to inhibitors for experimental validation by ranking candidate inhibitors, hence reducing the cost of drug design. The main contributions can be summarized as follows:
- •
We develop a novel feature representation and an adaptive data augmentation strategy to directly rank compounds by solving a simpler binary classification problem, where the relationship of the compounds in the original training set is effectively restored and expanded.
- •
We design a novel outlier-insensitive learning classifier to bypass the impact of the outliers, and a -norm based feature selection module for the binary classification.
- •
We evaluate the proposed model by extensive experiments on benchmark datasets and demonstrate the theoretical analysis. The results show the proposed model is reliable in providing priority to the candidate inhibitors for the further biochemical experiment.
Section snippets
Related works
Drug design is significant for epidemic outbreaks while its process is challenging, time-consuming, lengthy, and demanding [14]. A great deal of effort has been invested in finding ideal drugs for fighting against the threat of diseases [15]. In our previous work, for example, a human monoclonal antibody (mAb), ZKA190, was isolated which strongly cross-neutralizes multi-lineage Zika virus strains [16]. We also illustrated that dengue virus serotype 2 (DENV2) specific human monoclonal antibody
Methodology
Fig. 1 indicates our framework. In this study, we first propose a novel feature representation method to make full use of the relative information between compounds and significantly augment the training set. Then we develop an outlier-insensitive classifier for training under the well-represented features. In this way, the ranking information for drug screening can be easily derived from the well-trained classifier.
Dataset
In this study, we carry out sufficient experiments on three benchmark datasets collected by Curtin et al. [12], zak et al. [54] and Lockman et al. [55], respectively. We compared the proposed model with state-of-the-art models. The public datasets can be obtained from the following website: http://www.bindingdb.org/bind/index.jsp. The average enzymatic competitive inhibition constant (Ki) represents the binding affinity of the compounds to the enzyme. The half-maximal inhibitory concentration (
Conclusion
In this paper, we have proposed a novel approach to identify the relative rank of one query unknown compound with one or a set of known references. First, we develop an adaptive method for data augmentation, and more importantly, to make full use of the relative information between compounds instead of just individual information. Hence, we employ a reliable yet small training set from the same experimental condition. In this way, we relax the demand of the precise value of inhibitory potency
CRediT authorship contribution statement
Teng Zhou: Investigation, Methodology, Data curation, Visualization, Writing – original draft, Funding acquisition. Haowen Dou: Investigation, Methodology, Data curation, Visualization, Writing – original draft. Jie Tan: Data curation, Methodology. Youyi Song: Data curation, Methodology. Fei Wang: Conceptualization, Methodology. Jiaqi Wang: Conceptualization, Supervision, Writing- Reviewing and Editing, Funding acquisition, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The research was supported by Natural Science Foundation of China (No. 61902232, 81902059, 32071448), Guangdong Basic and Applied Basic Research Foundation, China (No. 2022A1515011590, 2020A1515011170, 2022A1515011978, 2021A1515012302), Sun Yat-sen University’s Basic Scientific Research Grant, China (No. 19ykzd28), STU Incubation Project for the Research of Digital Humanities and New Liberal Arts (No. 2021DH-3), 2020 Li Ka Shing Foundation, Hong Kong Cross-Disciplinary Research Grant (No.
References (58)
- et al.
Identification of 2, 3-dihydro-1h-pyrrolo [3, 4-c] pyridine-derived ureas as potent inhibitors of human nicotinamide phosphoribosyltransferase (nampt)
Bioorgan. Med. Chem. Lett.
(2013) - et al.
Sar and characterization of non-substrate isoindoline urea inhibitors of nicotinamide phosphoribosyltransferase (nampt)
Bioorgan. Med. Chem. Lett.
(2017) - et al.
A human bi-specific antibody against zika virus with high therapeutic potential
Cell
(2017) - et al.
Intelligent deep learning-enabled autonomous small ship detection and classification model
Comput. Electr. Eng.
(2022) - et al.
Unsupervised deep learning based variational autoencoder model for COVID-19 diagnosis and classification
Pattern Recognit. Lett.
(2021) - et al.
Intelligent video anomaly detection and classification using faster rcnn with deep reinforcement learning model
Image Vis. Comput.
(2021) - et al.
Revel: An ensemble method for predicting the pathogenicity of rare missense variants
Am. J. Hum. Genet.
(2016) - et al.
Integration of deep feature representations and handcrafted features to improve the prediction of n6-methyladenosine sites
Neurocomputing
(2019) - et al.
A temporal-aware lstm enhanced by loss-switch mechanism for traffic flow forecasting
Neurocomputing
(2021) - et al.
Mixture correntropy for robust learning
Pattern Recognit.
(2018)
Robust support vector machines based on the rescaled hinge loss function
Pattern Recognit.
Correntropy-based robust extreme learning machine for classification
Neurocomputing
A stat1 bound enhancer promotes nampt expression and function within tumor associated macrophages
Nature Commun.
Physiological and pathophysiological roles of nampt and nad metabolism
Nat. Rev. Endocrinol.
Enampt actions through nucleus accumbens nad+/sirt1 link increased adiposity with sociability deficits programmed by peripuberty stress
Sci. Adv.
A proteogenomic analysis of clear cell renal cell carcinoma in a chinese population
Nature Commun.
Sirt1 selectively exerts the metabolic protective effects of hepatocyte nicotinamide phosphoribosyltransferase
Nature Commun.
The pharmacokinetics, toxicities, and biologic effects of fk866, a nicotinamide adenine dinucleotide biosynthesis inhibitor
Investig. New Drugs
Structural and biochemical analyses of the catalysis and potency impact of inhibitor phosphoribosylation by human nicotinamide phosphoribosyltransferase
Chembiochem
Discovery of a highly selective nampt inhibitor that demonstrates robust efficacy and improved retinal toxicity with nicotinic acid coadministration
Mol. Cancer Therapeutics
Discovery and characterization of novel nonsubstrate and substrate nampt inhibitors
Mol. Cancer Therapeutics
Ot-82, a novel anticancer drug candidate that targets the strong dependence of hematological malignancies on nad biosynthesis
Leukemia
Transfer inhibitory potency prediction to binary classification: A model only needs a small training set
Comput. Methods Programs Biomed.
Advancing human genetics research and drug discovery through exome sequencing of the UK biobank
Nature Genet.
Natural products in drug discovery: Advances and opportunities
Nat. Rev. Drug Discov.
Cryo-em structure of an antibody that neutralizes dengue virus type 2 by locking e protein dimers
Science
Charged residue implantation improves the affinity of a cross-reactive dengue virus antibody
Int. J. Mol. Sci.
Preclinical characterization of an intravenous coronavirus 3cl protease inhibitor for the potential treatment of covid19
Nature Commun.
Antibody-drug conjugates with dual payloads for combating breast tumor heterogeneity and drug resistance
Nature Commun.
Cited by (13)
SaNDA: A small and iNcomplete dataset analyser
2023, Information SciencesError-distribution-free kernel extreme learning machine for traffic flow forecasting
2023, Engineering Applications of Artificial IntelligenceMutual gain adaptive network for segmenting brain stroke lesions
2022, Applied Soft ComputingCitation Excerpt :Deep Learning in Medicine. Recently, deep learning has shown better performance in medical fields such as skin cancer, brain tumor, COVID, etc [24–30]. Attique et al. [31] and Khan et al. [32] proposed a novel information fusion framework based on convolutional neural networks (CNN) that can effectively extract features and classify skin lesion regions, respectively.
From Regression to Classification: Fuzzy Multikernel Subspace Learning for Robust Prediction and Drug Screening
2024, IEEE Transactions on Industrial Informatics
- 1
Teng Zhou and Haowen Dou contribute equally.