Elsevier

Knowledge-Based Systems

Volume 251, 5 September 2022, 109242
Knowledge-Based Systems

Small dataset solves big problem: An outlier-insensitive binary classifier for inhibitory potency prediction

https://doi.org/10.1016/j.knosys.2022.109242Get rights and content

Abstract

Nicotinamide phosphoribosyltransferase (NAMPT) inhibitors show importance in cancer disease treatment while selecting compounds from a library according to inhibitory potency for further experiments is considered to be the main way for drug discovery. Meanwhile, computational methods have been widely used to accelerate the process of drug discovery. Hence, we propose a machine learning model that only needs to be trained on an extremely small dataset to predict the inhibition constant (Ki) and half maximal inhibitory concentration (IC50) for a compound. The key idea is to directly rank compounds according to inhibitory potency by solving a simpler binary classification problem since we only need the relative ranks of the inhibitors for drug screening. To this end, we develop an adaptive data augmentation method to consider and effectively capture the relative information between compounds in the original dataset. However, outliers in small samples can always be tricky to detect, and may severely affect the learned distribution of the classifier. In this regard, we propose an outlier-insensitive classifier with an effective feature selection module for the one-to-all classification task. Extensive experiments show that our model gains high and reliable accuracy in ranking compounds according to inhibitory potency. The current results demonstrate that the proposed model achieves reliability in prioritizing chemicals for experiment research and analysis through a ligand-based in silico approach.

Introduction

Cancer cells show the ability to uptake glucose and an increased rate of glycolysis to support a high proliferation rate even in a low oxygen environment and other limited energy resources conditions. Malignant cells invoke the glycolysis pathway by switching from undertaking mitochondrial oxidative phosphorylation (OXPHOS). Thus, compared to noncancerous cells, the amounts of the redox cofactor nicotinamide adenine dinucleotide (NAD+) are higher in cancer cells, indicating the important role of NAD+ in many critical cellular processes of cancer cell growth, including cell cycle, transcriptional regulation, chromatin dynamic regulation, and etc. [1].

Nicotinamide phosphoribosyltransferase (NAMPT) is found to be a rate-limiting enzyme in the NAD salvage synthesis pathway, which affects the activity of NAD-dependent enzymes to regulate cellular biological processes [2]. The de novo synthesis pathway cannot be effectively utilized by tumor cells because their NA levels are generally not enough to drive NAD generation, such that they regenerate more NAD through NAMPT salvage synthesis pathway [3], [4]. Hence, cancer cells without sufficient NAPRT1 depend on NAMPT for NAD generation and cellular metabolism regulation, which causes them more vulnerable to the cytotoxic influences of NAMPT inhibitors. Moreover, the NAMPT is found to indirectly promote the binding of reduced glutathione (GSH) to reactive oxygen species (ROS) by enhancing the activity of NAD-dependent enzymes and up-regulating the level of NADPH, so that tumor cells are able to adapt to the environment with a lack of energy and aerobic conditions [2], [5]. Therefore, NAMPT is a potential target for antitumor therapy development, and NAMPT inhibitors might be promising in cancer treatment.

FK866 and GMX1777 were identified as the first generation NAMPT inhibitors and showed strong anticancer efficacy [6]. However, toxicity profile and pharmacokinetics data determined from phase I clinical trials for advanced solid tumor malignancies indicated that FK866 and GMX1777 showed the toxicity of various symptoms in the alimentary canal and dose-dependent thrombocytopenia [6]. To overcome the drawbacks of poor oral bioavailability and the short plasma half-life, the urea-contained second generation NAMPT inhibitors including GNE-617 [7] were developed and showed that a potent cellular activity in vitro and preclinical efficacy in vivo of NAMPT required aromatic nitrogen positioned for phosphoribosylation [8]. A novel inhibitor named LSN3154567 discovered by a virtual screening and structure-guided design pipeline showed a potent and broad spectrum of anticancer activity yet mitigated retinal toxicities observed in other NAMPT inhibitors [9]. OT-82 identified by Korotchkina et al. [10] was generated from chemical library screenings followed by head-to-lead optimization and showed a strong potency against the hematopoietic malignant tumor. OT-82 enjoys the superior property of not causing retinal or cardiac toxicity in a mouse model. Its application for treating relapsed lymphoma is currently being studied in phase I clinical trials. Additionally, novel nonsubstrate NAMPT inhibitors with potent preclinical efficacy and pharmacokinetics that enable oral dosing were identified by Korotchkina et al. with the lead molecule, A-1293201, containing an isoindoline head group [11]. Furthermore, a series of isoindoline ureas were studied as the first-in-class non-substrate NAMPT inhibitors [12].

Various experiment-based in vitro or in vivo methods for ranking compounds according to inhibitory potency have been investigated, yet most of them suffer from a lot of expense and time. Computational methods which predict the precise value of binding affinity are another class of important screening approaches, which may be seriously interfered with by small regression errors. Moreover, it is considered a challenging task for model generalization because of variations in experimental data obtained from different research institutions or experimental methodologies [13]. In this paper, we first propose an adaptive in vitro method for data augmentation, and meanwhile, to transfer the inhibitory potency prediction problem to a binary classification problem. Our method only needs an extremely small dataset for training obtained from the same experimental environment and conditions to keep reliability and practicability. The adaptive data augmentation strategy originates from two perspectives. First, in drug screening, we only need the related ranks of the inhibitory potency between different compounds while the exact value is unnecessary. Second, the differences between compounds are of value to be considered because difficulties appear in the prediction of similar compounds. In this way, we can arrange the experimental priority for the query compounds according to these ranks.

Usually, the known compounds for training with very high dimensional features of molecular descriptors are limited. Models trained with such a small dataset tend to be of poor generalization because the number of features is far larger than the number of samples. The outliers in the dataset even make this situation more serious by dragging the learned distribution of the models to satisfy the outliers. In addition, non-Gaussian systematic errors caused by measurement generally exist in data with different experimental assays. To address this issue, we develop an outlier-insensitive classifier to bypass the impact of the outliers. Moreover, a 2,1-norm based feature selection module is designed to deal with the redundancy of the feature dimension. Finally, the related rank of the inhibitory potency of an unknown compound is easily predicted by one-to-rest classifications with known ones in the library building for reference. Three benchmark datasets with only not more than 60 inhibitors are utilized for evaluation.

In this study, we propose a machine learning model to accelerate the discovery of drugs targeting NAMPT. This study aims to give priority to inhibitors for experimental validation by ranking candidate inhibitors, hence reducing the cost of drug design. The main contributions can be summarized as follows:

  • We develop a novel feature representation and an adaptive data augmentation strategy to directly rank compounds by solving a simpler binary classification problem, where the relationship of the compounds in the original training set is effectively restored and expanded.

  • We design a novel outlier-insensitive learning classifier to bypass the impact of the outliers, and a 2,1-norm based feature selection module for the binary classification.

  • We evaluate the proposed model by extensive experiments on benchmark datasets and demonstrate the theoretical analysis. The results show the proposed model is reliable in providing priority to the candidate inhibitors for the further biochemical experiment.

Section snippets

Related works

Drug design is significant for epidemic outbreaks while its process is challenging, time-consuming, lengthy, and demanding [14]. A great deal of effort has been invested in finding ideal drugs for fighting against the threat of diseases [15]. In our previous work, for example, a human monoclonal antibody (mAb), ZKA190, was isolated which strongly cross-neutralizes multi-lineage Zika virus strains [16]. We also illustrated that dengue virus serotype 2 (DENV2) specific human monoclonal antibody

Methodology

Fig. 1 indicates our framework. In this study, we first propose a novel feature representation method to make full use of the relative information between compounds and significantly augment the training set. Then we develop an outlier-insensitive classifier for training under the well-represented features. In this way, the ranking information for drug screening can be easily derived from the well-trained classifier.

Dataset

In this study, we carry out sufficient experiments on three benchmark datasets collected by Curtin et al. [12], zak et al. [54] and Lockman et al. [55], respectively. We compared the proposed model with state-of-the-art models. The public datasets can be obtained from the following website: http://www.bindingdb.org/bind/index.jsp. The average enzymatic competitive inhibition constant (Ki) represents the binding affinity of the compounds to the enzyme. The half-maximal inhibitory concentration (

Conclusion

In this paper, we have proposed a novel approach to identify the relative rank of one query unknown compound with one or a set of known references. First, we develop an adaptive method for data augmentation, and more importantly, to make full use of the relative information between compounds instead of just individual information. Hence, we employ a reliable yet small training set from the same experimental condition. In this way, we relax the demand of the precise value of inhibitory potency

CRediT authorship contribution statement

Teng Zhou: Investigation, Methodology, Data curation, Visualization, Writing – original draft, Funding acquisition. Haowen Dou: Investigation, Methodology, Data curation, Visualization, Writing – original draft. Jie Tan: Data curation, Methodology. Youyi Song: Data curation, Methodology. Fei Wang: Conceptualization, Methodology. Jiaqi Wang: Conceptualization, Supervision, Writing- Reviewing and Editing, Funding acquisition, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The research was supported by Natural Science Foundation of China (No. 61902232, 81902059, 32071448), Guangdong Basic and Applied Basic Research Foundation, China (No. 2022A1515011590, 2020A1515011170, 2022A1515011978, 2021A1515012302), Sun Yat-sen University’s Basic Scientific Research Grant, China (No. 19ykzd28), STU Incubation Project for the Research of Digital Humanities and New Liberal Arts (No. 2021DH-3), 2020 Li Ka Shing Foundation, Hong Kong Cross-Disciplinary Research Grant (No.

References (58)

  • XuG. et al.

    Robust support vector machines based on the rescaled hinge loss function

    Pattern Recognit.

    (2017)
  • RenZ. et al.

    Correntropy-based robust extreme learning machine for classification

    Neurocomputing

    (2018)
  • HuffakerT.B. et al.

    A stat1 bound enhancer promotes nampt expression and function within tumor associated macrophages

    Nature Commun.

    (2021)
  • GartenA. et al.

    Physiological and pathophysiological roles of nampt and nad metabolism

    Nat. Rev. Endocrinol.

    (2015)
  • MoratóL. et al.

    Enampt actions through nucleus accumbens nad+/sirt1 link increased adiposity with sociability deficits programmed by peripuberty stress

    Sci. Adv.

    (2022)
  • QuY. et al.

    A proteogenomic analysis of clear cell renal cell carcinoma in a chinese population

    Nature Commun.

    (2022)
  • HigginsC.B. et al.

    Sirt1 selectively exerts the metabolic protective effects of hepatocyte nicotinamide phosphoribosyltransferase

    Nature Commun.

    (2022)
  • HolenK. et al.

    The pharmacokinetics, toxicities, and biologic effects of fk866, a nicotinamide adenine dinucleotide biosynthesis inhibitor

    Investig. New Drugs

    (2008)
  • OhA. et al.

    Structural and biochemical analyses of the catalysis and potency impact of inhibitor phosphoribosylation by human nicotinamide phosphoribosyltransferase

    Chembiochem

    (2014)
  • ZhaoG. et al.

    Discovery of a highly selective nampt inhibitor that demonstrates robust efficacy and improved retinal toxicity with nicotinic acid coadministration

    Mol. Cancer Therapeutics

    (2017)
  • WilsbacherJ.L. et al.

    Discovery and characterization of novel nonsubstrate and substrate nampt inhibitors

    Mol. Cancer Therapeutics

    (2017)
  • KorotchkinaL. et al.

    Ot-82, a novel anticancer drug candidate that targets the strong dependence of hematological malignancies on nad biosynthesis

    Leukemia

    (2020)
  • DouH. et al.

    Transfer inhibitory potency prediction to binary classification: A model only needs a small training set

    Comput. Methods Programs Biomed.

    (2022)
  • SzustakowskiJ.D. et al.

    Advancing human genetics research and drug discovery through exome sequencing of the UK biobank

    Nature Genet.

    (2021)
  • AtanasovA.G. et al.

    Natural products in drug discovery: Advances and opportunities

    Nat. Rev. Drug Discov.

    (2021)
  • FibriansahG. et al.

    Cryo-em structure of an antibody that neutralizes dengue virus type 2 by locking e protein dimers

    Science

    (2015)
  • WeiH. et al.

    Charged residue implantation improves the affinity of a cross-reactive dengue virus antibody

    Int. J. Mol. Sci.

    (2022)
  • BorasB. et al.

    Preclinical characterization of an intravenous coronavirus 3cl protease inhibitor for the potential treatment of covid19

    Nature Commun.

    (2021)
  • YamazakiC.M. et al.

    Antibody-drug conjugates with dual payloads for combating breast tumor heterogeneity and drug resistance

    Nature Commun.

    (2021)
  • Cited by (13)

    • Error-distribution-free kernel extreme learning machine for traffic flow forecasting

      2023, Engineering Applications of Artificial Intelligence
    • Mutual gain adaptive network for segmenting brain stroke lesions

      2022, Applied Soft Computing
      Citation Excerpt :

      Deep Learning in Medicine. Recently, deep learning has shown better performance in medical fields such as skin cancer, brain tumor, COVID, etc [24–30]. Attique et al. [31] and Khan et al. [32] proposed a novel information fusion framework based on convolutional neural networks (CNN) that can effectively extract features and classify skin lesion regions, respectively.

    View all citing articles on Scopus
    1

    Teng Zhou and Haowen Dou contribute equally.

    View full text