In silico methods and tools for drug discovery

https://doi.org/10.1016/j.compbiomed.2021.104851Get rights and content

Highlights

  • In silico drug discovery methods are able to reduce the time and cost for drug discovery processes.

  • Advancements in computational methods have enabled in silico drug discovery for practical use.

  • Several FDA-approved drugs were developed with in silico methods.

Abstract

In the past, conventional drug discovery strategies have been successfully employed to develop new drugs, but the process from lead identification to clinical trials takes more than 12 years and costs approximately $1.8 billion USD on average. Recently, in silico approaches have been attracting considerable interest because of their potential to accelerate drug discovery in terms of time, labor, and costs. Many new drug compounds have been successfully developed using computational methods. In this review, we briefly introduce computational drug discovery strategies and outline up-to-date tools to perform the strategies as well as available knowledge bases for those who develop their own computational models. Finally, we introduce successful examples of anti-bacterial, anti-viral, and anti-cancer drug discoveries that were made using computational methods.

Introduction

Conventional drug discovery and development are risky, time-consuming processes that include target identification and validation, lead compound discovery and optimization, and preclinical and clinical trials [1]. In recent years, the estimated cost of bringing a new drug to market has reached about $1.8 billion USD [2], and the attrition rate of drug candidates is as high as 96% [2]. The reasons underlying this high attrition rate are poor drug efficacy and deficient drug absorption, distribution, metabolism, and excretion, and toxicity (ADME-Tox) [3]. Typically, in vivo and in vitro techniques are employed to examine drug safety, including adverse effects and toxicity. Recent advancements in in vitro models, such as organ-on-chip technology, have accelerated ADME-Tox assessments [4]. However, these approaches remain time-consuming, labor-intensive, and costly. High-throughput screening (HTS) methods have been developed to accelerate the identification of pharmacologically active chemical compounds from large numbers of molecules using automated assays [5]. Although automatic HTS systems reduce the need for human intervention, the scale of HTS remains low compared to the diversity of chemical structures. In addition, automated instruments remain expensive.

Recently, computer-aided drug discovery (CADD) approaches are attracting increasing attention as they can help mitigate the scale, time, and cost issues faced by conventional experimental approaches. CADD includes computational identification of potential drug targets, virtual screening of large chemical libraries for effective drug candidates, further optimization of candidate compounds, and in silico assessment of their potential toxicity. After these processes are conducted computationally, candidate compounds are subjected to in vitro/in vivo experiments for confirmation. Thus, CADD approaches can reduce the number of chemical compounds that must be evaluated experimentally while increasing the success rate by removing inefficient and toxic chemical compounds from consideration [6]. To date, CADD has been successfully employed to bring new drug compounds to market for diverse diseases, including human immunodeficiency virus (HIV)-1-inhibiting drugs (atazanavir [7], saquinavir [8], indinavir [9], and ritonavir [10]), anti-cancer drugs (raltitrexed [11]), and antibiotics (norfloxacin [12]).

Several CADD approaches have been developed and integrated with machine learning techniques to improve the accuracy and efficiency of CADD methods [13]. Structure-based drug discovery (SBDD) [14] and ligand-based drug discovery (LBDD) [15] are two different approaches taken in CADD. The selection of a suitable CADD approach relies on the availability of target protein structural information. To use the SBDD approach, structural information on the target protein is required, which is usually obtained experimentally by nuclear magnetic resonance or X-ray crystallography [14]. When neither is available, in silico prediction methods such as homology modeling [16] or ab initio modeling [17] can be used to predict the 3D structure of the target protein. Once the structure is available, structure-based virtual screening and molecular docking are possible [18]. When the structure is not available and it is not possible to predict a high-quality structure using in silico methods, the LBDD approach is often taken as an alternative. Although this approach requires prior information on the known active molecules of the target protein, many compounds have been discovered to treat diseases and are compiled in public databases unless the target is novel [[19], [20], [21]]. These approaches are introduced in section 4.

The field of CADD is rapidly advancing, and techniques and methods are under active development. Over the past few years, the integration of biological big data and machine learning approaches has opened new possibilities to increase the accuracy and efficiency of in silico drug discovery. This review introduces the overall procedures and methodologies behind in silico drug discovery, including target protein identification, chemical library screening, and toxicity assessment using machine learning approaches, summarizes available prediction tools and databases, and lists Federal Drug Administration (FDA)-approved and reported drug compounds developed using CADD techniques.

Section snippets

Increase in biological data on chemical molecules for drug discovery

Over the past few decades, large-scale data has been generated on hundreds of thousands of small molecules through biological screening, and this data is compiled in online repositories that are available for research. For example, due to advancements in HTS techniques, large-scale experiments of >1 million chemicals have been generated [22]. In addition, this biological assay data has been compiled in chemical library databases, and the amount of data is increasing rapidly due to advancements

Target identification

A drug target is defined as a biological entity, usually a protein, that can modulate disease phenotypes [29]. Thus, the identification of prime drug targets is the first and most important step in drug discovery. Conventional drug target identification strategies are performed experimentally, such as identifying differentially expressed genes between normal and diseased cells or tissues and proteins that are highly interconnected with disease-related proteins.

In silico methods for drug screening

The goal of drug discovery is to find small molecules that can modulate the function of an identified target protein and thereby modulate the disease phenotype. Furthermore, it is necessary to identify small molecules that possess effective pharmacokinetic properties and low toxicity. Drug discovery involves a long, expensive, and risky cascade of complicated steps, including drug candidate identification, candidate validation, pharmacokinetics, and preclinical toxicity assessments. Traditional

ADME-Tox assessment

Once drug candidates are discovered, the next step is to assess their pharmacokinetic properties, such as ADME-Tox. Due to advances in machine learning algorithms and accumulated datasets, ADME-Tox can also be predicted using computational methods.

It is estimated that 40%–60% of drug candidates are withdrawn in preclinical tests because of ADME-Tox concerns [85]. Drug compounds must cross various physiological barriers, such as the gastrointestinal barrier, the blood-brain barrier, and

Successful applications of in silico drug design

The development of new therapeutic drugs is an expensive and time-consuming process. In silico technology has become essential in the contemporary pharmaceutical industry because it can reduce the time and resources required for drug discovery. Due to advancements in computational algorithms and accumulated knowledge databases, computational prediction tools have now been integrated into every stage of the drug discovery process. Computational drug discovery methods have been successfully used

Conclusions

Over the past few decades, the in silico identification of disease-associated drug targets and therapeutic drugs has become increasingly efficient and accurate. Recently, in silico drug discovery has accelerated due to rapid advancements in computational methods and accumulating publicly available biological data. Chemical biology is involved in the elucidation of the biological functions of targets, while CADD techniques make use of structural information of either the drug target

Author contributions

BS, SA, and DN: conceptualization. BS and JL: data curation. BS and CJ: methodology. DN: supervision. BS and DN: manuscript writing. All authors have read and agreed to the published version of the manuscript.

Declaration of competing interests

There are no conflicts of interest to declare.

A conflict of interest statement

None declared.

Acknowledgments

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2019M3E5D4065682). This research was also supported by the Center for Women in Science, Engineering, and Technology grant funded by the Ministry of Science and ICT (MSIT) under the Program for Returners into R&D.

References (267)

  • N.J. Tatum et al.

    New active leads for tuberculosis booster drugs by structure-based drug discovery

    Org. Biomol. Chem.

    (2017)
  • P.E. Brandish et al.

    A cell-based ultra-high-throughput screening assay for identifying inhibitors of D-amino acid oxidase

    J. Biomol. Screen

    (2006)
  • T. Katsila et al.

    Computational approaches in target identification and drug discovery

    Comput. Struct. Biotechnol. J.

    (2016)
  • K. Kubota et al.

    Target deconvolution from phenotype-based drug discovery by using chemical proteomics approaches

    Biochim. Biophys. Acta Proteins Proteom.

    (2019)
  • S.-E. Ong et al.

    Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics

    Mol. Cell. Proteom.

    (2002)
  • J.N. Chan et al.

    Recent advances and method development for drug target identification

    Trends Pharmacol. Sci.

    (2010)
  • J.L. Jenkins et al.

    In silico target fishing: predicting biological targets from chemical structure, Drug Discov

    Today Technol

    (2006)
  • A. Lavecchia

    Machine-learning approaches in drug discovery: methods and applications, Drug Discov

    Today

    (2015)
  • K.K. Jain

    RNAi and siRNA in target validation

    Drug Discov. Today

    (2004)
  • T. Kennedy

    Managing the drug discovery/development interface

    Drug Discov. Today

    (1997)
  • S. Venkatesh et al.

    Role of the development scientist in compound lead selection and optimization

    J. Pharm. Sci.

    (2000)
  • D. Prada-Gracia et al.

    Aplicación de métodos computacionales para el descubrimiento, diseño y optimización de fármacos contra el cáncer

    Bol. Méd. Hosp. Infan. Méx.

    (2016)
  • S.M. Paul et al.

    How to improve R&D productivity: the pharmaceutical industry's grand challenge

    Nat. Rev. Drug Discov.

    (2010)
  • B.S. Robinson et al.

    BMS-232632, a highly potent human immunodeficiency virus protease inhibitor that can be used in combination with other available antiretroviral agents

    Antimicrob. Agents Chemother.

    (2000)
  • A. Krohn et al.

    Novel binding mode of highly potent HIV-proteinase inhibitors incorporating the (R)-hydroxyethylamine isostere

    J. Med. Chem.

    (1991)
  • D.J. Kempf et al.

    ABT-538 is a potent inhibitor of human immunodeficiency virus protease and has high oral bioavailability in humans

    Proc. Nat. Acad. Sci.

    (1995)
  • J. Vamathevan et al.

    Applications of machine learning in drug discovery and development

    Nat. Rev. Drug Discov.

    (2019)
  • H. Jhoti et al.

    Structure-based Drug Discovery

    (2007)
  • D. Vidal et al.

    Ligand-based Approaches to in Silico pharmacology,Chemoinformatics and Computational Chemical Biology

    (2011)
  • D.B. Kitchen et al.

    Docking and scoring in virtual screening for drug discovery: methods and applications

    Nat. Rev. Drug Discov.

    (2004)
  • A. Evers et al.

    Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1A adrenergic receptor

    J. Med. Chem.

    (2005)
  • T.T. Talele et al.

    Successful applications of computer aided drug discovery: moving drugs from concept to the clinic

    Curr. Topics Med. Chem.

    (2010)
  • A. Tropsha

    QSAR in Drug Discovery, Drug Design: Structure-And Ligand-Based Approaches

    (2010)
  • A.C. Nascimento et al.

    A multiple kernel learning algorithm for drug-target interaction prediction

    BMC Bioinf.

    (2016)
  • W. Wang et al.

    Developing enhanced blood–brain barrier permeability models: integrating external bio-assay data in QSAR modeling

    Pharm. Res.

    (2015)
  • P. Schyman et al.

    vNN web server for ADMET predictions

    Front. Pharmacol.

    (2017)
  • R. Liu et al.

    Locally weighted learning methods for predicting dose-dependent toxicity with application to the human maximum recommended daily dose

    Chem. Res. Toxicol.

    (2012)
  • S. Hochreiter et al.

    Machine Learning in Drug Discovery

    (2018)
  • G. Giaever et al.

    Genomic profiling of drug sensitivities via induced haploinsufficiency

    Nat. Genet.

    (1999)
  • S.-E. Ong et al.

    Identifying the proteins to which small-molecule probes and drugs bind in cells

    Proc. Nat. Acad. Sci.

    (2009)
  • A. Ezzat et al.

    Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey, Brief

    Bioinform

    (2019)
  • A. Cichonska et al.

    Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors

    PLoS Comput. Biol.

    (2017)
  • S. Zheng et al.

    Text mining for drug discovery

    Method Mol. Biol.

    (2019)
  • F.E. Agamah et al.

    Computational/in silico methods in drug target and lead prediction

    Brief. Bioinform.

    (2020)
  • A.D. Rouillard et al.

    The Harmonizome: a Collection of Processed Datasets Gathered to Serve and Mine Knowledge about Genes and Proteins, Database 2016

    (2016)
  • D. Ochoa et al.

    Open Targets Platform: supporting systematic drug–target identification and prioritisation

    Nucleic Acids Res.

    (2021)
  • R. Byrne et al.

    In Silico Target Prediction for Small molecules,Systems Chemical Biology

    (2019)
  • Y. Chen et al.

    Ligand–protein inverse docking and its potential use in the computer search of protein targets of a small molecule

    Proteins

    (2001)
  • N. Paul et al.

    Recovering the true targets of specific ligands by virtual screening of the protein data bank

    Proteins

    (2004)
  • H. Li et al.

    TarFisDock: a web server for identifying drug targets with docking approach

    Nucleic Acids Res.

    (2006)
  • Cited by (164)

    View all citing articles on Scopus
    View full text