Elsevier

Computational Biology and Chemistry

Volume 78, February 2019, Pages 353-358
Computational Biology and Chemistry

Research Article
dbHDPLS: A database of human disease-related protein-ligand structures

https://doi.org/10.1016/j.compbiolchem.2018.12.023Get rights and content

Abstract

Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/.

Introduction

Proteins are important constituents of living organisms and always participate in complete physiological cycle process (Årajer and Schmidt, 2017). When proteins performed abnormal in the body, the normal physiological balance is broken and therefore the body appears lesions, the so-called diseases (Hanash, 2003, Mulder et al., 2018). Many human diseases are derived from the abnormalities of protein functions in vivo, so such proteins are considered as human disease-related proteins (Hajduk et al., 2005, Murakami et al., 2017). However, many proteins need to interact with small molecules to perform specific functions, and the binding of small molecules to proteins as well as its impact on biological effects has been one of the most important biological and chemical problems (Hu et al., 2017, Lian et al., 1994, Schneider, 1991, Barolo, 2002). Knowledge of the interaction between disease proteins and ligands is a key for studying the pathogenesis, drug repositioning and drug discovery (Liu et al., 2018, Chen et al., 2013, Eyk et al., 2016, Drews, 2000). Some works studied disease mechanisms and drug repositioning by drug-disease networks, based on the database involving drugs, known protein complexes and diseases (Yu et al., 2015, Dudley et al., 2011, Daminelli et al., 2012). In addition, other studies have shown that some disease-related protein ligands are drug molecules treating certain diseases, while one of drug discovery methods is based on drug-disease protein structure research (Imming et al., 2006, Jubb et al., 2017, Chen et al., 2018, Awan et al., 2017).

With the development of structural biology and structural genomics, the number of macromolecular structures stored in Protein Data Bank (PDB) (Berman, 2000) is growing rapidly. At present, the number of the solved protein structures in the PDB is more than 110000. Simultaneously, The databases of protein-ligand interactions also grew up rapidly. PDBsum (Laskowski, 2004) was a graphical database that briefly summarizes three-dimensional protein structures stored in the PDB database. It provided detailed information of protein sequences and related ligands. BioLip (Yang et al., 2012) was a biologically relevant protein-ligand interaction database, including the annotation information for each entry: ligand-binding residues, ligand-binding affinities, catalytic sites, EC numbers, Gene ontologies and cross-links to other databases. PLID database (Reddy et al., 2008) stored protein-ligand binding environment and physical-chemical properties of ligands for each entry. Databases of Binding MOAD (Hu et al., 2005) and the PDBbind (Liu et al., 2014) provided the binding affinity data of protein-ligand complexes with known three-dimensional structures. It adopted Ki (inhibition constant), Kd (dissociation constant) and IC50 (concentration at 50% Inhibition) to measure the intensity of each protein-ligand interaction.

Some databases of drug-target interactions were also based on the PDB. PDTD (Gao et al., 2008) was a drug-target database containing the information of crystal structures derived from PDB and pathology information for each target. PDID (Wang et al., 2015) provided two data sets, a drug-target interaction data set and a predicative one based on three predictive algorithm: ILbind (Hu et al., 2012), SAMP (Xie and Bourne, 2008) and eFindSite (Brylinski and Feinstein, 2013, Feinstein and Brylinski, 2014). The database of Guide to PHARMACOLOGY (Southan et al., 2015) provided the information of pharmacological, chemical, genetic and pathophysiologic data. SuperTarget (Gunther et al., 2007) was a database of drug-target biology and pathology, which integrated drug molecules, 3D structures of drug targets, therapeutic range of drugs, side effects of drugs, drug metabolic pathways and gene annotation information for drug targets. At present, the database contains over 2500 drug targets and 1500 drug molecules.

Among these databases, few one contained disease-related protein complexes information. For example, UniProt (Consortium, 2014) collected protein sequences as well as the annotation information and allowed us to find a large number of proteins with known disease information. At present, it stores more than 80 million protein sequences. Therapeutic Target Database (TTD) (Yang et al., 2015, Chen, 2002, Li et al., 2018) provided information about known therapeutic proteins and nucleic acid targets, targeted diseases, pathway information and drugs related with corresponding targets. The main purpose of SGC (structural genomics consortium, http://www.thesgc.org/), which was an unprofitable, public-private cooperative genomic consortium, was to define the 3-D structures of human proteins that is important for medicine. However, it did not provide sufficient structural information. Overall, these protein-ligand databases, protein databases, and drug-target databases provided a wealth of knowledge for specific research areas. Moreover, most annotations of drug molecules, human protein complex structures and human diseases were scattered in different databases.

In this article, we built an integrated database, named dbHDPLS, which is a complete database stored information about drug-target interactions with related human diseases as well as information about drugs and protein-ligand complex structures. Starting from the PDB database as well as originating from other protein and drug databases, such as UniProt and DrugBank, human disease-related protein-ligand complex structures were curated. The related information for each protein-ligand complex in the database was also annotated: protein structure properties, protein functions and disease information as well as the physicochemical properties of ligands and drug information. As a result, the database contains the vast majority of protein-ligand complexes which are human disease-related in the PDB databank. The details of the database can be freely accessible from our web server. Fig. 1 displays the pipeline of the database.

Section snippets

Construction and content

Our database was constructed using human proteins with known structure in the PDB and related information from the databases of DrugBank, UniProt, BioLip, PDBbind and Binding MOAD. At present, the total amount of protein structures is more than 120 thousand in the PDB (1 Jan., 2018). From the statistics given in the PDB, 32747 human protein structures were obtained, which are very complicated and contain not only protein-ligand complex structures but also protein structures or protein-nucleic

Utility and discussion

The database contains 8833 human proteins that are annotated as disease proteins in UniProt. For each protein-ligand complex structure, the database provides the information of structure names, the used experimental method to solve the crystal complex structure, the resolution of the structure, the index number of each structure corresponding to the primary literature in PubMed and download link for each pdb structure file. For protein in each structure item, the database provides the protein

Conclusions

This paper proposed a comprehensive protein-ligand structure database related to human diseases, and a wealth of biological, chemical, and pharmacological information for protein structures. The information of Ligand drugs, protein diseases and protein-ligand interactions are also provided in the database. We also built a webserver to easily browse and search for specific information. The usage of the webserver is also presented in this work. Moreover, a case study of the analysis of

Availability of data and materials

The datasets analysed during the current study are available in the below websites,

All data generated during this study are available in the webpage, http://DeepLearner.ahu.edu.cn/web/dbDPLS/. This database

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61672035 and 61472282), Anhui Province Funds for Excellent Youth Scholars in Colleges (gxyqZD2016068) and Anhui Scientific Research Foundation for Returned Scholars.

References (46)

  • A.S. Reddy et al.

    Protein ligand interaction database (PLID)

    Comput. Biol. Chem.

    (2008)
  • V. Årajer et al.

    Watching proteins function with time-resolved X-ray crystallography

    J. Phys. D: Appl. Phys.

    (2017)
  • F.M. Awan et al.

    Mutation-structure-function relationship based integrated strategy reveals the potential impact of deleterious missense mutations in autophagy related proteins on hepatocellular carcinoma (HCC): a comprehensive informatics approach

    Int. J. Mol. Sci.

    (2017)
  • K.A. Barlow et al.

    Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation

    J. Phys. Chem. B

    (2018)
  • S. Barolo

    Three habits of highly effective signaling pathways: principles of transcriptional control by developmental cell signaling

    Genes Dev.

    (2002)
  • H.M. Berman

    The protein data bank

    Nucleic Acids Res.

    (2000)
  • M. Brylinski et al.

    eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands

    J. Comput.-Aided Mol. Des.

    (2013)
  • P. Chen et al.

    Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    Proteins

    (2013)
  • P. Chen et al.

    A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    IEEE/ACM Trans. Comput. Biol. Bioinform.

    (2016)
  • R. Chen et al.

    Machine learning for drug-target interaction prediction

    Molecules

    (2018)
  • X. Chen

    TTD: therapeutic target database

    Nucleic Acids Res.

    (2002)
  • U. Consortium

    UniProt: a hub for protein information

    Nucleic Acids Res.

    (2014)
  • A. Dal Corso et al.

    Affinity enhancement of protein ligands by reversible covalent modification of neighboring lysine residues

    Angew. Chem. Int. Ed. Engl.

    (2018)
  • Cited by (5)

    1

    These two authors contributed equally to this study.

    View full text