dbHDPLS: A database of human disease-related protein-ligand structures

doi:10.1016/j.compbiolchem.2018.12.023

Computational Biology and Chemistry

Volume 78, February 2019, Pages 353-358

https://doi.org/10.1016/j.compbiolchem.2018.12.023 Get rights and content

Abstract

Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/.

Introduction

Proteins are important constituents of living organisms and always participate in complete physiological cycle process (Årajer and Schmidt, 2017). When proteins performed abnormal in the body, the normal physiological balance is broken and therefore the body appears lesions, the so-called diseases (Hanash, 2003, Mulder et al., 2018). Many human diseases are derived from the abnormalities of protein functions in vivo, so such proteins are considered as human disease-related proteins (Hajduk et al., 2005, Murakami et al., 2017). However, many proteins need to interact with small molecules to perform specific functions, and the binding of small molecules to proteins as well as its impact on biological effects has been one of the most important biological and chemical problems (Hu et al., 2017, Lian et al., 1994, Schneider, 1991, Barolo, 2002). Knowledge of the interaction between disease proteins and ligands is a key for studying the pathogenesis, drug repositioning and drug discovery (Liu et al., 2018, Chen et al., 2013, Eyk et al., 2016, Drews, 2000). Some works studied disease mechanisms and drug repositioning by drug-disease networks, based on the database involving drugs, known protein complexes and diseases (Yu et al., 2015, Dudley et al., 2011, Daminelli et al., 2012). In addition, other studies have shown that some disease-related protein ligands are drug molecules treating certain diseases, while one of drug discovery methods is based on drug-disease protein structure research (Imming et al., 2006, Jubb et al., 2017, Chen et al., 2018, Awan et al., 2017).

With the development of structural biology and structural genomics, the number of macromolecular structures stored in Protein Data Bank (PDB) (Berman, 2000) is growing rapidly. At present, the number of the solved protein structures in the PDB is more than 110000. Simultaneously, The databases of protein-ligand interactions also grew up rapidly. PDBsum (Laskowski, 2004) was a graphical database that briefly summarizes three-dimensional protein structures stored in the PDB database. It provided detailed information of protein sequences and related ligands. BioLip (Yang et al., 2012) was a biologically relevant protein-ligand interaction database, including the annotation information for each entry: ligand-binding residues, ligand-binding affinities, catalytic sites, EC numbers, Gene ontologies and cross-links to other databases. PLID database (Reddy et al., 2008) stored protein-ligand binding environment and physical-chemical properties of ligands for each entry. Databases of Binding MOAD (Hu et al., 2005) and the PDBbind (Liu et al., 2014) provided the binding affinity data of protein-ligand complexes with known three-dimensional structures. It adopted K_i (inhibition constant), K_d (dissociation constant) and IC50 (concentration at 50% Inhibition) to measure the intensity of each protein-ligand interaction.

Some databases of drug-target interactions were also based on the PDB. PDTD (Gao et al., 2008) was a drug-target database containing the information of crystal structures derived from PDB and pathology information for each target. PDID (Wang et al., 2015) provided two data sets, a drug-target interaction data set and a predicative one based on three predictive algorithm: ILbind (Hu et al., 2012), SAMP (Xie and Bourne, 2008) and eFindSite (Brylinski and Feinstein, 2013, Feinstein and Brylinski, 2014). The database of Guide to PHARMACOLOGY (Southan et al., 2015) provided the information of pharmacological, chemical, genetic and pathophysiologic data. SuperTarget (Gunther et al., 2007) was a database of drug-target biology and pathology, which integrated drug molecules, 3D structures of drug targets, therapeutic range of drugs, side effects of drugs, drug metabolic pathways and gene annotation information for drug targets. At present, the database contains over 2500 drug targets and 1500 drug molecules.

Among these databases, few one contained disease-related protein complexes information. For example, UniProt (Consortium, 2014) collected protein sequences as well as the annotation information and allowed us to find a large number of proteins with known disease information. At present, it stores more than 80 million protein sequences. Therapeutic Target Database (TTD) (Yang et al., 2015, Chen, 2002, Li et al., 2018) provided information about known therapeutic proteins and nucleic acid targets, targeted diseases, pathway information and drugs related with corresponding targets. The main purpose of SGC (structural genomics consortium, http://www.thesgc.org/), which was an unprofitable, public-private cooperative genomic consortium, was to define the 3-D structures of human proteins that is important for medicine. However, it did not provide sufficient structural information. Overall, these protein-ligand databases, protein databases, and drug-target databases provided a wealth of knowledge for specific research areas. Moreover, most annotations of drug molecules, human protein complex structures and human diseases were scattered in different databases.

In this article, we built an integrated database, named dbHDPLS, which is a complete database stored information about drug-target interactions with related human diseases as well as information about drugs and protein-ligand complex structures. Starting from the PDB database as well as originating from other protein and drug databases, such as UniProt and DrugBank, human disease-related protein-ligand complex structures were curated. The related information for each protein-ligand complex in the database was also annotated: protein structure properties, protein functions and disease information as well as the physicochemical properties of ligands and drug information. As a result, the database contains the vast majority of protein-ligand complexes which are human disease-related in the PDB databank. The details of the database can be freely accessible from our web server. Fig. 1 displays the pipeline of the database.

Section snippets

Construction and content

Our database was constructed using human proteins with known structure in the PDB and related information from the databases of DrugBank, UniProt, BioLip, PDBbind and Binding MOAD. At present, the total amount of protein structures is more than 120 thousand in the PDB (1 Jan., 2018). From the statistics given in the PDB, 32747 human protein structures were obtained, which are very complicated and contain not only protein-ligand complex structures but also protein structures or protein-nucleic

Utility and discussion

The database contains 8833 human proteins that are annotated as disease proteins in UniProt. For each protein-ligand complex structure, the database provides the information of structure names, the used experimental method to solve the crystal complex structure, the resolution of the structure, the index number of each structure corresponding to the primary literature in PubMed and download link for each pdb structure file. For protein in each structure item, the database provides the protein

Conclusions

This paper proposed a comprehensive protein-ligand structure database related to human diseases, and a wealth of biological, chemical, and pharmacological information for protein structures. The information of Ligand drugs, protein diseases and protein-ligand interactions are also provided in the database. We also built a webserver to easily browse and search for specific information. The usage of the webserver is also presented in this work. Moreover, a case study of the analysis of

Availability of data and materials

The datasets analysed during the current study are available in the below websites,

PDB database: http://www.rcsb.org/pdb/home/home.do
PDBbind database: http://www.pdbbind.org.cn/
Binding MOAD database: http://www.bindingmoad.org/
UniProt database: http://www.uniprot.org/
DrugBank database: http://www.drugbank.ca/
TTD database: http://bidd.nus.edu.sg/BIDD-Databases/TTD/TTD.asp.

All data generated during this study are available in the webpage, http://DeepLearner.ahu.edu.cn/web/dbDPLS/. This database

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61672035 and 61472282), Anhui Province Funds for Excellent Youth Scholars in Colleges (gxyqZD2016068) and Anhui Scientific Research Foundation for Returned Scholars.

References (46)

D. Barradas-Bautista et al.
Chapter seven-structural prediction of protein-protein interactions by docking: application to biomedical problems
P.J. Hajduk et al.
Predicting protein druggability
Drug Discovery Today
(2005)
G. Hu et al.
Finding protein targets for small biologically relevant ligands across fold space using inverse ligand binding predictions
Structure
(2012)
S.S. Hu et al.
Protein binding hot spots prediction from sequence only by a new ensemble learning method
Amino Acids
(2017)
Y. Ivarsson et al.
Affinity and specificity of motif-based protein-protein interactions
Curr. Opin. Struct. Biol.
(2019)
H.C. Jubb et al.
Mutations at protein-protein interfaces: small changes over big surfaces have large impacts on human health
Prog. Biophys. Mol. Biol.
(2017)
L. Lian et al.
Protein-ligand interactions: exchange processes and determination of ligand conformation and protein-ligand contacts
Methods Enzymol.
(1994)
Q. Liu et al.
dbMPIKT: a database of kinetic and thermodynamic mutant protein interactions
BMC Bioinform.
(2018)
C. Mulder et al.
Proteomic tools to study drug function
Curr. Opin. Syst. Biol.
(2018)
Y. Murakami et al.
Network analysis and in silico prediction of protein-protein interactions with applications in drug discovery
Curr. Opin. Struct. Biol.
(2017)

A.S. Reddy et al.

Protein ligand interaction database (PLID)

Comput. Biol. Chem.

(2008)

V. Årajer et al.

Watching proteins function with time-resolved X-ray crystallography

J. Phys. D: Appl. Phys.

(2017)

F.M. Awan et al.

Mutation-structure-function relationship based integrated strategy reveals the potential impact of deleterious missense mutations in autophagy related proteins on hepatocellular carcinoma (HCC): a comprehensive informatics approach

Int. J. Mol. Sci.

(2017)

K.A. Barlow et al.

Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation

J. Phys. Chem. B

(2018)

S. Barolo

Three habits of highly effective signaling pathways: principles of transcriptional control by developmental cell signaling

Genes Dev.

(2002)

H.M. Berman

The protein data bank

Nucleic Acids Res.

(2000)

M. Brylinski et al.

eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands

J. Comput.-Aided Mol. Des.

(2013)

P. Chen et al.

Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Proteins

(2013)

P. Chen et al.

A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

IEEE/ACM Trans. Comput. Biol. Bioinform.

(2016)

R. Chen et al.

Machine learning for drug-target interaction prediction

Molecules

(2018)

X. Chen

TTD: therapeutic target database

Nucleic Acids Res.

(2002)

U. Consortium

UniProt: a hub for protein information

Nucleic Acids Res.

(2014)

A. Dal Corso et al.

Affinity enhancement of protein ligands by reversible covalent modification of neighboring lysine residues

Angew. Chem. Int. Ed. Engl.

(2018)

Cited by (5)

Databases of ligand-binding pockets and protein-ligand interactions
2024, Computational and Structural Biotechnology Journal
Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
Protein-Ligand CH−π Interactions: Structural Informatics, Energy Function Development, and Docking Implementation
2023, Journal of Chemical Theory and Computation
ArVirInd-a database of arboviral antigenic proteins from the Indian subcontinent
2022, PeerJ
Imbalance Data Processing Strategy for Protein Interaction Sites Prediction
2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics
Semi-supervised prediction of protein interaction sites from unlabeled sample information
2019, BMC Bioinformatics

¹: These two authors contributed equally to this study.

View full text

Research ArticledbHDPLS: A database of human disease-related protein-ligand structures

Abstract

Introduction

Section snippets

Construction and content

Utility and discussion

Conclusions

Availability of data and materials

Acknowledgements

Drug Discovery Today

Structure

Amino Acids

Curr. Opin. Struct. Biol.

Prog. Biophys. Mol. Biol.

Methods Enzymol.

BMC Bioinform.

Curr. Opin. Syst. Biol.

Curr. Opin. Struct. Biol.

Comput. Biol. Chem.

Watching proteins function with time-resolved X-ray crystallography

J. Phys. D: Appl. Phys.

Mutation-structure-function relationship based integrated strategy reveals the potential impact of deleterious missense mutations in autophagy related proteins on hepatocellular carcinoma (HCC): a comprehensive informatics approach

Int. J. Mol. Sci.

Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation

J. Phys. Chem. B

Three habits of highly effective signaling pathways: principles of transcriptional control by developmental cell signaling

Genes Dev.

The protein data bank

Nucleic Acids Res.

eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands

J. Comput.-Aided Mol. Des.

Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Proteins

A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

IEEE/ACM Trans. Comput. Biol. Bioinform.

Machine learning for drug-target interaction prediction

Molecules

TTD: therapeutic target database

Nucleic Acids Res.

UniProt: a hub for protein information

Nucleic Acids Res.

Affinity enhancement of protein ligands by reversible covalent modification of neighboring lysine residues

Angew. Chem. Int. Ed. Engl.

Research Article
dbHDPLS: A database of human disease-related protein-ligand structures