Research ArticledbHDPLS: A database of human disease-related protein-ligand structures
Introduction
Proteins are important constituents of living organisms and always participate in complete physiological cycle process (Årajer and Schmidt, 2017). When proteins performed abnormal in the body, the normal physiological balance is broken and therefore the body appears lesions, the so-called diseases (Hanash, 2003, Mulder et al., 2018). Many human diseases are derived from the abnormalities of protein functions in vivo, so such proteins are considered as human disease-related proteins (Hajduk et al., 2005, Murakami et al., 2017). However, many proteins need to interact with small molecules to perform specific functions, and the binding of small molecules to proteins as well as its impact on biological effects has been one of the most important biological and chemical problems (Hu et al., 2017, Lian et al., 1994, Schneider, 1991, Barolo, 2002). Knowledge of the interaction between disease proteins and ligands is a key for studying the pathogenesis, drug repositioning and drug discovery (Liu et al., 2018, Chen et al., 2013, Eyk et al., 2016, Drews, 2000). Some works studied disease mechanisms and drug repositioning by drug-disease networks, based on the database involving drugs, known protein complexes and diseases (Yu et al., 2015, Dudley et al., 2011, Daminelli et al., 2012). In addition, other studies have shown that some disease-related protein ligands are drug molecules treating certain diseases, while one of drug discovery methods is based on drug-disease protein structure research (Imming et al., 2006, Jubb et al., 2017, Chen et al., 2018, Awan et al., 2017).
With the development of structural biology and structural genomics, the number of macromolecular structures stored in Protein Data Bank (PDB) (Berman, 2000) is growing rapidly. At present, the number of the solved protein structures in the PDB is more than 110000. Simultaneously, The databases of protein-ligand interactions also grew up rapidly. PDBsum (Laskowski, 2004) was a graphical database that briefly summarizes three-dimensional protein structures stored in the PDB database. It provided detailed information of protein sequences and related ligands. BioLip (Yang et al., 2012) was a biologically relevant protein-ligand interaction database, including the annotation information for each entry: ligand-binding residues, ligand-binding affinities, catalytic sites, EC numbers, Gene ontologies and cross-links to other databases. PLID database (Reddy et al., 2008) stored protein-ligand binding environment and physical-chemical properties of ligands for each entry. Databases of Binding MOAD (Hu et al., 2005) and the PDBbind (Liu et al., 2014) provided the binding affinity data of protein-ligand complexes with known three-dimensional structures. It adopted Ki (inhibition constant), Kd (dissociation constant) and IC50 (concentration at 50% Inhibition) to measure the intensity of each protein-ligand interaction.
Some databases of drug-target interactions were also based on the PDB. PDTD (Gao et al., 2008) was a drug-target database containing the information of crystal structures derived from PDB and pathology information for each target. PDID (Wang et al., 2015) provided two data sets, a drug-target interaction data set and a predicative one based on three predictive algorithm: ILbind (Hu et al., 2012), SAMP (Xie and Bourne, 2008) and eFindSite (Brylinski and Feinstein, 2013, Feinstein and Brylinski, 2014). The database of Guide to PHARMACOLOGY (Southan et al., 2015) provided the information of pharmacological, chemical, genetic and pathophysiologic data. SuperTarget (Gunther et al., 2007) was a database of drug-target biology and pathology, which integrated drug molecules, 3D structures of drug targets, therapeutic range of drugs, side effects of drugs, drug metabolic pathways and gene annotation information for drug targets. At present, the database contains over 2500 drug targets and 1500 drug molecules.
Among these databases, few one contained disease-related protein complexes information. For example, UniProt (Consortium, 2014) collected protein sequences as well as the annotation information and allowed us to find a large number of proteins with known disease information. At present, it stores more than 80 million protein sequences. Therapeutic Target Database (TTD) (Yang et al., 2015, Chen, 2002, Li et al., 2018) provided information about known therapeutic proteins and nucleic acid targets, targeted diseases, pathway information and drugs related with corresponding targets. The main purpose of SGC (structural genomics consortium, http://www.thesgc.org/), which was an unprofitable, public-private cooperative genomic consortium, was to define the 3-D structures of human proteins that is important for medicine. However, it did not provide sufficient structural information. Overall, these protein-ligand databases, protein databases, and drug-target databases provided a wealth of knowledge for specific research areas. Moreover, most annotations of drug molecules, human protein complex structures and human diseases were scattered in different databases.
In this article, we built an integrated database, named dbHDPLS, which is a complete database stored information about drug-target interactions with related human diseases as well as information about drugs and protein-ligand complex structures. Starting from the PDB database as well as originating from other protein and drug databases, such as UniProt and DrugBank, human disease-related protein-ligand complex structures were curated. The related information for each protein-ligand complex in the database was also annotated: protein structure properties, protein functions and disease information as well as the physicochemical properties of ligands and drug information. As a result, the database contains the vast majority of protein-ligand complexes which are human disease-related in the PDB databank. The details of the database can be freely accessible from our web server. Fig. 1 displays the pipeline of the database.
Section snippets
Construction and content
Our database was constructed using human proteins with known structure in the PDB and related information from the databases of DrugBank, UniProt, BioLip, PDBbind and Binding MOAD. At present, the total amount of protein structures is more than 120 thousand in the PDB (1 Jan., 2018). From the statistics given in the PDB, 32747 human protein structures were obtained, which are very complicated and contain not only protein-ligand complex structures but also protein structures or protein-nucleic
Utility and discussion
The database contains 8833 human proteins that are annotated as disease proteins in UniProt. For each protein-ligand complex structure, the database provides the information of structure names, the used experimental method to solve the crystal complex structure, the resolution of the structure, the index number of each structure corresponding to the primary literature in PubMed and download link for each pdb structure file. For protein in each structure item, the database provides the protein
Conclusions
This paper proposed a comprehensive protein-ligand structure database related to human diseases, and a wealth of biological, chemical, and pharmacological information for protein structures. The information of Ligand drugs, protein diseases and protein-ligand interactions are also provided in the database. We also built a webserver to easily browse and search for specific information. The usage of the webserver is also presented in this work. Moreover, a case study of the analysis of
Availability of data and materials
The datasets analysed during the current study are available in the below websites,
PDB database: http://www.rcsb.org/pdb/home/home.do
PDBbind database: http://www.pdbbind.org.cn/
Binding MOAD database: http://www.bindingmoad.org/
UniProt database: http://www.uniprot.org/
DrugBank database: http://www.drugbank.ca/
TTD database: http://bidd.nus.edu.sg/BIDD-Databases/TTD/TTD.asp.
All data generated during this study are available in the webpage, http://DeepLearner.ahu.edu.cn/web/dbDPLS/. This database
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 61672035 and 61472282), Anhui Province Funds for Excellent Youth Scholars in Colleges (gxyqZD2016068) and Anhui Scientific Research Foundation for Returned Scholars.
References (46)
- et al.
Chapter seven-structural prediction of protein-protein interactions by docking: application to biomedical problems
- et al.
Predicting protein druggability
Drug Discovery Today
(2005) - et al.
Finding protein targets for small biologically relevant ligands across fold space using inverse ligand binding predictions
Structure
(2012) - et al.
Protein binding hot spots prediction from sequence only by a new ensemble learning method
Amino Acids
(2017) - et al.
Affinity and specificity of motif-based protein-protein interactions
Curr. Opin. Struct. Biol.
(2019) - et al.
Mutations at protein-protein interfaces: small changes over big surfaces have large impacts on human health
Prog. Biophys. Mol. Biol.
(2017) - et al.
Protein-ligand interactions: exchange processes and determination of ligand conformation and protein-ligand contacts
Methods Enzymol.
(1994) - et al.
dbMPIKT: a database of kinetic and thermodynamic mutant protein interactions
BMC Bioinform.
(2018) - et al.
Proteomic tools to study drug function
Curr. Opin. Syst. Biol.
(2018) - et al.
Network analysis and in silico prediction of protein-protein interactions with applications in drug discovery
Curr. Opin. Struct. Biol.
(2017)
Protein ligand interaction database (PLID)
Comput. Biol. Chem.
Watching proteins function with time-resolved X-ray crystallography
J. Phys. D: Appl. Phys.
Mutation-structure-function relationship based integrated strategy reveals the potential impact of deleterious missense mutations in autophagy related proteins on hepatocellular carcinoma (HCC): a comprehensive informatics approach
Int. J. Mol. Sci.
Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation
J. Phys. Chem. B
Three habits of highly effective signaling pathways: principles of transcriptional control by developmental cell signaling
Genes Dev.
The protein data bank
Nucleic Acids Res.
eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands
J. Comput.-Aided Mol. Des.
Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences
Proteins
A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction
IEEE/ACM Trans. Comput. Biol. Bioinform.
Machine learning for drug-target interaction prediction
Molecules
TTD: therapeutic target database
Nucleic Acids Res.
UniProt: a hub for protein information
Nucleic Acids Res.
Affinity enhancement of protein ligands by reversible covalent modification of neighboring lysine residues
Angew. Chem. Int. Ed. Engl.
Cited by (5)
Databases of ligand-binding pockets and protein-ligand interactions
2024, Computational and Structural Biotechnology JournalProtein-Ligand CH−π Interactions: Structural Informatics, Energy Function Development, and Docking Implementation
2023, Journal of Chemical Theory and ComputationImbalance Data Processing Strategy for Protein Interaction Sites Prediction
2021, IEEE/ACM Transactions on Computational Biology and BioinformaticsSemi-supervised prediction of protein interaction sites from unlabeled sample information
2019, BMC Bioinformatics
- 1
These two authors contributed equally to this study.