Abstract
In this chapter we review current approaches to store, retrieve and integrate diverse protein interaction data. To incorporate the heterogeneous results of computational predictions and protein interaction experiments, methods of data integration have been widely used which provide efficient presentation, and analysis of interaction data. Among them statistical meta-analysis and supervised machine learning methods are becoming very popular in this respect. While integration methods reduce complexity of system representation, the databases provide efficient storage and retrieval of data. A large variety of interaction databases exist which differ in scope, type and coverage of data as well as query search capabilities. We categorize the databases of protein interactions into comprehensive, specialized, structural and databases developed for network analysis. This gives a rough grouping of resources based on how they might be used. In particular, one might often start with a comprehensive database search and afterwards perform a refined search of the obtained results using a database with a more specific focus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Birkland A, Yona G: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 2006, 7:70.
Joyce AR, Palsson BO: The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 2006, 7(3):198–210.
Lacroix Z, Raschid L, Eckman BA: Techniques for optimization of queries on integrated biological resources. J Bioinform Comput Biol 2004, 2(2):375–411.
Hedges LV, Olkin I: Statistical methods for meta-analysis: Academic Press; 1985.
Hunter JE, Schmidt FL: ‘Methods of Meta-Analysis : Correcting Error and Bias in Research’ Sage Publications; 1990.
Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356.
Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, de Atauri P, Aitchison JD, Hood L, Siegel AF et al.: A data integration methodology for systems biology. Proc Natl Acad Sci U S A 2005, 102(48):17296–17301.
Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD et al.: A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 2005, 102(48):17302–17307.
Gilchrist MA, Salter LA, Wagner A: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 2004, 20(5):689–700.
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–453.
Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput 2005:531–542.
Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–4400.
Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 2006, 63(3):490–500.
Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 2004, 22(1):78–85.
Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science 2004, 306(5701):1555–1558.
Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 2004, 20 Suppl 1:I363–I370.
Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494.
Huttenhower C, Troyanskaya OG: Bayesian data integration: a functional perspective. Comput Syst Bioinformatics Conf 2006:341–351.
Zhang LV, Wong SL, King OD, Roth FP: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004, 5:38.
Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 2004, 5:154.
Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951–959.
Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–953.
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The database of interacting proteins: 2004 update. Nucleic Acids Res 2004, 32(Database issue):D449–451.
Duan XJ, Xenarios I, Eisenberg D: Describing biological protein interactions in terms of protein states and state transitions: the LiveDIP database. Mol Cell Proteomics 2002, 1(2): 104–116.
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 2004, 5(5):R35.
Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 2005, 33(Database issue):D418–424.
von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 2005, 33(Database issue):D433–437.
Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513(1):135–140.
Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A et al.: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32(Database issue):D452–455.
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue): D535–539.
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34(Database issue):D354–357.
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C et al.: The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 2004, 22(2):177–183.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504.
Bader GD, Hogue CW: BIND–a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 2000, 16(5):465–477.
Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM et al.: Human protein reference database–2006 update. Nucleic Acids Res 2006, 34(Database issue):D411–414.
Li J, Ning Y, Hedley W, Saunders B, Chen Y, Tindill N, Hannay T, Subramaniam S: The Molecule Pages database. Nature 2002, 420(6916):716–717.
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 2006, 34(Database issue):D436–441.
Pacifico S, Liu G, Guest S, Parrish JR, Fotouhi F, Finley RL, Jr.: A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila. BMC Bioinformatics 2006, 7:195.
Hoebeke M, Chiapello H, Noirot P, Bessieres P: SPiD: a subtilis protein interaction database. Bioinformatics 2001, 17(12):1209–1212.
Koike A, Kobayashi Y, Takagi T: Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 2003, 13(6A):1231–1243.
Marino-Ramirez L, Minor JL, Reading N, Hu JC: Identification and mapping of self-assembling protein domains encoded by the Escherichia coli K-12 genome by use of lambda repressor fusions. J Bacteriol 2004, 186(5):1311–1319.
Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M: pSTIING: a ‘systems’ approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer. Nucleic Acids Res 2006, 34(Database issue):D527–534.
Mathew JP, Taylor BS, Bader GD, Pyarajan S, Antoniotti M, Chinnaiyan AM, Sander C, Burakoff SJ, Mishra B: From bytes to bedside: data integration and computational biology for translational cancer research. PLoS Comput Biol 2007, 3(2):e12.
Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31(13):3635–3641.
Kikuno R, Nagase T, Nakayama M, Koga H, Okazaki N, Nakajima D, Ohara O: HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE. Nucleic Acids Res 2004, 32(Database issue):D502–504.
Thorn KS, Bogan AA: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001, 17(3):284–285.
Fischer TB, Arunachalam KV, Bailey D, Mangual V, Bakhru S, Russo R, Huang D, Paczkowski M, Lalchandani V, Ramachandra C et al.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 2003, 19(11): 1453–1454.
Keskin O, Ma B, Nussinov R: Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5): 1281-1294.
Teyra J, Doms A, Schroeder M, Pisabarro MT: SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics 2006, 7:104.
Kumar MD, Gromiha MM: PINT: Protein-protein Interactions Thermodynamic Database. Nucleic Acids Res 2006, 34(Database issue):D195–198.
Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 2003, 31(1):251–254.
Pagel P, Oesterheld M, Stumpflen V, Frishman D: The DIMA web resource–exploring the protein domain network. Bioinformatics 2006, 22(8):997–998.
Raghavachari B, Tasneem A, Przytycka TM, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res 2007, 36(Database issue):D656–.
Kundrotas PJ, Alexov E: PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res 2007, 35(Database issue):D575–579.
Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, 33(Database issue):D413–417.
Aloy P, Russell RB: InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics 2003, 19(1):161–162.
Henrick K, Thornton JM: PQS: a protein quaternary structure file server. Trends Biochem Sci 1998, 23(9):358–361.
Davis FP, Sali A: PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 2005, 21(9):1901–1907.
Pieper U, Eswar N, Braberg H, Madhusudhan MS, Davis FP, Stuart AC, Mirkovic N, Rossi A, Marti-Renom MA, Fiser A et al.: MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2004, 32(Database issue):D217–222.
Shoemaker BA, Panchenko AR, Bryant SH: Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 2006, 15(2):352–361.
Winter C, Henschel A, Kim WK, Schroeder M: SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res 2006, 34(Database issue):D310–314.
Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005, 21(3):410–412.
Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D et al.: MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2006, 34(Database issue):D291–295.
Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C et al.: Protein interaction mapping: a Drosophila case study. Genome Res 2005, 15(3):376–384.
Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MM, Ling J, Xu T, Wasserman WW, Ouellette BF: Ulysses - an application for the projection of molecular interactions across species. Genome Biol 2005, 6(12):R106.
Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res 2004, 32(Web Server issue):W83–88.
Prieto C, De Las Rivas J: APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res 2006, 34(Web Server issue):W298–302.
Yip KY, Yu H, Kim PM, Schultz M, Gerstein M: The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics 2006, 22(23):2968–2970.
Aragues R, Jaeggi D, Oliva B: PIANA: protein interactions and network analysis. Bioinformatics 2006, 22(8):1015–1017.
Hu Z, Ng DM, Yamada T, Chen C, Kawashima S, Mellor J, Linghu B, Kanehisa M, Stuart JM, DeLisi C: VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 2007, 35(Web Server issue):W625–632.
Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A 2003, 100(20):11394–11399.
Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A 2005, 102(6):1974–1979.
Yin Y, Tainsky MA, Bischoff FZ, Strong LC, Wahl GM: Wild-type p53 restores cell cycle control and inhibits gene amplification in cells with mutant p53 alleles. Cell 1992, 70(6): 937–948.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Shoemaker, B., Panchenko, A. (2008). Handling Diverse Protein Interaction Data: Integration, Storage and Retrieval. In: Panchenko, A., Przytycka, T. (eds) Protein-protein Interactions and Networks. Computational Biology, vol 9. Springer, London. https://doi.org/10.1007/978-1-84800-125-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-84800-125-1_2
Publisher Name: Springer, London
Print ISBN: 978-1-84800-124-4
Online ISBN: 978-1-84800-125-1
eBook Packages: Computer ScienceComputer Science (R0)