Handling Diverse Protein Interaction Data: Integration, Storage and Retrieval

Shoemaker, Benjamin; Panchenko, Anna

doi:10.1007/978-1-84800-125-1_2

Benjamin Shoemaker⁷ &
Anna Panchenko⁷

Part of the book series: Computational Biology ((COBO,volume 9))

1342 Accesses

Abstract

In this chapter we review current approaches to store, retrieve and integrate diverse protein interaction data. To incorporate the heterogeneous results of computational predictions and protein interaction experiments, methods of data integration have been widely used which provide efficient presentation, and analysis of interaction data. Among them statistical meta-analysis and supervised machine learning methods are becoming very popular in this respect. While integration methods reduce complexity of system representation, the databases provide efficient storage and retrieval of data. A large variety of interaction databases exist which differ in scope, type and coverage of data as well as query search capabilities. We categorize the databases of protein interactions into comprehensive, specialized, structural and databases developed for network analysis. This gives a rough grouping of resources based on how they might be used. In particular, one might often start with a comprehensive database search and afterwards perform a refined search of the obtained results using a database with a more specific focus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

IMEx Databases: Displaying Molecular Interactions into a Single, Standards-Compliant Dataset

Navigating the Global Protein–Protein Interaction Landscape Using iRefWeb

The MIntAct Project and Molecular Interaction Databases

References

Birkland A, Yona G: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 2006, 7:70.
Article Google Scholar
Joyce AR, Palsson BO: The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 2006, 7(3):198–210.
Article Google Scholar
Lacroix Z, Raschid L, Eckman BA: Techniques for optimization of queries on integrated biological resources. J Bioinform Comput Biol 2004, 2(2):375–411.
Article Google Scholar
Hedges LV, Olkin I: Statistical methods for meta-analysis: Academic Press; 1985.
Google Scholar
Hunter JE, Schmidt FL: ‘Methods of Meta-Analysis : Correcting Error and Bias in Research’ Sage Publications; 1990.
Google Scholar
Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356.
Article Google Scholar
Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, de Atauri P, Aitchison JD, Hood L, Siegel AF et al.: A data integration methodology for systems biology. Proc Natl Acad Sci U S A 2005, 102(48):17296–17301.
Article Google Scholar
Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD et al.: A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 2005, 102(48):17302–17307.
Article Google Scholar
Gilchrist MA, Salter LA, Wagner A: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 2004, 20(5):689–700.
Article Google Scholar
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–453.
Article Google Scholar
Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput 2005:531–542.
Google Scholar
Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–4400.
Article Google Scholar
Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 2006, 63(3):490–500.
Article Google Scholar
Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 2004, 22(1):78–85.
Article Google Scholar
Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science 2004, 306(5701):1555–1558.
Article Google Scholar
Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 2004, 20 Suppl 1:I363–I370.
Article Google Scholar
Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494.
Article Google Scholar
Huttenhower C, Troyanskaya OG: Bayesian data integration: a functional perspective. Comput Syst Bioinformatics Conf 2006:341–351.
Google Scholar
Zhang LV, Wong SL, King OD, Roth FP: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004, 5:38.
Article Google Scholar
Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 2004, 5:154.
Article Google Scholar
Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951–959.
Article Google Scholar
Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–953.
Article Google Scholar
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The database of interacting proteins: 2004 update. Nucleic Acids Res 2004, 32(Database issue):D449–451.
Article Google Scholar
Duan XJ, Xenarios I, Eisenberg D: Describing biological protein interactions in terms of protein states and state transitions: the LiveDIP database. Mol Cell Proteomics 2002, 1(2): 104–116.
Article Google Scholar
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 2004, 5(5):R35.
Article Google Scholar
Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 2005, 33(Database issue):D418–424.
Article Google Scholar
von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 2005, 33(Database issue):D433–437.
Article Google Scholar
Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513(1):135–140.
Article Google Scholar
Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A et al.: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32(Database issue):D452–455.
Article Google Scholar
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue): D535–539.
Article Google Scholar
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34(Database issue):D354–357.
Article Google Scholar
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C et al.: The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 2004, 22(2):177–183.
Article Google Scholar
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504.
Article Google Scholar
Bader GD, Hogue CW: BIND–a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 2000, 16(5):465–477.
Article Google Scholar
Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM et al.: Human protein reference database–2006 update. Nucleic Acids Res 2006, 34(Database issue):D411–414.
Article Google Scholar
Li J, Ning Y, Hedley W, Saunders B, Chen Y, Tindill N, Hannay T, Subramaniam S: The Molecule Pages database. Nature 2002, 420(6916):716–717.
Article Google Scholar
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 2006, 34(Database issue):D436–441.
Article Google Scholar
Pacifico S, Liu G, Guest S, Parrish JR, Fotouhi F, Finley RL, Jr.: A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila. BMC Bioinformatics 2006, 7:195.
Article Google Scholar
Hoebeke M, Chiapello H, Noirot P, Bessieres P: SPiD: a subtilis protein interaction database. Bioinformatics 2001, 17(12):1209–1212.
Article Google Scholar
Koike A, Kobayashi Y, Takagi T: Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 2003, 13(6A):1231–1243.
Article Google Scholar
Marino-Ramirez L, Minor JL, Reading N, Hu JC: Identification and mapping of self-assembling protein domains encoded by the Escherichia coli K-12 genome by use of lambda repressor fusions. J Bacteriol 2004, 186(5):1311–1319.
Article Google Scholar
Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M: pSTIING: a ‘systems’ approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer. Nucleic Acids Res 2006, 34(Database issue):D527–534.
Article Google Scholar
Mathew JP, Taylor BS, Bader GD, Pyarajan S, Antoniotti M, Chinnaiyan AM, Sander C, Burakoff SJ, Mishra B: From bytes to bedside: data integration and computational biology for translational cancer research. PLoS Comput Biol 2007, 3(2):e12.
Article Google Scholar
Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31(13):3635–3641.
Article Google Scholar
Kikuno R, Nagase T, Nakayama M, Koga H, Okazaki N, Nakajima D, Ohara O: HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE. Nucleic Acids Res 2004, 32(Database issue):D502–504.
Article Google Scholar
Thorn KS, Bogan AA: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001, 17(3):284–285.
Article Google Scholar
Fischer TB, Arunachalam KV, Bailey D, Mangual V, Bakhru S, Russo R, Huang D, Paczkowski M, Lalchandani V, Ramachandra C et al.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 2003, 19(11): 1453–1454.
Article Google Scholar
Keskin O, Ma B, Nussinov R: Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5): 1281-1294.
Article Google Scholar
Teyra J, Doms A, Schroeder M, Pisabarro MT: SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics 2006, 7:104.
Article Google Scholar
Kumar MD, Gromiha MM: PINT: Protein-protein Interactions Thermodynamic Database. Nucleic Acids Res 2006, 34(Database issue):D195–198.
Article Google Scholar
Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 2003, 31(1):251–254.
Article Google Scholar
Pagel P, Oesterheld M, Stumpflen V, Frishman D: The DIMA web resource–exploring the protein domain network. Bioinformatics 2006, 22(8):997–998.
Article Google Scholar
Raghavachari B, Tasneem A, Przytycka TM, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res 2007, 36(Database issue):D656–.
Article Google Scholar
Kundrotas PJ, Alexov E: PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res 2007, 35(Database issue):D575–579.
Article Google Scholar
Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, 33(Database issue):D413–417.
Article Google Scholar
Aloy P, Russell RB: InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics 2003, 19(1):161–162.
Article Google Scholar
Henrick K, Thornton JM: PQS: a protein quaternary structure file server. Trends Biochem Sci 1998, 23(9):358–361.
Article Google Scholar
Davis FP, Sali A: PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 2005, 21(9):1901–1907.
Article Google Scholar
Pieper U, Eswar N, Braberg H, Madhusudhan MS, Davis FP, Stuart AC, Mirkovic N, Rossi A, Marti-Renom MA, Fiser A et al.: MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2004, 32(Database issue):D217–222.
Article Google Scholar
Shoemaker BA, Panchenko AR, Bryant SH: Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 2006, 15(2):352–361.
Article Google Scholar
Winter C, Henschel A, Kim WK, Schroeder M: SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res 2006, 34(Database issue):D310–314.
Article Google Scholar
Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005, 21(3):410–412.
Article Google Scholar
Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D et al.: MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2006, 34(Database issue):D291–295.
Article Google Scholar
Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C et al.: Protein interaction mapping: a Drosophila case study. Genome Res 2005, 15(3):376–384.
Article Google Scholar
Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MM, Ling J, Xu T, Wasserman WW, Ouellette BF: Ulysses - an application for the projection of molecular interactions across species. Genome Biol 2005, 6(12):R106.
Article Google Scholar
Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res 2004, 32(Web Server issue):W83–88.
Article Google Scholar
Prieto C, De Las Rivas J: APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res 2006, 34(Web Server issue):W298–302.
Article Google Scholar
Yip KY, Yu H, Kim PM, Schultz M, Gerstein M: The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics 2006, 22(23):2968–2970.
Article Google Scholar
Aragues R, Jaeggi D, Oliva B: PIANA: protein interactions and network analysis. Bioinformatics 2006, 22(8):1015–1017.
Article Google Scholar
Hu Z, Ng DM, Yamada T, Chen C, Kawashima S, Mellor J, Linghu B, Kanehisa M, Stuart JM, DeLisi C: VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 2007, 35(Web Server issue):W625–632.
Article Google Scholar
Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A 2003, 100(20):11394–11399.
Article Google Scholar
Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A 2005, 102(6):1974–1979.
Article Google Scholar
Yin Y, Tainsky MA, Bischoff FZ, Strong LC, Wahl GM: Wild-type p53 restores cell cycle control and inhibits gene amplification in cells with mutant p53 alleles. Cell 1992, 70(6): 937–948.
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Center for Biotechnology Information, National Institutes of Health, Bethesda, USA
Benjamin Shoemaker & Anna Panchenko

Authors

Benjamin Shoemaker
View author publications
You can also search for this author in PubMed Google Scholar
Anna Panchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Shoemaker .

Editor information

Editors and Affiliations

National Institutes of Health, Bethesda, Maryland, USA
Anna Panchenko & Teresa Przytycka &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shoemaker, B., Panchenko, A. (2008). Handling Diverse Protein Interaction Data: Integration, Storage and Retrieval. In: Panchenko, A., Przytycka, T. (eds) Protein-protein Interactions and Networks. Computational Biology, vol 9. Springer, London. https://doi.org/10.1007/978-1-84800-125-1_2

Download citation

DOI: https://doi.org/10.1007/978-1-84800-125-1_2
Publisher Name: Springer, London
Print ISBN: 978-1-84800-124-4
Online ISBN: 978-1-84800-125-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics