Skip to main content

Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol

  • Protocol
  • First Online:
Protein-Protein Interaction Networks

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2074))

Abstract

Proteins perform their functions by interacting with other proteins. Protein–protein interaction (PPI) is critical for understanding the functions of individual proteins, the mechanisms of biological processes, and the disease mechanisms. High-throughput experiments accumulated a huge number of PPIs in PubMed articles, and their extraction is possible only through automated approaches. The standard text-mining protocol includes four major tasks, namely, recognizing protein mentions, normalizing protein names and aliases to unique identifiers such as gene symbol, extracting PPIs, and visualizing the PPI network using Cytoscape or other visualization tools. Each task is challenging and has been revised over several years to improve the performance. We present a protocol based on our hybrid approaches and show the possibility of presenting each task as an independent web-based tool, NAGGNER for protein name recognition, ProNormz for protein name normalization, PPInterFinder for PPI extraction, and HPIminer for PPI network visualization. The protocol is specific to human but can be generalized to other organisms. We include KinderMiner, our most recent text-mining tool that predicts PPIs by retrieving significant co-occurring protein pairs. The algorithm is simple, easy to implement, and generalizable to other biological challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Raja K, Patrick M, Gao Y, Madu D, Yang Y, Tsoi LC (2017) A review of recent advancement in integrating Omics data with literature mining towards biomedical discoveries. Int J Genomics 2017:10. https://doi.org/10.1155/2017/6213474

    Article  CAS  Google Scholar 

  2. Raja K, Subramani S, Natarajan J (2013) PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (Oxford) 2013:bas052. https://doi.org/10.1093/database/bas052

    Article  CAS  Google Scholar 

  3. Subramani S, Kalpana R, Monickaraj PM, Natarajan J (2015) HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. J Biomed Inform 54:121–131. https://doi.org/10.1016/j.jbi.2015.01.006

    Article  PubMed  Google Scholar 

  4. Kuusisto F, Steill J, Kuang Z, Thomson J, Page D, Stewart R (2017) A simple text mining approach for ranking pairwise associations in biomedical applications. AMIA Jt Summits Transl Sci Proc 2017:166–174

    PubMed  PubMed Central  Google Scholar 

  5. Liu Y, Liang Y, Wishart D (2015) PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 43(W1):W535–W542. https://doi.org/10.1093/nar/gkv383

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M (2004) Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18):3604–3612. https://doi.org/10.1093/bioinformatics/bth451

    Article  CAS  PubMed  Google Scholar 

  7. Chowdhary R, Zhang J, Liu JS (2009) Bayesian inference of protein-protein interactions from biological literature. Bioinformatics 25(12):1536–1542. https://doi.org/10.1093/bioinformatics/btp245

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bui QC, Katrenko S, Sloot PM (2011) A hybrid approach to extract protein-protein interactions. Bioinformatics 27(2):259–265. https://doi.org/10.1093/bioinformatics/btq620

    Article  CAS  PubMed  Google Scholar 

  9. Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28. https://doi.org/10.1038/88213

    Article  CAS  PubMed  Google Scholar 

  10. Ananiadou S, Mcnaught J (2005) Text mining for biology and biomedicine. Artech House, Inc., Boston

    Google Scholar 

  11. Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10:233. https://doi.org/10.1186/1471-2105-10-233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Raja K, Subramani S, Natarajan J (2014) A hybrid named entity tagger for tagging human proteins/genes. Int J Data Min Bioinform 10(3):315–328

    Article  PubMed  Google Scholar 

  13. Krallinger M, Valencia A (2005) Text-mining and information-retrieval services for molecular biology. Genome Biol 6(7):224. https://doi.org/10.1186/gb-2005-6-7-224

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Blaschke C, Andrade MA, Ouzounis C, Valencia A (1999) Automatic extraction of biological information from scientific text: protein-protein interactions. In: Proceedings international conference on intelligent systems for molecular biology, pp 60–67

    Google Scholar 

  15. Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN (1999) MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27(6):1210–1214. 1216-1217

    Article  CAS  PubMed  Google Scholar 

  16. Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321. https://doi.org/10.1038/nbt1018

    Article  CAS  PubMed  Google Scholar 

  17. Gao M, Skolnick J (2010) Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A 107(52):22517–22522. https://doi.org/10.1073/pnas.1012820107

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhou D, He Y (2008) Extracting interactions between proteins from the literature. J Biomed Inform 41(2):393–407. https://doi.org/10.1016/j.jbi.2007.11.008

    Article  CAS  PubMed  Google Scholar 

  19. Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, Chen C, del-Toro N, Duesbury M, Dumousseau M, Galeota E, Hinz U, Iannuccelli M, Jagannathan S, Jimenez R, Khadake J, Lagreid A, Licata L, Lovering RC, Meldal B, Melidoni AN, Milagros M, Peluso D, Perfetto L, Porras P, Raghunath A, Ricard-Blum S, Roechert B, Stutz A, Tognolli M, van Roey K, Cesareni G, Hermjakob H (2014) The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42(Database issue):D358–D363. https://doi.org/10.1093/nar/gkt1115

    Article  CAS  PubMed  Google Scholar 

  20. Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G (2010) MINT, the molecular interaction database: 2009 update. Nucleic Acids Res 38(Database issue):D532–D539. https://doi.org/10.1093/nar/gkp983

    Article  CAS  PubMed  Google Scholar 

  21. Bader GD, Betel D, Hogue CW (2003) BIND: the biomolecular interaction network database. Nucleic Acids Res 31(1):248–250

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32(Database issue):D449–D451. https://doi.org/10.1093/nar/gkh086

    Article  PubMed  PubMed Central  Google Scholar 

  23. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 45(D1):D362–d368. https://doi.org/10.1093/nar/gkw937

    Article  CAS  PubMed  Google Scholar 

  24. Subramani S, Raja K, Natarajan J (2014) ProNormz--an integrated approach for human proteins and protein kinases normalization. J Biomed Inform 47:131–138. https://doi.org/10.1016/j.jbi.2013.10.003

    Article  PubMed  Google Scholar 

  25. Levy R, Andrew G (2006) Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the fifth international conference on language resources and evaluation. doi:citeulike-article-id:3441831

    Google Scholar 

  26. Raja K, Natarajan J (2018) Mining protein phosphorylation information from biomedical literature using NLP parsing and support vector machines. Comput Methods Prog Biomed 160:57–64. https://doi.org/10.1016/j.cmpb.2018.03.022

    Article  Google Scholar 

  27. Mukherjea S, Subramaniam LV, Chanda G, Sankararaman S, Kothari R, Batra VS, Bhardwaj DN, Srivastava B (2004) Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM J Res Dev 48:693–702

    Article  Google Scholar 

  28. Erhardt RA, Schneider R, Blaschke C (2006) Status of text-mining techniques applied to biomedical text. Drug Discov Today 11(7–8):315–325. https://doi.org/10.1016/j.drudis.2006.02.011

    Article  CAS  PubMed  Google Scholar 

  29. Xia JR, Liu NF, Zhu NX (2008) Specific siRNA targeting the receptor for advanced glycation end products inhibits experimental hepatic fibrosis in rats. Int J Mol Sci 9(4):638–661

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hasegawa S, Harada K, Morokoshi Y, Tsukamoto S, Furukawa T, Saga T (2013) Growth retardation and hair loss in transgenic mice overexpressing human H-ferritin gene. Transgenic Res 22(3):651–658. https://doi.org/10.1007/s11248-012-9669-0

    Article  CAS  PubMed  Google Scholar 

  31. Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. Paper presented at the 6th conference on empirical methods in natural language processing, Pittsburgh, USA

    Google Scholar 

  32. Schwartz AS, Hearst MA (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput:451–462

    Google Scholar 

  33. Raja K, Patrick M, Elder JT, Tsoi LC (2017) Machine learning workflow to enhance predictions of adverse drug reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases. Sci Rep 7(1):3690. https://doi.org/10.1038/s41598-017-03914-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Settles B (2005) ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14):3191–3192. https://doi.org/10.1093/bioinformatics/bti475

    Article  CAS  PubMed  Google Scholar 

  35. Tsuruoka Y, Tateishi Y, Kim J-D, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Bozanis P, Houstis EN (eds) Advances in informatics. Springer, Berlin, Heidelberg, pp 382–392

    Chapter  Google Scholar 

  36. Mika S, Rost B (2004) NLProt: extracting protein names and sequences from papers. Nucleic Acids Res 32(Web Server issue):W634–W637. https://doi.org/10.1093/nar/gkh427

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Leaman R, Gonzalez G (2008) BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput:652–663

    Google Scholar 

  38. Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu HH, Torres R, Krauthammer M, Lau WW, Liu H, Hsu CN, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Genome Biol 9(Suppl 2):S3. https://doi.org/10.1186/gb-2008-9-s2-s3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. The human protein/gene name dictionary from NCBI. http://www.ncbi.nlm.nih.gov/gene

  40. The universal protein resource (UniProt) (2008) Nucleic acids research. 36(Database issue):D190–D195. https://doi.org/10.1093/nar/gkm895

  41. Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA (2017) Genenames.org: the HGNC and VGNC resources in 2017. Nucleic Acids Res 45(D1):D619–d625. https://doi.org/10.1093/nar/gkw1033

    Article  CAS  PubMed  Google Scholar 

  42. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science (New York, NY) 298(5600):1912–1934. https://doi.org/10.1126/science.1075762

    Article  CAS  Google Scholar 

  43. Milanesi L, Petrillo M, Sepe L, Boccia A, D’Agostino N, Passamano M, Di Nardo S, Tasco G, Casadio R, Paolella G (2005) Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity. BMC Bioinformatics 6(Suppl 4):S20. https://doi.org/10.1186/1471-2105-6-s4-s20

    Article  PubMed  PubMed Central  Google Scholar 

  44. Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, H-h L, Torres R, Krauthammer M, Lau WW, Liu H, Hsu C-N, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Genome Biol 9(Suppl 2):S3–S3. https://doi.org/10.1186/gb-2008-9-s2-s3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Koike A, Takagi T (2004) Gene/protein/family name recognition in biomedical literature. Paper presented at the HLT-NAACL 2004 workshop: biolink 2004, linking biological literature, ontologies and databases (BioLink 2004)

    Google Scholar 

  46. Henry VJ, Bandrowski AE, Pepin A-S, Gonzalez BJ, Desfeux A (2014) OMICtools: an informative directory for multi-omic data analysis. Database (Oxford) 2014:bau069. https://doi.org/10.1093/database/bau069

    Article  CAS  Google Scholar 

  47. Temkin JM, Gilder MR (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16):2046–2053

    Article  CAS  PubMed  Google Scholar 

  48. Ono T, Hishigaki H, Tanigami A, Takagi T (2001) Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2):155–161

    Article  CAS  PubMed  Google Scholar 

  49. Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, Kolas N, O’Donnell L, Leung G, McAdam R, Zhang F, Dolma S, Willems A, Coulombe-Huntington J, Chatr-Aryamontri A, Dolinski K, Tyers M (2018) The BioGRID interaction database: 2019 update. Nucleic Acids Res. https://doi.org/10.1093/nar/gky1079

  50. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 39(Database issue):D691–D697. https://doi.org/10.1093/nar/gkq1018

    Article  CAS  PubMed  Google Scholar 

  52. Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40(Database issue):D742–D753. https://doi.org/10.1093/nar/gkr1014

    Article  CAS  PubMed  Google Scholar 

  53. Goel R, Harsha HC, Pandey A, Prasad TS (2012) Human protein reference database and human proteinpedia as resources for phosphoproteome analysis. Mol BioSyst 8(2):453–463. https://doi.org/10.1039/c1mb05340j

    Article  CAS  PubMed  Google Scholar 

  54. Floyd BJ, Wilkerson EM, Veling MT, Minogue CE, Xia C, Beebe ET, Wrobel RL, Cho H, Kremer LS, Alston CL, Gromek KA, Dolan BK, Ulbrich A, Stefely JA, Bohl SL, Werner KM, Jochem A, Westphall MS, Rensvold JW, Taylor RW, Prokisch H, Kim JP, Coon JJ, Pagliarini DJ (2016) Mitochondrial protein interaction mapping identifies regulators of respiratory chain function. Mol Cell 63(4):621–632. https://doi.org/10.1016/j.molcel.2016.06.033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Weber TA, Koob S, Heide H, Wittig I, Head B, van der Bliek A, Brandt U, Mittelbronn M, Reichert AS (2013) APOOL is a cardiolipin-binding constituent of the Mitofilin/MINOS protein complex determining cristae morphology in mammalian mitochondria. PLoS One 8(5):e63683. https://doi.org/10.1371/journal.pone.0063683

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Anand R, Strecker V, Urbach J, Wittig I, Reichert AS (2016) Mic13 is essential for formation of crista junctions in mammalian cells. PLoS One 11(8):e0160258. https://doi.org/10.1371/journal.pone.0160258

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Huynen MA, Muhlmeister M, Gotthardt K, Guerrero-Castillo S, Brandt U (2016) Evolution and structural organization of the mitochondrial contact site (MICOS) complex and the mitochondrial intermembrane space bridging (MIB) complex. Biochim Biophys Acta 1863(1):91–101. https://doi.org/10.1016/j.bbamcr.2015.10.009

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

K.R., F.K., J.S., J.T., and R.S. acknowledge funding from the Morgridge Institute for Research and a grant from Marv Conney. I.R. acknowledges the GeoDeepDive Infrastructure, funded by NSF ICER 1343760.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ron Stewart .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Raja, K. et al. (2020). Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol. In: Canzar, S., Ringeling, F. (eds) Protein-Protein Interaction Networks. Methods in Molecular Biology, vol 2074. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9873-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9873-9_2

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9872-2

  • Online ISBN: 978-1-4939-9873-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics