Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol

Raja, Kalpana; Natarajan, Jeyakumar; Kuusisto, Finn; Steill, John; Ross, Ian; Thomson, James; Stewart, Ron

doi:10.1007/978-1-4939-9873-9_2

Kalpana Raja^4,5,
Jeyakumar Natarajan⁵,
Finn Kuusisto⁴,
John Steill⁴,
Ian Ross⁶,
James Thomson^4,7 &
…
Ron Stewart⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2074))

1769 Accesses
8 Citations

Abstract

Proteins perform their functions by interacting with other proteins. Protein–protein interaction (PPI) is critical for understanding the functions of individual proteins, the mechanisms of biological processes, and the disease mechanisms. High-throughput experiments accumulated a huge number of PPIs in PubMed articles, and their extraction is possible only through automated approaches. The standard text-mining protocol includes four major tasks, namely, recognizing protein mentions, normalizing protein names and aliases to unique identifiers such as gene symbol, extracting PPIs, and visualizing the PPI network using Cytoscape or other visualization tools. Each task is challenging and has been revised over several years to improve the performance. We present a protocol based on our hybrid approaches and show the possibility of presenting each task as an independent web-based tool, NAGGNER for protein name recognition, ProNormz for protein name normalization, PPInterFinder for PPI extraction, and HPIminer for PPI network visualization. The protocol is specific to human but can be generalized to other organisms. We include KinderMiner, our most recent text-mining tool that predicts PPIs by retrieving significant co-occurring protein pairs. The algorithm is simple, easy to implement, and generalizable to other biological challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Raja K, Patrick M, Gao Y, Madu D, Yang Y, Tsoi LC (2017) A review of recent advancement in integrating Omics data with literature mining towards biomedical discoveries. Int J Genomics 2017:10. https://doi.org/10.1155/2017/6213474
Article CAS Google Scholar
Raja K, Subramani S, Natarajan J (2013) PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (Oxford) 2013:bas052. https://doi.org/10.1093/database/bas052
Article CAS Google Scholar
Subramani S, Kalpana R, Monickaraj PM, Natarajan J (2015) HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. J Biomed Inform 54:121–131. https://doi.org/10.1016/j.jbi.2015.01.006
Article PubMed Google Scholar
Kuusisto F, Steill J, Kuang Z, Thomson J, Page D, Stewart R (2017) A simple text mining approach for ranking pairwise associations in biomedical applications. AMIA Jt Summits Transl Sci Proc 2017:166–174
PubMed PubMed Central Google Scholar
Liu Y, Liang Y, Wishart D (2015) PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 43(W1):W535–W542. https://doi.org/10.1093/nar/gkv383
Article CAS PubMed PubMed Central Google Scholar
Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M (2004) Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18):3604–3612. https://doi.org/10.1093/bioinformatics/bth451
Article CAS PubMed Google Scholar
Chowdhary R, Zhang J, Liu JS (2009) Bayesian inference of protein-protein interactions from biological literature. Bioinformatics 25(12):1536–1542. https://doi.org/10.1093/bioinformatics/btp245
Article CAS PubMed PubMed Central Google Scholar
Bui QC, Katrenko S, Sloot PM (2011) A hybrid approach to extract protein-protein interactions. Bioinformatics 27(2):259–265. https://doi.org/10.1093/bioinformatics/btq620
Article CAS PubMed Google Scholar
Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28. https://doi.org/10.1038/88213
Article CAS PubMed Google Scholar
Ananiadou S, Mcnaught J (2005) Text mining for biology and biomedicine. Artech House, Inc., Boston
Google Scholar
Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10:233. https://doi.org/10.1186/1471-2105-10-233
Article CAS PubMed PubMed Central Google Scholar
Raja K, Subramani S, Natarajan J (2014) A hybrid named entity tagger for tagging human proteins/genes. Int J Data Min Bioinform 10(3):315–328
Article PubMed Google Scholar
Krallinger M, Valencia A (2005) Text-mining and information-retrieval services for molecular biology. Genome Biol 6(7):224. https://doi.org/10.1186/gb-2005-6-7-224
Article CAS PubMed PubMed Central Google Scholar
Blaschke C, Andrade MA, Ouzounis C, Valencia A (1999) Automatic extraction of biological information from scientific text: protein-protein interactions. In: Proceedings international conference on intelligent systems for molecular biology, pp 60–67
Google Scholar
Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN (1999) MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27(6):1210–1214. 1216-1217
Article CAS PubMed Google Scholar
Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321. https://doi.org/10.1038/nbt1018
Article CAS PubMed Google Scholar
Gao M, Skolnick J (2010) Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A 107(52):22517–22522. https://doi.org/10.1073/pnas.1012820107
Article PubMed PubMed Central Google Scholar
Zhou D, He Y (2008) Extracting interactions between proteins from the literature. J Biomed Inform 41(2):393–407. https://doi.org/10.1016/j.jbi.2007.11.008
Article CAS PubMed Google Scholar
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, Chen C, del-Toro N, Duesbury M, Dumousseau M, Galeota E, Hinz U, Iannuccelli M, Jagannathan S, Jimenez R, Khadake J, Lagreid A, Licata L, Lovering RC, Meldal B, Melidoni AN, Milagros M, Peluso D, Perfetto L, Porras P, Raghunath A, Ricard-Blum S, Roechert B, Stutz A, Tognolli M, van Roey K, Cesareni G, Hermjakob H (2014) The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42(Database issue):D358–D363. https://doi.org/10.1093/nar/gkt1115
Article CAS PubMed Google Scholar
Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G (2010) MINT, the molecular interaction database: 2009 update. Nucleic Acids Res 38(Database issue):D532–D539. https://doi.org/10.1093/nar/gkp983
Article CAS PubMed Google Scholar
Bader GD, Betel D, Hogue CW (2003) BIND: the biomolecular interaction network database. Nucleic Acids Res 31(1):248–250
Article CAS PubMed PubMed Central Google Scholar
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32(Database issue):D449–D451. https://doi.org/10.1093/nar/gkh086
Article PubMed PubMed Central Google Scholar
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 45(D1):D362–d368. https://doi.org/10.1093/nar/gkw937
Article CAS PubMed Google Scholar
Subramani S, Raja K, Natarajan J (2014) ProNormz--an integrated approach for human proteins and protein kinases normalization. J Biomed Inform 47:131–138. https://doi.org/10.1016/j.jbi.2013.10.003
Article PubMed Google Scholar
Levy R, Andrew G (2006) Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the fifth international conference on language resources and evaluation. doi:citeulike-article-id:3441831
Google Scholar
Raja K, Natarajan J (2018) Mining protein phosphorylation information from biomedical literature using NLP parsing and support vector machines. Comput Methods Prog Biomed 160:57–64. https://doi.org/10.1016/j.cmpb.2018.03.022
Article Google Scholar
Mukherjea S, Subramaniam LV, Chanda G, Sankararaman S, Kothari R, Batra VS, Bhardwaj DN, Srivastava B (2004) Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM J Res Dev 48:693–702
Article Google Scholar
Erhardt RA, Schneider R, Blaschke C (2006) Status of text-mining techniques applied to biomedical text. Drug Discov Today 11(7–8):315–325. https://doi.org/10.1016/j.drudis.2006.02.011
Article CAS PubMed Google Scholar
Xia JR, Liu NF, Zhu NX (2008) Specific siRNA targeting the receptor for advanced glycation end products inhibits experimental hepatic fibrosis in rats. Int J Mol Sci 9(4):638–661
Article CAS PubMed PubMed Central Google Scholar
Hasegawa S, Harada K, Morokoshi Y, Tsukamoto S, Furukawa T, Saga T (2013) Growth retardation and hair loss in transgenic mice overexpressing human H-ferritin gene. Transgenic Res 22(3):651–658. https://doi.org/10.1007/s11248-012-9669-0
Article CAS PubMed Google Scholar
Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. Paper presented at the 6th conference on empirical methods in natural language processing, Pittsburgh, USA
Google Scholar
Schwartz AS, Hearst MA (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput:451–462
Google Scholar
Raja K, Patrick M, Elder JT, Tsoi LC (2017) Machine learning workflow to enhance predictions of adverse drug reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases. Sci Rep 7(1):3690. https://doi.org/10.1038/s41598-017-03914-3
Article CAS PubMed PubMed Central Google Scholar
Settles B (2005) ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14):3191–3192. https://doi.org/10.1093/bioinformatics/bti475
Article CAS PubMed Google Scholar
Tsuruoka Y, Tateishi Y, Kim J-D, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Bozanis P, Houstis EN (eds) Advances in informatics. Springer, Berlin, Heidelberg, pp 382–392
Chapter Google Scholar
Mika S, Rost B (2004) NLProt: extracting protein names and sequences from papers. Nucleic Acids Res 32(Web Server issue):W634–W637. https://doi.org/10.1093/nar/gkh427
Article CAS PubMed PubMed Central Google Scholar
Leaman R, Gonzalez G (2008) BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput:652–663
Google Scholar
Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu HH, Torres R, Krauthammer M, Lau WW, Liu H, Hsu CN, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Genome Biol 9(Suppl 2):S3. https://doi.org/10.1186/gb-2008-9-s2-s3
Article CAS PubMed PubMed Central Google Scholar
The human protein/gene name dictionary from NCBI. http://www.ncbi.nlm.nih.gov/gene
The universal protein resource (UniProt) (2008) Nucleic acids research. 36(Database issue):D190–D195. https://doi.org/10.1093/nar/gkm895
Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA (2017) Genenames.org: the HGNC and VGNC resources in 2017. Nucleic Acids Res 45(D1):D619–d625. https://doi.org/10.1093/nar/gkw1033
Article CAS PubMed Google Scholar
Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science (New York, NY) 298(5600):1912–1934. https://doi.org/10.1126/science.1075762
Article CAS Google Scholar
Milanesi L, Petrillo M, Sepe L, Boccia A, D’Agostino N, Passamano M, Di Nardo S, Tasco G, Casadio R, Paolella G (2005) Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity. BMC Bioinformatics 6(Suppl 4):S20. https://doi.org/10.1186/1471-2105-6-s4-s20
Article PubMed PubMed Central Google Scholar
Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, H-h L, Torres R, Krauthammer M, Lau WW, Liu H, Hsu C-N, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Genome Biol 9(Suppl 2):S3–S3. https://doi.org/10.1186/gb-2008-9-s2-s3
Article CAS PubMed PubMed Central Google Scholar
Koike A, Takagi T (2004) Gene/protein/family name recognition in biomedical literature. Paper presented at the HLT-NAACL 2004 workshop: biolink 2004, linking biological literature, ontologies and databases (BioLink 2004)
Google Scholar
Henry VJ, Bandrowski AE, Pepin A-S, Gonzalez BJ, Desfeux A (2014) OMICtools: an informative directory for multi-omic data analysis. Database (Oxford) 2014:bau069. https://doi.org/10.1093/database/bau069
Article CAS Google Scholar
Temkin JM, Gilder MR (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16):2046–2053
Article CAS PubMed Google Scholar
Ono T, Hishigaki H, Tanigami A, Takagi T (2001) Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2):155–161
Article CAS PubMed Google Scholar
Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, Kolas N, O’Donnell L, Leung G, McAdam R, Zhang F, Dolma S, Willems A, Coulombe-Huntington J, Chatr-Aryamontri A, Dolinski K, Tyers M (2018) The BioGRID interaction database: 2019 update. Nucleic Acids Res. https://doi.org/10.1093/nar/gky1079
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30
Article CAS PubMed PubMed Central Google Scholar
Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 39(Database issue):D691–D697. https://doi.org/10.1093/nar/gkq1018
Article CAS PubMed Google Scholar
Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40(Database issue):D742–D753. https://doi.org/10.1093/nar/gkr1014
Article CAS PubMed Google Scholar
Goel R, Harsha HC, Pandey A, Prasad TS (2012) Human protein reference database and human proteinpedia as resources for phosphoproteome analysis. Mol BioSyst 8(2):453–463. https://doi.org/10.1039/c1mb05340j
Article CAS PubMed Google Scholar
Floyd BJ, Wilkerson EM, Veling MT, Minogue CE, Xia C, Beebe ET, Wrobel RL, Cho H, Kremer LS, Alston CL, Gromek KA, Dolan BK, Ulbrich A, Stefely JA, Bohl SL, Werner KM, Jochem A, Westphall MS, Rensvold JW, Taylor RW, Prokisch H, Kim JP, Coon JJ, Pagliarini DJ (2016) Mitochondrial protein interaction mapping identifies regulators of respiratory chain function. Mol Cell 63(4):621–632. https://doi.org/10.1016/j.molcel.2016.06.033
Article CAS PubMed PubMed Central Google Scholar
Weber TA, Koob S, Heide H, Wittig I, Head B, van der Bliek A, Brandt U, Mittelbronn M, Reichert AS (2013) APOOL is a cardiolipin-binding constituent of the Mitofilin/MINOS protein complex determining cristae morphology in mammalian mitochondria. PLoS One 8(5):e63683. https://doi.org/10.1371/journal.pone.0063683
Article CAS PubMed PubMed Central Google Scholar
Anand R, Strecker V, Urbach J, Wittig I, Reichert AS (2016) Mic13 is essential for formation of crista junctions in mammalian cells. PLoS One 11(8):e0160258. https://doi.org/10.1371/journal.pone.0160258
Article CAS PubMed PubMed Central Google Scholar
Huynen MA, Muhlmeister M, Gotthardt K, Guerrero-Castillo S, Brandt U (2016) Evolution and structural organization of the mitochondrial contact site (MICOS) complex and the mitochondrial intermembrane space bridging (MIB) complex. Biochim Biophys Acta 1863(1):91–101. https://doi.org/10.1016/j.bbamcr.2015.10.009
Article CAS PubMed Google Scholar

Download references

Acknowledgments

K.R., F.K., J.S., J.T., and R.S. acknowledge funding from the Morgridge Institute for Research and a grant from Marv Conney. I.R. acknowledges the GeoDeepDive Infrastructure, funded by NSF ICER 1343760.

Author information

Authors and Affiliations

Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
Kalpana Raja, Finn Kuusisto, John Steill, James Thomson & Ron Stewart
Data Mining and Text Mining Laboratory, Department of Bioinformatics, School of Life Sciences, Bharathiar University, Coimbatore, Tamil Nadu, India
Kalpana Raja & Jeyakumar Natarajan
Computer Sciences Department, Center for High Throughput Computing, University of Wisconsin, Madison, WI, USA
Ian Ross
Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI, USA
James Thomson

Authors

Kalpana Raja
View author publications
You can also search for this author in PubMed Google Scholar
Jeyakumar Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Finn Kuusisto
View author publications
You can also search for this author in PubMed Google Scholar
John Steill
View author publications
You can also search for this author in PubMed Google Scholar
Ian Ross
View author publications
You can also search for this author in PubMed Google Scholar
James Thomson
View author publications
You can also search for this author in PubMed Google Scholar
Ron Stewart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ron Stewart .

Editor information

Editors and Affiliations

Gene Center, Ludwig-Maximilians-Universität München, München, Germany
Stefan Canzar
Gene Center, Ludwig-Maximilians-Universität München, München, Germany
Francisca Rojas Ringeling

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Raja, K. et al. (2020). Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol. In: Canzar, S., Ringeling, F. (eds) Protein-Protein Interaction Networks. Methods in Molecular Biology, vol 2074. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9873-9_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-9873-9_2
Published: 04 October 2019
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9872-2
Online ISBN: 978-1-4939-9873-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics