Extracting Protein-Protein Interactions in Biomedical Literature Using an Existing Syntactic Parser

Jang, Hyunchul; Lim, Jaesoo; Lim, Joon-Ho; Park, Soo-Jun; Park, Seon-Hee; Lee, Kyu-Chul

doi:10.1007/11683568_7

Hyunchul Jang²⁴,
Jaesoo Lim²⁴,
Joon-Ho Lim²⁴,
Soo-Jun Park²⁴,
Seon-Hee Park²⁴ &
…
Kyu-Chul Lee²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Included in the following conference series:

International Workshop on Knowledge Discovery in Life Science LIterature

470 Accesses
2 Citations

Abstract

We are developing an information extraction system for life science literature. We are currently focusing on PubMed abstracts and trying to extract named entities and their relationships, especially protein names and protein-protein interactions. We are adopting methods including natural language processing, machine learning, and text processing. But we are not developing a new tagging or parsing technique. Developing a new tagger or a new parser specialized in life science literature is a very complex job. And it is not easy to get a good result by tuning an existing parser or by training it without a sufficient corpus. These all are another research topics and we are trying to extract information, not to develop something to help the extracting job or else. In this paper, we introduce our method to use an existing full parser without training or tuning. After tagging sentences and extracting proteins, we make sentences simple by substituting some words like named entities, nouns into one word. Then parsing errors are reduced and parsing precision is increased by this sentence simplification. We parse the simplified sentences syntactically with an existing syntactic parser and extract protein-protein interactions from its results. We show the effects of sentence simplification and syntactic parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ono, T., Takagi, T.: Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature. Bioinformatics 17(2), 155–161 (2001)
Article Google Scholar
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In: Genome Informatics Workshop, pp. 62–71 (1998)
Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl. 1), 74–82 (2001)
Article Google Scholar
Ng, S., Wong, M.: Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. In: Genome Informatics Workshop, pp. 104–112 (1999)
Google Scholar
Thomas, J., Milward, D., Ouzounis, C., Pulman, S., Carroll, M.: Automatic Extraction of Protein Interactions from Scientific Abstracts. In: Proceedings of the 5th Pacific Symposium on Biocomputing, pp. 541–552 (2000)
Google Scholar
Wong, L.: PIES, a Protein Interaction Extraction System. In: Proceedings of the 6th Pacific Symposium on Biocomputing, pp. 520–531 (2001)
Google Scholar
Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J.: Event extraction from biomedical papers using a full parser. In: Proceedings of the 6th Pacific Symposium on Biocomputing (PSB 2001), pp. 408–419 (2001)
Google Scholar
Huang, M., Zhu, X., Hao, Y., Payan, D.G., Qu, K., Li, M.: Discovering Patterns to Extract Protein-Protein Interactions from Full Texts. Bioinformatics 20(18), 3604–3612 (2004)
Article Google Scholar
Hao, Y., Zhu, X., Huang, M., Li, M.: Discovering Patterns to Extract Protein-Protein Interactions from the Literature: Part II. Bioinformatics 21(15), 3294–3300 (2005)
Article Google Scholar
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and the Interactions. Journal of Artificial Intelligence in Medicine 33, 139–155 (2004)
Article Google Scholar
Ramani, A.K., Bunescu, R.C., Mooney, R.J., Marcotte, E.M.: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology 6(5), R40.1–11 (2005)
Article Google Scholar
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products based on Frequently Seen Verbs in Medline Abstracts. In: Genome Informatics Workshop, pp. 62–71 (1998)
Google Scholar
Park, J.C.: Using Combinatory Categorial Grammar to Extract Biomedical Information. IEEE Intelligent Systems, Special Issue on Intelligent Systems in Biology, 62–67 (2001)
Google Scholar
Temkin, J.M., Gilder, M.R.: Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar. Bioinformatics 19, 2046–2053 (2003)
Article Google Scholar
Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604–611 (2004)
Article Google Scholar
Kim, Jin-Dong, Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), i180-i182 (2003)
Google Scholar
Rinaldi, F., Schneider, G., Kaljurand, K., Dowdall, J., Andronis, C., Persidis, A., Konstanti, O.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, pp. 61–68 (2004)
Google Scholar
Allen, J.: Natural Language Understanding, 2nd edn. The Benjamin/Cummings Publishing Company, Inc. (1995)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19, 313–330 (1994)
Google Scholar
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (2002)
Google Scholar
Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 1553–1561 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Gajeong-Dong, Yuseong-Gu, Daejeon, 305-350, Republic of Korea
Hyunchul Jang, Jaesoo Lim, Joon-Ho Lim, Soo-Jun Park & Seon-Hee Park
Department of Computer Engineering, Chungnam National University, Gung-Dong, Yuseong-Gu, Daejeon, 305-764, Republic of Korea
Kyu-Chul Lee

Authors

Hyunchul Jang
View author publications
You can also search for this author in PubMed Google Scholar
Jaesoo Lim
View author publications
You can also search for this author in PubMed Google Scholar
Joon-Ho Lim
View author publications
You can also search for this author in PubMed Google Scholar
Soo-Jun Park
View author publications
You can also search for this author in PubMed Google Scholar
Seon-Hee Park
View author publications
You can also search for this author in PubMed Google Scholar
Kyu-Chul Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Brain Tumor Research Program, Children’s Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric G. Bremer
Computer Science Department, Knowledge Management in Bioinformatics, Humbold-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Jörg Hakenberg
iXmatch Inc., 5555 West 78th Street Suite E, 55439-2702, Minneapolis, MN, USA
Eui-Hong (Sam) Han
School of Biomedical Sciences, University of Ulster, Cromore Road,, BT52 1SA, Coleraine, Northern Ireland, UK
Daniel Berrar
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jang, H., Lim, J., Lim, JH., Park, SJ., Park, SH., Lee, KC. (2006). Extracting Protein-Protein Interactions in Biomedical Literature Using an Existing Syntactic Parser. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_7

Download citation

DOI: https://doi.org/10.1007/11683568_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics