Abstract
We are developing an information extraction system for life science literature. We are currently focusing on PubMed abstracts and trying to extract named entities and their relationships, especially protein names and protein-protein interactions. We are adopting methods including natural language processing, machine learning, and text processing. But we are not developing a new tagging or parsing technique. Developing a new tagger or a new parser specialized in life science literature is a very complex job. And it is not easy to get a good result by tuning an existing parser or by training it without a sufficient corpus. These all are another research topics and we are trying to extract information, not to develop something to help the extracting job or else. In this paper, we introduce our method to use an existing full parser without training or tuning. After tagging sentences and extracting proteins, we make sentences simple by substituting some words like named entities, nouns into one word. Then parsing errors are reduced and parsing precision is increased by this sentence simplification. We parse the simplified sentences syntactically with an existing syntactic parser and extract protein-protein interactions from its results. We show the effects of sentence simplification and syntactic parsing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ono, T., Takagi, T.: Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature. Bioinformatics 17(2), 155–161 (2001)
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In: Genome Informatics Workshop, pp. 62–71 (1998)
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl. 1), 74–82 (2001)
Ng, S., Wong, M.: Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. In: Genome Informatics Workshop, pp. 104–112 (1999)
Thomas, J., Milward, D., Ouzounis, C., Pulman, S., Carroll, M.: Automatic Extraction of Protein Interactions from Scientific Abstracts. In: Proceedings of the 5th Pacific Symposium on Biocomputing, pp. 541–552 (2000)
Wong, L.: PIES, a Protein Interaction Extraction System. In: Proceedings of the 6th Pacific Symposium on Biocomputing, pp. 520–531 (2001)
Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J.: Event extraction from biomedical papers using a full parser. In: Proceedings of the 6th Pacific Symposium on Biocomputing (PSB 2001), pp. 408–419 (2001)
Huang, M., Zhu, X., Hao, Y., Payan, D.G., Qu, K., Li, M.: Discovering Patterns to Extract Protein-Protein Interactions from Full Texts. Bioinformatics 20(18), 3604–3612 (2004)
Hao, Y., Zhu, X., Huang, M., Li, M.: Discovering Patterns to Extract Protein-Protein Interactions from the Literature: Part II. Bioinformatics 21(15), 3294–3300 (2005)
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and the Interactions. Journal of Artificial Intelligence in Medicine 33, 139–155 (2004)
Ramani, A.K., Bunescu, R.C., Mooney, R.J., Marcotte, E.M.: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology 6(5), R40.1–11 (2005)
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products based on Frequently Seen Verbs in Medline Abstracts. In: Genome Informatics Workshop, pp. 62–71 (1998)
Park, J.C.: Using Combinatory Categorial Grammar to Extract Biomedical Information. IEEE Intelligent Systems, Special Issue on Intelligent Systems in Biology, 62–67 (2001)
Temkin, J.M., Gilder, M.R.: Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar. Bioinformatics 19, 2046–2053 (2003)
Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604–611 (2004)
Kim, Jin-Dong, Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), i180-i182 (2003)
Rinaldi, F., Schneider, G., Kaljurand, K., Dowdall, J., Andronis, C., Persidis, A., Konstanti, O.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, pp. 61–68 (2004)
Allen, J.: Natural Language Understanding, 2nd edn. The Benjamin/Cummings Publishing Company, Inc. (1995)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19, 313–330 (1994)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (2002)
Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 1553–1561 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jang, H., Lim, J., Lim, JH., Park, SJ., Park, SH., Lee, KC. (2006). Extracting Protein-Protein Interactions in Biomedical Literature Using an Existing Syntactic Parser. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_7
Download citation
DOI: https://doi.org/10.1007/11683568_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)