Abstract
In this paper, we introduce a way to apply rough set data analysis to the problem of extracting protein-protein interaction sentences in biomedical literature. Our approach builds on decision rules of protein names, interaction words, and their mutual positions in sentences. In order to broaden the set of potential interaction words, we develop a morphological model which generates spelling and inflection variants of the interaction words. We evaluate the performance of the proposed method on a hand-tagged dataset of 1894 sentences and show a precision-recall break-even performance of 79,8% by using leave-one-out crossvalidation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bader, G., Donaldson, I., Wolting, C., Ouellette, B., Pawson, T., Hogue, C.: BIND – the biomolecular interaction network database. Nucleic Acids Research 29, 242–245 (2001)
Xenarios, I., Rice, D., Salwinski, L., Baron, M., Marcotte, E., Eisenberg, D.: DIP: The database of interacting proteins. Nucleic Acids Research 28, 289–291 (2000)
Temkin, J., Gilder, M.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19, 2046–2053 (2003)
Bunescu, R., Ge, R., Kate, R., Marcotte, E.M., Mooney, R., Ramani, A.K., Wong, Y.W.: Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine. Special Issue on Summarization and Information Extraction from Medical Documents (2004) (to appear)
Pawlak, Z.: Rough sets, decision algorithms and Bayes’ theorem. European Journal of Operational Research 136, 181–189 (2002)
Lydall, D., Weiner, T.: G2/M checkpoint genes of saccharomyces cerevisiae: further evidence for roles in DNA replication and/or repair. Molecular and General Genetic 256, 638–651 (1997)
Calderwood, D., Zent, R., Grant, R., Rees, D., Hynes, R., Ginsberg, M.: The talin head domain binds to integrin beta subunit cytoplasmic tails and regulates integrin activation. The Journal of Biological Chemistry 274, 28071–28074 (1999)
Koskenniemi, K.: Two-level model for morphological analysis. In: Bundy, A. (ed.) Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 8-12, pp. 683–685. William Kaufmann, Inc., San Francisco (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ginter, F., Pahikkala, T., Pyysalo, S., Boberg, J., Järvinen, J., Salakoski, T. (2004). Extracting Protein-Protein Interaction Sentences by Applying Rough Set Data Analysis. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds) Rough Sets and Current Trends in Computing. RSCTC 2004. Lecture Notes in Computer Science(), vol 3066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25929-9_99
Download citation
DOI: https://doi.org/10.1007/978-3-540-25929-9_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22117-3
Online ISBN: 978-3-540-25929-9
eBook Packages: Springer Book Archive