A Method of Extracting Sentences Containing Protein Function Information from Articles by Iterative Learning with Feature Update

Miyanishi, Kazunori; Ohkawa, Takenao

doi:10.1007/978-3-642-38342-7_8

A Method of Extracting Sentences Containing Protein Function Information from Articles by Iterative Learning with Feature Update

Kazunori Miyanishi²² &
Takenao Ohkawa²³

Conference paper

867 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7845))

Abstract

Proteins are important macromolecules in living systems and serve various functions in almost all biological processes. Protein function information is reported in many scientific articles. Extraction of the function information from the articles is useful for drug discovery, understanding of life phenomenon, and so on. However, it is infeasible to extract the function information manually from a number of articles. In this paper, we propose a method of extracting sentences containing protein function information by iterative learning with feature update. In this method, we use a classifier in order to distinguish the sentences containing the function information from the other sentences, and introduce a semi-automatic procedure, in which a new classifier is reconstructed based on the user’s feedback for the previous classified results. In the experiment with twelve articles as feedback data, it was confirmed that F-measure was improved by iterating learning without getting the negative effect of the feedback.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berg, J., Tymoczko, J., Stryer, L.: Biochemistry, 5th edn., vol. 423, pp. 436–437. WH Freeman and Company (2002)
Google Scholar
Wu, C.H., Yeh, L.S.L., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., et al.: The protein information resource. Nucleic Acids Research 31, 345–347 (2003)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., et al.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Research 31, 365–370 (2003)
Article Google Scholar
Tsai, R.T.H., Sung, C.L., Dai, H.J., Hung, H.C., Sung, T.Y., Hsu, W.L.: Nerbio: Using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 7(suppl. 5), S11 (2006)
Google Scholar
Sun, C., Guan, Y., Wang, X., Lin, L.: Biomedical Named Entities Recognition Using Conditional Random Fields Model. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 1279–1288. Springer, Heidelberg (2006)
Chapter Google Scholar
Lafferty, J., Pereira, F., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, ICML 2001 (2001)
Google Scholar
Seki, K., Mostafa, J.: An approach to protein name extraction using heuristics and a dictionary. In: The American Society for Information Science and Technology (ASIST) Annual Meeting, vol. 40, pp. 71–77 (2003)
Google Scholar
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Learning to extract proteins and their interactions from medline abstracts. In: Proceedings of the International Conference on Machine Learning 2003 Workshop on Machine Learning in Bioinformatics, pp. 46–53 (2003)
Google Scholar
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), pp. 328–334 (1999)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 577–583 (2000)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995)
Google Scholar
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Information extraction: Identifying protein names from biological papers. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Google Scholar
Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Article Google Scholar
Cooper, J.W., Kershenbaum, A.: Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information. BMC Bioinformatics 6, 143 (2005)
Article Google Scholar
Hao, Y., Zhu, X., Huang, M., Li, M.: Discovering patterns to extract protein-protein interactions from the literature: part ii. Bioinformatics 21(15), 3294–3300 (2005)
Article Google Scholar
Munna, M.A., Ohkawa, T.: A method to extract sentences with protein functional information from literature by iterative learning of the corpus. IPSJ Transactions on Bioinformatics 47(SIG 17(TBIO 1)), 22–30 (2006)
Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (NIPS 2000), vol. 13 (2001)
Google Scholar
Quilan, J.R.: Decision trees and multi-valued attributes. Machine Intelligence 11, 305–318 (1988)
Google Scholar
Quilan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Article Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21, 543–565 (1995)
Google Scholar
Numa, M., Kaneta, Y., Ohkawa, T.: Automatic classification of proper names in protein-related literatures using database retrieval on www. In: Proceedings of the Fifth International Conference on Computational Biology and Genome Informatics, CBGI 2003, pp. 903–906 (2003)
Google Scholar
Kaneta, Y., Munna, M.A., Ohkawa, T.: A method for extracting sentences related to protein interaction from literature using a structure database. In: Proceedings of the Second Workshop on Data Mining and Text Mining for Bioinformatics (in conjunction with ECML/PKDD 2004), pp. 18–25 (2004)
Google Scholar
Martin, P.D., Malkowski, M.G., Box, J., Esmon, C.T., Edwards, B.F.P.: New insights into the regulation of the blood clotting cascade derived from the x-ray crystal structure of bovine meizothrombin des f1 in complex with ppack. Structure 5, 1681–1693 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Technology, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657–8501, Japan
Kazunori Miyanishi
Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657–8501, Japan
Takenao Ohkawa

Authors

Kazunori Miyanishi
View author publications
You can also search for this author in PubMed Google Scholar
Takenao Ohkawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Biostatistics, TMHRI, Weill Cornell Medical College, ll,, Cornell University,, 6565 Fannin Street, Mary Gibbs Jones Ha, 77030, Houston, TX, USA
Leif E. Peterson
DIBRIS, University of Genova, Via Dodecaneso 35, 16146, Genova, Italy
Francesco Masulli
Sbarro Institute for Cancer Research and Moleculare Medicine, Center for Biotechnology, Temple University, 1900 N 12th Street, BioLife Science Bldg, 19122, Philadelphia, PA, USA
Giuseppe Russo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyanishi, K., Ohkawa, T. (2013). A Method of Extracting Sentences Containing Protein Function Information from Articles by Iterative Learning with Feature Update. In: Peterson, L.E., Masulli, F., Russo, G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2012. Lecture Notes in Computer Science(), vol 7845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38342-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-38342-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38341-0
Online ISBN: 978-3-642-38342-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics