Abstract
In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass simpler approaches based on string-matching. In this paper, we investigate the use of semantic kernels as a way to address the task’s inherent data scarcity and we propose a simple yet effective solution to deal with class imbalance. From experiments on the TREC Genomics Track data, our approach achieves better F 1-score than two state-of-the-art approaches based on string-matching and cross-species information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baumgartner, W.A., Cohen, J., Fox, K.B., Acquaah-Mensah, L.M., Hunter, G., Manual, L.: curation is not sufficient for annotation of genomic databases. Bioinformatics 23(13), i41–i48 (2007)
Blaschke, C., Leon, E., Krallinger, M., Valencia, A.: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(suppl. 1), S16 (2005)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Hersh, W., Bhuptiraju, R.T., Ross, L., Cohen, A.M., Kraemer, D.F.: TREC 2004 genomics track overview. In: Proceedings of the 13th Text Retrieval Conference, TREC (2004)
Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 914–920 (1999)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI 1999, Stockholm (1999)
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 408–415. ACM, New York (2008)
Jaakkola, T., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1998)
Osuna, E.E., Freund, R., Girosi, F.: Support vector machines: Training and applications. Tech. rep., Massachusetts Institute of Technology (1997)
Ray, S., Craven, M.: Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6(suppl. 1), S18 (2005)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
Seki, K., Kino, Y., Uehara, K.: Gene functional annotation with dynamic hierarchical classification guided by orthologs. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 425–432. Springer, Heidelberg (2009)
Seki, K., Mostafa, J.: Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44(5), 1754–1770 (2008)
Si, L., Yu, D., Kihara, D., Fang, Y.: Combining gene sequence similarity and textual information for gene function annotation in the literature. Information Retrieval 11, 389–404 (2008)
Stoica, E., Hearst, M.: Predicting gene functions from text using a cross-species approach. In: Proc. of Pacific Biocomputing Symposium, vol. 11, pp. 88–99 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blondel, M., Seki, K., Uehara, K. (2011). Application of Semantic Kernels to Literature-Based Gene Function Annotation. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-24477-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)