Abstract
Functional genomic annotation data banks, which store the associations between genes (or a gene products) and terms of controlled vocabularies describing their features, are paramount in computational biology. Despite their undeniable importance, these data sources cannot be considered neither complete nor totally accurate; in their curated updates often new annotations are added and some of their annotations are revised. In this scenario, computational methods that are able to quicken the curation process of such data banks are very important. To this end, the Latent Semantic Indexing (LSI) by Singular Value Decomposition, and its Semantically IMproved (SIM) variant, have shown to be able to predict novel functional annotations from a set of available ones. In this work, we propose a further improvement of those techniques, based on a preparatory weighting of the associations between genes (or a gene products) and functional annotation terms. We tested the effectiveness of our approach on nine Gene Ontology annotation datasets. The results demonstrated that this technique is able to improve novel annotation predictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: a survey. Technical report, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA (2006)
Draghici S., Done B., Purvesh K., Done A.: Semantic analysis of genome annotations using weighting schemes. In: Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 212–218 (2007)
Chicco, D., Tagliasacchi, M., Masseroli, M.: Biomolecular annotation prediction through information integration, In: Proceedings of CIBB 2011 - Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 1–8 (2011)
Canakoglu, A., Ghisalberti, G., Masseroli, M.: Integration of biomolecular interaction data in a genomic and proteomic data warehouse to support biomedical knowledge discovery. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 112–126. Springer, Heidelberg (2012)
Pessina, F., Masseroli, M., Canakoglu, A.: Visual composition of complex queries on an integrative genomic and proteomic data warehouse. Engineering 5(10B), 94–98 (2013)
Gene Ontology Consortium: Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001)
Masseroli, M., Tagliasacchi, M.: Web resources for gene list analysis in biomedicine. In: Lazakidou, A. (ed.) Web-based Applications in Health Care and Biomedicine. Annals of Information Systems Series, vol. 7, pp. 117–141. Springer, Heidelberg (2010)
Salton, G.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Masseroli, M., Tagliasacchi, M., Chicco, D.: Semantically improved genome-wide prediction of gene ontology annotations. In: Proceedings of the 11th IEEE International Conference on Intelligent Systems Design and Applications, pp. 1080–1085 (2011)
Drineas, P.: Clustering large graphs via the singular values decomposition: theoretical advances in data clustering. Mach. Learn. 56, 9–33 (2004). (guest editors: Nina Mishra and Rajeev Motwani)
Tanoue, J., Yoshikawa, M., Uemura, S.: The GeneAround GO viewer. Bioinformatics 18, 1705–1706 (2002)
Masseroli, M., Tagliasacchi, M.: Anomaly-free prediction of gene ontology annotations using Bayesian networks. In: 9th IEEE International Conference on Bioinformatics and Bioengineering, pp. 107–114 (2009)
Chen, J., Saad, Y.: Lanczos vector versus singular vectors for effective dimension reduction. Technical report, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA (2008)
Nuzzo, A., Mulas, F., Gabetta, M., Arbustini, E., Zupan, B., Larizza, C., Bellazzi, R.: Text Mining approaches for automated literature knowledge extraction and representation. Stud. Health Technol. Inf. 160(Pt 2), 954–958 (2010)
Ceri, S.: Chapter 1: search computing. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 3–10. Springer, Heidelberg (2010)
Chicco, D.: Integration of bioinformatics web services through the search computing technology. Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy (2012)
Masseroli, M., Ghisalberti, G., Ceri, S.: Bio-search computing: integration and global ranking of bioinformatics search results. J. Integr. Bioinf. 8(166), 1–9 (2011)
Acknowledgments
This research is part of the Search Computing project (2008–2013) funded by the European Research Council (ERC), IDEAS Advanced Grant. The authors would like to thank Luke Lloyd-Jones for the help in the revision of the English style of the text.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pinoli, P., Chicco, D., Masseroli, M. (2014). Weighting Scheme Methods for Enhanced Genomic Annotation Prediction. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-09042-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)