Abstract
A key activity for life scientists is the exploration of the relatedness of a set of genes in order to differentiate genes performing coherently related functions from random grouped genes. This paper considers exploring the relatedness within two popular bio-organizations, namely gene families and pathways. This exploration is carried out by integrating different resources (ontologies, texts, expert classifications) and aims to suggest patterns that facilitate the biologists in obtaining a more comprehensive vision of differences in gene behaviour. Our approach is based on the annotation of a specialized corpus of texts (the gene summaries) that condense the description of functions/processes in which genes are involved. By annotating these summaries with different ontologies a set of descriptor terms is derived and compared in order to obtain a measure of relatedness within the bio-organizations we considered. Finally, the most important annotations within each family are extracted using a text categorization method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Richards, A.J., Muller, B., Shotwell, M., Cowart, L.A., Rohrer, B., Lu, X.: Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph. Bioinformatics 26(12), i79–i87 (2010)
Oliver, S.: Guilt-by-association goes global. Nature 403, 601–603 (2000)
Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings in Bioinformatics 13(5), 569–585 (2012)
Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)
Patwardhan, S.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL, pp. 1–8 (2006)
Kandula, S., Zeng-Treitler, Q.: Exploring relations among semantic groups: a comparison of concept co-occurrence in biomedical sources. Stud. Health Technol. Inform. 160, 995–999 (2010)
Jang, H., Lim, J., Lim, J.H., Park, S.J., Lee, K.C., Park, S.H.: Finding the evidence for protein-protein interactions from PubMed abstracts. Bioinformatics 22, e220–e226 (2006)
Kang, N., Van Mulligen, E.M., Kors, J.A.: Comparing and combining chunkers of biomedical text. J. Biomed. Inform. 44, 354–360 (2011)
Gray, K.A., Daugherty, L.C., Gordon, S.M., Seal, R.L., Wright, M.W., Bruford, E.A.: Genenames.org: the HGNC resources in 2013. Nucleic Acids Res. 41(database issue), D545–D5452 (2013)
Jonquet, C., Shah, N.H., Musen, M.A.: The open biomedical annotator. Summit on Translat Bioinforma 2009, 56–60 (2009)
http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/
Whetzel, P.L.: NCBO Team: NCBO Technology: Powering semantically aware applications. J. Biomed. Semantics 4(suppl. 1), S8 (2013)
Dice, L.: Measures of the Amount of Ecologic Association Between Species. Ecology 26, 297–302 (1945)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Cannas, L.M., Dessì, N., Dessì, S.: A Model for term selection in text categorization problems. In: DEXA Workshops 2012, pp. 169–173 (2012)
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley (1989)
Cannas, L.M., Dessì, N., Pes, B.: Tuning evolutionary algorithms in high dimensional classification problems (extended abstract). In: Proceedings of the 18th Italian Symposium on Advanced Database Systems (SEBD 2010), pp. 142–149 (2010)
Cannas, L.M., Dessì, N., Pes, B.: A filter-based evolutionary approach for selecting features in high-dimensional micro-array data. In: Shi, Z., Vadera, S., Aamodt, A., Leake, D. (eds.) IIP 2010. IFIP, vol. 340, pp. 297–307. Springer, Heidelberg (2010)
Mccallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on ‘Learning for Text Categorization’ (1998)
Bouckaert, R.R., Frank, E., Hall, M.A., et al.: WEKA - Experiences with a Java Open-Source Project. Journal of Machine Learning Research 11, 2533–2541 (2010)
Mistry, M., Pavlidis, P.: Gene Ontology term overlap as a measure of gene functional similarity. BMC Bionformatics 9, 237 (2008)
Popescu, M., Keller, J.M., Mitchell, J.A.: Fuzzy Measures on the Gene Ontology for Gene Products Similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(3) (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Dessì, N., Dessì, S., Pascariello, E., Pes, B. (2015). Exploring the Relatedness of Gene Sets. In: DI Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2014. Lecture Notes in Computer Science(), vol 8623. Springer, Cham. https://doi.org/10.1007/978-3-319-24462-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-24462-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24461-7
Online ISBN: 978-3-319-24462-4
eBook Packages: Computer ScienceComputer Science (R0)