Abstract
Gene ontology (GO) is a comprehensive resource for the properties of gene products and their relationships. A similarity measure can be defined between two gene products by utilizing GO, and the corresponding similarity score can be treated as a likelihood to interact between them physically. However, GO is being updated regularly by the addition of new terms and removal/merging of obsolete terms. Therefore, the similarity score of interaction may differ from one instance of GO to another. In this paper, we systematically study the impact of the continuous evolution of GO on the performance of similarity measures for the task of scoring confidence of protein–protein interactions (PPIs). We find that the performance of a similarity measure gets affected due to the continuous evolution of GO. We further observe that the degree of robustness of a similarity measure is highly influenced by the particular setting we consider.





Similar content being viewed by others
References
Adhikari A, Singh S, Dutta, A, Dutta B. A novel information theoretic approach for finding semantic similarity in wordnet. In: TENCON 2015-2015 IEEE Region 10 Conference, 2015; pp. 1–6. IEEE.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat genet. 2000;25(1):25–9.
Azuaje F, Wang H, Bodenreider O. Ontology-driven similarity approaches to supporting gene functional assessment. In: Proceedings of the ISMB’2005 SIG Meeting on Bio-ontologies, 2005; p. 9–10.
Bandyopadhyay S, Mallick K. A new path based hybrid measure for gene ontology similarity. IEEE/ACM Trans Comput Biol Bioinform (TCBB). 2014;11(1):116–27.
Benabderrahmane S, Smail-Tabbone M, Poch O, Napoli A, Devignes MD. Intelligo: a new vector-based semantic similarity measure including annotation origin. BMC Bioinform. 2010;11(1):588.
Carey V, Redestig H. Roc: utilities for roc, with uarray focus. r package version 1.16. 0. 2008.
Cheng J, Cline M, Martin J, Finkelstein D, Awad T, Kulp D, Siani-Rose MA. A knowledge-based clustering algorithm driven by gene ontology. J Biopharm Stat. 2004;14(3):687–700.
Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ. Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol Cell Proteom. 2007;6(3):439–50.
Couto FM, Silva MJ, Coutinh, PM. Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors. In: Proceedings of the 14th ACM International Conference on Information and knowledge management, 2005; p. 343–344. ACM.
Couto FM, Silva MJ, Coutinho PM. Measuring semantic similarity between gene ontology terms. Data Knowl Eng. 2007;61(1):137–52.
del Pozo A, Pazos F, Valencia A. Defining functional distances over gene ontology. BMC Bioinform. 2008;9(1):50.
Guo X, Liu R, Shriver CD, Hu H, Liebman MN. Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics. 2006;22(8):967–73.
Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. J Biomed Inform. 2014;48:38–53.
Hu P, Bader G, Wigle DA, Emili A. Computational prediction of cancer-gene function. Nat Rev Cancer. 2007;7(1):23–34.
Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, Weissman JS, O’Shea EK. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–91.
Jain S, Bader GD. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinform. 2010;11(1):562.
Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of 10th International Conference on research in computational linguistics (ROCLING-97). 1997.
Lastra-Díaz JJ, García-Serrano A. A new family of information content models with an experimental survey on wordnet. Knowl-Based Syst. 2015;89:509–26.
Li B, Wang JZ, Feltus FA, Zhou J, Luo F. Effectively integrating information content and structural relationship to improve the go-based similarity measure between proteins. In: Proceedings of BIOCOMP-10, 2010; p. 166–172.
Lin D. An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on machine learning, vol. 98. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA; 1998. p. 296–304.
Liu L, Dai X, Du C, Wang H, Lu J. A new hybrid semantic similarity computation method based on gene ontology. In: Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on, 2014; p. 849–853. IEEE.
Lord P, Steven R, Brass A, Goble C. Semantic similarity measures as tools for exploring the gene ontology. In: Pacific Symposium on biocomputing, 2003; p. 601–612.
Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83.
Mazandu GK, Chimusa ER, Mulder NJ. Gene ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform. 2016;18(5):886–901.
Mazandu GK, Mulder NJ. A topology-based metric for measuring term similarity in the gene ontology. Adv Bioinform. 2012;2012:975783.
Mistry M, Pavlidis P. Gene ontology term overlap as a measure of gene functional similarity. BMC Bioinform. 2008;9(1):327.
Nagar A, Al-Mubaid H. A new path length measure based on go for gene similarity with evaluation using sgd pathways. In: Computer-based medical systems, 2008. CBMS’08. 21st IEEE International Symposium on, 2008; p. 590–595. IEEE.
Paul M, Anand A. A new family of similarity measures for scoring confidence of protein interactions using gene ontology. bioRxiv. 2018; p. 459107.
Paul M, Anand A. Impact of low-confidence interactions on computational identification of protein complexes. J Bioinform Comput Biol. 2020;18(4):2050025.
Paul M, Anand A, Pyne S. Impact of the continuous evolution of gene ontology on similarity measures. In: Deka B, Maji P, Mitra S, Bhattacharyya DK, Bora PK, Pal SK, editors. Pattern recognition and machine intelligence - 8th international conference, PReMI 2019, Tezpur, India, December 17–20, 2019, Proceedings, Part II. Lecture Notes in Computer Science, Vol. 11942. Springer; 2019. p. 122–129.
Pesquita C. Semantic similarity in the gene ontology. Methods Mol Biol. 2017;1446:161–73.
Pesquita C, Faria D, Bastos ., Ferreira AE, Falcão AO. Couto FM. Metrics for go based protein semantic similarity: a systematic evaluation. In: BMC bioinformatics, vol. 9. BioMed Central; 2008. , p. S4.
Pesquita C, Faria D, Falcao AO, Lord P, Couto FM. Semantic similarity in biomedical ontologies. PLoS Comput Biol. 2009;5(7):e1000443.
Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.
Razick S, Magklaras G, Donaldson IM. irefindex: a consolidated protein interaction database with provenance. BMC Bioinform. 2008;9(1):1.
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on artificial intelligence,. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA; 1995. p. 448–453.
Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005;23(8):951–9.
Sánchez D, Batet M. A new model to compute the information content of concepts from taxonomic knowledge. Int J Semant Web Inf Syst (IJSWIS). 2012;8(2):34–50.
Sánchez D, Batet M, Isern D. Ontology-based information content computation. Knowl-Based Syst. 2011;24(2):297–303.
Schlicker A, Domingues FS, Rahnenführer J, Lengauer T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinform. 2006;7(1):302.
Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in wordnet. In: ECAI, vol. 16, 2004; p. 1089.
Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martinez-Cruz LA, Corrales FJ, Rubio A. Correlation between gene expression and go semantic similarity. IEEE/ACM Trans Comput Biol Bioinform. 2005;2(4):330–8.
Sing T, Sander O, Beerenwinkel N, Lengauer T. Rocr: visualizing classifier performance in r. Bioinformatics. 2005;21(20):3940–1.
Song X, Li L, Srimani PK, Yu PS, Wang JZ. Measure the semantic similarity of go terms using aggregate information content. IEEE/ACM Trans Comput Biol Bioinform (TCBB). 2014;11(3):468–76.
Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of go terms. Bioinformatics. 2013;29(11):1424–32.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF. A new method to measure the semantic similarity of go terms. Bioinformatics. 2007;23(10):1274–81.
Wu H, Su Z, Mao F, Olman V, Xu Y. Prediction of functional modules based on comparative genome analysis and gene ontology application. Nucleic Acids Res. 2005;33(9):2822–37.
Wu X, Pang E, Lin K, Pei ZM. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge-and ic-based hybrid method. PLoS One. 2013;8(5):e66745.
Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. Dip: the database of interacting proteins. Nucleic Acids Res. 2000;28(1):289–91.
Xu T, Du L, Zhou Y. Evaluation of go-based functional similarity measures using S. cerevisiae protein interaction and expression profile data. BMC Bioinform. 2008;9(1):472.
Xu Y, Guo M, Shi W, Liu X, Wang C. A novel insight into gene ontology semantic similarity. Genomics. 2013;101(6):368–75.
Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. Gosemsim: an r package for measuring semantic similarity among go terms and gene products. Bioinformatics. 2010;26(7):976–8.
Yu H, Gao L, Tu K, Guo Z. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene. 2005;352:75–81.
Zhang C, Wei X, Omenn GS, Zhang Y. Structure and protein interaction-based gene ontology annotations reveal likely functions of uncharacterized proteins on human chromosome 17. J Proteome Res. 2018;17(12):4186–96.
Zhang SB, Lai JH. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. Gene. 2015;558(1):108–17.
Zhou Z, Wang Y, Gu J. A new model of information content for semantic similarity in wordnet. In: Future Generation Communication and Networking Symposia, 2008. FGCNS’08. Second International Conference on, vol. 3, 2008; p. 85–89. IEEE.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Computational Biology and Biomedical Informatics” guest edited by Dhruba Kr Bhattacharyya, Sushmita Mitra and Jugal Kr Kalita.
Rights and permissions
About this article
Cite this article
Paul, M., Anand, A. & Pyne, S. Impact of the Continuous Evolution of Gene Ontology on the Performance of Similarity Measures for Scoring Confidence of Protein Interactions. SN COMPUT. SCI. 1, 351 (2020). https://doi.org/10.1007/s42979-020-00350-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-020-00350-5