Abstract
Semantic similarity measures (SSMs) are used to evaluate the similarity among terms of an ontology. Biological entities, e.g., gene products, are often annotated with terms extracted from existing ontologies. A common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, researchers have introduced the semantic similarity networks (SSNs), i.e., edge-weighted graphs where the nodes are concepts (e.g., proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyze SSNs may reveal clusters of functionally associated concepts. For instance, the application of these algorithms on networks built upon of proteins may find protein complexes. SSNs have a high number of arcs with low weight. The application of classical community detection algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus, we propose a novel preprocessing strategy to simplify SSNs implemented in an open-source tool: SSN-Analyzer. As proof of concept, we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
Similar content being viewed by others
References
Agapito G, Guzzi PH, Cannataro M (2013) Visualization of protein interaction networks: problems and solutions. BMC Bioinform 14(Suppl 1):S1
Ala U, Piro R, Grassi E, Damasco C, Silengo L, Oti M, Provero P, Cunto F (2008) Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput Biol 4(3):e1000,043. doi:10.1371/journal.pcbi.1000043
Alpert C, Kahng A, Yao S (1999) Spectral partitioning with multiple eigenvectors. Discret Appl Math 90(1–3):3–26. doi:10.1016/S0166-218X(98)00083-3
Bader G, Hogue C (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 27:1–27. http://www.biomedcentral.com/1471-2105/4/2
Bertolazzi P, Bock ME, Guerra C (2013) On the functional and structural characterization of hubs in protein-protein interaction networks. Biotechnol Adv 31(2):274–286. doi:10.1016/j.biotechadv.2012.12.002
Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76(18):3251–3254
Bolla M, Tusnády G (1994) Spectra and optimal partitions of weighted graphs. Discret Math 128(1):1–20
Brohée S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7:488. doi:10.1186/1471-2105-7-488
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl Acids Res 32(suppl\_1):D262–266. doi:10.1093/nar/gkh021
Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput Surv 43:1:1–1:36. doi:10.1145/1824795.1824796
Cannataro M, Guzzi PH, Sarica A (2013) Data mining and life sciences applications on the grid. Wiley Interdiscip Rev Data Min Knowl Discov 3(3):216–238
Chung F (1994) Spectral graph theory. In: Regional conference series in mathematics, vol 92. American Mathematical Society, Providence
Cvetković D, Simić SK (2010) Towards a spectral theory of graphs based on the signless laplacian, ii. Linear Algebra Appl 432(9):2257–2272
Ding C, He X, Zha H (2001) A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the seventh ACM international conference on knowledge discovery and data mining, 26–29 August 2001, San Francisco
Enright AJ, Van Dongen S, Ouzounis C (2002) An efficient algorithm for large-scale detection of protein families. Nucl Acids Res 30(7):1575–1584
Freeman T, Goldovsky L, Brosch M, van Dongen S, Maziere P, Grocock R, Freilich S, Thornton J, Enright A (2007) Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Comput Biol 3(10):e206. doi:10.1371/journal.pcbi.0030206
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes H, Stumpflen V (2006) Mpact: the mips protein interaction resource on yeast. Nucl Acids Res 34:D436–441. doi:10.1093/nar/gkj003
Guzzi PH, Mina M (2012) Investigating bias in semantic similarity measures for analysis of protein interactions. In: Proceedings of 1st international workshop on pattern recognition in proteomics, structural biology and bioinformatics (PR PS BB 2011), pp 71–80. doi:10.1393/ncc/i2012-11336-0
Guzzi P, Mina M, Guerra C, Cannataro M (2012) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 13(5):569–585. doi:10.1093/bib/bbr066. http://bib.oxfordjournals.org/content/early/2011/12/02/bib.bbr066.short
Harispe S, Sanchez D, Ranwez S, Janaqi S, Montmain J (2013) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inf. doi:10.1016/j.jbi.2013.11.006
Ji J, Zhang A, Liu C, Quan X, Liu Z (2013) Survey: functional module detection from protein-protein interaction networks. IEEE Trans Knowl Data Eng 99(PrePrints). doi:10.1109/TKDE.2012.225
King AD, Przulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics (Oxford, England) 20(17):3013–20. doi:10.1093/bioinformatics/bth351. http://www.ncbi.nlm.nih.gov/pubmed/15180928
Lee H, Hsu A, Sajdak J, Qin J, Pavlidis P (2004) Coexpression analysis of human genes across many microarray data sets. Genome Res 14:1085–1094. doi:10.1101/gr.1910904
Lin D (1998) An information-theoretic definition of similarity. Morgan Kaufmann, San Francisco, pp 296–304. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.1832
Ma X, Gao L (2012) Biological network analysis:insights into structure and functions. Brief Funct Genom 11(6):434–442. doi:10.1093/bfgp/els045
Merris R (1994) Laplacian matrices of graphs: a survey. Linear Algebra Appl 197:143–176
Mina M, Guzzi PH (2012) Alignmcl: comparative analysis of protein interaction networks through markov clustering. In: BIBM workshops. IEEE Computer Society Press, pp 174–181
Mina M, Guzzi PH (2014) Improving the robustness of local network alignment: design and extensive assessmentof a markov clustering-based approach. IEEE/ACM Trans Comput Biol Bioinform 11(3):561–572. doi:10.1109/TCBB.2014.2318707
Mohar B (1991) The Laplacian spectrum of graphs. In: Graph theory, combinatorics, and applications. Computers & Mathematics with Applications, vol 48. issue 5–6, pp 715–724. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.2577
Nassa G, Tarallo R, Ambrosino C, Bamundo A, Ferraro L, Paris O, Ravo M, Guzzi PH, Cannataro M, Baumann M, Nyman TA, Nola E, Weisz A (2011) A large set of estrogen receptor interacting proteins identified by tandem affinity purification in hormone-responsive human breast cancer cell nuclei. Proteomics 43:159–165. doi:10.1002/pmic.201000344
Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. BioData Min 1(1):11
Pesquita C, Faria D, Falcao A, Lord P, Couto FM (2009) Semantic similarity in biomedical ontologies. PLoS Comput Biol 5(7):e1000,443. doi:10.1371/journal.pcbi.1000443
Resnik P et al (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Rito T, Wang Z, Deane CM, Reinert G (2010) How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 26(18):i611–i617. doi:10.1093/bioinformatics/btq386. http://bioinformatics.oxfordjournals.org/content/26/18/i611.abstract
Acknowledgments
This work has been partially supported by the following research projects funded by MIUR: PRIN 2010–2011 2010NFEB9L_003; PON04a2_D “DICET-INMOTO-ORCHESTRA”; PON04a2_C Staywell SH 2.0.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guzzi, P.H., Milano, M., Veltri, P. et al. Using SSN-Analyzer for analysis of semantic similarity networks. Netw Model Anal Health Inform Bioinforma 4, 6 (2015). https://doi.org/10.1007/s13721-015-0077-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-015-0077-2