Abstract
Community detection has become an important research direction for data mining in complex networks. It aims to identify topological structures and discover patterns in complex networks, which presents an important problem of great significance. In this paper, we are interested in the detection of communities in the Protein-Protein or Gene-gene Interaction (PPI) networks. These networks represent a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based approach to detect communities having different sizes from PPI networks. For this purpose, we introduce three specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size and a specific mutation operator. In the computational tests carried out in this work, the introduced algorithm achieved excellent results to detect existing or even new communities from PPI networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The degree of a node is the number of edges incident to the node.
- 2.
References
Agrawal, R.: Bi-objective community detection (BOCD) in networks using genetic algorithm. In: Aluru, S., et al. (eds.) IC3 2011. CCIS, vol. 168, pp. 5–15. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22606-9_5
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Gene Ontol. Consortium. Nat. Genet. 25(1), 25–29 (2000). https://doi.org/10.1038/75556
Atay, Y., Koc, I., Babaoglu, I., Kodaz, H.: Community detection from biological and social networks: a comparative analysis of metaheuristic algorithms. Appl. Soft Comput. 50, 194–211 (2017). https://doi.org/10.1016/j.asoc.2016.11.025
Becker, K.G., White, S.L., Muller, J., Engel, J.: BBID: the biological biochemical image database. Bioinformatics 16(8), 745–746 (2000). https://doi.org/10.1093/bioinformatics/16.8.745
Ben M’barek, M., Borgi, A., Bedhiafi, W., Hmida, S.B.: Genetic algorithm for community detection in biological networks. Procedia Computer Science 126, 195–204 (2018)
Ben M’barek, M., Borgi, A., Hmida, S.B., Rukoz, M.: Genetic algorithm to detect different sizes’ communities from protein-protein interaction networks. In: Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 359–370. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007836703590370
Cai, Q., Ma, L., Gong, M., Tian, D.: A survey on network community detection based on evolutionary computation. Int. J. Bio-Inspired Comput. 8(2), 84–98 (2016). https://doi.org/10.1504/IJBIC.2016.076329
Camon, E., et al.: The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13(4), 662–672 (2003). https://doi.org/10.1101/gr.461403
Croft, D., et al.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39(Database issue), D691–697 (2011). https://doi.org/10.1093/nar/gkq1018
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Fortunato, S., Barthélemy, M.: Resolution limit in community detection. PNAS 104(1), 36–41 (2007). https://doi.org/10.1073/pnas.0605965104
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99(12), 7821–7826 (2002). https://doi.org/10.1073/pnas.122653799
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston (1989)
Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of Genetic Algorithms, pp. 69–93. Morgan Kaufmann (1991)
Green, M.L., Karp, P.D.: Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers. Nucleic Acids Res. 33(13), 4035–4039 (2005). https://doi.org/10.1093/nar/gki711. https://academic.oup.com/nar/article/33/13/4035/1094428d
Guo, X., Liu, R., Shriver, C.D., Hu, H., Liebman, M.N.: Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics 22(8), 967–973 (2006). https://doi.org/10.1093/bioinformatics/btl042
Hill, D.P., Smith, B., McAndrews-Hill, M.S., Blake, J.A.: Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics 9(5), S2 (2008). https://doi.org/10.1186/1471-2105-9-S5-S2
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv:cmp-lg/9709008, September 1997. arXiv: cmp-lg/9709008
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Lancichinetti, A., Fortunato, S., Kertesz, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015 (2009)
Li, Z., Zhang, S., Wang, R.S., Zhang, X.S., Chen, L.: Quantitative function for community detection. Phys. Rev. E 77(3), 036109 (2008)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann (1998)
Liu, X., Li, D., Wang, S., Tao, Z.: Effective algorithm for detecting community structure in complex networks based on GA and clustering. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4488, pp. 657–664. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72586-2_95
Mering, C.V., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., Snel, B.: STRING: a database of predicted functional associations between proteins. Nucl. Acids Res. 31(1), 258–261 (2003). https://doi.org/10.1093/nar/gkg034
National Human Genome Research Institute (NHGRI): Biological Pathways Fact Sheet (2015). https://www.genome.gov/27530687/Biological-Pathways-Fact-Sheet
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6) (2004). https://doi.org/10.1103/PhysRevE.69.066133, arXiv: cond-mat/0309508
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2) (2004). https://doi.org/10.1103/PhysRevE.69.026113, arXiv: cond-mat/0308217
Nibbe, R.K., Chowdhury, S.A., Koyutürk, M., Ewing, R., Chance, M.R.: Protein-protein interaction networks and subnetworks in the biology of disease. Wiley Interdiscip. Rev. Syst. Biol. Med. 3(3), 357–367 (2011)
Nishimura, D.: BioCarta. Biotech Softw. Internet Rep. 2(3), 117–120 (2001). https://doi.org/10.1089/152791601750294344
Pesquita, C., Faria, D., Falcão, A.O., Lord, P., Couto, F.M.: Semantic Similarity in Biomedical Ontologies. PLoS Comput. Biol. 5(7) (2009). https://doi.org/10.1371/journal.pcbi.1000443
Petrowski, A., Ben-Hamida, S.: Evolutionary Algorithms. Wiley, Hoboken, April 2017. google-Books-ID: fvRRCgAAQBAJ
Pizzuti, C.: Evolutionary computation for community detection in networks: a review. IEEE Trans. Evol. Comput. 22(3), 464–483 (2018). https://doi.org/10.1109/TEVC.2017.2737600
Pizzuti, C.: GA-Net: a genetic algorithm for community detection in social networks. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_107
Pizzuti, C.: A multi-objective genetic algorithm for community detection in networks. In: 2009 21st IEEE International Conference on Tools with Artificial Intelligence, pp. 379–386. IEEE (2009)
Pizzuti, C.: A multiobjective genetic algorithm to find communities in complex networks. IEEE Trans. Evol. Comput. 16(3), 418–430 (2011)
Pizzuti, C., Rombo, S.E.: Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods. Bioinformatics 30(10), 1343–1352 (2014). https://doi.org/10.1093/bioinformatics/btu034
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989). https://doi.org/10.1109/21.24528
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. PNAS 101(9), 2658–2663 (2004). https://doi.org/10.1073/pnas.0400054101
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. arXiv:1105.5444 [cs], May 2011. https://doi.org/10.1613/jair.514, arXiv: 1105.5444
Ruths, T., Ruths, D., Nakhleh, L.: GS2: an efficiently computable measure of GO-based similarity of gene sets. Bioinformatics 25(9), 1178–1184 (2009). https://doi.org/10.1093/bioinformatics/btp128
Schlicker, A., Domingues, F.S., Rahnenführer, J., Lengauer, T.: A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7, 302 (2006). https://doi.org/10.1186/1471-2105-7-302
Sherman, B.T., Huang, D.W., Tan, Q., Guo, Y., Bour, S., Liu, D., Stephens, R., Baseler, M.W., Lane, H.C., Lempicki, R.A.: DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 8, 426 (2007). https://doi.org/10.1186/1471-2105-8-426
Shi, C., Yu, P.S., Cai, Y., Yan, Z., Wu, B.: On selection of objective functions in multi-objective community detection. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2301–2304. ACM (2011)
Shi, C., Yu, P.S., Yan, Z., Huang, Y., Wang, B.: Comparison and selection of objective functions in multiobjective community detection. Comput. Intell. 30(3), 562–582 (2014)
Shi, C., Zhong, C., Yan, Z., Cai, Y., Wu, B.: A multi-objective approach for community detection in complex network. In: IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2010)
Snel, B., Lehmann, G., Bork, P., Huynen, M.A.: STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucl. Acids Res. 28(18), 3442–3444 (2000). https://doi.org/10.1093/nar/28.18.3442
Tasgin, M., Bingol, H.: Community Detection in Complex Networks using Genetic Algorithm. arXiv:cond-mat/0604419, April 2006. arXiv: cond-mat/0604419
Tasgin, M., Herdagdelen, A., Bingol, H.: Community Detection in Complex Networks Using Genetic Algorithms. arXiv:0711.0491 [physics], November 2007. arXiv: 0711.0491
Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10), 1274–1281 (2007). https://doi.org/10.1093/bioinformatics/btm087
Wilson, S.J., Wilkins, A.D., Lin, C.H., Lua, R.C., Lichtarge, O.: Discovery of functional and disease pathways by community detection in protein-protein interaction networks. In: Pacific Symposium on Biocomputing 2017, pp. 336–347. World Scientific (2017)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. ACL 1994, Association for Computational Linguistics, Stroudsburg, PA, USA (1994). https://doi.org/10.3115/981732.981751
Xu, B., Lin, H., Yang, Z.: Ontology integration to identify protein complex in protein interaction networks. Proteome Sci. 9(1), S7 (2011). https://doi.org/10.1186/1477-5956-9-S1-S7
Zhao, Y., Dong, J., Peng, T.: Ontology classification for semantic-web-based software engineering. IEEE Trans. Serv. Comput. 2(4), 303–317 (2009). https://doi.org/10.1109/TSC.2009.20
Acknowledgements
We would like to show our gratitude to Dr. Walid BEDHIAFI (Laboratoire de Génétique Immunologie et Pathologies Humaines, Université de Tunis El Manar) for assistance to comprehend the biological fields and for the interpretation of the results.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ben M’barek, M., Borgi, A., Ben Hmida, S., Rukoz, M. (2020). GA-PPI-Net: A Genetic Algorithm for Community Detection in Protein-Protein Interaction Networks. In: van Sinderen, M., Maciaszek, L. (eds) Software Technologies. ICSOFT 2019. Communications in Computer and Information Science, vol 1250. Springer, Cham. https://doi.org/10.1007/978-3-030-52991-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-52991-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52990-1
Online ISBN: 978-3-030-52991-8
eBook Packages: Computer ScienceComputer Science (R0)