Abstract
Detecting communities in large complex networks is important to understand their structure and to extract features useful for visualization or prediction of various phenomena like the diffusion of information or the dynamic of the network. A community is defined by a set of strongly interconnected nodes. An α-quasi-clique is a group of nodes where each member is connected to more than a proportion α of the other nodes. By construction, an α-quasi-clique has a density greater than α. The size of an α-quasi-clique is limited by the degree of its nodes. In complex networks whose degree distribution follows a power law, usually α-quasi-cliques are small sets of nodes for high values of α. In this paper, we present an efficient method for finding the maximal α-quasi-clique of a given node in the network. Therefore, the resulting communities of our method have two main characteristics: they are α-quasi-cliques (very dense for high α) and they are local to the given node. Detecting the local community of specific nodes is very important for applications dealing with huge networks, when iterating through all nodes would be impractical or when the network is not entirely known. The proposed method, called RANK-NUM-NEIGHS (RNN), is evaluated experimentally on real and computer-generated networks in terms of quality (community size), execution time and stability. We also provide an upper bound on the optimal solution.
Similar content being viewed by others
Notes
The density of links \(\delta \) of a graph G with |E| edges et |V| nodes is given by \(\frac{2|E|}{|V|(|V|-1)}\).
A complete clique is a set of node such as every two distinct nodes are connected to each other.
This Definition of an \(\alpha \)-quasi-clique is not unique. Most authors define an \(\alpha \)-quasi-clique as a set of nodes that have a density greater than \(\alpha \), see for instance (Abello et al. 2002). The Definition considered in this paper constitutes a relative relaxation of a complete clique as it depends on the size of the quasi-clique.
Notice that we used the word maximal instead of maximum. In graph theory a maximal clique is a clique which is not a proper subset of another clique whereas a maximum clique is a clique of the maximum cardinality in the graph. Since we aim to find \(\alpha \)-quasi-cliques containing a given node of interest we are looking for maximal \(\alpha \)-quasi-cliques instead of for the maximum \(\alpha \)-quasi-clique.
If \(\alpha <0.5\), Theorem 1 does not hold anymore, then the input of the Algorithm will not be limited to the second neighborhood, but the whole graph specially for alpha small.
The average degree is 20, the maximum degree 50, the exponent of the degree distribution is − 2 and that of the community size distribution is − 1. We chose three values of mixing parameter \(\lambda \), 0.10, 0.20 and 0.30. The results presented in this paper are those for \(\lambda =0.10\) to evaluate size, density and stability. The results have nearly the same behavior for the other values of mixing parameter.
The results obtained for other network sizes have nearly the same behavior.
References
Abello J, Resende MGC, Sudarsky S (2002) Massive quasi-clique detection. In: Proceedings of the 5th Latin American symposium on theoretical informatics, LATIN ’02. Springer, London, pp 598–612
Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election. In: Proceedings of the WWW-2005 workshop on the weblogging ecosystem. ACM New York, pp 36–43
Akoglu L, Mcglohon M, Faloutsos C (2009) Anomaly detection in large graphs. In: In CMU-CS-09-173 technical report
Asahiro Y, Hassin R, Iwama K (2002) Complexity of finding dense subgraphs. Discrete Appl Math 121(1–3):15–26. https://doi.org/10.1016/S0166-218X(01)00243-8
Bagrow JP (2008) Evaluating local community methods in networks. J Stat Mech 2008:05001
Bahmani B, Kumar R, Vassilvitskii S (2012) Densest subgraph in streaming and mapreduce. CoRR abs/1201.6567. http://arxiv.org/abs/1201.6567
Battiti R, Mascia F (2007) Reactive local search for maximum clique: a new implementation. Technical report DIT-07-018, Informatica e Telecomunicazioni, University of Trento, Trento, Italy
Battiti R, Protasi M (2001) Reactive local search for the maximum clique problem. Algorithmica 29(4):610
Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3):281–297
Blondel VD, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:P10008
Bomze IM, Budinich M, Pardalos PM, Pelillo M (1999) The maximum clique problem. In: Du D-Z, Pardalos PM (eds) Handbook of combinatorial optimization. Kluwer Academic Publishers, Dordrecht, pp 1–74
Brunato M, Hoos HH, Battiti R (2007) On effectively finding maximal quasi-cliques in graphs. In: Maniezzo V, Battiti R, Watson JP (eds) LION, vol 5313. Lecture Notes in Computer Science. Springer, Berlin, pp 41–55
Campigotto R, Conde-Céspedes P, Guillaume J (2014) A generalized and adaptive method for community detection. CoRR abs/1406.2518 http://arxiv.org/abs/1406.2518
Chen J, Saad Y (2012) Dense subgraph extraction with application to community detection. IEEE Trans Know Data Eng 24(7):1216–1230
Chen J, Zaiane OR, Goebel R (2009) Local communities identification in social networks. In: ASONAM, pp 237–242
Clauset A (2005) Finding local community structure in networks. Phys Rev 72:026132
Conde-Céspedes P, Marcotorchino J, Viennet E (2015) Comparison of linear modularization criteria using the relational formalism, an approach to easily identify resolution limit. Revue des Nouvelles Technologies de l’Information Extraction et Gestion des Connaissances, RNTI-E-28, pp 203–214
Conde-Céspedes P, Marcotorchino JF, Viennet E (2017) Comparison of linear modularization criteria using the relational formalism, an approach to easily identify resolution limit. In: Guillet F, Pinaud B, Venturini G (eds) Advances in knowledge discovery and management (AKDM-6). Springer, Cham, pp 101–120
Conde-Céspedes P, Ngonmang B, Viennet E(2015) Approximation of the maximal \(\alpha \)-consensus local community detection problem in complex networks. In: IEEE SITIS 2015, complex networks and their applications. Bangkok, Thailand
Condorcet CAMd (1785) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. J Math Sociol 1(1): 113–120
Cui W, Xiao Y, Wang H, Wang W (2014) Local search of communities in large graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 991–1002
Dang TA, Viennet E (2012) Community detection based on structural and attribute similarities. In: International conference on digital society (ICDS), pp 7–14
Dang TA, Viennet E (2013) Collaborative filtering in social networks: a community-based approach. In: IEEE ComManTel 2013, international conference on computing, management and telecommunications
Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174
Fortunato S, Barthelemy M (2006) Resolution limit in community detection. In: Proceedings of the National Academy of Sciences of the United States of America
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci U. S. A. 99(12):7821–7826
Harary F, Ross IC (1957) A procedure for clique detection using the group matrix. Sociometry 20:205–215
Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations, the IBM research symposia series. Plenum Press, New York, pp 85–103
Komusiewicz C (2016) Multivariate algorithmics for finding cohesive subnetworks. Algorithms 9(1):21
Krebs V (2004) Books about US politics http://www.orgnet.com/
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Lee VE, Ruan N, Jin R, Aggarwal CC (2010) A survey of algorithms for dense subgraph discovery. In: Aggarwal CC, Wang H (eds) Managing and mining graph data, advances in database systems, vol 40. Springer, Berlin, pp 303–336
Liang R, Hua J, Wang X (2012) Vcdanetwork visualization tool based on community detection. In: 2012 12th international conference on control, automation and systems (ICCAS), pp 1221–1226
Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, vol 5212. Lecture notes in computer science. Springer, Berlin, pp 33–49
Luo F, Wang JZ, Promislow E (2006) Exploring local community structure in large networks. In: WI’06., pp 233–239
Marcotorchino F, Michaud P (1979) Optimisation en analyse ordinale des données. Masson, Paris
Matsuda H, Ishihara T, Hashimoto A (1999) Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor Comput Sci 210(2):305–325
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Ngonmang B, Tchuente M, Viennet E (2012) Local communities identification in social networks. Parallel Process Lett. https://doi.org/10.1142/S012962641240004X
Ngonmang B, Viennet E, Tchuente M(2012) Churn prediction in a real online social network using local community analysis. In: International conference on advances in social networks analysis and mining, In: ASONAM 2012, Istanbul, Turkey, 26–29 August 2012, pp 282–288
Owsiński J, Zadrożny S (1986) Clustering for ordinal data: a linear programming formulation. Control Cybern 15(2):183–193
Pattillo J, Veremyev A, Butenko S, Boginski V (2013) On the maximum quasi-clique problem. Discret Appl Math 161:244–257
Pattillo J, Youssef N, Butenko S (2013) On clique relaxation models in network analysis. Eur J Oper Res 226(1):9–18
Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, pp 228–238
Pullan WJ, Hoos HH (2006) Dynamic local search for the maximum clique problem. J Artif Intell Res (JAIR) 25:159–185
Tanay A, Sharan R, Shamir R (202) Discovering statistically significant biclusters in gene expression data. In: Proceedings of ISMB 2002, pp 136–144
Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, New York, pp 104–112
Wu Q, Hao JK (2015) A review on algorithms for maximum clique problems. Eur J Oper Res 242(3):693–709
Yang J, Leskovec J (2014) Overlapping communities explain core-periphery organization of networks. Technical report, Stanford University . http://ilpubs.stanford.edu:8090/1103/
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zahn C (1964) Approximating symmetric relations by equivalence relations. SIAM J Appl Math 12:840–847
Zhang Y, Lin H, Yang Z, Wang J (2016) Construction of dynamic probabilistic protein interaction networks for protein complex identification. BMC Bioinform. https://doi.org/10.1186/s12859-016-1054-1
Acknowledgements
This work is supported by REQUEST project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Conde-Cespedes, P., Ngonmang, B. & Viennet, E. An efficient method for mining the maximal α-quasi-clique-community of a given node in complex networks. Soc. Netw. Anal. Min. 8, 20 (2018). https://doi.org/10.1007/s13278-018-0497-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0497-y