Abstract
In the field of NLP, word embeddings have recently attracted a lot of attention. A textual corpus is represented as a sparse words co-occurrences matrix. Then, the matrix can be factorized, for example using SVD, which allows to obtain a shorter matrix with dense and continuous vectors. To help SVD, PMI measure is applied on the initial co-occurrence matrix, assigning a relevant weight to the co-occurrences by normalizing them using both the considered words frequencies. In this paper, we follow this idea to study if weighted networks can benefit from pre-processing that can help community detection. We first design a benchmark using LFR networks. Then, we consider PMI and another NLP inspired measure as a preprocessing of the links weights, and show that PMI worsens the results while the other one improves them. By distinguishing links inside communities and links between communities into two classes, we show that this is due to the weights distributions of these links. Links between communities are in average bigger, leading to bigger values of PMI. From this analysis, we design another set of experiments that show that it is possible to classify efficiently links into these two classes, using a small set of features. Finally, we introduce the Supervised Label Propagation (SLP) algorithm that takes into account the classification results during the propagation. This algorithm clearly improves the results, leading us to a major questioning: is community detection on weighted networks a fully unsupervised task? We conclude with our thoughts on this topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Experiments and graphs can be found at https://github.com/nicolasdugue/WeightedCommunityDetection/
References
Barthélemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A.: Characterization and modeling of weighted networks. Phys. A Stat. Mech. Appl. 346(1), 34–43 (2005)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10,008 (2008)
Bruna, J., Li, X.: Community detection with graph neural networks. arXiv:1705.08415 (2017)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
De Meo, P., Ferrara, E., Fiumara, G., Ricciardello, A.: A novel measure of edge centrality in social networks. Knowl.-Based Syst. 30, 136–150 (2012)
Dugué, N., Labatut, V., Perez, A.: A community role approach to assess social capitalists visibility in the twitter network. Soc. Netw. Anal. Min. 5(1), 26 (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)
Lancichinetti, A., Fortunato, S.: Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80(1), 016,118 (2009)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Computat. Linguistics 3, 211–225 (2015)
Lu, X., Kuzmin, K., Chen, M., Szymanski, B.K.: Adaptive modularity maximization via edge weighting scheme. Informat. Sci. 424, 55–68 (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advanc. Neural Informat. Process. Syst. 3111–3119 (2013)
Newman, M.E.: Analysis of weighted networks. Phys. Rev. E 70(5), 056,131 (2004)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036,106 (2007)
Sarkar, S., Dong, A.: Community detection in graphs using singular value decomposition. Phys. Rev. E 83, 046,114 (2011)
Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)
Van Laarhoven, T., Marchiori, E.: Network community detection with edge classifiers trained on LFR graphs. In: ESANN (2013)
Wang, J., Leng, M.: A new active learning semi-supervised community detection algorithm in complex networks. In: Proceedings of Recent Developments in Mechatronics and Intelligent Robotics (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Connes, V., Dugué, N., Guille, A. (2019). Is Community Detection Fully Unsupervised? The Case of Weighted Graphs. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 812. Springer, Cham. https://doi.org/10.1007/978-3-030-05411-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-05411-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05410-6
Online ISBN: 978-3-030-05411-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)