EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks
Introduction
It is very common that people in social networks join various communities. Community detection, which aims at finding out the community structure hidden in a network, becomes a very popular research direction in social network mining. There have been numerous algorithms proposed to find disjoint community structure, such as DBSCAN [1], spectral clustering [2] and so on. However, people in social networks are often characterized by multiple community memberships. For instance, a boy could be a member of music club and basketball club simultaneously as he may be interested in both music and basketball. For this reason, researches about overlapping community detection have drawn more and more attention. The clique percolation method (CPM) [3] is one of the representative algorithms. To our best knowledge, many of the existing overlapping community detection methods do not consider the linking weights, and methods that take weights into account seldom perform well on networks with dramatic weights variation.
Density peak clustering (DPC) [4], published in Science Journal, introduces a novel approach to find community structures. DPC is based on the assumption that cluster centers are surrounded by lower density points and the distance between cluster centers is relatively large. DPC is capable of detecting communities with arbitrary shape efficiently and accurately. However, communities in social networks are often overlapped, DPC only detects non-overlapping communities. Moreover, DPC requires a distance matrix as its input. However, as point out in [5], for most of the social networks, the input would be the adjacent matrix. Therefore, we need to construct distance matrix ourselves by making use of the linking information in adjacent matrix. What’s more, DPC requires choosing cluster centers by hand, which will be not practical in the case of large networks.
Since DPC being proposed, many scholars have made various improvements on it. However, just the same as DPC, most of the improvements require the distance between nodes as input. Thus, we claim that those methods could not be used in social networks directly.
In this paper, we propose an extended adaptive density peaks clustering for overlapping community detection, called EADP. EADP incorporates a novel distance function based on common nodes to handle social networks directly. Furthermore, the distance function is designed to take linking weights into consideration such that EADP could be used in weighted networks. Our main contributions can be summarized as follows:
- •
Extend DPC to detect overlapping communities. To do this, we adopt a two-step allocation strategy. For the first step, we perform nodes assignation with the same strategy as DPC. Then, we calculate community memberships for each non-center node, and add it to clusters where it has higher community memberships than it has in its initial community.
- •
Come up with a common nodes based distance function with linking weights considered. Through this distance function, EADP could be used directly in social networks. Moreover, when calculating distance between nodes, linking weights are considered. Therefore, EADP is suitable for both weighted and unweighted social networks.
- •
Utilize a linear fitting based strategy to select cluster centers adaptively. We choose cluster centers by performing linear fitting within the difference vector of γ, which is the product of local density ρ and separation distance δ.
The rest of this paper will be organized as follows. In Section 2, we discuss the related work. In Section 3, we introduce the preliminary of DPC. Section 4 presents the proposed algorithm EADP. Section 5 gives an introduction of experimental setup. Section 6 shows and analyses the experimental results on both real and synthetic datasets. Finally, we conclude this paper and give the vision of our future work in Section 7.
Section snippets
Related work
Our work focuses on overlapping community detection based on density peaks, so we will discuss traditional overlapping community detection methods and density peaks based methods, respectively.
Preliminary
Our algorithm EADP is based on density peak proposed in [50], thus we give a brief introduction to its main idea here.
Density peaks clustering (DPC) is based on the assumption that cluster centers are characterized by higher density than their neighbors and by a relatively large distance from those nodes with higher density. It has two quantities: the local density ρ and the separation distance δ.
There are ways to calculate the local density ρ. One way is using cutoff distance shown in Eq. (1),
EADP: an extended adaptive density peaks clustering for overlapping community detection in social networks
To extend DPC to be capable of finding overlapping communities in both weighted and unweighted social networks, we mainly do following work: (1) Define a distance function to measure the distance between nodes in social networks. (2) Introduce the idea of linear fitting to choose cluster centers adaptively. (3) Change the allocation strategy adopted by DPC such that the communities could be overlapped.
Next, we will discuss our work one by one, which are the core parts of our method EADP.
Experiment setup
In this section, we will give a brief introduction of the baseline algorithms, community validation metrics and the datasets.
Experiments and analysis
In this section, we first show and analyse our experimental results of effectiveness evaluation on real-world networks and synthetic networks. Then, we will make a comparison about running time.
Conclusion and future work
In this paper, we extend DPC to be applied to overlapping community detection in social networks and propose our algorithm EADP. EADP adopts a two-step strategy to allocate non-center nodes so that communities could be overlapped. To handle social networks directly without extra work to construct the distance matrix, EADP incorporates a common nodes based distance function. Moreover, unlike DPC choosing cluster centers by hand, EADP utilizes a linear fitting based strategy to select cluster
Acknowledgments
This work is supported by the National Key Research and Development Program of China under grants 2016QY01W0202 and 2016YFB0800402, National Natural Science Foundation of China under grants 61572221, U1836204, 61672254, 61433006, 61772219, U1401258 and 61502185, Major Projects of the National Social Science Foundation under grant 16ZDA092 and Guangxi High level innovation Team in Higher Education Institutions Innovation Team of ASEAN Digital Cloud Big Data Security and Mining Technology. We
Mingli Xu received the B.S. degree from Software Engineering, Chongqing University. She is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. Her current research interests include social network and machine learning.
References (57)
- et al.
An overlapping community detection algorithm based on density peaks
Neurocomputing
(2017) - et al.
A spreading activation-based label propagation algorithm for overlapping community detection in dynamic social networks
Data Knowl. Eng.
(2018) - et al.
An overlapping network community partition algorithm based on semi-supervised matrix factorization and random walk
Expert Syst. Appl.
(2018) - et al.
Local spectral clustering for overlapping community detection
ACM Trans. Knowl. Disc. Data
(2018) - et al.
A multi-objective genetic algorithm for overlapping community detection based on edge encoding
Inf. Sci.
(2018) - et al.
A link clustering based overlapping community detection algorithm
Data Knowl. Eng.
(2013) Community detection in graphs
Phys. Rep.
(2010)- et al.
Study on density peaks clustering based on k-nearest neighbors and principal component analysis
Knowl.-Based Syst.
(2016) - et al.
Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors
Inf. Sci.
(2016) - et al.
A new unsupervised approach for fuzzy clustering
Fuzzy Sets Syst.
(2007)
Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering
Pattern Recognit. Lett.
Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy
Knowl.-Based Syst.
Comparative density peaks clustering
Expert Syst. Appl.
Denpehc: density peak based efficient hierarchical clustering
Inf. Sci.
Multi-topic tracking model for dynamic social network
Phys. A: Stat. Mech. Appl.
A density-based algorithm for discovering clusters in large spatial databases with noise.
Proceedings of the Kdd
Spectral ensemble clustering
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Uncovering the overlapping community structure of complex networks in nature and society
Nature
Clustering by fast search and find of density peaks
Science
Finding overlapping communities in networks by label propagation
New J. Phys.
Towards linear time overlapping community detection in social networks
Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining
Overlapping community detection based on discrete biogeography optimization
Appl. Intell.
Review on community detection algorithms in social networks
Proceedings of the 2015 IEEE International Conference on Progress in Informatics and Computing (PIC)
Detecting the overlapping and hierarchical community structure in complex networks
New J. Phys.
Detecting highly overlapping communities with model-based overlapping seed expansion
Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Detecting highly overlapping community structure by greedy clique expansion
Proceedings of the 4th Workshop on Social Network Mining and Analysis
A Markov random walk under constraint for discovering overlapping communities in complex networks
J. Stat. Mech.: Theory Exp.
Random graphs with arbitrary degree distributions and their applications
Phys. Rev. E
Cited by (51)
A neighborhood-based robust clustering algorithm using Apollonius function kernel
2024, Expert Systems with ApplicationsAFL-DCS: An asynchronous federated learning framework with dynamic client scheduling
2024, Engineering Applications of Artificial IntelligenceDensity peak clustering algorithms: A review on the decade 2014–2023
2024, Expert Systems with ApplicationsFuzzy self-consistent clustering ensemble
2024, Applied Soft ComputingAn overview on density peaks clustering
2023, NeurocomputingOverlapping community detection with adaptive density peaks clustering and iterative partition strategy
2023, Expert Systems with Applications
Mingli Xu received the B.S. degree from Software Engineering, Chongqing University. She is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. Her current research interests include social network and machine learning.
Yuhua Li received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2006. She is currently an associate professor in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. She has published more than 40 journal and conference papers. She is a senior member of China Computer Federation (CCF). Her research interests include data mining, social network, machine learning, big data.
Ruixuan Li received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2004. He is currently a professor in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. He is a senior member of China Computer Federation (CCF), a member of IEEE and ACM. He has published more than 70 journal and conference papers. His research interests include social network, big data management, distributed computing, big data security.
Fuhao Zou received B.E. degree in computer science from Huazhong Normal University, Wuhan, Hubei, China, in 1998. And received M.S. and Ph.D. in computer science and technology from Huazhong University of Science and Technology (HUST), Wuhan, Hubei, China, in 2003 and 2006. Currently, he is an associate professor with the school of computer science and technology, HUST. His research interests include deep learning, multimedia understanding and analysis, big data analysis. He is senior member of China Computer Federation (CCF) and member of IEEE, ACM.
Xiwu Gu received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2007. He is currently an associate associate research fellow in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. He has published more than 30 journal and conference papers. His research interests include distributed computing, data mining, social computing, big data.