Elsevier

Neurocomputing

Volume 337, 14 April 2019, Pages 287-302
Neurocomputing

EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks

https://doi.org/10.1016/j.neucom.2019.01.074Get rights and content

Highlights

  • Extend DPC to be able to detect overlapping communities.

  • Come up with a common nodes based distance function with linking weights considered.

  • Utilize a linear fitting based strategy to select cluster centers adaptively.

Abstract

Overlapping community detection plays an important role in studying social networks. The existing overlapping community detection methods seldom perform well on networks with complex weight distribution. Density peaks clustering (DPC) is capable of finding communities with arbitrary shape efficiently and accurately. However, DPC fails to be applied to overlapping community detection directly. In this paper, we propose an extended adaptive density peaks clustering for overlapping community detection, called EADP. To handle both weighted and unweighted social networks, EADP takes weights into consideration and incorporates a novel distance function based on common nodes to measure the distance between nodes. Moreover, unlike DPC choosing cluster centers by hand, EADP adopts a linear fitting based strategy to choose cluster centers adaptively. Experiments on real-world social networks and synthetic networks show that EADP is an effective overlapping community detection algorithm. Compared with the state-of-the-art methods, EADP performs better on those networks with complex weight distribution.

Introduction

It is very common that people in social networks join various communities. Community detection, which aims at finding out the community structure hidden in a network, becomes a very popular research direction in social network mining. There have been numerous algorithms proposed to find disjoint community structure, such as DBSCAN [1], spectral clustering [2] and so on. However, people in social networks are often characterized by multiple community memberships. For instance, a boy could be a member of music club and basketball club simultaneously as he may be interested in both music and basketball. For this reason, researches about overlapping community detection have drawn more and more attention. The clique percolation method (CPM) [3] is one of the representative algorithms. To our best knowledge, many of the existing overlapping community detection methods do not consider the linking weights, and methods that take weights into account seldom perform well on networks with dramatic weights variation.

Density peak clustering (DPC) [4], published in Science Journal, introduces a novel approach to find community structures. DPC is based on the assumption that cluster centers are surrounded by lower density points and the distance between cluster centers is relatively large. DPC is capable of detecting communities with arbitrary shape efficiently and accurately. However, communities in social networks are often overlapped, DPC only detects non-overlapping communities. Moreover, DPC requires a distance matrix as its input. However, as point out in [5], for most of the social networks, the input would be the adjacent matrix. Therefore, we need to construct distance matrix ourselves by making use of the linking information in adjacent matrix. What’s more, DPC requires choosing cluster centers by hand, which will be not practical in the case of large networks.

Since DPC being proposed, many scholars have made various improvements on it. However, just the same as DPC, most of the improvements require the distance between nodes as input. Thus, we claim that those methods could not be used in social networks directly.

In this paper, we propose an extended adaptive density peaks clustering for overlapping community detection, called EADP. EADP incorporates a novel distance function based on common nodes to handle social networks directly. Furthermore, the distance function is designed to take linking weights into consideration such that EADP could be used in weighted networks. Our main contributions can be summarized as follows:

  • Extend DPC to detect overlapping communities. To do this, we adopt a two-step allocation strategy. For the first step, we perform nodes assignation with the same strategy as DPC. Then, we calculate community memberships for each non-center node, and add it to clusters where it has higher community memberships than it has in its initial community.

  • Come up with a common nodes based distance function with linking weights considered. Through this distance function, EADP could be used directly in social networks. Moreover, when calculating distance between nodes, linking weights are considered. Therefore, EADP is suitable for both weighted and unweighted social networks.

  • Utilize a linear fitting based strategy to select cluster centers adaptively. We choose cluster centers by performing linear fitting within the difference vector of γ, which is the product of local density ρ and separation distance δ.

The rest of this paper will be organized as follows. In Section 2, we discuss the related work. In Section 3, we introduce the preliminary of DPC. Section 4 presents the proposed algorithm EADP. Section 5 gives an introduction of experimental setup. Section 6 shows and analyses the experimental results on both real and synthetic datasets. Finally, we conclude this paper and give the vision of our future work in Section 7.

Section snippets

Related work

Our work focuses on overlapping community detection based on density peaks, so we will discuss traditional overlapping community detection methods and density peaks based methods, respectively.

Preliminary

Our algorithm EADP is based on density peak proposed in [50], thus we give a brief introduction to its main idea here.

Density peaks clustering (DPC) is based on the assumption that cluster centers are characterized by higher density than their neighbors and by a relatively large distance from those nodes with higher density. It has two quantities: the local density ρ and the separation distance δ.

There are ways to calculate the local density ρ. One way is using cutoff distance shown in Eq. (1),ρ

EADP: an extended adaptive density peaks clustering for overlapping community detection in social networks

To extend DPC to be capable of finding overlapping communities in both weighted and unweighted social networks, we mainly do following work: (1) Define a distance function to measure the distance between nodes in social networks. (2) Introduce the idea of linear fitting to choose cluster centers adaptively. (3) Change the allocation strategy adopted by DPC such that the communities could be overlapped.

Next, we will discuss our work one by one, which are the core parts of our method EADP.

Experiment setup

In this section, we will give a brief introduction of the baseline algorithms, community validation metrics and the datasets.

Experiments and analysis

In this section, we first show and analyse our experimental results of effectiveness evaluation on real-world networks and synthetic networks. Then, we will make a comparison about running time.

Conclusion and future work

In this paper, we extend DPC to be applied to overlapping community detection in social networks and propose our algorithm EADP. EADP adopts a two-step strategy to allocate non-center nodes so that communities could be overlapped. To handle social networks directly without extra work to construct the distance matrix, EADP incorporates a common nodes based distance function. Moreover, unlike DPC choosing cluster centers by hand, EADP utilizes a linear fitting based strategy to select cluster

Acknowledgments

This work is supported by the National Key Research and Development Program of China under grants 2016QY01W0202 and 2016YFB0800402, National Natural Science Foundation of China under grants 61572221, U1836204, 61672254, 61433006, 61772219, U1401258 and 61502185, Major Projects of the National Social Science Foundation under grant 16ZDA092 and Guangxi High level innovation Team in Higher Education Institutions Innovation Team of ASEAN Digital Cloud Big Data Security and Mining Technology. We

Mingli Xu received the B.S. degree from Software Engineering, Chongqing University. She is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. Her current research interests include social network and machine learning.

References (57)

  • LiangZ. et al.

    Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering

    Pattern Recognit. Lett.

    (2016)
  • YaohuiL. et al.

    Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy

    Knowl.-Based Syst.

    (2017)
  • LiZ. et al.

    Comparative density peaks clustering

    Expert Syst. Appl.

    (2018)
  • XuJ. et al.

    Denpehc: density peak based efficient hierarchical clustering

    Inf. Sci.

    (2016)
  • Y. Li et al.

    Multi-topic tracking model for dynamic social network

    Phys. A: Stat. Mech. Appl.

    (2016)
  • M. Ester et al.

    A density-based algorithm for discovering clusters in large spatial databases with noise.

    Proceedings of the Kdd

    (1996)
  • LiuH. et al.

    Spectral ensemble clustering

    Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2015)
  • PallaG. et al.

    Uncovering the overlapping community structure of complex networks in nature and society

    Nature

    (2005)
  • A. Rodriguez et al.

    Clustering by fast search and find of density peaks

    Science

    (2014)
  • S. Gregory

    Finding overlapping communities in networks by label propagation

    New J. Phys.

    (2010)
  • J. Xie et al.

    Towards linear time overlapping community detection in social networks

    Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining

    (2012)
  • FanH. et al.

    Overlapping community detection based on discrete biogeography optimization

    Appl. Intell.

    (2018)
  • WangC. et al.

    Review on community detection algorithms in social networks

    Proceedings of the 2015 IEEE International Conference on Progress in Informatics and Computing (PIC)

    (2015)
  • A. Lancichinetti et al.

    Detecting the overlapping and hierarchical community structure in complex networks

    New J. Phys.

    (2009)
  • A. McDaid et al.

    Detecting highly overlapping communities with model-based overlapping seed expansion

    Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

    (2010)
  • C. Lee et al.

    Detecting highly overlapping community structure by greedy clique expansion

    Proceedings of the 4th Workshop on Social Network Mining and Analysis

    (2010)
  • JinD. et al.

    A Markov random walk under constraint for discovering overlapping communities in complex networks

    J. Stat. Mech.: Theory Exp.

    (2011)
  • M.E. Newman et al.

    Random graphs with arbitrary degree distributions and their applications

    Phys. Rev. E

    (2001)
  • Cited by (51)

    • AFL-DCS: An asynchronous federated learning framework with dynamic client scheduling

      2024, Engineering Applications of Artificial Intelligence
    • Fuzzy self-consistent clustering ensemble

      2024, Applied Soft Computing
    View all citing articles on Scopus

    Mingli Xu received the B.S. degree from Software Engineering, Chongqing University. She is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. Her current research interests include social network and machine learning.

    Yuhua Li received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2006. She is currently an associate professor in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. She has published more than 40 journal and conference papers. She is a senior member of China Computer Federation (CCF). Her research interests include data mining, social network, machine learning, big data.

    Ruixuan Li received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2004. He is currently a professor in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. He is a senior member of China Computer Federation (CCF), a member of IEEE and ACM. He has published more than 70 journal and conference papers. His research interests include social network, big data management, distributed computing, big data security.

    Fuhao Zou received B.E. degree in computer science from Huazhong Normal University, Wuhan, Hubei, China, in 1998. And received M.S. and Ph.D. in computer science and technology from Huazhong University of Science and Technology (HUST), Wuhan, Hubei, China, in 2003 and 2006. Currently, he is an associate professor with the school of computer science and technology, HUST. His research interests include deep learning, multimedia understanding and analysis, big data analysis. He is senior member of China Computer Federation (CCF) and member of IEEE, ACM.

    Xiwu Gu received the Ph.D. degree in computer application technology from Huazhong University of Science and Technology, Wuhan, China, in 2007. He is currently an associate associate research fellow in the College of Computer Science and Technology, Huazhong University of Science and Technology, China. He has published more than 30 journal and conference papers. His research interests include distributed computing, data mining, social computing, big data.

    View full text