Elsevier

Neurocomputing

Volume 331, 28 February 2019, Pages 97-107
Neurocomputing

Digging into it: Community detection via hidden attributes analysis

https://doi.org/10.1016/j.neucom.2018.11.059Get rights and content

Abstract

Identifying community structure in complex systems is essential for characterizing and understanding their functions and properties. Over the past decades, considerable efforts have been devoted to analyzing the community structure of networks and numerous community detection methods have consequently been developed. Among the proposed methods, none of them has explored the community membership in depth, which may provide useful information about the nodes and the communities. In this paper, we name the information contained in the community membership as hidden attributes of nodes and communities, and design a delicate nonnegative matrix factorization (a widely used framework for both disjoint and overlapping community detection) based model to extract the hidden attributes and use these hidden attributes to modify the community detection results on unannotated networks. To test our model’s expansibility, we also extend it on annotated networks by adding observed nodes’ attributes into it. Experiment results on both unannotated and annotated real-world networks show superior performance of our model over state-of-the-art approaches.

Introduction

In the study of the functionality of the real-world complex systems including the human brain, the Internet, modern society and so on, a typical way is to model them with complex networks and study the property of the community structure. Here, a community is a set of nodes where the nodes inside this set have more connections to each other than to the rest of the network [10]. Apart from shedding light on better understanding the complex systems, there are also many practical applications of community detection, such as exploring friendships in social networks [26], classifying the multidomain networks [4], identifying topics in information networks [7], detecting criminal organizations in mobile phone networks [6], and finding bird flyways in biological networks [3]. Therefore, considerable efforts have been devoted to finding community structure in the network [8]. In the literature, most of the classical community detection approaches mainly focus on identifying disjoint communities. However, it is well understood that nodes of a network are naturally characterized by multiple community memberships. Take social networks for example, recent studies have shown that some members can take part in various communities to acquire available resources and gain sources of information flow [28], [34]. Accordingly, many overlapping community detection algorithms have been proposed in recent years to get a more realistic community membership [9], [34], such as stochastic block model [14], deep learning based model [38], and nonnegative matrix factorization (NMF) based model [31]. Similar to disjoint community detection, detecting overlapping communities is also useful for several purposes. For example, in WWW, by identifying overlapping communities in hyperlink networks, the web pages with high content similarity can be found [5]. In author-collaboration networks, overlapping community detection can be used to identify the research area that is attracting the attention from different communities [23]. Other applications such as disease spread can also be controlled by studying the overlapping community structure [29].

Among the proposed community detection methods, NMF is a widely used framework for both disjoint and overlapping community detection. The success of NMF lies in the fact that it can extract useful information from a given matrix by factorizing it. Symmetric nonnegative matrix factorization (SNMF) [30] is the basic NMF based community detection model which finds the community membership by factorizing the network’s adjacency matrix. Specifically, for a given network with its adjacency matrix denoted by A, SNMF seeks to factorize A into two identical matrices U with nonnegative constraint, i.e., A ≈ UUT(U ≥ 0). U can be interpreted as the community membership matrix and each entry Uij refers to the likelihood of node i belonging to community j. Based on this factorization process, it is obvious that SNMF is suitable for both disjoint and overlapping community detection.

Although SNMF is simple and straightforward, there is a fatal drawback. Since the real-world networks are always noisy, with missing or even misleading edges in them, if we only focus on modeling the network topology, the community detection result will inevitably be affected. In order to solve this problem, many variants of NMF based algorithms have been proposed [24], [28], [32], [36], [39], [40]. Among them, bounded nonnegative matrix tri-factorization (BNMTF) takes the interaction between communities into consideration [39]. Preference-based nonnegative matrix factorization (PNMF) uses the relationship between links and communities to improve the model’s performance [40]. Although these methods benefit from modeling the relationship in the network, they fail to explore nodes’ and communities’ properties. Many algorithms also try to incorporate nodes’ attributes into the model. Social relations and contents generated by users are used to detect communities with better quality in social networks [28], and there are models designed specifically for annotated networks [24], [32]. By using more network’s information, the model will be less affected by noises in the network topology, however, such methods do not have the ability to further explore the relationship between nodes’ attributes and the community detection result. A probabilistic model is further proposed to capture the relationship between the community structure and nodes’ attributes [37], but the update rules of this model are not guaranteed to converge.

Although so many community detection methods have been proposed to improve models’ robustness to noises, none of them tries to shed more light on the community detection result. Since the community membership matrix characterizes the relationship between nodes and communities, it is possible that there is useful information about the nodes and communities hidden in the community membership matrix, which could be used to modify the community detection result. In another word, the errors in community detection result brought by the missing or even misleading edges in the network might be fixed by digging into the community membership matrix and extracting useful hidden information from it, if possible. With the use of the hidden information, we can improve the result of community detection without other side information such as nodes’ attributes, which might be unavailable or irrelevant to the community structure (even if the nodes’ attributes are available, we could still extract information about the communities by digging into the community membership matrix). Thus two important questions are naturally triggered: (1) Is there any useful information hidden in U? (2) If so, how can we extract it from U and use it to modify the community detection result? For the first question, inspired by the idea of collaborative filtering (CF) [16], we assume U contains information about the hidden attributes of nodes and communities. Although it is hard to find the exact meaning of these hidden attributes, since they are inferred from the community membership matrix they can serve the purpose of characterizing nodes and communities. The feasibility of this assumption is guaranteed by the success of CF and its effectiveness is verified by our experiment results. As for the second question, we design the following model to solve it. We first extract the hidden attributes of nodes and communities by factorizing U into two different matrices. Then we represent the hidden attributes of communities as a function of its members’ hidden attributes. Let Y be the communities’ hidden attributes derived by factorizing U directly, Y˜ be the communities’ hidden attributes represented by a function of its members’ hidden attributes. Ultimately, the difference between Y and Y˜ is used to enhance the community detection result. The necessity of this approximation process lies in the fact that if we only extract X and Y from U, it will be purely fitting U with two arbitrary matrices. As a result, X and Y can not serve the purpose of modifying the community detection result.

In summary, we propose a novel community detection model named as Digging Into It (DII), which answers the two questions stated above. The main contributions of this paper are summarized as follows:

  • To the best of our knowledge, we are the first one to explore the information hidden in the community membership matrix. We name the information hidden in the community membership matrix as hidden attributes of nodes and communities.

  • We design a delicate model DII to extract the hidden attributes of nodes and communities from the community membership matrix first, then use the hidden attributes to modify the community detection result, which provides the model a way to “learn” from the learned result.

  • We extend our model on annotated networks and conduct extensive experiments on both unannotated and annotated real-world networks. Experiment results demonstrate that DII outperforms several state-of-the-art algorithms.

The remainder of this paper is organized as follows. Section 2 gives the problem definition and model assumptions. Section 3 introduces the framework of DII in detail. Section 4 presents the optimization algorithm. Section 5 illustrates the experiment results. Section 6 introduces the related works and Section 7 makes the conclusion.

Section snippets

Preliminaries

In this section, we give the problem definition and introduce the model’s assumptions.

DII: The proposed model

In this section, we illustrate DII in detail on unannotated networks. DII consists of three stages, that is, finding the elementary community membership, extracting the hidden attributes and modifying the community detection result.

Optimization

Note that the objective function (5) is non-convex, it is hard to find out the global optimal solution directly. However, we can decompose (5) into four parts and alternatively solve them in an effective way. Here we adopt an algorithm which guarantees the nonincreasing of the objective function by iteratively updating each variable with other variables fixed.

Update U: When fixing X, Y, and B, the U-subproblem is:minU0L(U)=AaUUTF2+αUXYTF2+βYUTaBXF2.

It is hard to derive an update rule

Experiments

In this section, we compare DII with several state-of-the-art approaches on real-world datasets. Experiments are mainly conducted on unannotated networks. To test DII’s expansibility, we also do experiments on annotated networks.

Related work

In recent years, considerable efforts have been taken in finding community structure in the network. Fortunato and Santo [8] gives a comprehensive survey on community detection in graphs. Since a node can belong to multiple communities, overlapping community detection has also drawn a lot of attention. A detailed survey on overlapping community detection can be found in [9], [34]. In [34], the authors categorized the algorithms into five categories: clique percolation, link partitioning, local

Conclusion

In this paper, we propose a novel NMF based community detection model, DII, which explores the community membership in depth and extracts the hidden attributes for better community detection. The key idea underlying DII is by digging into the community membership, we can extract useful hidden information from it. By incorporating the extracted hidden information into model, the model’s robustness to noises in data can be improved. Extensive experiment results show that DII can indeed improve

Acknowledgment

The work described in this paper was supported by the National Key Research and Development Program (2016YFB1000101), the National Natural Science Foundation of China (11801595), the Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2016), Natural Science Foundation of Guangdong (2018A030310076), CCF Opening Project of Information System.

Rui Li is currently pursuing the B.S. degree with the School of Physics, Sun Yat-sen University, Guangzhou, China. Her research interests include graph mining, social network analysis and machine learning.

References (41)

  • M. Girvan et al.

    Community structure in social and biological networks

    Proc. Natl. Acad. Sci. U.S.A.

    (2002)
  • S. Gregory

    Finding overlapping communities in networks by label propagation

    J. Phys.

    (2009)
  • F. Havemann et al.

    Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels

    Comput. Sci.

    (2010)
  • L. Hubert et al.

    Comparing partitions

    J. Classif.

    (1985)
  • B. Karrer et al.

    Stochastic blockmodels and community structure in networks

    Phys. Rev. E Stat. Nonlinear Soft Matter Phys.

    (2011)
  • J. Kim et al.

    Sparse Nonnegative Matrix Factorization for Clustering

    (2008)
  • Y. Koren et al.

    Matrix factorization techniques for recommender systems

    Computer

    (2009)
  • J.M. Kumpula et al.

    Sequential algorithm for fast clique percolation.

    Phys. Rev. E Stat. Nonlinear Soft Matter Phys.

    (2008)
  • A. Lancichinetti et al.

    Detecting the overlapping and hierarchical community structure of complex networks

    J. Phys.

    (2008)
  • A. Lancichinetti et al.

    Finding statistically significant communities in networks

    Plos One

    (2010)
  • Cited by (0)

    Rui Li is currently pursuing the B.S. degree with the School of Physics, Sun Yat-sen University, Guangzhou, China. Her research interests include graph mining, social network analysis and machine learning.

    Fanghua Ye received the B.S. degree from the University of Electronic Science and Technology of China, Chengdu, China, in 2016. He is currently pursuing the M.S. degree with the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China. His research interests include social network analysis, data mining, and machine learning.

    Shaoan Xie received the B.S. degree from the Sun Yat-sen University, Guangzhou, China, in 2016. He is currently pursuing the M.S. degree with the School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China. His current research interests include blockchain and machine learning.

    Chuan Chen received the B.S. degree from Sun Yat-sen University, Guangzhou, China, in 2012, and the Ph.D. degree from Hong Kong Baptist University, Hong Kong, in 2016. He is currently an Associate Research Fellow with the School of Data and Computer Science, Sun Yat-Sen University. His main research interests include machine learning, numerical linear algebra, and numerical optimization.

    Zibin Zheng received the Ph.D. degree from The Chinese University of Hong Kong in 2011. He is currently a Full Professor with Sun Yat-sen University, Guangzhou, China. His research interests include service computing, software engineering, and blockchain. He was a recipient of the IBM Ph.D. Fellowship Award. He received the ACM SIGSOFT Distinguished Paper Award at the ICSE 2010 and the Best Student Paper Award at the ICWS 2010.

    View full text