Elsevier

Information Sciences

Volume 432, March 2018, Pages 164-184
Information Sciences

Overlapping community detection in heterogeneous social networks via the user model

https://doi.org/10.1016/j.ins.2017.11.055Get rights and content

Abstract

Clustering users with more common interests who interact frequently on social networking sites has attracted much attention from researchers due to the high economic value and further application prospects. Community detection is a widely accepted means of dealing with the challenge of clustering users, but conventional methods are inadequate since there are billions of vertices and various relations in social media. Through the user model, a heterogeneous network containing both undirected and directed edges is built in this study to exactly simulate a social network. A novel approach for overlapping community detection in a heterogeneous social network (OCD-HSN) is proposed, which contains seed selecting and community initializing and expanding to accurately and efficiently unfold modules in parallel. Experimental results on artificial and real-world social networks demonstrate the higher accuracy and lower time consumption of the proposed scheme compared with other existing state-of-the-art algorithms.

Introduction

With the rapid development of Web 2.0 technology, the widespread acceptance and utilization of social networking sites (SNSs) have changed the modes of communication by facilitating the ability of individuals from diverse areas to interact and share interests. Social networks such as Twitter, Facebook, Google+, Orkut, LinkedIn, Friendster and Multiply have attracted billions of users, and have also received attention from many researchers and media. Whereas these sites offer potential benefits to users [22], they also pose a challenging problem of information overload and place higher requirements on community detection.

Communities, also known as clusters or modules, are the sets of vertices with common properties or the same roles in the network [12]. Community detection is a process of identifying all communities in a network. A large variety of techniques is available to carry out this single task. For more information, see the recent survey by Schaffer [33]. The ability to unfold these sub-structures in a social network can provide insight into how the network topology and functions affect each other, and can further provide theoretical support for interest prediction, friend recommendation, and event evolution.

Entities in social systems are treated as vertices, and edges are connected between vertices based on various relations. With such a representation, the theories of complex networks can be applied to solve many specific problems; this has currently become the leading method in pertinent research efforts [13]. How to construct appropriate complex networks based on the characteristics of social media and explore effective community detection algorithms in large-scale social networks, however, has become challenging. This study aims at meeting this challenge.

The user model, also called the user profile, as a mirror of a real user in the cyber-world, is constructed to depict basic user attributes, interests, and social relations. Although the Vector Space Model [19] has been widely adopted to model users, it suffers from several critical weaknesses. It models users in a bloated form, has no uniform representation standard, and cannot implement semantic expansion. To address these shortcomings, the semantic ontology representation [9] is selected in this study to model users’ semantic interests, and social relations among users are expressed independently of users’ interests.

The Pearson correlation coefficient is applied to calculate the interest similarity value between user models on the corresponding layer of the semantic ontology, and the interest similarity values of multiple layers are mixed using the hyperbolic tangent function. We combine interest similarities and social interactions in the same network to construct a heterogeneous network that simultaneously contains undirected and directed edges.

In this study, it is demonstrated that the community detection in heterogeneous networks using seed set expansion can not only help mine overlapping and hierarchical communities, but can also calculate membership degrees of overlapping vertices in different communities. To the best of our knowledge, this is the first attempt to study overlapping community detection in heterogeneous social networks (OCD-HSN) containing both undirected and directed edges.

The main characteristics and innovations of this study are the following:

  • The interest similarities and social interactions among user models are both included in the same heterogeneous network.

  • A novel approach is proposed to detect communities in heterogeneous networks containing both undirected and directed edges.

  • Multiple seeds are selected in the heterogeneous network to facilitate community detection in parallel.

  • Each community is initialized to a vertex set according to the seed vertex characteristic to improve community accuracy and accelerate detection speed.

  • The election principle is applied to add and remove multiple vertices during iterative expanding of communities, to further accelerate community detection.

To assess the accuracy and effectiveness of our method, we have compared it with two other existing state-of-the-art algorithms. Experimental results on both artificial and real-world social networks indicate that our proposed approach outperforms the other two methods with higher accuracy and lower time complexity. With these advantages, our approach can well satisfy the application requirements of large-scale social networks with billions of vertices.

The rest of this paper is organized as follows. In Section 2, we summarize related work on the techniques of user modeling and community detection. We introduce the user modeling method and build a heterogeneous social network in Section 3. In Section 4, the proposed community detection approach is described in depth. The experimental results of the proposed scheme are compared and analyzed in Section 5. Finally, we conclude the paper in Section 6 by discussing plans for future work.

Section snippets

Related work

Different from traditional complex networks, social networks express users of SNSs and the relations among them. By analyzing such networks, not only is it possible to gain insights into social phenomena and processes that take place in the real world, but one can also extract actionable knowledge that can be beneficial in retrieval tasks and information management, such as online content navigation and recommendation systems.

Heterogeneous social networks

In this section, we generate a heterogeneous social network to provide the foundation for community detection. First, each user is exhibited as a user model with semantic interests and social interactions by analyzing the features of users. We then calculate the interest similarities and connect user models based on the social relations among them, yielding a graph with vertices representing user models and edges representing various relations.

Community detection approach

In this section, we first present the framework of our proposed approach. Next, we describe in detail the multiple seeds selection strategy, overlapping community detection algorithm, and membership degree calculation method in the heterogeneous social network. Finally, we reveal how to discover the hierarchical module structure and analyze the time complexity.

Experiments

In this section, we first introduce four assessment criteria that are extended from traditional unweighted and undirected networks, and two state-of-the-art methods that are employed for comparison with our proposed approach. Second, we present the experimental results on both synthetic and real-world heterogeneous social networks. Finally, we discuss and analyze the experimental results.

Conclusions and Future Work

In this study, we propose an overall framework for overlapping community detection in the heterogeneous social network. First, we construct a heterogeneous social network using the similarity calculation of multiple user interests and the consideration of social interaction relations, where the user’s interests and social features are depicted by a user model as a vertex in the network, to provide a new perspective for social computing. We then present an integrative approach for community

Acknowledgments

This work is partially funded by the National Key Research and Development Program of China (No. 2017YFC0907505), the National Natural Science Foundation of China (Nos. 61772128 and 61303096), the Fundamental Research Funds for the Central Universities (No. 16D111208), the Shanghai Natural Science Foundation (No. 17ZR1400200), and the Xinjiang Social Science Foundation (No. 2015BGL100).

References (37)

  • R. Andersen et al.

    Local graph partitioning using pagerank vectors

    Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)

    (2006)
  • A. Arenas et al.

    Synchronization reveals topological scales in complex networks

    Phys. Rev. Lett.

    (2006)
  • S. Asur et al.

    An event-based framework for characterizing the evolutionary behavior of interaction graphs

    ACM Trans. Knowl. Discov. Data (TKDD)

    (2009)
  • V.D. Blondel et al.

    Fast unfolding of communities in large networks

    J. Stat. Mech: Theory Exp.

    (2008)
  • L.M. Collins et al.

    Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions

    Multivariate Behav. Res.

    (1988)
  • S.G. Esparza et al.

    Mining the real-time web: a novel approach to product recommendation

    Knowl. Based Syst.

    (2012)
  • G. Ghoshal et al.

    Social system as complex networks

    Soc. Netw. Anal. Min.

    (2014)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proc. Natl. Acad. Sci.

    (2002)
  • Cited by (42)

    • CEO: Identifying Overlapping Communities via Construction, Expansion and Optimization

      2022, Information Sciences
      Citation Excerpt :

      However, increasing evidence shows that many real-world systems are characterized by statistics of overlapping communities where a node may belong to multiple communities [4]. Mainstream overlapping community detection algorithms can be classified into local expansion methods [5], clique percolation methods [6], link partitioning methods [7], agent-based dynamical methods [8], fuzzy detection methods [9], multi-objective evolutionary methods [10], learning-based methods [11], etc. In addition, by performing on multiple groups of query nodes, community search methods can generate communities with overlaps [12].

    • Social influence based community detection in event-based social networks

      2020, Information Processing and Management
      Citation Excerpt :

      Community detection is an important problem in social network analysis and it has received a great deal of attention from many disciplines (Fortunato, 2010; He, Li, Soundarajan, & Hopcroft, 2018; Newman, 2004; Wu, Kwong, Zhou, Jia, & Gao, 2018). The ability to detect internally connected communities in social networks can provide supports for real-world applications, such as friend recommendation and interest prediction (Huang et al., 2018). In general, a community in social networks represents a group of people who are connected internally (Newman, 2004).

    • Identification of topical subpopulations on social media

      2020, Information Sciences
      Citation Excerpt :

      As the processing and storage costs of individual topical profiles is high, the application of such an approach in ad-hoc search setup is non-trivial. More recently, Huang et al. [20] similarly mapped the content associated with users on social media onto an ontology. They inferred topical similarity between users, and co-represented this information alongside social relations in a heterogeneous joint graph for the purpose of overlapping community detection.

    View all citing articles on Scopus
    View full text