Elsevier

Information Sciences

Volume 569, August 2021, Pages 544-556
Information Sciences

Boolean factor based community extraction from directed networks with the non reciprocal link relationship

https://doi.org/10.1016/j.ins.2021.05.027Get rights and content

Abstract

The community extraction from the networks has gained more attention this last decade due to the available data provided by the online social media. This task consists in extracting the homogeneous groups from the network modelled by a graph. The graph models the interaction between entities of the network through the edges. Most of the existing approaches of the community extraction have been designed for the non-directed graph or considered that the relationship between entities is symmetric or reciprocal. In most of the real world application like food web or hierarchical relationship between employees, it is not the case. In this paper, we propose a boolean factor based approach for community detection in directed networks. The main advantage of the boolean factor, based on formal concepts is that, it keeps the relationship between the two sets of related entities. The semantic relationship (non reciprocal) is taken into account during the candidate community extraction process by splitting concept into two parts. The final communities is obtained after refinement of these candidates. We have experimented this approach on some collected directed networks available on internet and the results show the effectiveness of this approach.

Introduction

One of the most important applications in network analysis is the detection of homogeneous groups called communities. This is due to the fact that many real life situations like social network, entities interactions like in social network are modelled using graph structures. Depending on the nature of the relationship between the entities, the graph can be either directed or not.

For undirected graph, many interesting community detection approaches [1], [2] have been achieved. However, a real work still remains to be done on directed graphs whose fundamental characteristic expresses the semantics supported by the orientation of the links. Indeed, several studies have already been done on the communities detection in directed graphs [3]. However, they have the general limit of disregarding the direction of the links, which would not be interesting to omit, in the light of the information processed according to the type of graph. There is no unanimity on defining what a community is. However, it is commonly accepted that it is a dense part of the graph modelling the social network interactions [1], [4]. This community definition is appropriate if the relationship represented by the links is symmetric or reciprocal, like amity relationship or fellowship in the social media, hypertext link between the pages of a website. But in many situations, relationship between entities in the network is not reciprocal or non symmetric. This occurs in the food web networks where the relationship is generally prey/predator, in company environment where the relation could be hierarchical like upper employee/lower employee. In these cases, defining the community as a set of densely connected nodes is biased. This implies the need of taking into account the nature of relationship between entities during the extraction process. This paper focuses on community extraction from network modelled by directed graph whom relationship between nodes is not reciprocal.

The main challenge here is “how to integrate the semantic based on the link property in the graph clustering process?”. Taking this knowledge into account can guide the process and lead it to the appropriate clustering results. In this paper, the knowledge is the nature of the relationship between entities. We mainly stress on the non-reciprocal relationship between actors represented by the graph nodes.

In this paper, a community is defined as a set of homogeneous nodes that share the common properties. The homogeneity is based on the knowledge about the modelled network phenomena. In the case of the non reciprocal relationship, the homogeneity of a group is defined as the interaction of entities within this group with entities within another group in the network. Therefore, a new network clustering algorithm based on adjacency (boolean) factorization matrix is proposed. The factorization matrix consists in decomposing the matrix into a sum of product of other matrices. The well known algorithm GRECOND (Greedy Concept on Demand) [5] designed for Boolean matrix factorization is adapted for this task. The advantage of this algorithm is firstly its polynomial time complexity, secondly it keeps the relation between rows (objects) and columns (attributes) as it is based on the FCA (Formal Concept Analysis) theory [6]. And finally due to the fact that it is based on the greedy strategy, it becomes easy to apply it since it stops its execution when introduced constraints are satisfied. After the extracting phase, these resulting boolean factors are transformed into communities; the overlapping between them is pruned and the isolated entities are migrated to the other communities.

The main contributions of this article are summarized as following:

  • 1.

    Showing that by taking into account the semantic or nature of the relationship represented by the link direction in the graph, other type of communities (different from the classical definition of the set of densely connected nodes) could be emerged from directed networks.

  • 2.

    Proposing a new approach based on the boolean matrix factorization to cluster the adjacency matrix of the directed networks. The link direction (domain knowledge) whether symmetrical or not, is expressed by the representation of the formal concept in the form of boolean factors. This helps avoid the loss of the link semantic like in many related works.

  • 3.

    As no clustering approach focusing on the non symmetric relationship between actors in the network was found in the literature, a new measure called QNR2 called NR-modularity (for Non-Reciprocal modularity), for evaluating the homogeneity degree of the extracted communities based on the non-symmetric relationship, is proposed.

  • 4.

    Finally, we propose an adaptation of the approach designed for the non-reciprocal relationship between entities to cluster the network where relationship is reciprocal.

The next section describes the problem of graph clustering through two examples of networks and the results of some existing algorithms applied on these networks. The following section focuses on the proposed approach. In the fourth section, the experimental results are presented.

Section snippets

Problem statement and related works

The first part of this section focuses on the problem of link semantic in the network and its use in the clustering process; and the second part presents some methods of community detection in directed networks.

Method and algorithms

This section firstly describes a boolean matrix factorization (BMF) based on Formal Concepts Analysis (FCA) [6] as the boolean factors are very related to the formal concepts. Secondly it presents the use of the boolean factors in the community detection.

Experiments and results

The approach presented in this paper has been tested on some datasets. The experiments were conducted on a Windows 10 personal computer of 2 GHZ CPU and 4 Go RAM, using the R software [28] and its package igraph [29]. Table 1 summarizes these data. mayaan is freely available on internet.2 It describes the food web dataset collected in theLittle Rock Lake, Wisconsin in the United States of America. Nodes in

Conclusion

The problem treated in this work was to extract the community from directed network by taking into account the semantic associated to the edge directionality. To do this, we proposed a boolean factor based approach. The main advantage of the boolean factors is to keep the relationship between two groups of nodes (source group and the target group). The community candidates are defined from these boolean factors; the overlapping between these candidates have been pruned and the residual nodes

CRediT authorship contribution statement

Norbert Tsopze: Conceptualization, Data curation, Formal analysis, Methodology, Resources, Software, Supervision, Visualization, Writing - original draft, Writing - review & editing. Félicité Gamgne Domgue: Investigation, Validation, Software, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (29)

  • U. Brandes et al.

    On Modularity Clustering

    IEEE Transactions on Knowledge and Data Engineering

    (2008)
  • B. Ganter et al.

    Formal Concepts Analysis: Mathematical Foundations

    (1999)
  • L. Wang et al.

    Detecting Community Kernels in Large Social Networks

  • D. Garlaschelli et al.

    Patterns of link reciprocity in directed networks

    Physical Review Letters

    (2005)
  • Cited by (5)

    • Graph clustering using triangle-aware measures in large networks

      2022, Information Sciences
      Citation Excerpt :

      Community detection is a fundamental problem in network analysis [38], and a large amount of work in applied mathematics, computer science and statistical physics has been done to abstract the communities in complex networks. A community is typically defined as a set of densely connected vertices in a network [6,8,11,15,23,27,30,33,34], where vertices in the same community always share common properties [31]. And community detection aims to identify all the communities in a network.

    • Knowledge-Guidance Based Directed Graph Clustering

      2023, ICNC-FSKD 2023 - 2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
    View full text