Boolean factor based community extraction from directed networks with the non reciprocal link relationship
Introduction
One of the most important applications in network analysis is the detection of homogeneous groups called communities. This is due to the fact that many real life situations like social network, entities interactions like in social network are modelled using graph structures. Depending on the nature of the relationship between the entities, the graph can be either directed or not.
For undirected graph, many interesting community detection approaches [1], [2] have been achieved. However, a real work still remains to be done on directed graphs whose fundamental characteristic expresses the semantics supported by the orientation of the links. Indeed, several studies have already been done on the communities detection in directed graphs [3]. However, they have the general limit of disregarding the direction of the links, which would not be interesting to omit, in the light of the information processed according to the type of graph. There is no unanimity on defining what a community is. However, it is commonly accepted that it is a dense part of the graph modelling the social network interactions [1], [4]. This community definition is appropriate if the relationship represented by the links is symmetric or reciprocal, like amity relationship or fellowship in the social media, hypertext link between the pages of a website. But in many situations, relationship between entities in the network is not reciprocal or non symmetric. This occurs in the food web networks where the relationship is generally prey/predator, in company environment where the relation could be hierarchical like upper employee/lower employee. In these cases, defining the community as a set of densely connected nodes is biased. This implies the need of taking into account the nature of relationship between entities during the extraction process. This paper focuses on community extraction from network modelled by directed graph whom relationship between nodes is not reciprocal.
The main challenge here is “how to integrate the semantic based on the link property in the graph clustering process?”. Taking this knowledge into account can guide the process and lead it to the appropriate clustering results. In this paper, the knowledge is the nature of the relationship between entities. We mainly stress on the non-reciprocal relationship between actors represented by the graph nodes.
In this paper, a community is defined as a set of homogeneous nodes that share the common properties. The homogeneity is based on the knowledge about the modelled network phenomena. In the case of the non reciprocal relationship, the homogeneity of a group is defined as the interaction of entities within this group with entities within another group in the network. Therefore, a new network clustering algorithm based on adjacency (boolean) factorization matrix is proposed. The factorization matrix consists in decomposing the matrix into a sum of product of other matrices. The well known algorithm GRECOND (Greedy Concept on Demand) [5] designed for Boolean matrix factorization is adapted for this task. The advantage of this algorithm is firstly its polynomial time complexity, secondly it keeps the relation between rows (objects) and columns (attributes) as it is based on the FCA (Formal Concept Analysis) theory [6]. And finally due to the fact that it is based on the greedy strategy, it becomes easy to apply it since it stops its execution when introduced constraints are satisfied. After the extracting phase, these resulting boolean factors are transformed into communities; the overlapping between them is pruned and the isolated entities are migrated to the other communities.
The main contributions of this article are summarized as following:
- 1.
Showing that by taking into account the semantic or nature of the relationship represented by the link direction in the graph, other type of communities (different from the classical definition of the set of densely connected nodes) could be emerged from directed networks.
- 2.
Proposing a new approach based on the boolean matrix factorization to cluster the adjacency matrix of the directed networks. The link direction (domain knowledge) whether symmetrical or not, is expressed by the representation of the formal concept in the form of boolean factors. This helps avoid the loss of the link semantic like in many related works.
- 3.
As no clustering approach focusing on the non symmetric relationship between actors in the network was found in the literature, a new measure called called NR-modularity (for Non-Reciprocal modularity), for evaluating the homogeneity degree of the extracted communities based on the non-symmetric relationship, is proposed.
- 4.
Finally, we propose an adaptation of the approach designed for the non-reciprocal relationship between entities to cluster the network where relationship is reciprocal.
The next section describes the problem of graph clustering through two examples of networks and the results of some existing algorithms applied on these networks. The following section focuses on the proposed approach. In the fourth section, the experimental results are presented.
Section snippets
Problem statement and related works
The first part of this section focuses on the problem of link semantic in the network and its use in the clustering process; and the second part presents some methods of community detection in directed networks.
Method and algorithms
This section firstly describes a boolean matrix factorization (BMF) based on Formal Concepts Analysis (FCA) [6] as the boolean factors are very related to the formal concepts. Secondly it presents the use of the boolean factors in the community detection.
Experiments and results
The approach presented in this paper has been tested on some datasets. The experiments were conducted on a Windows 10 personal computer of 2 GHZ CPU and 4 Go RAM, using the R software [28] and its package igraph [29]. Table 1 summarizes these data. mayaan is freely available on internet.2 It describes the food web dataset collected in theLittle Rock Lake, Wisconsin in the United States of America. Nodes in
Conclusion
The problem treated in this work was to extract the community from directed network by taking into account the semantic associated to the edge directionality. To do this, we proposed a boolean factor based approach. The main advantage of the boolean factors is to keep the relationship between two groups of nodes (source group and the target group). The community candidates are defined from these boolean factors; the overlapping between these candidates have been pruned and the residual nodes
CRediT authorship contribution statement
Norbert Tsopze: Conceptualization, Data curation, Formal analysis, Methodology, Resources, Software, Supervision, Visualization, Writing - original draft, Writing - review & editing. Félicité Gamgne Domgue: Investigation, Validation, Software, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (29)
Community detection in graphs
Physics Reports
(2010)- et al.
Community detection in complex networks by density-based clustering
Physica A: Statistical Mechanics and its Applications
(2013) - et al.
Clustering and community detection in directed networks: A survey
Physics Reports
(2013) - et al.
Discovery of optimal factors in binary data via a novel method of matrix decomposition
Journal of Computer System
(2010) - et al.
Community detection based on modularity and k-plexes
Information Sciences
(2020) - et al.
Finding communities in directed networks by pagerank random walk induced network embedding
Physica A: Statistical Mechanics and its Applications
(2010) - et al.
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Computer Networks
(2000) Directed LPA: Propagating labels in directed networks
Physics Letters A
(2019)- et al.
Factorizing Boolean matrices using formal concepts and iterative usage of essential entries
Information Sciences
(2019) - et al.
Toward quality assessment of Boolean matrix factorizations
Information Sciences
(2018)
On Modularity Clustering
IEEE Transactions on Knowledge and Data Engineering
Formal Concepts Analysis: Mathematical Foundations
Detecting Community Kernels in Large Social Networks
Patterns of link reciprocity in directed networks
Physical Review Letters
Cited by (5)
Graph clustering using triangle-aware measures in large networks
2022, Information SciencesCitation Excerpt :Community detection is a fundamental problem in network analysis [38], and a large amount of work in applied mathematics, computer science and statistical physics has been done to abstract the communities in complex networks. A community is typically defined as a set of densely connected vertices in a network [6,8,11,15,23,27,30,33,34], where vertices in the same community always share common properties [31]. And community detection aims to identify all the communities in a network.
Social network analysis of Twitter interactions: a directed multilayer network approach
2023, Social Network Analysis and MiningA Metaheuristic-Based Modularity Optimization Algorithm Driven by Edge Directionality for Directed Networks
2023, IEEE Transactions on Network Science and EngineeringKnowledge-Guidance Based Directed Graph Clustering
2023, ICNC-FSKD 2023 - 2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge DiscoveryMarkov Chains Based on Random Generalized 1-Flipper Operations for Connected Regular Multi-digraphs
2023, Journal of Donghua University (English Edition)