Elsevier

Future Generation Computer Systems

Volume 124, November 2021, Pages 467-479
Future Generation Computer Systems

Detecting covert communities in multi-layer networks: A network embedding approach

https://doi.org/10.1016/j.future.2021.06.027Get rights and content

Highlights

  • Used a Log-BiLinear embedding to find communities in multi-layer criminal networks.

  • A goal-directed approach optimizes both network embedding and graph clustering mutually.

  • Block removal network disruption is investigated to simulate a police raid arrest.

  • The efficiency is evaluated using different levels of simulated cover-up strategies.

  • The learned representations are evaluated for the prediction of missing links.

Abstract

Graph clustering is a fundamental task to discover community ties in multi-layer networks. In this paper, we propose a network embedding technique to find covert communities in multi-layer dark networks using a Log-BiLinear (LBL) approach. Recent works on graph clustering using network embedding have focused on new ways of learning representations of nodes and relations, upon which a classic clustering method is then used to identify the communities (clusters). However, these embedding approach does not yield good and accurate communities from the clustering task. Hence, we address this issue with a sequence-based network embedding technique on a multi-layer network. Our proposal learns structural representations of nodes and relations simultaneously by capturing the position of a given node within a set of neighboring anchor-set, and the type of connections between nodes in the anchor-set. To find the clusters (communities), clustering centroids are also learned as the representations of nodes and relations are extracted. Our solution is well-suited to detecting covert communities, such as terrorist networks. In our experiments on three real-world terrorist datasets and one synthetic network, our approach is found to deliver a higher level of accuracy in detecting covert communities compared with six baseline methods.

Introduction

Detecting community structures in social networks plays an important part in understanding the functions of complex networks. A common technique to detect communities is graph clustering, where nodes in a graph (or network) are partitioned into disjoint groups. Many graph clustering exist, including hierarchical clustering, cut-based partitioning, and Girvan–Newman algorithm [1]. A major challenge with using graph clustering techniques is translating the structural information of networks into a suitable set of features for machine learning [2]. To overcome this issue, recent studies have turned to “network embedding” (also known as representation learning) techniques to extract features that show improvements in classification and prediction results [2], [3], [4], [5]. Network embedding allows us to transform the rich structural information of networks into a low-dimensional vector space that can be subsequently used for different machine learning tasks, such as node classification, node clustering, regression, and link prediction [6], [7], [8].

In many studies, network embedding technique mostly focused on the existence of edges between nodes and ignored the different edge types in the network [3]. This feature is akin to recognizing the existence of a transportation route (edge) between two towns (nodes) but ignoring the distance or route type (i.e., road or railway). In detecting communities in social networks, this focus could lead to information loss and may prevent important information from being discovered [9], [10]. In reality, social interactions among communities comprise multiple types of relationships, a feature that recent works have increasingly recognized by steering towards analyzing multi-layer networks with nodes connected by different edge types (i.e., relationships) [9], [10]. Multi-layer networks are known by different names depending on the context. In this paper, we refer to [11] where “multi-layer network”, “multi-graph”, “multiplex network”, “multi-relational network”, “multi-slice network”, and “multi-level network” are “multi-layer networks” on the basis of their similar network structure.

Current works that use network embedding to detect communities, typically involve a two-step approach [12], [13]. The first step is network representation learning, and the second is the application of a clustering algorithm (e.g., a classical k-means) to find communities from the learned features. The drawback of this two-step approach is that these representations may not be the best fit for preserving social communities [12]. Therefore, a goal-directed training framework is required to manipulate the learned representations according to the clustering result to find more accurate communities.

Research in multi-layer network analysis has led to many new solutions in various application domains. However, because researchers may not have access to the data of networks in some domains, not every domain has received the same level of attention. One of these domains is the dark network or covert social network problems, where very little data are available [3].

Analyzing dark or covert networks involving illicit activities, such as drug or arms trafficking and terrorist activities [14], is especially problematic. Members in these networks tend to actively conceal their actual network information by engaging in a range of “diversion” or “cover-up” activities to reduce the chances of being noticed by the authorities [14], [15]. They would attempt to remain anonymous by exhibiting different ties (e.g., friendship and kinship) and engaging in activities that distract or divert attention from their real intent, e.g., distributing illegal drugs [14].

Current state-of-the-art studies for community detection in multi-layer networks, such as [16], [17], [18], [19], have not been successful in discovering the actual organization [9] when applied to dark networks. In some cases, it was noted that the proposed approaches produced distorted insights into the network topology and its embedded dynamics [10]. According to [20], finding covert communities within a dark multi-layer network requires enforcement agencies to understand the individual behaviors and qualities, psychological predispositions, and network effects. To do this effectively, the algorithm must accurately capture the position of actors within the covert network as this is the salient feature in which the above information can be derived. Without accurate insights into covert network activities, it would remain challenging for law enforcement to disrupt dark network operations. The better law enforcement agencies can understand the interactions and relationships within covert communities, the more efficient and effective they would be in disrupting criminal activities and halting the cascade of influence from one community to another in the network [21], [22].

Through a systematic literature review, Pourhabibi et al. [3] found that network embedding has not been applied to discover covert communities in multi-layer networks, specifically crime and terror-related activities. We contend that network embedding has the potential to find all covert communities within a dark multi-layer network, including the position of actors within a covert network that could reveal not only their standing but also how they are communicating with other members within the network, as Robbins [20] have uncovered. To capture such structural characteristics, this study uses a Log-BiLinear (LBL) model, a type of network embedding that can be applied on the sequences of nodes randomly sampled from the neighborhood of each node in the network. LBL is able to preserve the structural information related to different types of relations in multi-layer networks using relation-specific matrices [23]. The context-specific matrices also enable LBL to preserve the structural position of the actors within the network [23].

Given a multi-layer network with LR+different types of relationships, where each type of relationship can be represented as a sub-graph Gi=(V,Ei,Li) and G=i=1LGi [24], our task is to learn a d-dimensional vector representation ziZRv×d for each node viV, and partition the nodes within the network into k disjoint communities (clusters)ckC. Following Lu et al. [25], we argue that nodes within the same community should: (i) be close to each other and distant from other communities, and (ii) have similar attribute values (e.g., similar structural representations).

On this formulation, we make the following contributions:

  • We apply network embedding in community detection in criminal networks to learn representations of nodes in a multi-layer network by incorporating (i) nodes positional information regarding other neighboring nodes, and (ii) the types of relations that connect nodes.

  • We propose a goal-directed approach for multi-layer network clustering. Our approach optimizes both network embedding and graph clustering to benefit from these two components mutually.

  • Compared to current state-of-the-art network embedding models, our approach shows a significant increase in accuracy performance in the majority of cases when applied to finding covert communities in real-world dark multi-layer network datasets as well as synthetic data (see Section 4).

  • We investigate the problem of network disruption within the detected communities in a police raid scenario.

  • We analyze the efficiency of the proposed approach by simulating different levels of cover-up strategies taken by criminals to evade detection.

  • We analyze the sensitivity of the proposed approach when different number of communities are chosen.

  • We evaluate the accuracy of learned representations in missing link prediction.

The rest of this paper is organized as follows. Section 2 discusses related works. Section 3 describes the main components of the proposed approach to extract representations from a multi-layer network and to find communities while learning the representations. Section 4 presents the empirical results of our experiments by applying the proposed approach on three real-world terrorist networks and one synthetic network and comparing its performance against six baseline methods. Section 5 concludes the study with suggestions for further research.

Section snippets

Related works

Embedding networks into a low dimensional space is a way of simplifying the graph information by associating each node with a point in space [26]. Most network embedding methods preserve the positional information of each node among its neighboring nodes when simplifying graph information. Positional node information is an important feature in detecting covertness as the nodes of interest may be associated with a specific order in the network [26]. Sequence-based network embedding methods,

Proposed approach

Our proposed solution, as outlined in Algorithm 1, discovers covert communities while learning node representations. It consists of two major mutually beneficial components: (a) network embedding, and (b) self-clustering. We describe these components in more detail below.

Experiments

The problem we are addressing, i.e., finding covert communities such as terrorist networks, has very few real-world datasets that come with a precise ground-truth. Hence, evaluating the performance of our algorithm is a challenge. To ascertain the effectiveness of our solution, we consider two things: (a) comparing the results generated by our algorithm with those produced by state-of-the-art baseline algorithms, and (b) selecting datasets that match the application problem we want to address.

Conclusion

Detecting communities within criminal networks is an important step for law enforcement to disrupt criminal organizations. Law enforcement can benefit from understanding the inter-relationships between communities and the operations of those organizations. Furthermore, disrupting one community of the organization can cause a cascading disruption to other communities. In this paper, we propose an unsupervised Log-BiLinear model to jointly perform graph clustering and learn graph embedding on

CRediT authorship contribution statement

Tahereh Pourhabibi: Methodology development (proposing the main idea and developing the algorithm), Resources (provision of study materials and experimental datasets), Investigation (experimental experiences), Drafting original manuscript. Kok-Leong Ong: Methodology validation and reliability analysis (validating the methodology), Supervision of algorithm development, Review, redrafting and editing of manuscript. Yee Ling Boo: Methodology validation and reliability analysis (validating the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Tahereh Pourhabibi is a Ph.D. candidate in the School of Accounting, Information Systems and Supply Chain, RMIT University, Melbourne, Australia. She received her Master of Science in Artificial Intelligence from Al-Zahra University, Tehran, Iran. Her research interests include machine learning, data mining, anomaly detection, and their application in suspicious activity detection and fraud detection.

References (77)

  • BengioY. et al.

    Representation learning: A review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • SalimA. et al.

    Design of multi-view graph embedding using multiple kernel learning

    Eng. Appl. Artif. Intell.

    (2020)
  • LiT. et al.

    Deep dynamic network embedding for link prediction

    IEEE Access

    (2018)
  • DomenicoM.D. et al.

    Identifying modular flows on multi-layer networks reveals highly overlapping organization in interconnected systems

    Phys. Rev. X

    (2015)
  • RosvallM. et al.

    Memory in network flows and its effects on spreading dynamics and community detection

    Nat. Commun.

    (2014)
  • KiveläM. et al.

    Multilayer networks

    J. Complex Netw.

    (2014)
  • WangC. et al.

    Attributed graph clustering: A deep atten-tional embedding approach

  • RozemberczkiB. et al.

    Gemsec: Graph embedding with self-clustering

  • EricksonB.

    Secret societies and social structure

    Soc. Forces

    (1981)
  • WarnkeS.

    Partial Information Community Detection in a Multilayer Network

    (2016)
  • DickisonM. et al.

    Multilayer social networks

    (2016)
  • BerlingerioM. et al.

    Finding redundant and complementary communities in multidimensional networks

  • RocklinM. et al.

    Latent clustering on graphs with multiple edge types

  • BerlingerioM. et al.

    Abacus: Frequent pattern mining-based community dis-covery in multidimensional networks

    Data Min. Knowl. Discov.

    (2013)
  • RobinsG.

    Understanding individual behaviors within covert networks: The interplay of individual qualities, psychological predispositions, and network effects

    Trends Organ. Crime

    (2009)
  • SaxenaA. et al.

    Discovering and leveraging communities in dark multi-layered networks for network disruption

  • LiuQ. et al.

    Multi-behavioral sequential prediction with recurrent log-bilinear model

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • PourhabibiT. et al.

    Behavioral analysis of users for spammer detection in a multiplex social network

  • LuM. et al.

    Hete_mese: Multi-dimensional community detection algorithm based on multiplex network extraction and seed expansion for heterogeneous information networks

    IEEE Access

    (2018)
  • RallapalliS. et al.

    Sense: Semantically enhanced node sequence embedding

    (2019)
  • RozemberczkiB. et al.

    Fast sequence-based embedding with diffusion graphs

  • PerozziB. et al.

    Deepwalk: Online learning of social representations

  • GroverA. et al.

    Node2vec: Scalable feature learning for networks

  • RibeiroL. et al.

    NStruc2Vec: Learning node representations from structural identity

  • TangJ. et al.

    Line: Large-scale information network embedding

  • CavallariS. et al.

    Learning community embedding with community detection and node embedding on graphs

  • HongmingZ. et al.

    Scalable multiplex network embedding

  • LiuW. et al.

    Principled multilayer network embedding

  • Cited by (0)

    Tahereh Pourhabibi is a Ph.D. candidate in the School of Accounting, Information Systems and Supply Chain, RMIT University, Melbourne, Australia. She received her Master of Science in Artificial Intelligence from Al-Zahra University, Tehran, Iran. Her research interests include machine learning, data mining, anomaly detection, and their application in suspicious activity detection and fraud detection.

    Kok-Leong Ong is an Associate Professor at the Centre for Data Analytics and Cognition, La Trobe University. He received his Ph.D. in 2004 and B. A. Sc. (Hons) in 1999 from the Nanyang Technological University, Singapore. His research interest includes data mining and analytics, and machine learning and AI, and his works have been supported by over $1.46m of grants to-date. He has published over 80 peer-reviewed papers and has served in over 60 Program Committees.

    Yee Ling Boo received her Ph.D. in Information Technology from Monash University Australia. She is currently a senior lecturer in the School of Accounting, Information Systems and Supply Chain, RMIT University, Melbourne, Australia. Her research interests include Data Mining, Brain Inspired Computing, Cognitive Analytics and their applications in business, education and health. Before the pursuit of her PhD. degree, she was a software engineer in Malaysia. Her research works have appeared in reputable journals and conferences.

    Booi Kam is a Professor in the School of Accounting, Information Systems and Supply Chains, RMIT University. His current research interests are in areas of strategic digital supply chain operations and supply chain relationships. A recipient of an Emerald Literati Network Awards for Excellence, Booi is regularly invited by universities in China, England, France, Korea, and Taiwan to give public lectures and teach into their degree programs. Booi holds a Ph.D. from the University of California at Los Angeles. He co-authors Consumer Logistics, a book by Edward Elgar Publishing.

    View full text