Efficient pattern matching on big uncertain graphs

doi:10.1016/j.ins.2015.12.034

Information Sciences

Volume 339, 20 April 2016, Pages 369-394

https://doi.org/10.1016/j.ins.2015.12.034 Get rights and content

Abstract

A significant amount of research has been devoted to seeking efficient solutions to the problem of pattern matching over graphs. This interest is largely due to the many applications that require such efficient solutions, including protein complex prediction, social network analysis, and structural pattern recognition. However, in many real applications, the graph data are often noisy, incomplete, and inaccurate. In other words, there exist many uncertain graphs. Therefore, in this paper, we study pattern matching in the context of large uncertain graphs. Specifically, we want to retrieve all qualified matches of a query pattern in the uncertain graph. Though pattern matching over uncertain graphs is NP-hard, we employ a filtering-and-verification framework to speed up the search. In the filtering phase, we propose a probabilistic matching tree (PM-tree) built from match cuts obtained by a cut selection process. Based on the PM-tree, we devise a collective pruning strategy to prune a large number of unqualified matches. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. Extensive experimental results demonstrate the effectiveness and efficiency of the proposed algorithms. Finally, we show how our solution can be applied to querying knowledge graphs.

Introduction

Graphs constitute a generic data model with wide applicability in numerous domains, such as social networks, biological networks, and the World Wide Web. Indeed, it is often less complex for users to shoehorn semi-structured or sparse data into a vertex-edge-vertex data model than into a relational data model. Furthermore, it is also most natural for users to reason about an increasing number of popular datasets, such as the underlying networks of Twitter, Facebook, or LinkedIn, within a graph paradigm. Various types of queries over graph data have been investigated, such as subgraph search queries [62], [69], [73], shortest-path queries [6], [23], reachability queries [34], [57], and pattern matching queries [18], [42]. Reachability, or shortest-path, queries focus on the relation between two vertices in a graph. On the other hand, pattern matching queries are concerned with the connectivity among sets of vertices. Thus, a pattern matching query is more informative than a simple shortest-path, or reachability, query. Furthermore, a pattern matching query can be answered in polynomial time [20], while processing a subgraph query is $#$ P-complete [25]. Therefore, the database community has devoted considerable effort to the study of the pattern matching query problem [18], [19], [20], [42], [74].

Interestingly, all of the aforementioned studies focus exclusively on applications where the edges of the graph are deterministic. Yet, in most applications, there is inherent uncertainty about the presence of edges due to often inevitable noise, incompleteness, and delays during data collection. For example, in protein–protein interaction (PPI) network, the proteins obtained from experiments may contain non-existing protein interactions, or on the contrary miss existing ones [10], [28], [52], [54]; in social networks, graphs are often used to represent communities of users, where probabilities can be assigned to edges to model the degree of influence among users [1], [40], [46]; in communication or road networks, edge probabilities are used to quantify the connectivity between nodes, or to take traffic uncertainty into consideration [9], [30]; finally, the uncertainty in an Resource Description Framework (RDF) graph is caused by data errors or semantic extraction inaccuracy in the data integration process [13], [31], [39].

Based on the above discussion, in this paper, we study pattern matching queries over large uncertain graphs. In the following, we describe the problem of probabilistic graph pattern matching and outline our contributions.

We first introduce graph pattern matching on deterministic graphs, and then proceed to discuss uncertain graph pattern matching.

Given a graph pattern query q with n vertices ${v_{1}, \dots, v_{n}}$ and a deterministic graph g^c, a deterministic pattern matching query retrieves all matches of q in g^c. For a given q and an n-vertex set $m = {u_{1}, \dots, u_{n}}$ in g^c, m is a match for q in g^c, if (1) the n vertices ${u_{1}, \dots, u_{n}}$ in g^c have the same labels as the corresponding vertices ${v_{1}, \dots, v_{n}}$ in q; and (2) for any two adjacent vertices v_i and v_j in q, the shortest-path distance between the two corresponding vertices u_i and u_j in g^c is no larger than a given threshold γ [19], [74].

Example 1

Consider the pattern query q and the deterministic graph ug^c in Fig. 1. For this example the probabilities of each edge can be ignored. Let the weight of each edge be 1 and the distance constraint γ be 3. Vertices {2, 5, 7} or {5, 6, 7} form a match for q in ug^c, since their vertex labels are same as those of q, namely, {A, B, C}, and the shortest-path distance between each pair of vertices is less than 3. Though the vertex set {1, 5, 7} also has labels {A, B, C}, it is not a match because the shortest-path distance between vertices 1 and 7 is 4, which violates the distance constraint.

The semantics of pattern matching queries have many real life applications [19], [20], [74]. For example, suppose that Fig. 1 is a graph model of LinkedIn, where vertices represent active users and edges indicate the friendship relations among users. Job attributes are used to label the vertices, e.g., {A, B, C} = {Scientist, Professor, Student}. The pattern matching query q looks for relations among scientists, professors and students. Finding such patterns may help social science researchers discover close connections (due to the distance constraint) between a successful scientist and his/her circle of students or professors.

For the uncertain graph pattern matching problem, we focus here on threshold-based probabilistic pattern matching (T-PM) over large uncertain graphs, where vertices are deterministic and edges are uncertain. Specifically, let g be an uncertain graph, let q be a graph pattern query, and let ϵ be a probability threshold. A T-PM query retrieves all vertex sets $m = {u_{1}, \dots, u_{n}}$ in g (i.e., n vertices in g), such that the pattern matching probability (PMP) of m in g is at least ϵ. We will formally define PMP later.

We employ the possible world semantics [53], which has been widely used for modeling query processing over uncertain databases, to explain the semantics of PMP. A possible world graph (PWG) of an uncertain graph is a possible instance of the uncertain graph. It contains all of the vertices and a subset of the edges of the uncertain graph, and its weight is the product of all probabilities associated with the edges. Then, for a graph pattern query q with n vertices ${v_{1}, \dots, v_{n}}$ and an n vertex set $m = {u_{1}, \dots, u_{n}}$ in an uncertain graph g, the probability of m being a match for q is the sum of the weights of those PWGs g′, of g, where m is a match for q in g′. For m to be a match for q in g′, it must satisfy the two conditions of deterministic graph pattern matching defined above.

Example 2

Fig. 2 shows a couple of the PWGs of the uncertain graph ug of Fig. 1 and their respective weights. There are altogether $2^{9} = 512$ PWGs for ug, and the sum of all weights is 1. To decide if a vertex set $m = {5, 6, 7}$ is a match for q in the uncertain graph ug, we first find all of ug’s PWGs that contain m as a match for q. Again, recall that m is a match for q in g′ if (1) vertices in m and q have the same labels, and (2) each pair of corresponding vertices in m has a shortest-path distance of at most 3 ( $γ = 3$ ). Here, the result includes both of the PWGs depicted in Fig. 2, as well as many others. Next, we sum the probability of all of these PWGs: $0.01248 + 0.009126 + \dots = 0.65$ . If a threshold 0.6 is used for the query, then m is a qualified match for q in the uncertain graph ug.

The above example gives a naive solution to T-PM query processing. We call it SCAN, as it needs to enumerate all PWGs of the uncertain graph, and to conduct a pattern matching between the query and each PWG. SCAN is very inefficient due to the exponential nature of the number of PWGs. Therefore, in this paper, we propose a filter-and-verification method to reduce the search space.

Specifically, given a graph pattern query q and a large uncertain graph g, our solution performs T-PM query processing in three steps, namely structural pruning, probabilistic pruning, and verification. In the structural pruning step, we run q on a deterministic graph g^c that removes uncertainty from g, and get a match candidate set SC_q. In the probabilistic pruning step, we first obtain a tight upper bound for PMP via a pre-computed index, which is based on edge cuts of g^c. Next we refine the set of candidates in SC_q, by pruning those potential matches whose upper bound is smaller than the probability threshold. In the verification phase, we validate each remaining candidate match to determine the final answer set.

The following is a summary of the contributions we make with this paper.

•
We give a general framework for answering pattern matching queries over large uncertain graphs.
•
We calculate a very tight upper bound for removing a large number of false candidates. We also devise the “Collective Pruning” strategy to speed up the pruning process.
•
We propose a lightweight index to avoid storing the exponential number of cuts, and devise a query cost model to maximize the pruning capability of the index with a small number of cuts.
•
We propose an efficient hybrid sampling algorithm to rapidly validate the final query answers.
•
We conduct extensive experiments to confirm the efficiency and effectiveness of our proposed approaches on real uncertain graph datasets.

Our earlier work [64] set the stage for the more in-depth study of the uncertain graph pattern matching problem found here. The extension includes the following new contents. First, we include proofs for all of theorems. Second, we used a probabilistic index consisting of edge cuts of the graph, we found that the number of cuts was extremely large, which led to a very large size index. Therefore, in this paper, we propose an optimal cut selection algorithm, so that the index has great pruning power and very small size. Third, we design a basic sampling algorithm to verify the candidates, so as to avoid the hard problem of computing pattern matching probabilities. To speed up the basic algorithm, we use a hybrid sampling approach based on unequal probability sampling techniques, that sample many possible worlds at once. Fourth, we show how to apply uncertain graph pattern matching to the problem of querying knowledge graphs. The experimental results show that the proposed approach is significantly better than state-of-the-art methods in terms of both efficiency and match quality.

The remainder of this paper is organized as follows. We formally define T-PM queries over uncertain graphs, and give the complexity of the problem in Section 2. In Section 3, we give an overview of our approach, while Section 4 details the algorithms for efficient probabilistic pruning and the derivation of the upper bounds of the PMP. Index construction and sampling-based verification algorithms are presented in Sections 5 and 6, respectively. We discuss the results of performance tests on real datasets in Section 7. Relevant related work is presented in Section 9. Finally, Section 10 concludes the paper.

Section snippets

Problem definition

In this section, we define some necessary concepts and discuss the complexity of the graph matching problem.

Definition 1 Uncertain graph

An undirected deterministic graph g^c is denoted by (V, E, Σ, L), where V is a set of vertices, E is a set of edges (⊆ V × V), Σ is a set of labels, and L: V → Σ is a function that assigns labels to vertices. An uncertain graph is defined as $g = (g^{c}, P r),$ where Pr: E → (0, 1] is a function that assigns existence probabilities to edges in E.

Definition 2 Possible world graph

A PWG $g^{'} = (V^{'}, E^{'}, Σ^{'}, L^{'})$ is an instantiation of an

Overview of our approach

Fig. 3 gives a high-level overview of our general framework for a pattern matching query q over an uncertain graph g. It consists of three phases, namely Structural pruning, Probabilistic pruning, and Verification. The first two phases belong to the filtering step, and the last one is the verification step. We briefly present each step in what follows.

Structural pruning. The idea of structural pruning is straightforward. For n vertices $m = {u_{1}, \dots, u_{n}}$ in g, if we remove all of the uncertainty in

Probabilistic pruning

As mentioned above, we first conduct structural pruning to obtain a set of qualified candidate matches of q in g. We then use probabilistic pruning techniques to further filter the remaining match set, SC_q.

The idea behind probabilistic pruning is to compute and use an upper bound for PMP. To facilitate this process, we propose an indexing structure, called probabilistic matching tree (PM-tree).

Before we describe the structure of PM-trees, we begin with some definitions. Given a deterministic

Probabilistic matching tree

Definition 6 introduced the structure and properties of PM-trees. Here, we first describe how to construct PM-trees, and then show that PM-trees have effective pruning capabilities.

Recall from Definition 6 that a PM-tree is a tree $T = (V (T), E (T)),$ where $V (T) = V (g^{c})$ and each edge e ∈ E(T) satisfies the following property.

Property 1

For each pair of distinct nodes (s, t) and edge e on the unique path between s and t, deleting e from T separates V(T) into two components, X and Y, such that (X, Y) is an s–t cut

Verification

In this section, we compute the PMP of a match in C_q to determine the final answer set. Specifically, given the hardness of computing PMP, we propose sampling algorithms to estimate PMP.

Performance evaluation

In this section, we report on the effectiveness and efficiency of our proposed approach. Our methods are implemented on a Windows XP machine with a Core 2 Duo CPU (2.8 GHz) and 8GB main memory. Programs are compiled using Microsoft Visual C++ 2010.

Real-world uncertain dataset. We use the real-world uncertain graph, Yeast, from the STRING database.¹ Yeast contains all known and predicted protein interactions. The graph consists of 5862 vertices, 16,651 edges and 91 distinct

Application: Querying knowledge graphs

As knowledge graphs, such as DBpedia [2], YAGO [17], Probase [58] and Freebase [4], keep track of millions of entities (e.g., persons, products, organizations) together with their relationships, the potential for querying these graphs is tremendous. We now show how the graph pattern matching techniques described above may be applied in this context.

A knowledge graph can be represented as a tuple $K G = (V, E, L_{V},$ L_E, c), where V, E, L_V and L_E denote nodes, edges, node labels and edge labels,

Querying uncertain data

The topic most related to our work is managing and mining uncertain graphs, and it can be divided into two categories. The first category uses online algorithms, i.e., sampling approaches, to answer queries. Zou et al. [76], [77] study frequent subgraph mining on uncertain graph data. Potamias et al. [50] study k-nearest neighbor queries (k-NN) over uncertain graphs. Gao et al. [24] study the probability distribution of the diameter in uncertain graphs. Jin et al. [32] develop fast peeling

Conclusions

Uncertain graphs are pervasive in many real-world applications, such as bioinformatics, where data often exhibit uncertainties. In this paper, we study the problem of retrieving matches from large uncertain graphs that satisfy a query graph pattern with high confidence. To efficiently tackle this problem, we propose a tree index structure to enable an adaptive pruning process, designed according to a formal cost model, so that the index not only has a small size but also has powerful pruning

Acknowledgments

Ye Yuan is supported by the NSFC (grant nos. 61100024 and 61173029) and the Fundamental Research Funds for the Central Universities (grant no. N130504006). Guoren Wang is supported by the NSFC (grant no. 61025007, 61328202 and U1401256), National Basic Research Program of China (973, grant no. 2011CB302200-G), National High Technology Research and Development 863 Program of China (grant no. 2012AA011004). Lei Chen is supported by the NSFC (grant no. 61328202). Bo Ning is supported by the NSFC

References (77)

L. Du et al.
Probabilistic simrank computation over uncertain graphs
Inform. Sci.
(2015)
Y. Gao et al.
On distribution function of the diameter in uncertain graph
Inform. Sci.
(2015)
S. Han et al.
The maximum flow problem of uncertain network
Inform. Sci.
(2014)
L. Xu et al.
Patterns from nature: Distributed greedy colouring with simple messages and minimal graph knowledge
Inform. Sci.
(2015)
E. Adar et al.
Managing uncertainty in social networks
IEEE Data Eng. Bull.
(2007)
S. Auer et al.
Dbpedia: A Nucleus for a Web of Open Data
(2007)
E. Balas et al.
Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring
Algorithmica
(1996)
K. Bollacker et al.
Freebase: A collaboratively created graph database for structuring human knowledge
Proceedings of the Special Interest Group on Management of Data
(2008)
L.S. Chandran et al.
On the number of minimum cuts in a graph
SIAM J. Discrete Math.
(2004)
J. Cheng et al.
Efficient processing of distance queries in large graphs: A vertex cover approach

J. Cheng et al.

Fg-index: Towards verification free query processing on graph databases

Proceedings of Special Interest Group on Management of Data

(2007)

J. Cheng et al.

Fast graph pattern matching

Y. Cheng et al.

Threshold-based shortest path query over large correlated uncertain graphs

J. Comput. Sci. Technol.

(2015)

H. Chui et al.

Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions

Bioinformatics

(2007)

C.J. Colbourn

The Combinatorics of Network Reliability

(1987)

W.J. Cook et al.

Combinatorial Optimization

(1997)

D. Dimitrov et al.

Query operators for comparing uncertain graphs

(2015)

D.S. Hochbaum

Approximation Algorithms for NP-Hard Problems

(1997)

P. Ernst et al.

Knowlife: A versatile approach for constructing a large knowledge graph for biomedical sciences

BMC Bioinform.

(2015)

M. Fabian et al.

Yago: A core of semantic knowledge unifying wordnet and wikipedia

W. Fan et al.

Incremental graph pattern matching

W. Fan et al.

Adding regular expressions to graph reachability and pattern queries

W. Fan et al.

Graph pattern matching: From intractable to polynomial time

Proceedings of Very Large Data Base

(2010)

L. Fang et al.

Rex: Explaining relationships between entity pairs

Proc. VLDB Endow.

(2011)

G.S. Fishman

A monte carlo sampling plan based on product form estimation

Proceedings of the 23rd Conference on Winter Simulation

(1991)

A.W.-C. Fu et al.

Is-label: An independent-set based labeling scheme for point-to-point distance querying

Proc. VLDB Endow.

(2013)

M.R. Garey et al.

Computers and Intractability: A Guide to the Theory of NP-Completeness

(1979)

R.E. Gomory et al.

Multi-terminal network flows

SIAM

(1961)

J.L. Herman et al.

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

BMC Bioinform.

(2015)

A. Hogan et al.

Towards fuzzy query-relaxation for RDF

The Semantic Web: Research and Applications

(2012)

M. Hua et al.

Probabilistic path queries in road networks: traffic uncertainty aware path selection

H. Huang et al.

Query evaluation on probabilistic rdf databases

R. Jin et al.

Discovering highly reliable subgraphs in uncertain graphs

R. Jin et al.

Distance-constraint reachability computation in uncertain graphs

R. Jin et al.

Simple, fast, and scalable reachability oracle

Proc. VLDB Endow.

(2013)

G. Kasneci et al.

Ming: Mining informative entity relationship subgraphs

Proceedings of the Conference on Information and Knowledge Management

(2009)

G. Kasneci et al.

Naga: Searching and ranking knowledge

International Conference on Data Engineering

(2008)

A. Khan et al.

Nema: Fast graph search with label similarity

Proc. VLDB Endow.

(2013)

Cited by (17)

Effective and efficient aggregation on uncertain graphs
2022, Fuzzy Sets and Systems
Citation Excerpt :
In recent years, significant progress has been made on search and mining over uncertain graphs. There have been numerous prior explorations on frequent subgraph mining [33–36], dense subgraph mining [37–39], subgraph matching [40–43] and so on. For frequent subgraph matching, Chen et al. [33] investigated the problem of frequent subgraph mining on single uncertain graphs.
Large-scale graphs are widely used to model the entities and their complex relations. Uncertain graphs are adopted when the relations between entities contain some uncertainty. However, the inherent uncertainties, which are embedded underlying the data and structure of the graphs derived from various sources introduce difficulties on data analysis. To understand the underlying characteristics of large graphs, graph aggregation techniques are critical. However, the existing graph aggregation techniques are designed for deterministic graphs therefore are not applicable on uncertain graphs. In this paper, we provide the first attempt on addressing the aggregation problem on uncertain graphs. To deal with the computation complexity of the aggregation problem, we propose a heuristic-based aggregation algorithm for uncertain graphs and some optimization methods to improve its efficiency in real world implementation. Besides the optimization, to further speed up the process, we design a parallel aggregation implementation approach. The intensive evaluations on the two datasets, DBLP and Flickr, demonstrate that our proposed algorithms are able to produce high quality aggregation results within reasonable operation time and the parallel implementation accelerates the aggregation by up to 82 times compared with the baseline algorithm.
Stable structural clustering in uncertain graphs
2022, Information Sciences
Citation Excerpt :
Because of the significant difference between the deterministic graph and the uncertain graph, some concepts and algorithms in deterministic graphs cannot be directly applied to uncertain graphs. In recent years, the problems that have been extensively studied in deterministic graphs are gradually discussed with respect to uncertain graphs, such as the calculation of k-nearest neighbors [25], k-core [3,13,15,20,24], simrank similarity [34], motif [21], betweenness centrality [27,29], frequent pattern mining [9,31,35] and clustering [5,12,14,17,18]. Structural clustering is an important method in graph clustering, whose goal is to find densely connected clusters in large networks.
The uncertain graph is widely used to model and analyze graph data in which the relation between objects is uncertain. We here study the structural clustering in uncertain graphs. As an important method in graph clustering, structural clustering can not only discover the densely connected core vertices, but also the hub vertices and the outliers. We propose a new clustering model named stable structural clustering, to solve the problem existing in previous models that the mined core vertex is a ‘real’ core one in only a small amount of possible worlds of the uncertain graph. Specifically, we first propose the concept of probability core reliability which measures the probability of a vertex being a core vertex in the uncertain graph. On the basis of probability core reliability, we propose the definition of stable core vertex and formulate the stable structural clustering problem. Comparing with other structural clustering models, the proposed stable structural clustering performs better in crucial indicators that reflect the quality of clustering. We develop two algorithms to calculate stable core vertex, a precise dynamic programming based algorithm and a sampling based algorithm with some effective pruning techniques, based on which we give our structural clustering algorithm. Extensive experiments show that comparing with other structural clustering algorithms in uncertain graphs, the stable structural clustering algorithms proposed can get better clustering to a certain extent.
Limited approximate bisimulations and the corresponding rough approximations
2021, International Journal of Approximate Reasoning
To measure the similarity of nodes in the neighboring subgraphs, Milner introduced the notion of k-limited bisimilarity. Recently, as a weaker version of k-limited bisimilarity, the notion of k-limited similarity was proposed and applied to graph pattern matching. (Bi)simulations have been widely used in comparing the behavior of fuzzy transition systems. In order to study the (bi)simulation semantics of labeled fuzzy transition systems in the residuated lattice-valued logic setting, we introduce an extension of labeled approximation spaces, called the quantitative fuzzy approximation spaces (QFASs), whose labels are equipped with a residuated lattice-valued equality relation. In a QFAS, we define a new notion of limited approximate similarity, to quantify to what extent one state is simulated by another in the neighboring subgraphs, and provide its properties. Based on the new notion, we give a definition of limited approximate simulation and discuss its properties. Then we introduce an ordered pair of relations, one on the state (vertex) set (limited approximate simulation) and one on the edge (transition) set induced by the relation on state set, called VE limited approximate simulation in this paper. We also present a new notion of limited approximate bisimilarity in a QFAS, to quantify to what extent two states are similar in the neighboring subgraphs, and give its properties. One main contribution of the paper is to give a condition for two states to be limited approximate bisimilar and investigate the degree of similarity between two states in a QFAS. Finally, we discuss the relationships between the rough approximations based on the underlying crisp relation induced by underlying labeled fuzzy relation and the rough approximations based on limited approximate bisimilarity.
An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake
2019, Information Sciences
Citation Excerpt :
In the literature, a huge variety of approaches to extracting CKPs has been proposed. Some of them are based on Network Analysis [47], others are centered on “questions and answers” mechanisms [18], further ones exploit Similarity Join [39], and so forth. Each family of approaches has its pros and cons, as well as its corresponding tools [40].
In this paper, we propose a new network-based model to uniformly represent the structured, semi-structured and unstructured sources of a data lake, which is one of the newest and most successful architectures proposed for managing big data. Then, we present a new approach to, at least partially, “structuring” unstructured sources. Finally, with the support of these two tools, we define a new approach to extracting complex knowledge patterns from the data stored in a data lake.
Autonomous overlapping community detection in temporal networks: A dynamic Bayesian nonnegative matrix factorization approach
2016, Knowledge-Based Systems
Citation Excerpt :
For instance, Ahmed and Chen [27] proposed an efficient algorithm for link prediction in temporal uncertain social networks, in which each edge is associated with a probability value indicating its existence in the network. Yuan, et al. [28] employ a filtering-and-verification framework for retrieve all qualified matches of a query pattern in the uncertain graph, in which a probabilistic matching tree (PM-tree) is built from match cuts obtained by a cut selection process and based on the PM-tree, and a collective pruning strategy is devised to prune a large number of unqualified matches. Rezvanian and Meybodi [29] first define minimum vertex covering in stochastic graphs and give four learning automata-based algorithms for solving minimum vertex covering problem in stochastic graphs, in which the probability distribution functions of the weights associated with the vertices of the graph are unknown and can be parameterized a proper choice of the parameter.
A wide variety of natural or artificial systems can be modeled as time-varying or temporal networks. To understand the structural and functional properties of these time-varying networked systems, it is desirable to detect and analyze the evolving community structure. In temporal networks, the identified communities should reflect the current snapshot network, and at the same time be similar to the communities identified in history or say the previous snapshot networks. Most of the existing approaches assume that the number of communities is known or can be obtained by some heuristic methods. This is unsuitable and complicated for most real world networks, especially temporal networks. In this paper, we propose a Bayesian probabilistic model, named Dynamic Bayesian Nonnegative Matrix Factorization (DBNMF), for automatic detection of overlapping communities in temporal networks. Our model can not only give the overlapping community structure based on the probabilistic memberships of nodes in each snapshot network but also automatically determines the number of communities in each snapshot network based on automatic relevance determination. Thereafter, a gradient descent algorithm is proposed to optimize the objective function of our DBNMF model. The experimental results using both synthetic datasets and real-world temporal networks demonstrate that the DBNMF model has superior performance compared with two widely used methods, especially when the number of communities is unknown and when the network is highly sparse.
A survey on mining and analysis of uncertain graphs
2022, Knowledge and Information Systems

View all citing articles on Scopus

View full text

Efficient pattern matching on big uncertain graphs

Abstract

Introduction

Section snippets

Problem definition

Overview of our approach

Probabilistic pruning

Probabilistic matching tree

Verification

Performance evaluation

Application: Querying knowledge graphs

Querying uncertain data

Conclusions

Acknowledgments

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Managing uncertainty in social networks

IEEE Data Eng. Bull.

Dbpedia: A Nucleus for a Web of Open Data

Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring

Algorithmica

Freebase: A collaboratively created graph database for structuring human knowledge

Proceedings of the Special Interest Group on Management of Data

On the number of minimum cuts in a graph

SIAM J. Discrete Math.

Efficient processing of distance queries in large graphs: A vertex cover approach

Fg-index: Towards verification free query processing on graph databases

Proceedings of Special Interest Group on Management of Data

Fast graph pattern matching

Threshold-based shortest path query over large correlated uncertain graphs

J. Comput. Sci. Technol.

Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions

Bioinformatics

The Combinatorics of Network Reliability

Combinatorial Optimization

Query operators for comparing uncertain graphs

Approximation Algorithms for NP-Hard Problems

Knowlife: A versatile approach for constructing a large knowledge graph for biomedical sciences

BMC Bioinform.

Yago: A core of semantic knowledge unifying wordnet and wikipedia

Incremental graph pattern matching

Adding regular expressions to graph reachability and pattern queries

Graph pattern matching: From intractable to polynomial time

Proceedings of Very Large Data Base

Rex: Explaining relationships between entity pairs

Proc. VLDB Endow.

A monte carlo sampling plan based on product form estimation

Proceedings of the 23rd Conference on Winter Simulation

Is-label: An independent-set based labeling scheme for point-to-point distance querying

Proc. VLDB Endow.

Computers and Intractability: A Guide to the Theory of NP-Completeness

Multi-terminal network flows

SIAM

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

BMC Bioinform.

Towards fuzzy query-relaxation for RDF

The Semantic Web: Research and Applications

Probabilistic path queries in road networks: traffic uncertainty aware path selection

Query evaluation on probabilistic rdf databases

Discovering highly reliable subgraphs in uncertain graphs

Distance-constraint reachability computation in uncertain graphs

Simple, fast, and scalable reachability oracle

Proc. VLDB Endow.

Ming: Mining informative entity relationship subgraphs

Proceedings of the Conference on Information and Knowledge Management

Naga: Searching and ranking knowledge

International Conference on Data Engineering

Nema: Fast graph search with label similarity

Proc. VLDB Endow.