Error-tolerant graph matching in linear computational cost using an initial small partial matching

https://doi.org/10.1016/j.patrec.2018.04.003Get rights and content

Highlights

  • A sub-optimal algorithm is presented to compute the graph edit distance.

  • The computational cost is linear w.r.t the order of the graphs.

  • It needs an initial node-to-node mapping.

  • It allows mapping huge graphs, such as social nets.

Abstract

Error-tolerant graph matching has been demonstrated to be an NP-problem, therefore, its exact computation has an exponential computational cost and several sub-optimal algorithms have been presented with the aim of making the runtime acceptable in some applications. Some well-known sub-optimal algorithms have sixth, cubic or quadratic computational costs with respect to the order of the graphs. Although these computational costs could be considered very low, when applications deal with large graphs (for instance in social networks), the quadratic cost continues to be unacceptable. For this reason, we present an error-tolerant graph-matching algorithm that has a  O(d3.5 · ) computational cost, d being the number of output edges per node and  n the order of the graphs. Note that, usually, in social networks, it holds that  d ≪ n and for this reason we consider the cost to be linear, in other words O(k · ), k being a low constant. Our method needs an initial seed, which is composed of one or several node-to-node mappings. The algorithm has been applied to analyse the evolution of social networks.

Introduction

Recently, we have seen an increase in the number of people registered in the social networks and also in the number of different social networks. In some applications, for instance, personalised publicity, it would be interesting to locate people from one network on the other network, in order to increase the knowledge, we have of these people. It is worth noting that in some cases, we know the nodes in each network that represent the same person, since we have this knowledge from other sources of information. However, this is not the most common case given that several people could have the same name in the net or people could use different aliases in each network. We call these few mappings between both networks Seeds and they are crucial information in the model we present. Fig. 1 shows a naïve example of these Seeds.

Attributed Graphs are good models for representing social networks, thus, if we want to correlate two networks, what we have to do is to find a mapping between nodes of the graphs that represent these networks. The methods that return a distance between two graphs and a mapping between their nodes are called error-tolerant graph matching [1].

Error-tolerant graph matching has been demonstrated to be an NP-problem [2], therefore, several algorithms have been presented that apply certain heuristics in order to reduce the computational cost [3], [4], [5] or [6]. However, sub-optimal algorithms have been presented that deduce a distance and a matching between nodes in polynomial time. For instance, the Graduated assignment [7], the Bipartite graph matching [8], [9], [10], [11] or the Greedy edit distance algorithm [12], [13]. All of these algorithms define a bi-dimensional matrix in which the number of rows or columns is related to the graph order.

The aim of this paper is to present an error-tolerant graph-matching algorithm designed to match huge graphs. To achieve this, we have imposed two main restrictions. Firstly, any bi-dimensional matrix cannot be defined in which the number of rows or columns is the graph order or higher (or, a vector with a quadratic length with regard to the order of the graphs). Secondly, the computational cost has to be linear with regard to the order of the graphs. Moreover, we assume that some (few) initial mappings between nodes of both networks are given. We call these initial mappings seeds since, as we will see in the following sections, they are initial node-to-node mappings from which the algorithm begins to spread its knowledge of the partial matching. Clearly, we accept that the quality of the deduced mappings might be lower than the mappings of other non-exponential algorithms, such as the ones in [7], [8] or [12], which are polynomial but they have a higher computational cost. Nevertheless, the linear computational cost is the only way we consider that two huge networks can be currently mapped. Note that the introduction of the seeds makes the algorithm to be more informed and could compensate the reduction of the explored search space.

The rest of paper will be as follows. In the next section, we introduce the Attributed Graphs, the Graph Edit Distance and the graph matching algorithms that compute the Graph Edit Distance. In Section 3, we continue to explain our graph-matching algorithm and we also deduce its computational cost. In Section 4, we present the experimental section. It is composed of two parts. In the first part, we have randomly generated some graphs and we compare our algorithm to some of the state of the art error-tolerant graph matching algorithms. In the second part, we show how we have used our method to map people on two social networks. We conclude the paper in Section 5. We have added an appendix with an example of how the algorithm works, given two small graphs.

Section snippets

Attributed graphs and graph matching

Let G=(Σv,Σe,γv,γe) and G=(Σv,Σe,γv,γe) be two Attributed Graphs. Σv={va|a=1,,n} is the set of vertices and Σe={ea,b|a,b1,,n} is the set of edges. Functions γv: Σv → Δv and γe: Σe → Δe assign attribute values in any domain to vertices and edges. γv(va)=va and γe(ea,b)=ea,b. Coherent definitions hold for G=(Σv,Σe,γv,γe). We call n the order of the graph.

A local structure of a node is the set of edges and nodes of the graph adjacent to it. The influence on selecting different local

Belief propagation graph matching

Algorithm 1 shows the pseudo-code of our error-tolerant graph-matching algorithm, which we have called Belief Propagation Graph Matching. The input of the algorithm is the same as any error-tolerant graph-matching algorithm that computes the Graph Edit Distance in addition to the initial Seeds. In other words, the input is composed of a pair of graphs, the edit cost functions and a set of Seeds. The output is the deduced node-to-node mapping between both graphs. The main assumption of the

Experimental validation

In the first part of this section we validate and analyse our algorithm using synthetic graphs whereas in the second part we show a real application of it. We have used small graphs to compare our algorithm against other non-linear algorithms, but we have used large graph to show its runtime.

Conclusions and further work

We have presented, for the first time, an error-tolerant graph-matching algorithm that has a linear computational cost and a linear space with respect to the nodes. Specifically, the computational cost is O(d3.5 · ), d being the number of output edges per node and n the order of the graphs. This algorithm is useful for computing the correspondence between huge graphs or social networks.

To achieve this low computational cost, it needs an initial node-to-node mapping to begin to spread the

Acknowledgements

This research is supported by the Spanish projects TIN2016-77836-C2-1-R and ColRobTransp MINECO DPI2016-78957-R AEI/FEDER EU; and also the European project AEROARMS, H2020-ICT-2014-1-644271.

References (21)

There are more references available in the full text version of this article.

Cited by (0)

View full text