Keywords

1 Introduction

Graph matching is a fundamental problem in computer science and closely relates to many computer vision problems including feature registration [13], shape matching [46], object recognition [7, 8], visual tracking [9], activity analysis [10], etc. Despite decades of research effort devoted to graph matching, it remains a challenging problem due to the non-convexity in the objective function and the constraints over the solutions. A typical way is to utilize relaxation to harness the solution searching. Popular algorithms include, but not limited to, three categories: spectral relaxation [1113], continuous optimization [1418] and probabilistic modeling [19, 20].

Among recently proposed graph matching algorithms, the ones utilizing the path-following strategy have exhibited state-of-the-art performances [1518]. These path following algorithms reformulate graph matching as a convex-concave relaxation procedure (CCRP) problem, which is solved by interpolating between two simpler approximate formulations, and they use the path following strategy to recast iteratively the bistochastic matrix solution in the discrete domain. The path following algorithms can be viewed as special cases of the numerical continuation method (NCM) [21], which computes approximate solutions of parameterized nonlinear equation systems. These algorithms succeed at regular points but may fail at singular points (details in Sect. 4). It therefore demands research attention on how to address this issue to improve matching performance.

Motivated by above discussion, we propose a novel path following strategy, branching path following (BPF), to improve path following graph matching algorithms. In particular, BPF extends the traditional path following strategy by branching new paths at singular points. It first discovers singular points on the original path by determining the Jacobian of the associated KKT system, and then branches a new path at each singular point using the pseudo-arclength continuation method [22, 23]. After searching along all branching paths, BPF chooses the best one in terms of the objective function as the final solution. Since the original path is always searched, BPF is guaranteed to achieve better or the same optimization solution, and thus the matching performance. Using the BPF strategy, we develop a new graph matching algorithm, named BPF-G, by applying BPF to the GNCCP (graduated nonconvexity and concavity procedure) algorithm [17]. Note that GNCCP is chosen since it is one of the latest path following algorithms, while BPF is by no means limited to working with GNCCP.

For a thorough evaluation, we test the proposed BPF-G algorithm on four popular benchmarks and a synthetic dataset. Experimental results show that, the proposed algorithm significantly improves the path following procedure and outperforms state-of-the-art graph matching algorithms in comparison.

In summary, our main contribution lies in the new path following strategy for graph matching, and the contribution is three-fold: (1) we discuss the pitfalls of path following algorithms at singular points, and propose an efficient singular point discovery method; (2) we design a novel branching path following strategy to bypass these pitfalls and thus improve matching performance; and (3) we develop a new graph matching algorithm by applying the proposed BPF strategy to the GNCCP algorithm, and demonstrate the effectiveness of the algorithm in a thorough evaluation.

2 Related Work

Graph matching has been investigated for decades and many algorithms have been invented, as summarized in [24, 25]. In general, graph matching has a combinatorial nature that makes the global optimum solution hardly available. As a result, approximate solutions are commonly applied to graph matching. In this section we review studies that relate the most to ours, and leave general graph matching research to the surveys mentioned above. Some sampled latest studies include [26] that uses discrete methods in the linear approximation framework, and [27] that adapts discrete tabu search for graph matching.

A popular way to approximate graph matching is based on spectral relaxation with notable work by Leordeanu and Hebert [11], who model graph matching with spectral relaxation and propose an eigen-analysis solution. Later, the work is extended by Cour et al. [12] by first encoding affine constraints into the spectral decomposition and then applying bistochastic normalization. Cho et al. [13] reformulate graph matching as a vertex selection problem and introduce an affinity-preserving random walk algorithm. From a different perspective, Zass and Shashua [19] present a probabilistic framework for (hyper-)graph matching. The two lines somewhat merge in Egozi et al. [20], where a probabilistic view of the spectral relaxation scheme is presented.

Being inherently a discrete optimization problem, graph matching is often relaxed to continuous domain and many important algorithms have been designed on top of the relaxation. For example, Gold and Rangarajan [14] propose the graduated assignment algorithm to iteratively solve a series of linear approximations of the cost function using Taylor expansion. Leordeanu and Hebert [28] develop an integer projection algorithm to optimize the objective function in the integer domain. The studies that related most to ours are the so-called path following one. In particular, Zaslavskiy et al. [15] reformulate graph matching as a convex-concave relaxation procedure (CCRP) problem and then solve it by interpolating between simpler relaxed formulations. More specifically, the path following algorithm proposed by them iteratively searches a solution by tracing a path of local minima of a series of functions that linearly interpolate between the two relaxations. Later, Zhou and Torre [16] apply the similar strategy, and factorize an affinity matrix into a Kronecker product of smaller matrices, each of them encodes the structure of the graphs and the affinities between vertices and between edges. Liu and Qiao [17] propose the graduated nonconvexity and concavity procedure (GNCCP) to equivalently realize CCRP on partial permutation matrix, and GNCCP provides a much simpler way for CCRP without explicitly involving the convex or concave relaxation. Wang and Ling [29] propose a novel search strategy with adaptive path estimation to improve the computational efficiency of the path following algorithms.

Our work falls in the group using path following algorithms, but focuses on improving the path following strategy itself, which is not fully explored in previous studies. For this, we propose a novel branching path following (BPF) strategy, which is shown to effectively boost the graph matching performance as demonstrated in thorough evaluation (Sect. 6).

3 Path Following for Graph Matching

3.1 Problem Formulation

An undirected graph of n vertices can be represented by \(\mathbb {G=(V,E)}\), where \(\mathbb {V}=\{v_1,\ldots ,v_n\}\) and \(\mathbb {E\subseteq V \times V}\) denote the vertex and edge sets, respectively. A graph is often conveniently represented by a symmetric adjacency matrix \(A\in \mathbb {R}^{n\times n}\), such that \(A_{i j}>0\) if and only if there is an edge between \(v_i\) and \(v_j\).

For graph matching, given two graphs \(\mathbb {G}^{(i)}=(\mathbb {V}^{(i)},\mathbb {E}^{(i)})\) of size \(n_i\), \(i=1,2\), the problem is to find a vertex correspondence \(X\in \{0,1\}^{n_1\times n_2}\) between \(\mathbb {G}^{(1)}\) and \(\mathbb {G}^{(2)}\) in favor of the following global consistency:

$$\begin{aligned} \mathcal {E}_1(X)=\sum _{i_1,i_2} c_{i_1 i_2} X_{i_1 i_2} + \sum _{i_1,i_2,j_1,j_2} d_{i_1 j_1 i_2 j_2} X_{i_1 i_2} X_{j_1 j_2}, \end{aligned}$$
(1)

where \(c_{i_1i_2}\) measures the consistency between the \(i_1\)-th vertex in \(\mathbb {G}^{(1)}\) and the \(i_2\)-th vertex in \(\mathbb {G}^{(2)}\), and \(d_{i_1j_1 i_2j_2}\) the consistency between edge \((i_1,j_1)\) in \(\mathbb {G}^{(1)}\) and edge \((i_2,j_2)\) in \(\mathbb {G}^{(2)}\). The correspondence matrix X denotes matching result, i.e., \(X_{i_1 i_2}=1\) if and only if \(v_{i_1}\in \mathbb {V}^{(1)}\) corresponds to \(v_{i_2}\in \mathbb {V}^{(2)}\). In practice, the matching is often restricted to be one-to-one, which requires \(X \mathbf 1 _{n_2}\le \mathbf 1 _{n_1}\) and \(X^\top \mathbf 1 _{n_1}\le \mathbf 1 _{n_2}\), where \(\mathbf 1 _n\) denotes a vector of n ones.

Let \(A^{(i)}\) be the adjacency matrix for \(\mathbb {G}^{(i)}\), \(i=1,2\), a more commonly used formulation for graph matching is defined as

$$\begin{aligned} \mathcal {E}_2(X)=\mathrm {tr}(C^\top X) + \alpha {\Vert A^{(1)} - X A^{(2)} X^\top \Vert }_F^2, \end{aligned}$$
(2)

where \(C=(c_{i_1i_2}) \in \mathbb {R}^{n_1\times n_2}\) is the vertex consistency matrix, \(\alpha \ge 0\) the weight balancing between the vertex and edge comparisons, and \({\Vert \cdot \Vert }_F\) the Frobenius norm.

A more general formulation of Eq. (1) is formulated in a pairwise compatibility form

$$\begin{aligned} \mathcal {E}_3(\mathbf x )=\mathbf x ^\top K \mathbf x , \end{aligned}$$
(3)

where \(\mathbf x \doteq \mathrm {vec}(X) \in \{0,1\}^{n_1 n_2}\) is the vectorized version of matrix X and \(K\in \mathbb {R}^{n_1 n_2\times n_1 n_2}\) is the corresponding affinity matrix defined as:

$$\begin{aligned} K_{\mathrm {ind}(i_1,i_2) \mathrm {ind}(j_1,j_2)} = \left\{ \begin{array}{ll} c_{i_1 i_2} &{} \text {if} \, i_1=j_1 \, \text {and} \, i_2=j_2,\\ d_{i_1 j_1 i_2 j_2}~ &{} \text {if} \, A^{(1)}_{i_1 j_1} \, A^{(2)}_{i_2 j_2} > 0,\\ 0 &{} \text {otherwise.} \end{array} \right. \end{aligned}$$
(4)

while \(\mathrm {ind}(\cdot ,\cdot )\) is a bijection mapping a vertex correspondence to an integer index.

In this paper, we mainly discuss and test graph matching algorithms for Eq. (3) since it encodes not only the difference of edge weight but also many complex graph compatibility functions.

3.2 The Path Following Algorithm

In [15], Zaslavskiy et al. introduce the convex-concave relaxation procedure (CCRP) into the graph matching problem by reformulating it as interpolation between two relaxed and simpler formulations. The first relaxation is obtained by expanding the convex quadratic function \(\mathcal {E}_2(X)\) from the set of permutation matrices \(\mathcal {P}\) to the set of doubly stochastic matrices \(\mathcal {D}\). The second relaxation is a concave function

$$\begin{aligned} \min _{X\in \mathcal {D}}{\mathcal {E}_4(X)} = -\mathrm {tr}(\varDelta X)-2\mathrm {vec}(X)^\top \big ((L^{(1)})^\top \otimes (L^{(2)})^\top \big ) \mathrm {vec}(X), \end{aligned}$$
(5)

where \(\varDelta \) is a matrix with element \(\varDelta _{i j}=\big (D^{(1)}_{i i}-D^{(2)}_{j j}\big )^2\); \(D^{(i)}\) and \(L^{(i)}\) represent respectively the diagonal degree matrix and the Laplacian matrix of an adjacency matrix \(A^{(i)}\), \(i=1,2\); and \(\otimes \) denotes the Kronecker product. A key property is that the optimum solution of \(\mathcal {E}_4(X)\) over \(\mathcal {P}\) is the solution of the original graph matching problem.

The path following strategy proposed in [15] can be interpreted as an iterative procedure that smoothly projects an initial solution of \(\mathcal {E}_2\) in the continuous space \(\mathcal {D}\) to the discrete space \(\mathcal {P}\) by tracking a path of local minima of a series of functionals \(\mathcal {E}_\lambda \) over \(\mathcal {D}\)

$$\begin{aligned} \mathcal {E}_\lambda = (1-\lambda )\mathcal {E}_2 + \lambda \mathcal {E}_4, \end{aligned}$$
(6)

for \(0\le \lambda \le 1\). Each local minimum of \(\mathcal {E}_{\lambda + d_{\lambda }}\) is gained by the Frank-Wolfe algorithm [30] given the local minimum of \(\mathcal {E}_{\lambda }\) as the start point. Increasing \(\lambda \) from 0 to 1, this approach searches toward a local minimum of \(\mathcal {E}_4\) from the unique local minimum of \(\mathcal {E}_2\), and takes it as the final solution. For more details about the path following algorithm please see the literature [15].

Recently, Liu and Qiao [17] proposed the graduated nonconvexity and concavity procedure (GNCCP) to equivalently realize CCRP on partial permutation matrix without explicitly involving the convex or concave relaxation. As the latest work following the path following strategy, GNCCP provides a general optimization framework for the combinatorial optimization problems defined on the set of partial permutation matrices. In Sect. 5.3, we improve GNCCP by integrating the proposed branching path following strategy.

4 Numerical Continuation Method Interpretation

In this section, we interpret the path following algorithms in a numerical continuation view, and then discuss their pitfalls due singular points. The discussion will guide the subsequent extension on these algorithms.

4.1 KKT System

According to the path following strategy described above and by converting \(\mathcal {D}\) into constraints, we need to solve a series of optimization problems with equality and inequality constraints parameterized by \(\lambda \):

$$\begin{aligned} \begin{aligned} \mathbf x ^*=\mathrm {arg}\max _\mathbf x \mathcal {E}_{\lambda }(\mathbf x ),\\ \mathrm {s.t.~} \left\{ \begin{array} {l} B\mathbf x = \mathbf 1 _{2n}, \\ \mathbf x \ge \mathbf 0 _{n^2}. \end{array} \right. \end{aligned} \end{aligned}$$
(7)

where \(B\mathbf x = \mathbf 1 _{2n}\) encodes the one-to-one matching constraints (\(B\in \mathbb {R}^{2n\times n^2}\)).

Using Lagrange multipliers \(\alpha _i\) and KKT multipliers \(\mu _i\), the above constrained problem can be converted to the following unconstrained one

$$\begin{aligned} \mathbf x ^*=\mathrm {arg}\max _\mathbf{x } \Big (\mathcal {E}_{\lambda }(\mathbf x ) + \sum _{i=1}^{2n} {\alpha _i h_i(\mathbf x )} - \sum _{i=1}^{n^2} {\mu _i g_i(\mathbf x )} \Big ), \end{aligned}$$
(8)
$$\begin{aligned} \mathrm {where~~} \left\{ \begin{array} {l} h_i(\mathbf x )=B(i,:)\mathbf x - 1, \\ g_i(\mathbf x )=-\mathbf x _i.\\ \end{array} \right. \end{aligned}$$
(9)

This results in the following system of Karush-Kuhn-Tucker (KKT) equations [31]:

$$\begin{aligned} \left\{ \begin{array}{lll} \nabla \mathcal {E}_{\lambda }(\mathbf x )+ B^\top \alpha + \mu &{}=&{} \mathbf 0 _{n^2}, \\ \quad \quad \quad \quad \quad h_i(\mathbf x ) &{}=&{} 0, \quad 1\le i \le 2n, \\ \quad \quad \quad \quad \quad \mu _{i} g_{i}(\mathbf x ) &{}=&{} 0, \quad 1\le i \le n^2, \\ \quad \quad \quad \quad \quad \mu _{i} &{}\ge &{} 0, \quad 1\le i \le n^2. \\ \end{array} \right. \end{aligned}$$
(10)

In the next subsection we formulate the KKT system as a constrained nonlinear system \(F(\lambda , \mathbf x , \alpha , \mu )=\mathbf 0 , \mathrm {s.t.~} \mu \ge \mathbf 0 \).

4.2 Numerical Continuation Method

The existing path following algorithms (PATH [15], FGM [16], GNCCP [17]) can be viewed as special cases of the numerical continuation method (NCM) [21]. In general, NCM computes approximate solutions of parameterized nonlinear equation systems, and it estimates curves given in the following implicit form:

$$\begin{aligned} F(\lambda , u) = \mathbf 0 _m,\text {where { F} is a mapping: }\mathbb {R}^{m+1}\rightarrow \mathbb {R}^m. \end{aligned}$$
(11)

This method works as well in the presence of constraints on any or all of the variables [32, 33]. In particular, for graph matching, we have \(u=[\mathbf x ^\top , \alpha ^\top , \mu ^\top ]^\top \) and \(m = 2n^2+2n\).

Most solutions of nonlinear equation systems are iterative methods. For a particular parameter value \(\lambda _0\), a mapping is repeatedly applied to an initial guess \(u_0\). In fact, the existing PATH, FGM and GNCCP algorithms correspond to a particular implementation of the so-called generic predictor corrector (GPC) approach [21]. The solution at a specific \(\lambda \) is used as the initial guess for the solution at \(\lambda +\varDelta \lambda \). With \(\varDelta \lambda \) sufficiently small the iteration applied to the initial guess converges [21].

4.3 Pitfalls at Singular Points

A solution component \(\varGamma (\lambda _0,u_0)\) of the nonlinear system F is a set of points \((\lambda ,u)\) such that \(F(\lambda ,u)=0\) and these points are connected to the initial solution \(( \lambda _0,u_0)\) by a path of solutions. A regular point of F is a point \((\lambda , u)\) at which the Jacobian of F is of full rank, while a singular point of F is a point \((\lambda , u)\) at which the Jacobian of F is rank deficient. As discussed in [23], near a regular point the solution component is an isolated curve passing through the regular point. By contrast, for a singular point, there may be multiple curves passing through it. The local structure of a point in \(\varGamma \) is determined by high-order derivatives of F.

An advantage of the GPC approach is that it uses the solution for the original problem as a black box where all that required is an initial solution. However, this approach may fail at singular points, where the branch of solutions turns around [23]. In general, solution components \(\varGamma \) of a nonlinear system are branching curves where the branching points are singular [34]. Therefore, for problems with singular points, more sophisticated handling is desired.

5 Branching Path Following

In this section, we propose the branching path following (BPF) strategy that branches new curves at singular points toward potentially better matching results. BPF contains two main steps: singular point discovery and branch switching, as described in the following subsections. In the last subsection, we apply BPF to GNCCP to develop a new graph matching algorithm.

5.1 Singular Points Discovery

The first step in BPF is to discover singular points. Theoretically, these points can be detected by checking whether the Jacobian of F is of full rank. However, it is impractical to check discrete samples by sampling \(\lambda _i\), since these samples are rarely able to cover the exact singular points.

Denote \(J_{\lambda }\) the Jacobian of F parameterized by \(\lambda \). A singular point \((\lambda , u)\) should have \(|J_{\lambda }|=0\). A reasonable assumption is that the curve formed by \((\lambda , |J_{\lambda }|)\) over \(\lambda \) is continuous. This implies there is at least one singular point between two points \((\lambda _1, u_1)\) and \((\lambda _2, u_2)\) if \(|J_{\lambda _1}|\) and \(|J_{\lambda _2}|\) have different signs.

Thus inspired, we design a simple yet effective way for singular point discovery by checking the signs of determinants of Jacobian on consecutive sampled points. Specifically, denote \((\lambda _t, u_t)\) the point at iteration t in the path, we mark \((\lambda _t, u_t)\) as a singular point if \(|J_{\lambda _t}||J_{\lambda _{t+1}}|\le 0\).

It is computationally expensive to decide determinants of large Jacobian matrices. Since we are only interested in the signs of these Jacobian matrices, we develop an efficient solution that first decompose the Jacobian matrices using the LU decomposition and then accumulate the signs of the diagonal elements of decomposed matrices.

5.2 Branch Switching

Finding the solution curves passing a singular point is called branch switching. Once a singular point \((\lambda _t, u_t)\) is discovered, we branch a new curve using the pseudo-arclength continuation (PAC) algorithm [22, 23].

PAC is based on the observation that an ideal parameterization of a curve is through arclength s. With the parameterization, we extend equations in (10) to the following form

$$\begin{aligned} G(\lambda ,u)=\left\{ \begin{array}{l} F(\lambda ,u) \\ N(\lambda ,u,s) \end{array}\right\} = \left\{ \begin{array}{l} 0\\ 0 \end{array} \right\} , \end{aligned}$$
(12)

where the normalization equation \(N(.) = 0\) approximates the statement that s represents arclength. Denote \((\dot{\lambda },\dot{u})\) the tangent vector at point \((\lambda _t, u_t)\), we have

$$\begin{aligned} N(\lambda ,u,s)=\dot{u}^\top (u-u_t) + \dot{\lambda }^\top (\lambda -\lambda _t) - s = 0. \end{aligned}$$
(13)

According to the implicit function theorem [35], the tangent vector \((\dot{\lambda },\dot{u})\) can be computed as

$$\begin{aligned} (\dot{\lambda },\dot{u}) = \big (\varDelta \lambda , \varDelta \lambda \frac{\partial u}{\partial \lambda } (\lambda _t, u_t)\big ) =\big (\varDelta \lambda , \varDelta \lambda (F_u(\lambda _t, u_t))^{-1} F_{\lambda }(\lambda _t, u_t)\big ). \end{aligned}$$
(14)

However, it is inapplicable to the bifurcation point and is computationally expensive. We therefore propose approximation of the tangent vector using previous points for computational efficiency. The tangent vector at a previous iteration i \((i<t)\) is approximated as \((\varDelta \lambda ,u_{i+1}-u_i)\), and the tangent vector \((\dot{\lambda },\dot{u})\) at t be estimated as

$$\begin{aligned} (\dot{\lambda },\dot{u})=\Big (\varDelta \lambda ,\frac{\sum _{i=1}^{k} {(k-i+1)(u_{t-i+1}-u_{t-i})}}{\sum _{i=1}^{k} {i}} \Big ), \end{aligned}$$
(15)

where k controls the size of the window used for approximation. The motivation behind this approximation is the smoothness of the path, which implies the similarity between tangent vectors of consecutive iterations.

The Jacobian of the pseudo-arclength system is the bordered matrix \( \Big [ \begin{array}{ll} F_u &{} F_{\lambda } \\ \dot{u} &{} \dot{\lambda } \end{array} \Big ]\).

Appending the tangent vector as the last row can be seen as determining the coefficient of the null vector in the general solution of the Newton system [36] (particular solution plus an arbitrary multiple of the null vector).

Finally, we solve Eq. (12) using the trust-region-reflective algorithm [37], and then branch a new curve using the solution \((\lambda ^*,u^*)\) as the initial solution.

5.3 Applying BPF to GNCCP

Now we apply the BPF strategy to GNCCP [17] to develop our new graph matching algorithm named BPF-G.

Note that, in both steps of singular point discovery and branch switching, we need to compute the Jacobian of F in advance, which includes a parameter-dependent sub-matrix, the Jacobian of \(\nabla \mathcal {E}_{\lambda }(\mathbf x )\) (denoted as \(J(\lambda , \mathbf x )\)). In GNCCP

$$\begin{aligned} \mathcal {E}_\lambda (\mathbf x )=\left\{ \begin{array}{ll} (1-\lambda )\mathbf x ^\top K\mathbf x +\lambda \mathrm {tr}(\mathbf x ^\top \mathbf x ), &{} \text {if} \, \, \lambda \ge 0,\\ (1+\lambda )\mathbf x ^\top K\mathbf x +\lambda \mathrm {tr}(\mathbf x ^\top \mathbf x ), &{} \text {if} \, \, \lambda < 0. \end{array} \right. \end{aligned}$$
(16)

Applying BPF here, we have

$$\begin{aligned} J(\lambda ,\mathbf x )=\left\{ \begin{array}{ll} (1-\lambda )(K^T+K)+2\lambda \text{ I }, &{} \text {if} \, \, \lambda \ge 0,\\ (1+\lambda )(K^T+K)+2\lambda \text{ I }, &{} \text {if} \, \, \lambda < 0. \end{array} \right. \end{aligned}$$
(17)

The pseudo-code of BPF-G is shown in Algorithm 1. Note that, since the solution of the original algorithm is always in set T, the new algorithm is guaranteed to achieve the same or better solution in terms of objectives.

The computational complexity of GNCCP is \(O(n^3)\) [17], where n is the vertex number of the graph. Thus, the computational complexity of our algorithm is \(O(kn^3)\), where k is the number of explored additional branches. As a result, the complexity of the proposed algorithm roughly equals to \(O(n^3)\) because k is a small bounded integer.

figure a

6 Experiments

We compare the proposed BPF-G algorithm with four state-of-the-art graph matching algorithms including GNCCP [17], IPFP [28], RRWM [13] and PSM [20], and report experimental results on a synthetic dataset and four benchmark datasets. Two indicators, matching accuracy and objective ratio, are used to evaluate algorithms. Specifically, denote \(f_i\) the objective achieved by the i-th algorithm \(g_i\), the objective ratio \(r_i\) of \(g_i\) is computed as \(r_i = {f_i} / \max _{k}{f_k}\).

6.1 Synthetic Dataset

We first perform a comparative evaluation of the algorithms on graphs randomly synthesized following the experimental protocol in [13]. For each trial, we construct two graphs, \(\mathbb {G}^{(1)}\) and \(\mathbb {G}^{(2)}\), with 20 inlier nodes and later add \(n_{out}\) outlier nodes to both graphs. The edges between nodes are randomly generated with respect to an edge density parameter \(\rho \). Each edge (ij) in the first graph \(\mathbb {G}^{(1)}\) is assigned with a random edge weight \(A^{(1)}_{i j}\) distributed uniformly in [0,1], and \(A^{(2)}_{a b}=A^{(1)}_{i j}+\epsilon \) the edge weight of the corresponding edge (ab) in \(\mathbb {G}^{(2)}\) is perturbed by adding a random Gaussian noise \(\epsilon \sim \mathcal {N}(0,\sigma ^2)\). The edge affinity is computed as \(K_{\mathrm {ind}(i,a)\mathrm {ind}(j,b)}=\exp \big (-(A^{(1)}_{i j}-A^{(2)}_{a b})^2/0.15\big )\) and the node affinity is set to zero.

Fig. 1.
figure 1

Evaluation on the synthetic dataset by different settings.

We compare the performance of the algorithms under three different settings by varying the number of outliers \(n_{out}\), edge density \(\rho \) and edge noise \(\sigma \), respectively. For each setting, we construct 100 different pairs of graphs and evaluate the average matching accuracy and objective ratio. In the first setting (Fig. 1(a)), we fix edge density \(\rho =0.5\) and edge noise \(\sigma =0\), and increase the number of outliers \(n_{out}\) from 0 to 10. In the second setting (Fig. 1(b)), we change the edge noise parameter \(\sigma \) from 0 to 0.2 while fixing \(n_{out}=0\) and \(\rho =0.5\). In the last case (Fig. 1(c)), the edge density \(\rho \) ranges from 0.3 to 1, and the other two parameters are fixed as \(n_{out}=0\) and \(\sigma =0.1\).

It can be observed that in almost all of cases under varying parameters, our approach achieves the best performance in terms of both objective ratio and matching accuracy. From Fig. 1(c), the PSM, RRWM and GNCCP algorithms are comparable to our approach when the graphs are close to full connections (the density parameter \(\rho \) near to 1). All algorithms fail to achieve reasonable solutions when graph pairs present extreme deformation or sparsity. The comparison on the running time is also provided in Fig. 1. Our approach spends several times of running time comparing to the GNCCP algorithm of which the multiple depends on the number of explored additional branches.

6.2 CMU House Dataset

The CMU house dataset includes 111 frames of image sequences, where all sequences contain the same house object with transformation cross sequence gaps. In order to assess the matching accuracy, following [2, 38], 30 landmarks were manually tracked and labeled across all frames. We matched all possible image pairs, in total 560 pairs gapped by 10, 20, ..., 100 frames, where increasing sampling gaps implies the increase of deformation degree. To evaluate graph matching algorithms against noise, we use two different settings of nodes \((n_1,n_2)\) =(30,30) and (20, 30), where decreasing \(n_1\) implies the increase of outlier. In the setting where \(n_1<30\), \(n_1\) points are randomly chosen out of the 30 landmark points.

Fig. 2.
figure 2

An example of graph matching on the CMU house dataset. The algorithm, the number of true matches per ground truths for each subfigure are captioned. Graph edges are represented by yellow lines, true matches by green lines and false matches by red lines (best viewed in color, and the same style is also used for Figs. 56 and 7). (Color figure online)

Fig. 3.
figure 3

Comparison of graph matching on the CMU house dataset.

Fig. 4.
figure 4

Evaluation on (a) the Pascal dataset, and (b) the Willow dataset.

We model each landmark as a graph node, and then build graph edges by Delaunay triangulation [39]. Each edge (ij) is associated with a weight \(A_{ij}\) which is computed as the Euclidean distance between the connected nodes \(v_i\) and \(v_j\). The node affinity is set to zero, and the edge affinity between edges (ij) in \(\mathbb {G}^{(1)}\) and (ab) in \(\mathbb {G}^{(2)}\) is computed as \(K_{\mathrm {ind}(i,a)\mathrm {ind}(j,b)}=\mathrm {exp}(-(A^{(1)}_{i j}-A^{(2)}_{a b})^2/2500)\).

Figure 2 presents an example for graph matching with 10 outliers and significant deformation. Figure 3 shows the performance curves for \(n_1\) = 30 and 20 with respect to variant sequence gaps. All algorithms except IPFP achieve perfect matching when no outliers existing (\(n_1 = 30\)). When we increase the number of outliers to 10 (\(n_1 = 20\)), our approach gains remarkable improvement in both accuracy and objective compared with the original GNCCP algorithm. It is interesting to see that PSM gains comparable matching accuracy to our approach with certain sequence gaps but achieves lower objectives.

6.3 Pascal Dataset

The Pascal dataset [40] consists of 30 pairs of car images and 20 pairs of motorbike images selected from Pascal 2007. The authors provide detected feature points and manually labeled ground-truth correspondences for each pair of images. To evaluate the performance of each algorithm against noise, we randomly select \(0\sim 20\) outlier nodes from the background.

For each node \(v_i\), we associate it with a feature \(p_i\) which is computed as its orientation of the normal vector at that point to the contour where the point was sampled. The node affinity between nodes \(v_i\) and \(v_j\) is consequently computed as \(\mathrm {exp}(-|p_i-p_j|)\). We use Delaunay triangulation to build graph edges, and associate each edge (ij) with two features \(d_{ij}\) and \(\theta _{ij}\), where \(d_{ij}\) is the pairwise distance between the connected nodes \(v_i\) and \(v_j\), and \(\theta _{ij}\) is the absolute angle between the edge and the horizontal line. Consequently, the edge affinity between edges (ij) in \(\mathbb {G}^{(1)}\) and (ab) in \(\mathbb {G}^{(2)}\) is computed as \(K_{\mathrm {ind}(i,a)\mathrm {ind}(j,b)}=\mathrm {exp}(-(|d_{ij}-d_{ab}|+|\theta _{ij}-\theta _{ab}|)/2)\).

Figure 5 presents an example for graph matching of motorbike images (with 10 outliers). The matching accuracy and objective ratio of each algorithm with respect to the outlier number was summarized in Fig. 4(a). It can be observed that our approach outperforms other algorithms remarkably in both matching accuracy and objective ratio. In the case when no outliers exist, our method achieves near 90 % matching rate which is much higher than other ones.

Fig. 5.
figure 5

A matching example of motorbike images in the Pascal dataset.

6.4 Willow Object Dataset

In this experiment, we create 500 pairs of images using Willow object class dataset [41]. This dataset provides images of five classes, namely car, duck, face, motorbike and winebottle. Each class contains at least 40 images with different instances and 10 distinctive landmarks were manually labeled on the target object across all images in each class. We randomly select 100 pairs of images from each class respectively.

We use Hessian detector [42] to extract interesting points and SIFT descriptor [43] to represent the node attributes. To test the performance against noise, we randomly select \(0\sim 10\) outlier nodes from the background. We utilize the Delaunay triangulation to connect nodes and compute the affinity between nodes via their appearance similarity. Edge affinity is computed following the method used in Sect. 6.3.

Figure 6 shows a representative example for graph matching selected from the duck class (with 10 outliers). The comparison on matching accuracy and objective ratio of each algorithm is summarized in Fig. 4(b). Our approach achieves remarkable improvement compared with GNCCP and outperforms other algorithms.

Fig. 6.
figure 6

A matching example of duck images on the Willow dataset.

6.5 Caltech Image Dataset

The Caltech image dataset provided by Cho et al. [13] contains 30 pairs of real images. The authors provide detected MSER keypoints [44], initial matches, affinity matrix, and manually labeled ground-truth correspondences for each image pair. In [13], the low-quality candidate matches are filtered out according to the distance between SIFT features [43]. The affinity matrix is consequently computed by the mutual projection error function [45].

Figure 7 shows a representative example for graph matching with significant deformation and plenty of repeated patterns. The matching accuracy and objective ratio of each algorithm was summarized in Table 1. Our approach gains remarkable improvement compared with the original GNCCP algorithm in both matching accuracy and objective ratio. It is interesting to see that RRWM performs better in terms of accuracy, whereas our approach obtains much higher objectives.

Fig. 7.
figure 7

A matching example on the Caltech dataset.

Table 1. Comparison of graph matching on the Caltech dataset (The top 1 result is indicated in and top 2 in ).

7 Conclusion

In this paper, we proposed a novel branching path following strategy for graph matching aiming to improve the matching performance. To avoid the pitfalls at singular points in the original path following strategy, our new strategy first discovers singular points and subsequently branches new paths from them seeking for potentially better solutions. We integrated the strategy into a state-of-the-art graph matching algorithm that utilizes the original path following strategy. Experimental results reveal that, our approach gains remarkable improvement on matching performance compared to the original algorithm, and also outperforms other state-of-the-art algorithms.