Elsevier

Pattern Recognition Letters

Volume 100, 1 December 2017, Pages 96-103
Pattern Recognition Letters

Graph edit distance contest: Results and future challenges

https://doi.org/10.1016/j.patrec.2017.10.007Get rights and content

Highlights

  • The context of Graph Edit Distance Contest (GDC), organized during ICPR2016, is presented.

  • Eight methods from three research groups are evaluated.

  • The evaluation Metrics, methods and datasets of GDC are described in detail.

  • A crystal clear picture of the accuracy and speed of each method is provided.

  • Future challenges and possible tracks in graph edit distance are highlighted.

Abstract

Graph Distance Contest (GDC) was organized in the context of ICPR 2016. Its main challenge was to inspect and report performances and effectiveness of exact and approximate graph edit distance methods by comparison with a ground truth. This paper presents the context of this competition, the metrics and datasets used for evaluation, and the results obtained by the eight submitted methods. Results are analyzed and discussed in terms of computation time and accuracy. We also highlight the future challenges in graph edit distance regarding both future methods and evaluation metrics. The contest was supported by the Technical Committee on Graph-Based Representations in Pattern Recognition (TC-15) of the International Association of Pattern Recognition (IAPR).

Introduction

Computing a similarity or a dissimilarity measure between graphs is a major challenge in pattern recognition. One of the most well-known and used approaches to compute a distance between two graphs is the Graph Edit Distance (GED). Computing the GED consists in finding a sequence of graph edit operations (insertions, deletions and substitutions of vertices and edges) which transforms a graph into another with a minimal cost. However, computing the GED is NP-hard. Therefore, in the last four decades, several approaches were proposed to compute approximations in polynomial time [22].

This paper reports the results of the Graph Distance Contest (GDC) which was organized in the context of ICPR 2016. The aim of the contest was to inspect performance and effectiveness of recent methods which compute an exact or an approximate GED. The quality of the output distances as well as the execution times of the methods were used as keys for the inspection. Seven datasets were integrated, each of them being composed of several types of graphs with symbolic or numerical attributes attached to vertices and edges.

GDC was open to any method which computes a sequence of edit operations transforming a graph into another one. All the participants were required to download the datasets to prepare the submission of their programs. All the programs were executed by the organizers on the same computer. Two constraints were put in the contest. First, each submitted method could not exceed 30 s per graph comparison. Second, concerning parallel methods, the number of threads was limited to 4.

This paper is organized as follows: Section 2 describes the methods submitted to GDC. Then, Section 3 specifies the protocol and the datasets used for this contest. Obtained results are presented and discussed in Section 4. Note that a complementary and exhaustive presentation of the results is provided on GDC website http://gdc2016.greyc.fr. Last but not least, Section 5 highlights the bottlenecks of both tested methods and GED performance evaluation metrics, and proposes some possible tracks to go beyond these bottlenecks.

Section snippets

Inspected methods

Eight methods proposed by three different research groups were submitted. The beam search algorithm of Neuhaus et al. [21] was also added to the list of inspected methods. All these methods can be globally divided into three categories.

Protocol and datasets

Our evaluation is conducted on 4 Quad-Core AMD Opteron processor 8350, cadenced at 2.0GHz together with 16GB memory. The maximum number of threads was limited to 4 (i.e., for F24threads and PDFS). The time constraint used in GDC was fixed to 30 s. That is, the methods that needed more than 30 s were stopped, and the best answer found so far was outputted. Note that F2, F24threads, DF and PDFS are exact algorithms without time constraints.

Results and discussion

In this section, we present and analyze the results obtained by the submitted methods on all datasets. Table 3 summarizes the GED methods that were included in GDC.

For the sake of clarity, we synthesize our different conclusions via figures. For exhaustive and numerical results, we refer the interested reader to the contest website: http://gdc2016.greyc.fr.

Challenges in graph edit distance

In this section, we present and discuss perspectives of evaluated methods and GED challenges upcoming in the near future. Regarding the 3 inspected GED categories, the branch-and-bound based algorithms are highly dependent on the lower bound. These algorithms could be improved by proposing other promising lower bounds with the help of machine learning techniques. On the other hand, currently, the implementation of the assignment-based algorithms cannot handle graphs with numeric attributes.

The

References (29)

  • V. Carletti et al.

    Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance.

    Graph-Based Representations in Pattern Recognition

    (2015)
  • X. Cortés et al.

    Active-learning query strategies applied to select a graph node given a graph labelling

    IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition

    (2013)
  • X. Cortés et al.

    Learning graph matching substitution weights based on the ground truth node correspondence

    Int. J. Pattern Recognit. Artif. Intell.

    (2016)
  • B. Gaüzère et al.

    Approximate graph edit distance guided by bipartite matching of bags of walks

    Structural, Syntactic, and Statistical Pattern Recognition

    (2014)
  • Cited by (34)

    • Two parallel versions of VF3: Performance analysis on a wide database of graphs

      2021, Pattern Recognition Letters
      Citation Excerpt :

      However, on the one hand the need to process larger and larger graphs, requiring days or weeks to be explored also by state-of-the-art algorithms like VF3, on the other the pervasivity of multi-processor, multi-core architectures, make desirable to have parallel implementations that efficiently exploit the available computing power. A related graph matching problem for which several parallel solutions have been recently proposed is the computation of the Graph Edit Distance [1,2,8,36]; in that case, there are several approximated versions of the problem that are expressed in terms of matrix operations, and so lend themselves better to the exploitation of data parallelism. Unfortunately this is not the case for subgraph isomorphism, at least if an exact solution is desired, as it happens in several of its applications.

    • Bridging Distinct Spaces in Graph-Based Machine Learning

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Experimental Comparison of Graph Edit Distance Computation Methods

      2023, Proceedings - IEEE International Conference on Mobile Data Management
    View all citing articles on Scopus
    View full text