Elsevier

Pattern Recognition

Volume 46, Issue 2, February 2013, Pages 551-565
Pattern Recognition

Fuzzy multilevel graph embedding

https://doi.org/10.1016/j.patcog.2012.07.029Get rights and content

Abstract

Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding bridges the gap between structural and statistical pattern recognition. We extract the topological, structural and attribute information from a graph and encode numeric details by fuzzy histograms and symbolic details by crisp histograms. The histograms are concatenated to achieve a simple and straightforward embedding of graph into a low-dimensional numeric feature vector. Experimentation on standard public graph datasets shows that our method outperforms the state-of-the-art methods of graph embedding for richly attributed graphs.

Highlights

► We propose an explicit graph embedding method. ► We perform multilevel analysis of graph to extract global, topological/structural and attribute information. ► We use homogeneity of subgraphs in graph for extracting topological/structural details. ► We encode numeric information by fuzzy histograms and symbolic information by crisp histograms. ► Our method outperforms graph embedding methods for richly attributed graphs.

Introduction

Pattern recognition has emerged as an important research domain and has supported the development of numerous applications in many different areas of activity. For a general introduction to former we refer the interested reader to [1], [2]. The methods for pattern recognition are broadly categorized as statistical, structural or syntactic approaches [3]. In this paper we address the problem of lack of computational tools for structural pattern recognition and propose to exploit the computational efficiency of statistical pattern recognition. This permits a pattern recognition application to benefit from representational power of structural methods and computational efficiency of statistical methods, while avoiding the limitations of both. The next two paragraphs briefly introduce the main advantages and limitations of structural and statistical pattern recognition.

Structural pattern recognition is characterized by the use of symbolic data structures i.e. graphs, strings and trees. Graphs are widely used in structural pattern recognition and can safely be termed as representative of symbolic data structures (strings and trees are special instances of graphs [4]). Graphs provide a convenient and powerful representation of relational information. They are able to represent not only the values of both symbolic and numeric properties of an object, but can also explicitly model the spatial, temporal and conceptual relations that exist between its parts. Moreover, graphs do not suffer from the constraint of fixed dimensionality. For example, the number of nodes and edges in a graph is not limited a priori and depends on the size and the complexity of the actual object to be modeled [5]. And above all, graphs have foundations in strong mathematical formulation and have a mature theory at their basis. However two serious drawbacks of graph based representations are that these representations are sensitive to noise and that the algorithmic tools for performing different operations on them are computational expensive. For instance the much needed operations of graph matching and graph isomorphism are NP-complete. For further reading on structural pattern recognition we refer the interested reader to [4], [5], [6], [7], [8].

Statistical pattern recognition is characterized by the use of numeric feature vectors. A very important advantage of these representations is that because of their simple structure, the basic operations that are used in machine learning can easily be executed on them. This makes a large number of mature algorithms for pattern analysis and classification immediately available to statistical pattern recognition. And, as a result of this fact, the statistical pattern recognition offers state-of-the art computational efficient tools of learning, classification and clustering. However, feature vector based representations have associated representational limitations, which arise from their simple structure and the fact that they have same length and structure regardless of the complexity of object to be modeled [9]. For further reading on statistical pattern recognition and classification we refer the interested reader to [10].

Graph embedding is a natural outcome of parallel advancements in structural and statistical pattern recognition. Over decades, the pattern recognition research community has developed a range of expressive and powerful approaches for diverse problem domains. Graph based structural representations are usually employed for extracting the structure, topology and geometry in addition to the statistical details of underlying data. These representations could not be exploited to their full strength during the next step in processing chain, because of limited availability of computational tools. On the other hand,the efficient and mature computational models offered by statistical approaches work only on vector data and cannot be directly applied to high-dimensional structural representations. Graph embedding acts as a bridge between structural and statistical approaches [3], [6], [11], and allows a pattern recognition method to benefit from computational efficiency of state-of-the-art statistical models and tools [12] along-with the convenience and representational power of classical symbolic representations [7]. This makes it possible to address the problems of learning, classification and clustering for graphs, which are among the most basic tasks in pattern recognition [13]. Graph embedding has its application to the whole variety of domains which are entertained by pattern recognition and where the use of a relational data structure is mandatory for performing high level tasks. Graph embedding methods are also employed to solve the computationally hard problems geometrically [14]. For further reading on graph embedding we refer the interested reader to [15], [16].

The graph embedding methods are formally categorized as implicit graph embedding or explicit graph embedding. The implicit graph embedding methods are based on graph kernels. A graph kernel is a function that can be thought of as a dot product in some implicitly existing vector space. Instead of mapping graphs from graph space to vector space and then computing their dot product, the value of the kernel function is evaluated in graph space. Since it does not explicitly map a graph to a point in vector space, a strict limitation of implicit graph embedding is that it does not permit all the operations that could be defined on vector spaces. For further reading on graph kernels and implicit graph embedding we refer the interested reader to [16], [17]. The more useful, explicit graph embedding methods explicitly embed an input graph into a feature vector and enable the use of all the methodologies and techniques devised for vector spaces. The vectors obtained by an explicit graph embedding method can also be employed in a standard dot product for defining an implicit graph embedding function between two graphs [4]. An interesting property of explicit graph embedding is that it embeds graphs in pattern spaces in a manner that similar structures come close to each other and different structures go far away i.e. an implicit clustering is achieved [18]. Another important property of explicit graph embedding is that the graphs of different size and order are embedded into a fixed size feature vector. This means that for constructing the feature vector, an important step is to mark the important details that are available in all the graphs and are applicable to a broad range of graph types. For further reading on explicit graph embedding we refer the interested reader to [16].

Graph embedding is an interesting approximate solution for addressing the problem of in-exact graph matching, which belongs to the class of NP-complete problems. By mapping a high dimensional graph into a point in suitable vector space, graph embedding permits to perform the basic mathematical computations which are required by various statistical pattern recognition techniques, and offers interesting solutions to the problems of graph clustering and classification. However, in our opinion because of the strict limitation of the resulting feature vector of not being capable of preserving the matching between nodes of graphs, graph embedding always lacks the capabilities to address the problem of graph isomorphism (i.e. exact graph matching).

In this paper we present an unsupervised method for explicit embedding of directed and undirected attributed graphs with many numeric as well as symbolic attributes on both nodes and edges (which represent a very general super class of graphs), into feature vectors. The method is equally applicable to strings and trees as well. We employ fuzzy logic for addressing the noise sensitivity of graph based representations whilst achieving a simple and straightforward embedding of topological, structural and attribute information of a graph into a low-dimensional numeric feature vector. The method has been named as Fuzzy Multilevel Graph Embedding and abbreviated as FMGE. It embeds an attributed graph into a feature vector by extracting graph level details, subgraph homogeneity details and elementary level details. The feature vector is constructed by employing a direct encoding of graph level details followed by encoding of the distribution of subgraph homogeneity and elementary level details of the graph. The latter is achieved by constructing fuzzy histograms for numeric information and crisp histograms for symbolic information. The parameters for these histograms are learned during a prior unsupervised learning phase which does not necessarily require any labeled learning set. The work presented in this paper is an evolved version of our previous work [19]. Apart from formalizing our previously proposed graph embedding method to be generally applicable to a wide range of graph representations and the theoretical contributions of our work highlighted in next section, the experimentation has been enlarged by testing on more graph databases and the method is applied to the real problem of graph retrieval and subgraph spotting.

In Section 2 we outline related works on graph embedding. Section 3 presents definitions and formalizes the notation used in this paper. Section 4 introduces an overall global description of FMGE. In Section 5 we present details on the unsupervised learning phase and graph embedding phase of FMGE. Experimental results are presented in Section 6 and are followed by a discussion on the parameters of FMGE in Section 7. The paper concludes by presenting future directions of work in Section 8.

Section snippets

Related works

In the literature the problem of graph embedding has been approached by three important families of algorithms. Recent surveys on graph embedding are presented in [4], [15], [16]. Among these, a first family of graph embedding methods is based on the frequencies of appearance of specific knowledge-dependent substructures in graph. These works are mostly proposed for chemical compounds and molecular structures. Graph representation of molecules are assigned feature vectors whose components are

Definitions and notations

Definition 1 Attributed graph (AG)

Let AV and AE denote the domains of possible values for attributed vertices and edges respectively. These domains are assumed to include a special value that represents a null value of a vertex or an edge. In this paper the term attributed graph is used to refer to an undirected attributed graph, unless explicitly specified. An attributed graph AG over (AV, AE) is defined to be a four-tuple: AG=(V,E,μV,μE)where

  • V is a set of vertices, EV×V is a set of edges,

  • μV:VAVk is function assigning k

Overview of fuzzy multilevel graph embedding (FMGE)

A block diagram of FMGE is presented in Fig. 1. It accepts a collection of m attributed graphs as input and encodes their topological, structural and attribute details into m equal size feature vectors. The feature vector of FMGE is named as Fuzzy Structural Multilevel Feature Vector and abbreviated as FSMFV.

Framework of fuzzy multilevel graph embedding (FMGE)

In FMGE framework, the mapping of input collection of graphs to appropriate points in a suitable vector space Rn is achieved in two phases i.e. the off-line unsupervised learning phase and the on-line graph embedding phase. The unsupervised learning phase learns a set of fuzzy intervals for features linked to distribution analysis of the input graphs i.e. features for node degree, numeric node and edge attributes and the corresponding resemblance attributes. We refer each of them as an

Experimentations

The experimentation has been performed to confirm that FMGE and the subsequent classification and clustering in real valued vector space are not only applicable to different graph classification and clustering problems, but also that in certain cases it outperform the classical techniques for the latter. We have employed various standard datasets from the fields of graphics recognition, object recognition and document image analysis for addressing the problems of recognition of graphic

Discussion

We have outlined to use two basic discretization techniques in this paper. However FMGE is fully capable of employing sophisticated state-of-the-art discretization methods. Also, our proposed framework employs trapezoidal membership function from fuzzy logic but FMGE is fully capable of utilizing any of the available membership functions from fuzzy logic. In light of domain knowledge, appropriate choices could be made for discretization technique and fuzzy membership function.

An important

Conclusion and perspectives

We have presented a method of explicit graph embedding, with an aim to bridge the gap between structural and statistical approaches of pattern recognition. Our work proposes a straightforward, simple and computational efficient solution for facilitating the use of graph based powerful representations together with learning and computational strengths of state-of-the-art machine learning, classification and clustering. The proposed method exploits multilevel analysis of graph for embedding it

Acknowledgments

This work was partially supported by the Spanish projects TIN2008-04998, TIN2009-14633-C03-03 & CSD2007-00018 and partially by the PhD grant PD-2007- 1/Overseas/FR/HEC/222 from Higher Education Commission of Pakistan.

Muhammad Muzzamil Luqman received his Ph.D. degree in Computer Science from François-Rabelais University of Tours, France and Autonoma University of Barcelona, Spain in 2012. He got his masters from François-Rabelais University of Tours, France in 2008. In 2004 he was awarded gold medal and academic roll of honor by Government College University Lahore, Pakistan for achieving distinction in Bachelors of Computer Science (honors). Currently Luqman is a teaching and research assistant at

References (41)

  • A. Shokoufandeh et al.

    Indexing hierarchical structures using graph spectra

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • R. Duda et al.
    (2000)
  • V. Roth et al.

    Optimal cluster preserving embedding of nonmetric proximity data

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • T. Chen, Q. Yang, X. Tang, Directed graph embedding, in: International Joint Conference on Artificial Intelligence,...
  • E.L. Nathan Linial et al.

    The geometry of graphs and some of its algorithmic applications

    Combinatorica

    (1995)
  • B. Shaw, T. Jebara, Structure preserving embedding, in: International Conference on Machine Learning, 2009, pp....
  • G. Lee, A. Madabhushi, Semi-supervised graph embedding scheme with active learning (SSGEAL): classifying high...
  • P. Foggia, M. Vento, Graph embedding for pattern recognition, in: D. Ünay, Z. Çataltepe, S. Aksoy (Eds.), Recognizing...
  • K. Riesen et al.

    Graph Classification and Clustering Based on Vector Space Embedding

    (2010)
  • R.C. Wilson et al.

    Pattern vectors from algebraic graph theory

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • Cited by (56)

    • Interactive online learning for graph matching using active strategies

      2020, Knowledge-Based Systems
      Citation Excerpt :

      This information is embedded into a vector because we want to use a common supervised learning algorithm, such as neural network or a support vector machine. Several embedding methods have been presented in the literature, for instance, there is one in [57] or two more recent proposals in [58,59]. In our case, we present a simple one defined by the attribute nodes (semantic information) and the number of edges adjacent to the node (structural information) as we show in Fig. 3.

    • Design of multi-view graph embedding using multiple kernel learning

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The embedding is then built based on the statistical properties of those mappings. In the work of Luqman et al. (2013), the features of a graph are constructed from its topological, structural, and attribute information using a predefined set of rules. These features are encoded as fuzzy histograms in a vector space to define the embedding.

    View all citing articles on Scopus

    Muhammad Muzzamil Luqman received his Ph.D. degree in Computer Science from François-Rabelais University of Tours, France and Autonoma University of Barcelona, Spain in 2012. He got his masters from François-Rabelais University of Tours, France in 2008. In 2004 he was awarded gold medal and academic roll of honor by Government College University Lahore, Pakistan for achieving distinction in Bachelors of Computer Science (honors). Currently Luqman is a teaching and research assistant at François-Rabelais University of Tours, France. His main research interests include Structural Pattern Recognition, Machine Learning, Document Image Analysis/Recognition and Graphics Recognition.

    Jean-Yves Ramel received his Ph.D. degree in Computer Sciences in 1996 from the National Institute of Applied Sciences of Lyon (INSA Lyon France). After being an Assistant Professor at the Industrial Engineering Department of the National Institute of Applied Sciences of Lyon from 1998 to 2002; Jean-Yves Ramel is currently a Full Professor at the Computer Science Department of PolytechTours (French engineering school). He is also a staff researcher of the Computer Science Laboratory of Tours in the Image Analysis and Pattern Recognition Group (EA 2101). His current research fields are image analysis, document image indexation and structural pattern recognition. He has been the head of a number of Image Analysis and Indexation R+D projects and published several papers in national and international conferences and journals. Jean-Yves RAMEL is an active member of the Pattern Recognition for Image Understanding French Association (AFRIF), a member society of the IAPR (TC-10, TC-11 and TC-15) and also a PC member of a number of international conferences. His team has developed several open source software dealing with Document Image Analysis. Jean-Yves RAMEL was the recipient of a Google Digital Humanities Award in 2010 and has also experience in technological transfer and patent registration.

    Josep Lladós received the degree in Computer Sciences in 1991 from the Universitat Politècnica de Catalunya and the Ph.D. degree in Computer Sciences in 1997 from the Universitat Autònoma de Barcelona (Spain) and the Université Paris 8 (France). Currently he is an Associate Professor at the Computer Science Department of the Universitat Autònoma de Barcelona and is a staff researcher of the Computer Vision Center, where he is also the director. He is the head of the Pattern Recognition and Document Analysis Group (2005SGR-00472). His current research fields are document analysis, graphics recognition and structural and syntactic pattern recognition. He has been the head of a number of Computer Vision R+D projects and published several papers in national and international conferences and journals. J. Lladós is an active member of the Image Analysis and Pattern Recognition Spanish Association (AERFAI), a member society of the IAPR. He is currently the chairman of the IAPR-ILC (Industrial Liaison Committee). Formerly he served as chairman of the IAPR TC-10, the Technical Committee on Graphics Recognition, and also he is a member of the IAPR TC-11 (reading Systems) and IAPR TC-15 (Graph based Representations). He serves on the Editorial Board of the ELCVIA (Electronic Letters on Computer Vision and Image Analysis) and the IJDAR (International Journal in Document Analysis and Recognition), and also a PC member of a number of international conferences. He was the recipient of the IAPR-ICDAR Young Investigator Award in 2007. Josep Lladós has also experience in technological transfer and in 2002 he created the company ICAR Vision Systems, a spin-off of the Computer Vision Center working on Document Image Analysis, after win the entrepreneurs award from the Catalonia Government on business projects on Information Society Technologies in 2000.

    View full text