Improving vector space embedding of graphs through feature selection algorithms

doi:10.1016/j.patcog.2010.05.016

Pattern Recognition

Volume 44, Issue 9, September 2011, Pages 1928-1940

https://doi.org/10.1016/j.patcog.2010.05.016 Get rights and content

Abstract

Graph based pattern representation offers a versatile alternative to vectorial data structures. Therefore, a growing interest in graphs can be observed in various fields. However, a serious limitation in the use of graphs is the lack of elementary mathematical operations in the graph domain, actually required in many pattern recognition algorithms. In order to overcome this limitation, the present paper proposes an embedding of a given graph population in a vector space $R^{n}$ . The key idea of this embedding approach is to interpret the distances of a graph g to a number of prototype graphs as numerical features of g. In previous works, the prototypes were selected beforehand with heuristic selection algorithms. In the present paper we take a more fundamental approach and regard the problem of prototype selection as a feature selection or dimensionality reduction problem, for which many methods are available. With several experiments we show the feasibility of graph embedding based on prototypes obtained from such feature selection algorithms and demonstrate their potential to outperform previous approaches.

Introduction

After decades of focusing on independent and identically distributed representation formalisms in pattern recognition, more and more effort is now rendered in various research fields on graph based representation [1]. Object representation by means of graphs is advantageous compared to vectorial approaches because of two reasons. First, graphs do not suffer from the constraint of fixed dimensionality. That is, the number of nodes and edges in a graph is not limited a priori and depends on the size and the complexity of the actual object to be modeled. Second, graphs are able to represent not only the values of object properties, i.e. features, but can be used to explicitly model relations that exist between different parts of an object.

Due to their ability to represent properties of entities and binary relations at the same time, graphs have found widespread applications in science and engineering. They are used, for instance, in bioinformatics and chemoinformatics [2], [3], [4], web content and data mining [5], [6], [7], classifying images from various fields [8], [9], [10], symbol and character recognition [11], [12], [13], and in computer network analysis [14], [15].

However, one drawback of graphs, when compared to feature vectors, is the significantly increased complexity of many algorithms. For example, the comparison of two feature vectors for identity can be accomplished in linear time with respect to the length of the two vectors. For the analogous operation on general graphs, i.e. testing two graphs for isomorphism, only exponential algorithms are known today. Another serious limitation in the use of graphs for pattern recognition tasks is the little mathematical structure in the domain of graphs. For example, computing the (weighted) sum or the product of a pair of entities (which are elementary operations required in many classification and clustering algorithms) is not possible in the domain of graphs, or is at least not defined in a standardized way. Due to these general problems in the graph domain, we observe a lack of algorithmic tools for graph based pattern recognition.

The present paper's objective is to benefit from both the universality of graphs for pattern representation and the computational convenience of vectors for pattern recognition. To this end we propose a general procedure for mapping graphs from arbitrary graph domains to a real vector space by means of functions $φ : G \to R^{n}$ . Based on the resulting graph maps, the considered pattern recognition task is eventually carried out. Hence, the whole arsenal of algorithmic tools readily available for vectorial data can be applied to graphs (more exactly to graph maps $φ (g) \in R^{n}$ ).

The presented approach for graph embedding is primarily based on the idea proposed in [16] where the dissimilarity representation for pattern recognition in conjunction with feature vectors was first introduced. In the current work we go one step further and generalize the methods described in [16] to the domain of graphs. The key idea of the novel graph embedding approach is to use the distances of an input graph g to n prototype graphs $P = {p_{1}, \dots, p_{n}}$ as a vectorial description of g. Apparently, the definition of the prototype set $P$ is a critical issue since the graphs in $P$ affect the resulting vectors. Thus, a good selection of prototypes is crucial to succeed with the algorithm to be applied in the embedding space. Commonly, the prototypes are selected from a training set $T$ of existing graphs before the embedding is carried out. In previous works, this prototype selection uses some heuristics based on distances between the members of $T$ [17]. In the present paper, however, a new approach is proposed where all available elements from the training set are used as prototypes in a first step, i.e. for our embedding we define $P = T$ . Subsequently, feature subset selection algorithms are applied to the vector space embedded graphs. In other words, rather than selecting the prototypes beforehand, the embedding is carried out first and then the problem of prototype selection is reduced to a feature selection problem. Thus, by means of this more fundamental approach, we bypass the difficult problem of selecting adequate prototypes.

A preliminary version of the current paper appeared in [18]. The current paper has been significantly extended with respect to the underlying methodology and the experimental evaluation. First, two additional feature selection algorithms are applied to vector space embedded graphs. In addition, and for the sake of completeness, the results of two dimensionality reduction algorithms are also included in the experimental evaluation (originally presented in [19], [20] for the first time). The number of data sets where our embedding procedure is tested on is also considerably increased and results of an additional reference system, a similarity kernel, are added. Finally, a detailed discussion and results about the validation of the meta parameters of our approach are provided.

The remainder of this paper is organized as follows. In the next section we define basic concepts and introduce our notation. Then, in Section 3, the proposed approach for graph embedding in real vector spaces is described. The feature selection algorithms applied to the vector space embedded graphs are described in Section 4. In Section 5 we report a number of experiments and present results achieved with our embedding method. Finally, in Section 6 we summarize our work and draw conclusions.

Section snippets

Basic terminology

Depending on the considered application, various definitions for graphs can be found in the literature. The following well-established definition is sufficiently flexible for a large variety of tasks.

Definition 1 Graph

Let L_V and L_E be finite or infinite label sets for nodes and edges, respectively. A graph g is a four-tuple $g = (V, E, μ, ν)$ , where V is the finite set of nodes, $E \subseteq V \times V$ is the set of edges, $μ : V \to L_{V}$ is the node labeling function, and $ν : E \to L_{E}$ is the edge labeling function.

The number of nodes of a graph g is

General embedding procedure and properties

The idea of our graph embedding framework stems from the seminal work done by Duin and Pekalska [16] where dissimilarities for pattern representation are used for the first time. Later this method was extended so as to map string representations into vector spaces [28]. In the current work we go one step further and generalize and substantially extend the methods described in [16], [28] to the domain of graphs. The key idea of this approach is to use the distances of an input graph to a number

Feature selection algorithms

Feature subset selection aims at selecting a suitable subset of features such that the performance of a given algorithm is improved [32], [33]. By means of forward selection search strategies, the search starts with an empty set and iteratively adds useful features to this set. Conversely, backward elimination refers to the process of iteratively removing useless features starting with the full set of features. Also floating search methods are available, where alternately useful features are

Experimental evaluation

The purpose of the experimental evaluation described in this chapter is to empirically verify the power and applicability of the proposed graph embedding framework. To this end, several classification tasks are carried out using vector space embedded graphs.

Conclusions and future work

For objects given in terms of feature vectors a rich repository of algorithmic tools for classification has been developed over the past decades. Graphs are a versatile alternative to feature vectors, and are known to be a powerful and flexible representation formalism. The representational power of graphs is due to their ability to represent not only feature values but also relationships among different parts of an object, and their flexibility comes from the fact that there are no size or

Acknowledgement

This work has been supported by the Swiss National Science Foundation (Project 200021-113198/1).

References (52)

L. Ralaivola et al.
Graph kernels for chemical informatics
Neural Networks
(2005)
P. Suganthan et al.
Recognition of handprinted Chinese characters by constrained graph matching
Image and Vision Computing
(1998)
H. Bunke et al.
Inexact graph matching for structural pattern recognition
Pattern Recognition Letters
(1983)
M. Neuhaus et al.
Automatic learning of cost functions for graph edit distance
Information Sciences
(2007)
K. Riesen et al.
Approximate graph edit distance computation by means of bipartite graph matching
Image and Vision Computing
(2009)
R. Kohavi et al.
Wrappers for feature subset selection
Artificial Intelligence
(1997)
P. Pudil et al.
Floating search methods in feature-selection
Pattern Recognition Letters
(1994)
D. Conte et al.
Thirty years of graph matching in pattern recognition
International Journal of Pattern Recognition and Artificial Intelligence
(2004)
P. Mahé et al.
Graph kernels for molecular structures—activity relationship analysis with support vector machines
Journal of Chemical Information and Modeling
(2005)
K. Borgwardt, Graph kernels, Ph.D. Thesis, Ludwig-Maximilians-University Munich,...

A. Schenker et al.

Graph-Theoretic Techniques for Web Content Mining

(2005)

A. Schenker et al.

Classification of web documents using graph matching

International Journal of Pattern Recognition and Artificial Intelligence

(2004)

D. Cook, L. Holder (Eds.), Mining Graph Data, Wiley-Interscience,...

Z. Harchaoui et al.

Image classification with segmentation graph kernels

B. Luo et al.

Spectral embedding of graphs

Pattern Recognition

(2003)

R. Ambauen et al.

Graph edit distance with node splitting and merging and its application to diatom identification

J. Lladós et al.

Graph matching versus graph parsing in graphics recognition

International Journal of Pattern Recognition and Artificial Intelligence

(2004)

J. Rocha et al.

A shape analysis model with applications to a character recognition system

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1994)

H. Bunke, P. Dickinson, M. Kraetzl, W. Wallis, A graph-theoretic approach to enterprise network dynamics, Progress in...

P. Dickinson et al.

Matching graphs with unique node labels

Pattern Analysis and Applications

(2004)

E. Pekalska et al.

The Dissimilarity Representation for Pattern Recognition: Foundations and Applications

(2005)

K. Riesen et al.

Graph classification based on vector space embedding

International Journal of Pattern Recognition and Artificial Intelligence

(2009)

K. Riesen et al.

Feature ranking algorithms for improving classification of vector space embedded graphs

K. Riesen et al.

Reducing the dimensionality of vector space embeddings of graphs

K. Riesen et al.

Non-linear transformations of vector space embedded graphs

A. Sanfeliu et al.

A distance measure between attributed relational graphs for pattern recognition

IEEE Transactions on Systems, Man, and Cybernetics (Part B)

(1983)

Cited by (42)

Altered dynamic electroencephalography connectome phase-space features of emotion regulation in social anxiety
2019, NeuroImage
Citation Excerpt :
In this way, one can obtain an efficient classifier with minimal degradation in classification accuracy and performance. As two concrete examples, the spanning prototype selector (Bunke and Riesen, 2011) (SPS) was proposed such that each additional prototype selected is the graph that is the furthest away from already selected prototype graphs (with the first graph selected being the median graph, defined as the graph whose sum of geodesic distances to all other graphs is minimum); The center prototype selector (Bunke and Riesen, 2011) (CPS), on the other hand, selected the most central graphs as prototypes, which are recursively defined by the median graph from the remaining graph set. Last, informed by the MST construction we propose one additional approach to prototype selection which we termed the MST prototype selector.
Emotion regulation deficits are commonly observed in social anxiety disorder (SAD). We used manifold-learning to learn the phase-space connectome manifold of EEG brain dynamics in twenty SAD participants and twenty healthy controls. The purpose of the present study was to utilize manifold-learning to understand EEG brain dynamics associated with emotion regulation processes. Our emotion regulation task (ERT) contains three conditions: Neutral, Maintain and Reappraise. For all conditions and subjects, EEG connectivity data was converted into series of temporally-consecutive connectomes and aggregated to yield this phase-space manifold. As manifold geodesic distances encode intrinsic geometry, we visualized this space using its geodesic-informed minimum spanning tree and compared neurophysiological dynamics across conditions and groups using the corresponding trajectory length. Results showed that SAD participants had significantly longer trajectory lengths during Neutral and Maintain. Further, trajectory lengths during Reappraise were significantly associated with the habitual use of reappraisal strategies, while Maintain trajectory lengths were significantly associated with the negative affective state during Maintain. In sum, an unsupervised connectome manifold-learning approach can reveal emotion regulation associated phase-space features of brain dynamics.
Refinement operators for directed labeled graphs with applications to instance-based learning
2018, Knowledge-Based Systems
This paper presents a collection of refinement operators for directed labeled graphs (DLGs), and a family of distance and similarity measures based on them. We build upon previous work on refinement operators for other representations such as feature terms and description logic models. Specifically, we present eight refinement operators for DLGs, which will allow for the adaptation of three similarity measures to DLGs: the anti-unification-based, S_λ, the property-based, S_π, and the weighted property-based, S_wπ, similarities. We evaluate the resulting measures empirically, comparing them to existing similarity measures for structured data in the context of instance-based machine learning.
A scheme for high level data classification using random walk and network measures
2018, Expert Systems with Applications
Supervised classification techniques are known to exploit physical information of the analysed data, such as similarity, distribution and other low level features. Despite the relevance of such features, recent works have showed that a higher variety of patterns can be detected by combining low level and high level features. In this paper, it is proposed a supervised classification technique which applies limiting probabilities of the random walk theory over underlying networks constructed from input labeled data. The appealing feature of the proposed approach is that the adjacency matrix which carries both physical and structural information about the data. Structural information are given by features extracted from network connections. The class of a given unlabeled sample is estimated by a heuristic called ease of access, which is measured by the random walk process over the adjacency matrix. Such approach makes the technique quite general as one can put distinct data measures of interest in the connection matrix of the underlying data network to guide the random walker. Specifically, we show examples of combining low and high level features in the proposed classification scheme. Simulation results using artificial and real data sets suggest that the proposed technique is not only competitive with current and established classification techniques, but it also can reveal intrinsic structural patterns formed by the input data.
Semantic content-based image retrieval: A comprehensive study
2015, Journal of Visual Communication and Image Representation
The complexity of multimedia contents is significantly increasing in the current digital world. This yields an exigent demand for developing highly effective retrieval systems to satisfy human needs. Recently, extensive research efforts have been presented and conducted in the field of content-based image retrieval (CBIR). The majority of these efforts have been concentrated on reducing the semantic gap that exists between low-level image features represented by digital machines and the profusion of high-level human perception used to perceive images. Based on the growing research in the recent years, this paper provides a comprehensive review on the state-of-the-art in the field of CBIR. Additionally, this study presents a detailed overview of the CBIR framework and improvements achieved; including image preprocessing, feature extraction and indexing, system learning, benchmarking datasets, similarity matching, relevance feedback, performance evaluation, and visualization. Finally, promising research trends, challenges, and our insights are provided to inspire further research efforts.
Network-based supervised data classification by using an heuristic of ease of access
2015, Neurocomputing
We propose a new supervised classification technique which considers the ease of access of unlabeled instances to training classes through an underlying network. The training data set is used to construct a network, in which instances (nodes) represent the states that a random walker visits, and the network link structure is modified by performing a link weight composition between the unlabeled instance bias and the initial network link weights. Different from traditional classification heuristics, which divide the training data set into subspaces, the proposed scheme uses random walk limiting probabilities to measure the limiting state transitions among training nodes. An unlabeled instance receives the label of the class that is most easily reached by the random walker, that is, the limiting transition to that class is large. Simulation results suggest that the proposed technique is comparable to some well-known classification techniques.
Optimized dissimilarity space embedding for labeled graphs
2014, Information Sciences
This paper introduces a new general-purpose classification system able to face automatically a wide range of classification problems for labeled graphs. The proposed graph classifier explicitly embeds the input labeled graphs using the dissimilarity representation framework. We developed a method to optimize the dissimilarity space representation estimating the quadratic Rényi entropy of the underlying distribution of the generated dissimilarity values. The global optimization governing the synthesis of the classifier is implemented using a genetic algorithm and it is carried out by means of two operations that perform prototype selection and extraction on the input set of graphs. During the optimization step, we adopted a suitable objective function which includes the classification accuracy achieved by the whole classification model on a validation set. Experimental evaluations have been conducted on both synthetic and well-known benchmarking datasets, achieving competitive test set classification accuracy results with respect to other state-of-the-art graph embedding based classification systems.

View all citing articles on Scopus

About the Author—HORST BUNKE received his M.S. and Ph.D. degrees in Computer Science from the University of Erlangen, Germany. In 1984, he joined the University of Bern, Switzerland, where he is a professor in the Computer Science Department. He was Department Chairman from 1992 to 1996, Dean of the Faculty of Science from 1997 to 1998, and a member of the Executive Committee of the Faculty of Science from 2001 to 2003. Horst Bunke served as 1st Vice-President of the International Association for Pattern Recognition (IAPR) from 1998 to 2000. In 2000 he also was Acting President of this organization. Horst Bunke is a Fellow of the IAPR, former Editor-in-Charge of the International Journal of Pattern Recognition and Artificial Intelligence, Editor-in-Chief of the journal Electronic Letters of Computer Vision and Image Analysis, Editor-in-Chief of the book series on Machine Perception and Artificial Intelligence by World Scientific Publ. Co., Advisory Editor of Pattern Recognition, Associate Editor of Acta Cybernetica and Frontiers of Computer Science in China, and Former Associate Editor of the International Journal of Document Analysis and Recognition, and Pattern Analysis and Applications. Horst Bunke received an honorary doctor degree from the University of Szeged, Hungary, and held visiting positions at the IBM Los Angeles Scientific Center (1989), the University of Szeged, Hungary (1991), the University of South Florida at Tampa (1991, 1996, 1998–2006), the University of Nevada at Las Vegas (1994), Kagawa University, Takamatsu, Japan (1995), Curtin University, Perth, Australia (1999), and Australian National University, Canberra (2005). He served as a co-chair of the 4th International Conference on Document Analysis and Recognition held in Ulm, Germany, 1997 and as a Track Co-Chair of the 16th and 17th International Conference on Pattern Recognition held in Quebec City, Canada, and Cambridge, UK, in 2002 and 2004, respectively. Also he was chairman of the IAPR TC2 Workshop on Syntactic and Structural Pattern Recognition held in Bern 1992, a cochair of the 7th IAPR Workshop on Document Analysis Systems held in Nelson, NZ, 2006, and a cochair of the 10th International Workshop on Frontiers in Handwriting Recognition, held in La Baule, France, 2006. Horst Bunke was on the program and organization committee of many other conferences and served as a referee for numerous journals and scientific organizations. He is on the Scientific Advisory Board of the German Research Center for Artificial Intelligence (DFKI). Horst Bunke has more than 550 publications, including 36 authored, co-authored, edited or co-edited books and special editions of journals.

About the Author—KASPAR RIESEN received his M.S. and Ph.D. degrees in Computer Science from the University of Bern, Switzerland, in 2006 and 2009, respectively. Currently he is a researcher and lecture assistant in the research group of Computer Vision and Artificial Intelligence at the University of Bern, Switzerland. His research interests include structural pattern recognition and in particular graph embeddings in real vector spaces. He has more than 30 publications, including six journal papers.

View full text

Improving vector space embedding of graphs through feature selection algorithms

Abstract

Introduction

Section snippets

Basic terminology

General embedding procedure and properties

Feature selection algorithms

Experimental evaluation

Conclusions and future work

Acknowledgement

Neural Networks

Image and Vision Computing

Pattern Recognition Letters

Information Sciences

Image and Vision Computing

Artificial Intelligence

Pattern Recognition Letters

Thirty years of graph matching in pattern recognition

International Journal of Pattern Recognition and Artificial Intelligence

Graph kernels for molecular structures—activity relationship analysis with support vector machines

Journal of Chemical Information and Modeling

Graph-Theoretic Techniques for Web Content Mining

Classification of web documents using graph matching

International Journal of Pattern Recognition and Artificial Intelligence

Image classification with segmentation graph kernels

Spectral embedding of graphs

Pattern Recognition

Graph edit distance with node splitting and merging and its application to diatom identification

Graph matching versus graph parsing in graphics recognition

International Journal of Pattern Recognition and Artificial Intelligence

A shape analysis model with applications to a character recognition system

IEEE Transactions on Pattern Analysis and Machine Intelligence

Matching graphs with unique node labels

Pattern Analysis and Applications

The Dissimilarity Representation for Pattern Recognition: Foundations and Applications

Graph classification based on vector space embedding

International Journal of Pattern Recognition and Artificial Intelligence

Feature ranking algorithms for improving classification of vector space embedded graphs

Reducing the dimensionality of vector space embeddings of graphs

Non-linear transformations of vector space embedded graphs

A distance measure between attributed relational graphs for pattern recognition

IEEE Transactions on Systems, Man, and Cybernetics (Part B)