High-level pattern-based classification via tourist walks in networks

doi:10.1016/j.ins.2014.09.048

Information Sciences

Volume 294, 10 February 2015, Pages 109-126

https://doi.org/10.1016/j.ins.2014.09.048 Get rights and content

Highlights

•
Proposal of a novel hybrid (low- and high-level) classification technique.
•
The high-level term is realized by a combination of several tourist walk processes.
•
Tourist walks are used in a total novel approach: high-level classification.
•
The high-level model’s learning weights are self-adjusted in a statistical way.
•
Simulations on a real-world application: handwritten digit recognition.

Abstract

In this paper, we present a hybrid classification technique, which combines the decisions of low- and high-level classifiers. The low-level term realizes the classification task considering only the input data’s physical features, such as geometrical or statistical characteristics. In contrast, the high-level classification process checks the compliance of the new test instances against the pattern formations of each class that composes the training data. For this end, we extract suitable organizational and topological descriptors of a network that is constructed from the input data. With these descriptors, we show that the high-level term has the ability of detecting data patterns with semantic and global meanings. Here, the input data’s pattern formations are extracted by utilizing the dynamical information generated from several tourist walk processes, which are performed on the resulting network. Specifically, weighted combinations of transient and cycle lengths, which are derived variables from the tourist walks, are employed. Moreover, we show an effective method for calibrating the learning weights of these terms by using a statistical approach. Furthermore, we show that the tourist’s memory size is related to what extent one may capture organizational and complex features of the network. This means that local, quasi-local, and global features can be extracted, depending on the value of memory size parameter. Still in this work, we uncover the existence of a critical memory length, here denominated complex saturation, where any values larger than this critical point make no changes in the behaviors of the transient and cycle lengths. We also investigate several artificial and real-world situations where the low-level term alone fails to identify intrinsic data patterns, but the high-level term is able to perform well. Our investigation suggests that the proposed technique is able to improve the already optimized performance of traditional classification techniques. Finally, we apply the proposed technique in recognizing handwritten digits images and interesting results are obtained.

Introduction

In supervised data classification, for a given training set, a map from the input data to the corresponding desired output is estimated. The constructed map, called a classifier, is used to predict new input instances. Many supervised data classification techniques have been developed [6], [11], [19], [23], [30], [31], [35], [42], [43], such as k-nearest neighbors, Bayesian classifiers, neural networks, decision trees, committee machines, spectral biclustering, genetic algorithms, gravitational-based methods, and so on. In essence, all these techniques train and, consequently, classify unlabeled data items according to the physical features (e.g., distance, similarity or distribution) of the input data. These techniques that predict class labels using only physical features are called low-level classification techniques [36].

Usually, data items are not isolated points in the attribute space, but instead tend to form certain patterns. For example, in Fig. 1, the two test instances represented by the triangles are most probably to be classified as members of the square-shaped class if only physical features, such as distances among data instances, are considered. On the other hand, if we take into account the relationships among the data, we would intuitively classify the triangle-shaped items as members of the circular-shaped class, since a clear pattern (lozenge) is formed. The human (animal) brain performs both low and high orders of learning and it has facility of identifying patterns according to the semantic meanings of the input data. In general, however, this kind of task is still hard to be performed by computers. Supervised data classification by considering not only physical attributes but also pattern formation is referred to as high-level classification [36].

Broadly speaking, low-level classification techniques often share the same heuristic: division of the data space into sub-spaces, each of which representing a class. They are short in reproducing complex-formed or twisted classes, because they often rely on assumptions such as fixed shapes or predefined distributions. In contrast, the salient feature of the proposed technique is that it really provides two distinct classification heuristics: low- and high-level classifications. The former performs the prediction by the data’s physical features, while the latter captures the data’s pattern formations, which, in turn, permits the classifier to reproduce complex-formed and (or) twisted classes. As a result, a test instance is declared as member of the class to which it complies in a structural sense, no matter how far it is from the center or any members of that class.

It is well known that the network representation can capture arbitrary levels of relationships or interactions of the input data [37], [38], [39]. For this reason, we here show how the networks’ topological properties can help in identifying the pattern formation and, consequently, be used for general high-level classification. In this work, these topological properties are revealed by the tourist walks. A tourist walk can be defined as follows [21]. Given a set of cities, at each time step, the tourist (walker) goes to the nearest city that has not been visited in the past μ time steps. It has been shown that tourist walk is useful for data clustering [8] and image processing [3]. Each tourist walk can be decomposed in two terms: (i) the initial transient part of length t and (ii) a cycle (attractor) with period c. However, all these kinds of works are realized in regular lattices. Here, we study tourist walks in networks and we show that it has the ability of capturing the topological properties of the underlying network in a local to global fashion. It is worth observing that the application of tourist walks to graph-based environments is a new approach taken here. In addition, the employment of the tourist walks’ dynamics for discovering patterns in networks is a totally novel scheme in the literature.

Following the literature stream on such matter, several kinds of works related to high-level classification may be highlighted, such as:

•
the Semantic Web [4], [12], [34], which uses ontologies to describe the semantics of the data;
•
statistical relational learning, which may be decomposed in methods that realize collective inference [15], [25], [45], [46], [47] or graph-based semisupervised learning [9], [48];
•
contextual classification techniques [5], [10], [24], [27], [40], [41], [44], which consider the spatial relationships between the individual pixels and the local and global configurations of neighboring pixels in an image for assigning classes.

All the above-mentioned techniques, on one hand, try to make inferences for a new data item in accordance with the neighborhood relationships between data samples (nodes) within the graph. On the other hand, our approach aims at finding out global patterns formed by all the training samples. At the implementation level, while the former determines the class label of a test instance by analyzing the transition probabilities or other kinds of relational information, such as neighbors’ edge weights, our approach is realized by calculating the network’s topological measures, permitting the extraction of some kinds of semantic structures presented in the training data.

Another interesting related area is the graph-based structural pattern recognition [14], [16]. This topic is usually characterized as a graph matching problem. Both graph matching and the proposed approach intend to find out structural information, instead of pure geometrical information in the input patterns or data. However, in graph matching, pairs of patterns are compared. From the viewpoint of data classification, such structural information can be considered as a local information, because, at each time, only a limited amount of patterns is analyzed. In opposition, the proposed approach extracts pattern formations by considering the training data as a whole. As a consequence, our approach may reveal global organizations of data under analysis.

In this paper, we propose a technique that combines the low- and high-level supervised data classifications. The idea of this paper is built upon the general framework recently proposed by [36], where the high-level classification problem is treated using three existing network measures in a combined way: assortativity, clustering coefficient, and average degree. As highlighted by Silva and Zhao [36], a serious open problem is how one may choose other network measures in an intuitive way and also how one may define the learning weights that are associated for each of them. For instance, in their original paper, those three network measures were chosen under a series of trial and error attempts against several well-known network measures. In this paper, we address these two open issues as follows:

•
We propose a unified measure to capture the pattern formation of the data. In this way, one does not need to discover suitable and convenient sets of network measures to build up the high-level classifier, as occurs in [36]. In this paper, we show that the dynamical information generated by the tourist walks process can itself capture local-to-global organizational and complex features of the network by adjusting the walker’s memory length parameter. For example, when the memory window of the tourist is low, local structural features of the network are extracted. As the memory window grows larger, the walk dynamics compels the walker to venture far away from its starting point, permitting it to learn global features of the network.
•
The model selection procedure is simplified. In the original work in [36], the several learning weights of the high-level classifier must be carefully adjusted by the user. Because they are in a large number, the model selection procedure takes time and may be unfeasible for large data sets. As opposed to that, in this work, they are automatically adjusted by utilizing a statistical approach to fit the training data, which runs in linear time. As a result, the model selection effort is reduced at a large extent.

In addition, the adoption of tourist walks in this paper presents some interesting characteristics and advantages over the previous approach taken by [36]. For example, it occurs that the tourist walk method presents a class-dependent critical memory length, where any values larger than this critical point provide no changes in the behaviors of the transient and cycle lengths. This is an interesting phenomenon, which is observed when the memory length reaches a sufficient large value. We say that, when this happen, the walks have reached the “complexity saturation” of the class component. In this occasion, the global topological and organizational features of the network are said to be completely characterized in the sense of the tourist walks process. Moreover, we relate this phenomenon to phase transition in the context of complex networks. Finally, we show how the proposed technique can be used to solve general invariant pattern recognition problems [17], [29], [33], particularly when the pattern variances are nonlinear and there is not a closed form to describe the invariance.

The remainder of the paper is organized as follows. The proposed model is defined in Section 2. Computer simulations are performed on synthetic and real-world data sets in Section 3. In Section 4, the proposed technique is adapted to perform manual digits recognition. Finally, Section 5 concludes the paper.

Section snippets

Model description

In this section, the proposed model is described in detail.

Computer simulations

In this section, computer experiments are performed in order to assess the effectiveness of the proposed hybrid classification model based on tourist walks.

Application: handwritten digits recognition

In this section, the proposed high-level scheme is applied to a real-world task: handwritten digits recognition. The goal here is to show that the hybrid classification technique is able to perform well in real situations (as opposed to the latter section in which we focused on conveying the interesting model’s properties).

While recognizing individual digits is only one of a myriad of problems that involves specific designing of practical recognition systems, it still is, undoubtedly, an

Conclusions

In this work, we have proposed an alternative and novel technique for data classification, which combines both low- and high-level characteristics of the data. The former classifies data instances by their physical features and the latter measures the compliance of the test instance with the pattern formation of the input data. To this end, tourist walks have been employed to capture the complex topological properties of the network constructed from the input data. A quite interesting feature

References (48)

A.R. Backes et al.
Texture analysis and classification using deterministic tourist walk
Pattern Recogn.
(2010)
X. Peng et al.
Structural regularized projection twin support vector machine for data classification
Inform. Sci.
(2014)
P. Shafigh et al.
Gravitation based classification
Inform. Sci.
(2013)
T.C. Silva et al.
Uncovering overlapping cluster structures via stochastic competitive learning
Inform. Sci.
(2013)
S. Abe, T. Inoue, Fuzzy support vector machines for multiclass problems, in: European Symposium on Artificial Neural...
E. Alpaydin
Introduction to Machine Learning
(2004)
T. Berners-Lee et al.
The semantic web
Sci. Am.
(2001)
E. Binaghi et al.
A cognitive pyramid for contextual classification of remote sensing images
IEEE Trans. Geosci. Remote Sens.
(2003)
C.M. Bishop
Pattern Recognition and Machine Learning
(2006)
S. Boriah, V. Chandola, V. Kumar, Similarity measures for categorical data: a comparative evaluation, in: SIAM Data...

M.G. Campiteli et al.

Deterministic walks as an algorithm of pattern recognition

Phys. Rev. E

(2006)

R.W. Donaldson et al.

Use of contextual constraints in recognition of contour-traced handprinted characters

IEEE Trans. Comput.

(1970)

R.O. Duda et al.

Pattern Classification

(2001)

L. Feigenbaum et al.

The semantic web in action

Sci. Am.

(2007)

A. Frank, A. Asuncion, UCI machine learning repository,...

Z. Galil

Efficient algorithms for finding maximum matching in graphs

ACM Comput. Surv.

(1986)

B. Gallagher, H. Tong, T. Eliassi-rad, C. Faloutsos, Using ghost edges for classification in sparsely labeled networks,...

M. Gori et al.

Exact and approximate graph matching using random walks

IEEE Trans. Pattern Anal. Machine Intell.

(2005)

O.C. Hamsici et al.

Rotation invariant kernels and their application to shape analysis

IEEE Trans. Pattern Anal. Machine Intell.

(2009)

T. Hastie et al.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

(2009)

V. Kecman

Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models

(2001)

Y. LeCun et al.

Gradient-based learning applied to document recognition

Proc. IEEE

(1998)

G.F. Lima et al.

Deterministic walks in random media

Phy. Rev. Lett.

(2001)

Cited by (0)

View full text

High-level pattern-based classification via tourist walks in networks

Highlights

Abstract

Introduction

Section snippets

Model description

Computer simulations

Application: handwritten digits recognition

Conclusions

Pattern Recogn.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Introduction to Machine Learning

The semantic web

Sci. Am.

A cognitive pyramid for contextual classification of remote sensing images

IEEE Trans. Geosci. Remote Sens.

Pattern Recognition and Machine Learning

Deterministic walks as an algorithm of pattern recognition

Phys. Rev. E

Use of contextual constraints in recognition of contour-traced handprinted characters

IEEE Trans. Comput.

Pattern Classification

The semantic web in action

Sci. Am.

Efficient algorithms for finding maximum matching in graphs

ACM Comput. Surv.

Exact and approximate graph matching using random walks

IEEE Trans. Pattern Anal. Machine Intell.

Rotation invariant kernels and their application to shape analysis

IEEE Trans. Pattern Anal. Machine Intell.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models

Gradient-based learning applied to document recognition

Proc. IEEE

Deterministic walks in random media

Phy. Rev. Lett.