Elsevier

Neurocomputing

Volume 136, 20 July 2014, Pages 152-161
Neurocomputing

Comparative study among three strategies of incorporating spatial structures to ordinal image regression

https://doi.org/10.1016/j.neucom.2014.01.017Get rights and content

Highlights

  • Through summary, find three strategies of using image prior spatial information.

  • Apply these strategies to establish OR variants for classifying ordinal image data.

  • Conduct comprehensive comparisons among the developed novel OR variants.

  • Conclude the effectiveness of spatial structure depends on the embedding way.

Abstract

Images usually have specific spatial structures, and related researches have shown that these structures can contribute to the establishment of more effective classification algorithms for images. So far though there have been many solutions of making use of such spatial structures separately proposed, little attention has been paid to their systematic summary, let their comparative study alone. On the other hand, we find that the existing image-oriented ordinal regression (OR) methods do not utilize such structure information, which motivates us to compensate a comparative study through embedding such spatial structure into ORs. Towards the end, in this paper, we (1) through a summary, find three typical strategies of using image prior spatial information, i.e., structure-embedded Euclidean distance strategy, structure-regularized modeling strategy for classifier learning, and direct manipulation strategy on images without vectorization for image; more importantly, (2) apply these strategies to establish corresponding ORs for classifying data with ordinal characteristic, conduct comprehensive comparisons and give analysis on them under three evaluation criteria. Experimental results on typical ordinal image datasets JAFFE, UMIST and FG-NET show that the latter two strategies can, on the whole, achieve distinct gain in OR performance and while the first one cannot necessarily as expected, which is due to whether the spatial information is directly embedded into the objective function involved or not.

Introduction

Images have two-dimensional inherent spatial structures, in which explicit and implicit discriminative information beneficial to image classification is involved. For example, in human faces, the eyes, nose and mouth are distributed in different regions, and specific geometric relations exist between them. However, most current developed pattern recognition and machine learning algorithms are based on vector patterns, in which the process of matrix-to-vector conversion is performed, consequently, useful spatial structure information to classification is lost seriously, thus leaving the room of performance promotion.

Over past years, though strategies of taking advantage of spatial structure information have been separately developed for improving performance of image classification, a systematic summary and comparative study among them is still lacked. For this purpose, in this paper, we will first make a summary from those scattered related literature and group them into three categories; then for making a comparison among them, we choose one of currently popular topics in image classification, i.e., image-oriented OR, as the comparative platform. OR is a special machine learning paradigm and possesses the duality of classification and regression, thus often is applied in such scenarios in which the predicted labels are discrete but ordered [26], [31], e.g., human facial age estimation and movie scoring and so on. Besides the duality, further reasons of choosing image-oriented OR as the comparative study paradigm are (1) these specially-designed ORs for ordinal image classification have so far hardly exploited such spatial information, and (2) the multi-index-based synthetic evaluation originated from their duality of classification and regression can more be reflected from multi-facets for such information utilization than single-index evaluation for classification or regression. And next, we develop three image-oriented OR variants by the compensation of spatial information using the aforementioned three strategies and then make an extensive comparison from a joint view of regression and classification under three evaluation criteria of MAE, Acc and OCI.

In this subsection we analyze the existing scattered schemes designed to utilize the spatial structure information and summarize them into three main families as follows:

It is known that Euclidean distance (ED) is one of the most often-used metric in pattern recognition. However, when it is used to similarity/distance measure between two images, the spatial structure information involved in them is not sufficiently reflected such that classification performance for the images is unfavorably affected. In order to compensate such loss, many attempts [1], [2], [3], [4], [5], [6], [7], [8] have been done, among which Ref. [1] can be viewed as their representative. In Ref. [1], the authors developed an IMage Euclidean Distance (IMED) by means of embedding spatial structure of images to ED and applied it to handwritten digit and human face recognition with better performance than ED. Due to its insensitiveness to small distortion of images and generality able to be embedded into such classifiers as SVM, IMED can successively be extended. For example, Li et al. [4] extended the IMED to multi-view gender classification and achieved higher classification accuracy; Liu et al. [5] further proposed multi-linear locality-preserved maximum information embedding for face recognition with more stable performance. Moreover, Li and Lu [8] developed an adaptive IMED (AIMED) by further fusing gray level knowledge of image to IMED besides the prior spatial information to achieve more satisfactory identification performance for human face and handwritten digit. In summary, these methods originating from IMED are either modified to different applications or embedded into other learning tasks such as SVM for performance gain. Thus in the following comparative study, we just adopt IMED as basic embedding, but any of its effective variants can straightforwardly be utilized in a similar way.

In this family, the strategy of exploiting spatial structure usually adopts the regularization technique to penalize a related objective function such that the resulted solution (by optimizing the objective) is spatially smooth as much as possible [9], [10], [11], [12], [13]. The spatial smooth subspace learning (SSSL) proposed in Ref. [9] can be regarded as the representative, in which a Laplacian penalty is imposed to constrain the projection coefficients to be spatially smooth. Zuo et al. [12] went further by weighting the Laplacian penalty function with Gaussian weights to realize multi-scale image smoothing. Chen et al. in Ref. [13] developed a regularized metric learning framework by again imposing the Laplacian penalty and achieved competitive face recognition performance on several benchmark datasets. From these related researches it can be easily found that the structure-regularized modeling indeed can also compensate the spatial information loss induced by tensor- or matrix-to-vector conversion. Therefore, we also try to adopt such a spatially-regularized strategy for image-oriented OR. Considering that adapting those successive strategies from the spatial regularization [9] to our problem is trivial, thus without loss of generality, we here take the spatial smooth constraint in Ref. [9] as the basic regularization strategy to conduct the following comparative study.

The strategies in former two families are all vector-pattern-oriented. Though the spatial structure information of images can get utilized and thus related learning performance is boosted, these strategies usually suffer from (1) high computational complexity; and (2) the so-called “small sample problem”, i.e., the dimensionality of feature vector is higher than the training set size, leading to over-fitting. Hence, a natural way to mitigate or address these problems is operating directly image (or reshaped image) patterns. Along this line, many studies have been developed, for example Refs. [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], in which the works of Chen et al. [14], [15], [16], [17], [18] and Tao et al. [20], [21], [22], [23], [24], [25] can be regarded as their representatives. More specifically, Chen et al. developed a series of classifiers, such as MatMHKS [14] and MatFE+MatCD [18], by bilinear projection on image (or reshaped image) patterns and achieved competitive performance in such classification tasks as human face and handwritten digit identification, against the vector-pattern-oriented counterparts; while Tao et al. developed their dimensionality reduction or classification modeling directly manipulated on (higher order) tensor patterns and applied them respectively for human gait recognition [20] and visual tracking [25]. It is the direct manipulation on matrix or tensor as operating unit such that the schemes like the bilinear projection on image (second-order tensor) can make more sufficient use of the inherent spatial structure information involved in the data than their vectorized counterparts. Out of the similar consideration, in the following comparative study, we take the bi-lateral manipulation on image as a direct learning scheme to make a comparison on ordinal learning performance with the other methods.

Finally, we tabulate a brief comparative summarization for the aforementioned three strategies in Table 1.

Following the categorization and summary for spatial structure information utilization strategies, our next step is in position to taking the OR as a research platform, on which we will make extensive empirical comparison on three image sets among we afore-summarized three groups of categories. Before that let us briefly give a review for OR, OR is actually a special learning strategy used to design classifiers for ordinal classes, e.g., human age estimation. Due to its duality of regression and classification and powerful ability, OR has so far been widely applied in domains such as the recommender system [26], web page ranking [27], image retrieval [28], medical image diagnosis [29], [30] and age estimation [31], [32]. In implementing them, various approaches have been put forward [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], including KDLOR [44], one of distinguished ORs. Though most of these ORs have achieved performance to different extents, however, when manipulated on images, almost all these methods neglect the compensation of spatial structure information for vectorized images, thus choosing the image-oriented OR as the research platform to give a comparison among the summarized three categories of using spatial structure is reasonable. Though such a work of incorporating the spatial information to existing OR seems trivial, to the best of our knowledge, there has indeed no related research done yet.

Now for the sake of clarity but without loss of generality, we will just take the linear version of KDLOR, a typical OR model proposed in Ref. [44], as the basic OR approach (herein denoted as LDLOR), and select IMED [1], SSSL [9] and bilinear modeling [14] as the comparative representatives of the three families of spatial structure information utilization to re-model LDLOR, thus yielding three modified LDLOR versions, respectively named as IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR, and for which we conduct a series of experiments on several image benchmark datasets and report comparison results in terms of the OR-specific evaluation criteria.

The remainder of the paper is organized as follows. In Section 2, we briefly review a representative OR, i.e., LDLOR, which is taken as the base model (baseline). In Section 3, three re-modeled LDLOR counterparts derived from three spatial structure information utilizing strategies are detailed. Section 4 shows the experimental results and gives comparison analysis. The conclusions are drawn in Section 5.

Section snippets

Review of LDLOR

LDLOR, one of the distinguished ORs, aims to find the best projection direction along which the ordinal indices of the ordered classes can be preserved well after projection. Based on this principle, LDLOR has two main characteristics: maximizing the distance between each pair of mean vectors of neighboring ordinal classes, and simultaneously minimizing the within-class scatters, which makes it different from the discriminant principles used in DA models such as LDA [45] due to the imposition

Three re-modeled LDLORs fused spatial structure information

In order to utilize the spatial structure information involved in such data as images to LDLOR, in the following sub-sections, we will briefly review the theorems of IMED [1], SSSL [9] and bilinear modeling [14] (as the representatives of three spatial information utilization strategies), and then employ them to re-model the basic LDLOR to generate its new variants: IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR.

Experiments

In this section, we conduct experiments to make empirical comparisons among LDLOR, IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR on three benchmark image datasets, i.e., JAFFE (for human facial expression intensity regression), UMIST (for human head pose regression), and FG-NET (for human age group regression), their classes all are ordinal. To eliminate the influence of image size to experiments, all images are cropped and resized to 16×16, and the raw (pixel) gray levels are directly extracted as

Conclusions

In this paper, first through a systematic summary for separately-proposed spatial structure information utilization schemes, we classified them into three main categories of the structure-embedded Euclidean distance preserving, the structure-regularized modeling and the direct manipulation on images; second, to further make a comparison among them in conditions that the spatial structure information is rarely reflected in existing image-oriented ORs, we respectively took IMED, SSSL and Bilinear

Acknowledgments

This work is partially supported by NSFC (61170151 and 61073112), Jiangsu SFC(BK2012793), Research Fund for the Doctoral Program (RFDP) (20123218110033), Funding of Jiangsu Innovation Program for Graduate Education (CXLX13_159), the Fundamental Research Funds for the Central Universities (NZ2013306) and Jiangsu Qinglan project.

Qing Tian received the B.S. degree in computer science from Southwest University for Nationalities, China, and the M.S. degree in computer science from Zhejiang University of Technology, China, respectively with the honors of Sichuan provincial level outstanding graduate and Zhejiang provincial level outstanding graduate in 2008 and 2011. From February 2011 to February 2012, as a researcher in the field of gender/age recognition, he worked in Arcsoft, U.S. Now he is a Ph.D. candidate in

References (53)

  • K. Gray et al.

    Random forest-based similarity measures for multi-modal classification of Alzheimer׳s disease

    NeuroImage

    (2013)
  • L. Wang et al.

    On the Euclidean distance of images

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • T. Tangkuampien, D. Suter, 3D Object Pose Inference via Kernel Principal Component Analysis with Image Euclidean...
  • J. Chen, R. Wang, S. Shan, et al. Isomap Based on the Image Euclidean Distance, in: ICPR,...
  • J. Li et al.

    A framework for multi-view gender classification

    Lecture Notes Comput. Sci.

    (2008)
  • Y. Liu et al.

    Tensor distance based multilinear locality preserved maximum information embedding

    IEEE Trans. Neural Netw.

    (2010)
  • B. Sun, J. Feng, L. Wang, Learning IMED via Shift-Invariant Transformation, in: CVPR,...
  • D. Cai, X. He, Y. Hu, et al., Learning a Spatial Smooth Subspace for Face Recognition, in: CVPR,...
  • S. Gu et al.

    Laplacian smoothing transform for face recognition

    Sci. China

    (2010)
  • W. Zuo, L. Liu, K. Wang, et al., Spatially Smooth Subspace Face Recognition Using LOG and DOG Penalties, in: ISNN,...
  • X. Chen, Z. Tong, H. Liu, et al., Metric Learning with Two-Dimensional Smoothness for Visual Analysis, in: CVPR,...
  • Z. Wang et al.

    New least squares support vector machines based on matrix patterns

    Neural Process. Lett.

    (2007)
  • Z. Wang et al.

    Pattern representation in feature extraction and classifier design: matrix versus vector

    IEEE Trans. Neural Netw.

    (2008)
  • D. Tao et al.

    General tensor discriminant analysis and gabor features for gait recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • D. Tao et al.

    Bayesian tensor approach for 3-D face modeling

    IEEE Trans. Circuits Syst. Video Technol.

    (2008)
  • B. Wang et al.

    A unified tensor level set for image segmentation

    IEEE Trans. Syst. Man Cybern. -Part B: Cybern.

    (2010)
  • Cited by (24)

    • Distance metric learning for augmenting the method of nearest neighbors for ordinal classification with absolute and relative information

      2021, Information Fusion
      Citation Excerpt :

      Ordinal classification problems are common in many fields of science, such as medicine[1,2], image processing[3] and social sciences[4].

    • Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification

      2020, Information Fusion
      Citation Excerpt :

      This approach not only improves the performance, but also allows to directly process data with missing information. As a special case of multiclass classification, ordinal classification [17] has become a popular research topic that has been considered in, for instance, economical modelling [18], social sciences [19] and computer vision [20]. Common approaches for addressing ordinal classification problems could be divided into naive methods, ordinal binary decomposition methods and threshold methods [21].

    • Fast kernel extreme learning machine for ordinal regression

      2019, Knowledge-Based Systems
      Citation Excerpt :

      There are a wide range of OR applications since an order exists among the categories in many situations. They include liver transplantation [3] and online doctor performance evaluation [4] in the medical field [5], age estimation [6] and facial beauty assessment [7] in the image classification [8], wind speed prediction [9] in the mechanic field and corporate credit rating [10] in the financial field, etc. Because of the ordering information between labels in OR applications, the costs of misclassifying the instance usually vary with different labels.

    View all citing articles on Scopus

    Qing Tian received the B.S. degree in computer science from Southwest University for Nationalities, China, and the M.S. degree in computer science from Zhejiang University of Technology, China, respectively with the honors of Sichuan provincial level outstanding graduate and Zhejiang provincial level outstanding graduate in 2008 and 2011. From February 2011 to February 2012, as a researcher in the field of gender/age recognition, he worked in Arcsoft, U.S. Now he is a Ph.D. candidate in computer science at Nanjing University of Aeronautics and Astronautics, and his current research interests include machine learning and pattern recognition.

    Songcan Chen received the B.S. degree from Hangzhou University (now merged into Zhejiang University), the M.S. degree from Shanghai Jiao Tong University and the Ph.D. degree from Nanjing University of Aeronautics and Astronautics (NUAA) in 1983, 1985, and 1997, respectively. He joined in NUAA in 1986, and since 1998, he has been a full-time Professor with the Department of Computer Science and Engineering. He has authored/co-authored over 170 scientific peer-reviewed papers and ever obtained Honorable Mentions of 2006, 2007 and 2010 Best Paper Awards of Pattern Recognition Journal respectively. His current research interests include pattern recognition, machine learning, and neural computing.

    Xiaoyang Tan received his B.S. and M.S. degrees in computer applications from Nanjing University of Aeronautics and Astronautics (NUAA) in 1993 and 1996, respectively. Then he worked at NUAA in June 1996 as an assistant lecturer. He received a Ph.D. degree from Department of Computer Science and Technology of Nanjing University, China, in 2005. From September 2006 to October 2007, he worked as a postdoctoral researcher in the LEAR (Learning and Recognition in Vision) team at INRIA Rhone-Alpes in Grenoble, France. His research interests are in face recognition, machine learning, pattern recognition, and computer vision.

    View full text