Comparative study among three strategies of incorporating spatial structures to ordinal image regression
Introduction
Images have two-dimensional inherent spatial structures, in which explicit and implicit discriminative information beneficial to image classification is involved. For example, in human faces, the eyes, nose and mouth are distributed in different regions, and specific geometric relations exist between them. However, most current developed pattern recognition and machine learning algorithms are based on vector patterns, in which the process of matrix-to-vector conversion is performed, consequently, useful spatial structure information to classification is lost seriously, thus leaving the room of performance promotion.
Over past years, though strategies of taking advantage of spatial structure information have been separately developed for improving performance of image classification, a systematic summary and comparative study among them is still lacked. For this purpose, in this paper, we will first make a summary from those scattered related literature and group them into three categories; then for making a comparison among them, we choose one of currently popular topics in image classification, i.e., image-oriented OR, as the comparative platform. OR is a special machine learning paradigm and possesses the duality of classification and regression, thus often is applied in such scenarios in which the predicted labels are discrete but ordered [26], [31], e.g., human facial age estimation and movie scoring and so on. Besides the duality, further reasons of choosing image-oriented OR as the comparative study paradigm are (1) these specially-designed ORs for ordinal image classification have so far hardly exploited such spatial information, and (2) the multi-index-based synthetic evaluation originated from their duality of classification and regression can more be reflected from multi-facets for such information utilization than single-index evaluation for classification or regression. And next, we develop three image-oriented OR variants by the compensation of spatial information using the aforementioned three strategies and then make an extensive comparison from a joint view of regression and classification under three evaluation criteria of MAE, Acc and OCI.
In this subsection we analyze the existing scattered schemes designed to utilize the spatial structure information and summarize them into three main families as follows:
It is known that Euclidean distance (ED) is one of the most often-used metric in pattern recognition. However, when it is used to similarity/distance measure between two images, the spatial structure information involved in them is not sufficiently reflected such that classification performance for the images is unfavorably affected. In order to compensate such loss, many attempts [1], [2], [3], [4], [5], [6], [7], [8] have been done, among which Ref. [1] can be viewed as their representative. In Ref. [1], the authors developed an IMage Euclidean Distance (IMED) by means of embedding spatial structure of images to ED and applied it to handwritten digit and human face recognition with better performance than ED. Due to its insensitiveness to small distortion of images and generality able to be embedded into such classifiers as SVM, IMED can successively be extended. For example, Li et al. [4] extended the IMED to multi-view gender classification and achieved higher classification accuracy; Liu et al. [5] further proposed multi-linear locality-preserved maximum information embedding for face recognition with more stable performance. Moreover, Li and Lu [8] developed an adaptive IMED (AIMED) by further fusing gray level knowledge of image to IMED besides the prior spatial information to achieve more satisfactory identification performance for human face and handwritten digit. In summary, these methods originating from IMED are either modified to different applications or embedded into other learning tasks such as SVM for performance gain. Thus in the following comparative study, we just adopt IMED as basic embedding, but any of its effective variants can straightforwardly be utilized in a similar way.
In this family, the strategy of exploiting spatial structure usually adopts the regularization technique to penalize a related objective function such that the resulted solution (by optimizing the objective) is spatially smooth as much as possible [9], [10], [11], [12], [13]. The spatial smooth subspace learning (SSSL) proposed in Ref. [9] can be regarded as the representative, in which a Laplacian penalty is imposed to constrain the projection coefficients to be spatially smooth. Zuo et al. [12] went further by weighting the Laplacian penalty function with Gaussian weights to realize multi-scale image smoothing. Chen et al. in Ref. [13] developed a regularized metric learning framework by again imposing the Laplacian penalty and achieved competitive face recognition performance on several benchmark datasets. From these related researches it can be easily found that the structure-regularized modeling indeed can also compensate the spatial information loss induced by tensor- or matrix-to-vector conversion. Therefore, we also try to adopt such a spatially-regularized strategy for image-oriented OR. Considering that adapting those successive strategies from the spatial regularization [9] to our problem is trivial, thus without loss of generality, we here take the spatial smooth constraint in Ref. [9] as the basic regularization strategy to conduct the following comparative study.
The strategies in former two families are all vector-pattern-oriented. Though the spatial structure information of images can get utilized and thus related learning performance is boosted, these strategies usually suffer from (1) high computational complexity; and (2) the so-called “small sample problem”, i.e., the dimensionality of feature vector is higher than the training set size, leading to over-fitting. Hence, a natural way to mitigate or address these problems is operating directly image (or reshaped image) patterns. Along this line, many studies have been developed, for example Refs. [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], in which the works of Chen et al. [14], [15], [16], [17], [18] and Tao et al. [20], [21], [22], [23], [24], [25] can be regarded as their representatives. More specifically, Chen et al. developed a series of classifiers, such as MatMHKS [14] and MatFE+MatCD [18], by bilinear projection on image (or reshaped image) patterns and achieved competitive performance in such classification tasks as human face and handwritten digit identification, against the vector-pattern-oriented counterparts; while Tao et al. developed their dimensionality reduction or classification modeling directly manipulated on (higher order) tensor patterns and applied them respectively for human gait recognition [20] and visual tracking [25]. It is the direct manipulation on matrix or tensor as operating unit such that the schemes like the bilinear projection on image (second-order tensor) can make more sufficient use of the inherent spatial structure information involved in the data than their vectorized counterparts. Out of the similar consideration, in the following comparative study, we take the bi-lateral manipulation on image as a direct learning scheme to make a comparison on ordinal learning performance with the other methods.
Finally, we tabulate a brief comparative summarization for the aforementioned three strategies in Table 1.
Following the categorization and summary for spatial structure information utilization strategies, our next step is in position to taking the OR as a research platform, on which we will make extensive empirical comparison on three image sets among we afore-summarized three groups of categories. Before that let us briefly give a review for OR, OR is actually a special learning strategy used to design classifiers for ordinal classes, e.g., human age estimation. Due to its duality of regression and classification and powerful ability, OR has so far been widely applied in domains such as the recommender system [26], web page ranking [27], image retrieval [28], medical image diagnosis [29], [30] and age estimation [31], [32]. In implementing them, various approaches have been put forward [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], including KDLOR [44], one of distinguished ORs. Though most of these ORs have achieved performance to different extents, however, when manipulated on images, almost all these methods neglect the compensation of spatial structure information for vectorized images, thus choosing the image-oriented OR as the research platform to give a comparison among the summarized three categories of using spatial structure is reasonable. Though such a work of incorporating the spatial information to existing OR seems trivial, to the best of our knowledge, there has indeed no related research done yet.
Now for the sake of clarity but without loss of generality, we will just take the linear version of KDLOR, a typical OR model proposed in Ref. [44], as the basic OR approach (herein denoted as LDLOR), and select IMED [1], SSSL [9] and bilinear modeling [14] as the comparative representatives of the three families of spatial structure information utilization to re-model LDLOR, thus yielding three modified LDLOR versions, respectively named as IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR, and for which we conduct a series of experiments on several image benchmark datasets and report comparison results in terms of the OR-specific evaluation criteria.
The remainder of the paper is organized as follows. In Section 2, we briefly review a representative OR, i.e., LDLOR, which is taken as the base model (baseline). In Section 3, three re-modeled LDLOR counterparts derived from three spatial structure information utilizing strategies are detailed. Section 4 shows the experimental results and gives comparison analysis. The conclusions are drawn in Section 5.
Section snippets
Review of LDLOR
LDLOR, one of the distinguished ORs, aims to find the best projection direction along which the ordinal indices of the ordered classes can be preserved well after projection. Based on this principle, LDLOR has two main characteristics: maximizing the distance between each pair of mean vectors of neighboring ordinal classes, and simultaneously minimizing the within-class scatters, which makes it different from the discriminant principles used in DA models such as LDA [45] due to the imposition
Three re-modeled LDLORs fused spatial structure information
In order to utilize the spatial structure information involved in such data as images to LDLOR, in the following sub-sections, we will briefly review the theorems of IMED [1], SSSL [9] and bilinear modeling [14] (as the representatives of three spatial information utilization strategies), and then employ them to re-model the basic LDLOR to generate its new variants: IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR.
Experiments
In this section, we conduct experiments to make empirical comparisons among LDLOR, IMED–LDLOR, SSSL–LDLOR and Bil–LDLOR on three benchmark image datasets, i.e., JAFFE (for human facial expression intensity regression), UMIST (for human head pose regression), and FG-NET (for human age group regression), their classes all are ordinal. To eliminate the influence of image size to experiments, all images are cropped and resized to 16×16, and the raw (pixel) gray levels are directly extracted as
Conclusions
In this paper, first through a systematic summary for separately-proposed spatial structure information utilization schemes, we classified them into three main categories of the structure-embedded Euclidean distance preserving, the structure-regularized modeling and the direct manipulation on images; second, to further make a comparison among them in conditions that the spatial structure information is rarely reflected in existing image-oriented ORs, we respectively took IMED, SSSL and Bilinear
Acknowledgments
This work is partially supported by NSFC (61170151 and 61073112), Jiangsu SFC(BK2012793), Research Fund for the Doctoral Program (RFDP) (20123218110033), Funding of Jiangsu Innovation Program for Graduate Education (CXLX13_159), the Fundamental Research Funds for the Central Universities (NZ2013306) and Jiangsu Qinglan project.
Qing Tian received the B.S. degree in computer science from Southwest University for Nationalities, China, and the M.S. degree in computer science from Zhejiang University of Technology, China, respectively with the honors of Sichuan provincial level outstanding graduate and Zhejiang provincial level outstanding graduate in 2008 and 2011. From February 2011 to February 2012, as a researcher in the field of gender/age recognition, he worked in Arcsoft, U.S. Now he is a Ph.D. candidate in
References (53)
- et al.
Post-processed LDA for face and palmprint recognition: what is the rationale
Signal Process.
(2010) - et al.
An adaptive image Euclidean distance
Pattern Recognit.
(2009) - et al.
Contextual constraints based linear discriminant analysis
Pattern Recognit. Lett.
(2011) - et al.
Matrix-pattern-oriented Ho-Kashyap classifier with regularization learning
Pattern Recognit.
(2007) - et al.
Matrix-pattern-oriented least squares support vector classifier with AdaBoost
Pattern Recognit. Lett.
(2008) - et al.
Three-fold structured classifier design based on matrix pattern
Pattern Recognit.
(2013) - et al.
Maximum margin multisurface support tensor machines with application to image classification and segmentation
Expert Syst. Appl.
(2012) - et al.
Tensor rank one discriminant analysis – a convergent method for discriminative multilinear subspace selection
Neurocomputing
(2008) - et al.
Incremental tensor biased discriminant analysis: a new color-based visual tracking method
Neurocomputing
(2010) - et al.
Multimodal classification of Alzheimer׳s disease and mild cognitive impairment
NeuroImage
(2011)
Random forest-based similarity measures for multi-modal classification of Alzheimer׳s disease
NeuroImage
On the Euclidean distance of images
IEEE Trans. Pattern Anal. Mach. Intell.
A framework for multi-view gender classification
Lecture Notes Comput. Sci.
Tensor distance based multilinear locality preserved maximum information embedding
IEEE Trans. Neural Netw.
Laplacian smoothing transform for face recognition
Sci. China
New least squares support vector machines based on matrix patterns
Neural Process. Lett.
Pattern representation in feature extraction and classifier design: matrix versus vector
IEEE Trans. Neural Netw.
General tensor discriminant analysis and gabor features for gait recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Bayesian tensor approach for 3-D face modeling
IEEE Trans. Circuits Syst. Video Technol.
A unified tensor level set for image segmentation
IEEE Trans. Syst. Man Cybern. -Part B: Cybern.
Cited by (24)
Fusion of standard and ordinal dropout techniques to regularise deep models
2024, Information FusionDistance metric learning for augmenting the method of nearest neighbors for ordinal classification with absolute and relative information
2021, Information FusionCitation Excerpt :Ordinal classification problems are common in many fields of science, such as medicine[1,2], image processing[3] and social sciences[4].
Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification
2020, Information FusionCitation Excerpt :This approach not only improves the performance, but also allows to directly process data with missing information. As a special case of multiclass classification, ordinal classification [17] has become a popular research topic that has been considered in, for instance, economical modelling [18], social sciences [19] and computer vision [20]. Common approaches for addressing ordinal classification problems could be divided into naive methods, ordinal binary decomposition methods and threshold methods [21].
Fast kernel extreme learning machine for ordinal regression
2019, Knowledge-Based SystemsCitation Excerpt :There are a wide range of OR applications since an order exists among the categories in many situations. They include liver transplantation [3] and online doctor performance evaluation [4] in the medical field [5], age estimation [6] and facial beauty assessment [7] in the image classification [8], wind speed prediction [9] in the mechanic field and corporate credit rating [10] in the financial field, etc. Because of the ordering information between labels in OR applications, the costs of misclassifying the instance usually vary with different labels.
Prediction of low-visibility events due to fog using ordinal classification
2018, Atmospheric Research
Qing Tian received the B.S. degree in computer science from Southwest University for Nationalities, China, and the M.S. degree in computer science from Zhejiang University of Technology, China, respectively with the honors of Sichuan provincial level outstanding graduate and Zhejiang provincial level outstanding graduate in 2008 and 2011. From February 2011 to February 2012, as a researcher in the field of gender/age recognition, he worked in Arcsoft, U.S. Now he is a Ph.D. candidate in computer science at Nanjing University of Aeronautics and Astronautics, and his current research interests include machine learning and pattern recognition.
Songcan Chen received the B.S. degree from Hangzhou University (now merged into Zhejiang University), the M.S. degree from Shanghai Jiao Tong University and the Ph.D. degree from Nanjing University of Aeronautics and Astronautics (NUAA) in 1983, 1985, and 1997, respectively. He joined in NUAA in 1986, and since 1998, he has been a full-time Professor with the Department of Computer Science and Engineering. He has authored/co-authored over 170 scientific peer-reviewed papers and ever obtained Honorable Mentions of 2006, 2007 and 2010 Best Paper Awards of Pattern Recognition Journal respectively. His current research interests include pattern recognition, machine learning, and neural computing.
Xiaoyang Tan received his B.S. and M.S. degrees in computer applications from Nanjing University of Aeronautics and Astronautics (NUAA) in 1993 and 1996, respectively. Then he worked at NUAA in June 1996 as an assistant lecturer. He received a Ph.D. degree from Department of Computer Science and Technology of Nanjing University, China, in 2005. From September 2006 to October 2007, he worked as a postdoctoral researcher in the LEAR (Learning and Recognition in Vision) team at INRIA Rhone-Alpes in Grenoble, France. His research interests are in face recognition, machine learning, pattern recognition, and computer vision.