Elsevier

Pattern Recognition Letters

Volume 33, Issue 9, 1 July 2012, Pages 1196-1204
Pattern Recognition Letters

Image classification by multimodal subspace learning

https://doi.org/10.1016/j.patrec.2012.02.002Get rights and content

Abstract

In recent years we witnessed a surge of interest in subspace learning for image classification. However, the previous methods lack of high accuracy since they do not consider multiple features of the images. For instance, we can represent a color image by finding a set of visual features to represent the information of its color, texture and shape. According to the “Patch Alignment” Framework, we developed a new subspace learning method, termed Semi-Supervised Multimodal Subspace Learning (SS-MMSL), in which we can encode different features from different modalities to build a meaningful subspace. In particular, the new method adopts the discriminative information from the labeled data to construct local patches and aligns these patches to get the optimal low dimensional subspace for each modality. For local patch construction, the data distribution revealed by unlabeled data is utilized to enhance the subspace learning. In order to find a low dimensional subspace wherein the distribution of each modality is sufficiently smooth, SS-MMSL adopts an alternating and iterative optimization algorithm to explore the complementary characteristics of different modalities. The iterative procedure reaches the global minimum of the criterion due to the strong convexity of the criterion. Our experiments of image classification and cartoon retrieval demonstrate the validity of the proposed method.

Highlights

► We encode different features to build a physically meaningful subspace. ► We adopt discriminative information to obtain the optimal low-dimensional subspace for each view. ► We utilize unlabeled data to enhance the subspace learning. ► We use the alternating optimization to explore complementary characteristics of different features.

Introduction

In many applications of computer vision and multimedia management, images are usually represented in different ways. This kind of data is termed as multimodal data. A typical example is a color image which has different features from different modalities (Tao et al., 2006, Bian and Tao, 2010), e.g., color, texture, and shape. Classifying images into meaningful categories is a challenging and important task. Many methods (Vapnik, 1998, Domingos and Pazzani, 1996) have been used for image classification in order to organize, represent and browse images better; and also they are to improve the performances of the related applications such as Content Based Image Retrieval (CBIR) (Liu, 2004) and image annotation and image indexing (Datta et al., 2008). In these methods, an image of n1 × n2 pixels is described by a feature vector with n1 × n2 dimensions. Because of the large dimension of this data (Donoho, 2000), these methods cannot work well in practice. Some previous studies show that the performance of image classification can be improved significantly in a low dimensional subspace (Belhumeur et al., 1997). Representatives of the subspace learning methods are the Principal Component Analysis (PCA) (Belhumeur et al., 1997), Linear Discriminant Analysis (LDA) (Belhumeur et al., 1997), Locally Linear Embedding (LLE) (Roweis and Saul, 2000), Laplacian Eigenmaps (LE) (Belkin and Niyogi, 2002, Belkin and Niyogi, 2003) and Local Tangent Space Alignment (LTSA) (Zhang and Zha, 2005). However, these methods cannot deal with data represented by multiple feature spaces.

Although much progress has been made for multimodal data learning including classification (Zien and Ong, 2007), clustering (Bickel and Scheffer, 2004) and feature selection (Zhao and Liu, 2008), little progress has been made in multimodal subspace learning. A possible solution is to concatenate vectors from different modalities to form a new vector, and to apply dimensionality reduction directly on the concatenated vector. However, further analysis shows that this solution also has its own problem as these features describe different aspects of an image’s properties, that is, they are intrinsically embedded in heterogeneous feature spaces. This concatenation ignores the diversity of modalities and thus cannot efficiently explore their complementary nature. The other considerable solution is the Distributed Multiple-view Subspace Learning (DMSL) proposed in (Long et al., 2008). DMSL performs a subspace learning algorithm on each modality independently, and then based on the obtained low dimensional representations, it learns a common low dimensional representation which is as close as possible to each representation. Although DMSL allows selecting different subspace learning algorithms for different modalities, the original data are invisible to the final learning process, and thus, it cannot effectively explore the complementary nature.

In this paper, we propose a novel method, termed Semi-Supervised Multimodal Subspace Learning (SS-MMSL), which learns a unified low dimensional subspace over all modalities simultaneously. We consider the problem of multimodal subspace learning for image classification based on the “Patch Alignment” Framework (PAF) (Zhang et al., 2009; Guan et al., 2011a, Guan et al., 2011b; Wang et al., 2011). According to PAF, the subspace learning techniques are applied in the stages of local patch construction and whole alignment. We apply the discriminative information revealed by the labeled data in building local patches. Besides, we introduce the unlabeled data (Belkin et al., 2005, Zhu et al., 2003, Zheng et al., 2008, Belkin and Niyogi, 2004) into the local patch construction and incorporate them in the whole alignment stage to obtain the optimal low dimensional subspace for each modality. In order to find a unified low dimensional subspace wherein the distribution of each modality is sufficiently smooth, we derive an iterative algorithm by using alternating optimization (Bezdek and Hathaway, 2002) to obtain a group of appropriate weights which effectively learns the complementary nature. Our experimental results of image classification and cartoon retrieval demonstrate the effectiveness of the proposed method.

The rest of this paper is organized as follows. In Section 2, we present the proposed Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method and the solution to image classification using SS-MMSL. Experimental results are presented in Section 3. The application for cartoon retrieval is described in Section 4. And finally, conclusions are drawn in Section 5.

Section snippets

Semi-supervised multimodal subspace learning

In this section, we present a novel Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method based on the “Patch Alignment” Framework. The workflow of SS-MMSL is shown in Fig. 1. First, SS-MMSL extracts multiple features including color histogram, color correlogram and edge direction histogram for images. Subsequently, SS-MMSL builds the so-called local patches of both labeled and unlabeled data in each modality. Based on these patches, the whole alignment is performed to obtain a low

Experiments

In this section, we compare the effectiveness of the proposed SS-MMSL method with the Feature Concatenation based Subspace Learning (FCSL), the DMSL (Long et al., 2008), the Average performance of the Single-modality Subspace Learning (ASSL) and the Best performance of the Single-modality Subspace Learning (BSSL) in image classification. The experiment of image classification is conducted by LIBSVM (Fan et al., 2005) under the constructed low dimensional subspace.

Application in cartoon retrieval

As an extension of our approach, in this section, we demonstrate that SS-MMSL performs well in cartoon retrieval by conducting experiments on the cartoon database (Yu et al., 2007, Zhuang et al., 2008, Yang et al., 2009). The key issue is how to evaluate the shape similarities among cartoon images. It is important to carefully choose the features of cartoon images, e.g., in (Juan and Bodenheimer, 2004, Yu et al., 2011a), the edge feature is adopted to estimate the similarities. More recently,

Conclusions

This paper presents a new Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method for image classification. By successfully integrating the discriminative information from the labeled data in the local patch construction and utilizing the data distribution revealed by unlabeled data, a new objective function for subspace learning is conducted to obtain the advantages of both the “Patch Alignment” Framework and the semi-supervised learning. Moreover, an iterative algorithm using the

Acknowledgements

This work is supported by the grant of the National Natural Science Foundation of China (No. 61100104), the Grant of the National Defense Basic Scientific Research program of China (No. B1420110155), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20110121110020), the Grant of the Fundamental Research Funds for the Central Universities (No. 2011121049), the Singapore National Research Foundation& Interactive Digital Media R& D Program Office Grant (

References (37)

  • P. Belhumeur et al.

    Eigenfaces vs. fisherfaces: Recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Machine Intell.

    (1997)
  • M. Belkin et al.

    Laplacian eigenmaps and spectral techniques for embedding and clustering

    Proc. Adv. Neural Inf. Process. Syst. (NIPS 02)

    (2002)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • M. Belkin et al.

    Semi-supervised learning on riemannian manifolds

    J. Mach. Learn.

    (2004)
  • Belkin, M., Niyogi, P., Sindhwani, V., 2005. On manifold regularization. In: Proceedings of the Interantional Workshop...
  • Y. Bengio et al.

    Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering

    Adv. Neur. Inform. Process. Syst.

    (2003)
  • J.C. Bezdek et al.

    Some notes on alternating optimization

    Proc. AFSS Internat. Conf. Fuzzy Syst.

    (2002)
  • W. Bian et al.

    Biased discriminant Euclidean embedding for content based image retrieval

    IEEE Trans. Image Process.

    (2010)
  • S. Bickel et al.

    Multi-view clustering

    Proc. Internat. Conf. Mach. Learn. (ICML 04)

    (2004)
  • Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y., 2009. NUS-WIDE: A real-world web image database from...
  • R. Datta et al.

    Image retrieval: Ideas, influences, and trends of the new age

    ACM Comput. Surveys

    (2008)
  • P. Domingos et al.

    Beyond independence: Conditions for the optimality of the simple Bayesian classifier

    Proc. Internat. Conf. Mach. Learn. (ICML 96)

    (1996)
  • Donoho, D.L. 2000. High dimensional data analysis: The curses and blessings of dimensionality. In: Lecture at the...
  • R.E. Fan et al.

    Working set selection using the second order information for training SVM

    J. Mach. Learn. Res.

    (2005)
  • N. Guan et al.

    Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

    IEEE Trans. Image Process.

    (2011)
  • N. Guan et al.

    Non-negative patch alignment framework

    IEEE Trans. Neural Networ.

    (2011)
  • X. He et al.

    Locality preserving projections

    Adv. Neur. Inform. Process. Syst.

    (2004)
  • I. Jolliffe

    Principal Component Analysis

    (2002)
  • Cited by (29)

    • Multi-view partial label machine

      2022, Information Sciences
      Citation Excerpt :

      For instance, Rakotomamonjy et al. [22] proposed a simple MKL method by using an adaptive 2-norm regularization function. In addition, subspace learning [24,25] is also an effective strategy to solve the multi-view problems. For example, Yu et al. [24] utilized the multi-view data to construct a low dimensional subspace and proposed the semi-supervised multimodal subspace learning (SS-MMSL) method for image classification.

    • A similarity-based two-view multiple instance learning method for classification

      2020, Knowledge-Based Systems
      Citation Excerpt :

      The single view is a one-sided description of the data. Jun et al. [42] propose semi-supervised multi-modal subspace learning (SS-MMSL). They use the data distribution revealed by unlabeled data to enhance subspace learning, and use alternating iterative optimization algorithms to explore the complementary features of different modes.

    • Low-rank local tangent space embedding for subspace clustering

      2020, Information Sciences
      Citation Excerpt :

      In order to reveal hidden features in data, subspace representation techniques [11,12,15,18,36] have been developed to transform data to its significant feature space. They bring out good performance in characterizing global structures of real-world data and have been widely applied in computer vision [3], image processing [51], and system identification [40] to handle various types of visual scenarios such as motion, emotion, illumination, texture, etc. Moreover, sufficient mathematical theories in subspace techniques make them capable of dealing with noise and outliers [5,26].

    View all citing articles on Scopus
    View full text