Image classification by multimodal subspace learning
Highlights
► We encode different features to build a physically meaningful subspace. ► We adopt discriminative information to obtain the optimal low-dimensional subspace for each view. ► We utilize unlabeled data to enhance the subspace learning. ► We use the alternating optimization to explore complementary characteristics of different features.
Introduction
In many applications of computer vision and multimedia management, images are usually represented in different ways. This kind of data is termed as multimodal data. A typical example is a color image which has different features from different modalities (Tao et al., 2006, Bian and Tao, 2010), e.g., color, texture, and shape. Classifying images into meaningful categories is a challenging and important task. Many methods (Vapnik, 1998, Domingos and Pazzani, 1996) have been used for image classification in order to organize, represent and browse images better; and also they are to improve the performances of the related applications such as Content Based Image Retrieval (CBIR) (Liu, 2004) and image annotation and image indexing (Datta et al., 2008). In these methods, an image of n1 × n2 pixels is described by a feature vector with n1 × n2 dimensions. Because of the large dimension of this data (Donoho, 2000), these methods cannot work well in practice. Some previous studies show that the performance of image classification can be improved significantly in a low dimensional subspace (Belhumeur et al., 1997). Representatives of the subspace learning methods are the Principal Component Analysis (PCA) (Belhumeur et al., 1997), Linear Discriminant Analysis (LDA) (Belhumeur et al., 1997), Locally Linear Embedding (LLE) (Roweis and Saul, 2000), Laplacian Eigenmaps (LE) (Belkin and Niyogi, 2002, Belkin and Niyogi, 2003) and Local Tangent Space Alignment (LTSA) (Zhang and Zha, 2005). However, these methods cannot deal with data represented by multiple feature spaces.
Although much progress has been made for multimodal data learning including classification (Zien and Ong, 2007), clustering (Bickel and Scheffer, 2004) and feature selection (Zhao and Liu, 2008), little progress has been made in multimodal subspace learning. A possible solution is to concatenate vectors from different modalities to form a new vector, and to apply dimensionality reduction directly on the concatenated vector. However, further analysis shows that this solution also has its own problem as these features describe different aspects of an image’s properties, that is, they are intrinsically embedded in heterogeneous feature spaces. This concatenation ignores the diversity of modalities and thus cannot efficiently explore their complementary nature. The other considerable solution is the Distributed Multiple-view Subspace Learning (DMSL) proposed in (Long et al., 2008). DMSL performs a subspace learning algorithm on each modality independently, and then based on the obtained low dimensional representations, it learns a common low dimensional representation which is as close as possible to each representation. Although DMSL allows selecting different subspace learning algorithms for different modalities, the original data are invisible to the final learning process, and thus, it cannot effectively explore the complementary nature.
In this paper, we propose a novel method, termed Semi-Supervised Multimodal Subspace Learning (SS-MMSL), which learns a unified low dimensional subspace over all modalities simultaneously. We consider the problem of multimodal subspace learning for image classification based on the “Patch Alignment” Framework (PAF) (Zhang et al., 2009; Guan et al., 2011a, Guan et al., 2011b; Wang et al., 2011). According to PAF, the subspace learning techniques are applied in the stages of local patch construction and whole alignment. We apply the discriminative information revealed by the labeled data in building local patches. Besides, we introduce the unlabeled data (Belkin et al., 2005, Zhu et al., 2003, Zheng et al., 2008, Belkin and Niyogi, 2004) into the local patch construction and incorporate them in the whole alignment stage to obtain the optimal low dimensional subspace for each modality. In order to find a unified low dimensional subspace wherein the distribution of each modality is sufficiently smooth, we derive an iterative algorithm by using alternating optimization (Bezdek and Hathaway, 2002) to obtain a group of appropriate weights which effectively learns the complementary nature. Our experimental results of image classification and cartoon retrieval demonstrate the effectiveness of the proposed method.
The rest of this paper is organized as follows. In Section 2, we present the proposed Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method and the solution to image classification using SS-MMSL. Experimental results are presented in Section 3. The application for cartoon retrieval is described in Section 4. And finally, conclusions are drawn in Section 5.
Section snippets
Semi-supervised multimodal subspace learning
In this section, we present a novel Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method based on the “Patch Alignment” Framework. The workflow of SS-MMSL is shown in Fig. 1. First, SS-MMSL extracts multiple features including color histogram, color correlogram and edge direction histogram for images. Subsequently, SS-MMSL builds the so-called local patches of both labeled and unlabeled data in each modality. Based on these patches, the whole alignment is performed to obtain a low
Experiments
In this section, we compare the effectiveness of the proposed SS-MMSL method with the Feature Concatenation based Subspace Learning (FCSL), the DMSL (Long et al., 2008), the Average performance of the Single-modality Subspace Learning (ASSL) and the Best performance of the Single-modality Subspace Learning (BSSL) in image classification. The experiment of image classification is conducted by LIBSVM (Fan et al., 2005) under the constructed low dimensional subspace.
Application in cartoon retrieval
As an extension of our approach, in this section, we demonstrate that SS-MMSL performs well in cartoon retrieval by conducting experiments on the cartoon database (Yu et al., 2007, Zhuang et al., 2008, Yang et al., 2009). The key issue is how to evaluate the shape similarities among cartoon images. It is important to carefully choose the features of cartoon images, e.g., in (Juan and Bodenheimer, 2004, Yu et al., 2011a), the edge feature is adopted to estimate the similarities. More recently,
Conclusions
This paper presents a new Semi-Supervised Multimodal Subspace Learning (SS-MMSL) method for image classification. By successfully integrating the discriminative information from the labeled data in the local patch construction and utilizing the data distribution revealed by unlabeled data, a new objective function for subspace learning is conducted to obtain the advantages of both the “Patch Alignment” Framework and the semi-supervised learning. Moreover, an iterative algorithm using the
Acknowledgements
This work is supported by the grant of the National Natural Science Foundation of China (No. 61100104), the Grant of the National Defense Basic Scientific Research program of China (No. B1420110155), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20110121110020), the Grant of the Fundamental Research Funds for the Central Universities (No. 2011121049), the Singapore National Research Foundation& Interactive Digital Media R& D Program Office Grant (
References (37)
- et al.
Eigenfaces vs. fisherfaces: Recognition using class specific linear projection
IEEE Trans. Pattern Anal. Machine Intell.
(1997) - et al.
Laplacian eigenmaps and spectral techniques for embedding and clustering
Proc. Adv. Neural Inf. Process. Syst. (NIPS 02)
(2002) - et al.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
(2003) - et al.
Semi-supervised learning on riemannian manifolds
J. Mach. Learn.
(2004) - Belkin, M., Niyogi, P., Sindhwani, V., 2005. On manifold regularization. In: Proceedings of the Interantional Workshop...
- et al.
Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering
Adv. Neur. Inform. Process. Syst.
(2003) - et al.
Some notes on alternating optimization
Proc. AFSS Internat. Conf. Fuzzy Syst.
(2002) - et al.
Biased discriminant Euclidean embedding for content based image retrieval
IEEE Trans. Image Process.
(2010) - et al.
Multi-view clustering
Proc. Internat. Conf. Mach. Learn. (ICML 04)
(2004) - Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y., 2009. NUS-WIDE: A real-world web image database from...
Image retrieval: Ideas, influences, and trends of the new age
ACM Comput. Surveys
Beyond independence: Conditions for the optimality of the simple Bayesian classifier
Proc. Internat. Conf. Mach. Learn. (ICML 96)
Working set selection using the second order information for training SVM
J. Mach. Learn. Res.
Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent
IEEE Trans. Image Process.
Non-negative patch alignment framework
IEEE Trans. Neural Networ.
Locality preserving projections
Adv. Neur. Inform. Process. Syst.
Principal Component Analysis
Cited by (29)
Multi-view partial label machine
2022, Information SciencesCitation Excerpt :For instance, Rakotomamonjy et al. [22] proposed a simple MKL method by using an adaptive 2-norm regularization function. In addition, subspace learning [24,25] is also an effective strategy to solve the multi-view problems. For example, Yu et al. [24] utilized the multi-view data to construct a low dimensional subspace and proposed the semi-supervised multimodal subspace learning (SS-MMSL) method for image classification.
A similarity-based two-view multiple instance learning method for classification
2020, Knowledge-Based SystemsCitation Excerpt :The single view is a one-sided description of the data. Jun et al. [42] propose semi-supervised multi-modal subspace learning (SS-MMSL). They use the data distribution revealed by unlabeled data to enhance subspace learning, and use alternating iterative optimization algorithms to explore the complementary features of different modes.
Label embedded dictionary learning for image classification
2020, NeurocomputingLow-rank local tangent space embedding for subspace clustering
2020, Information SciencesCitation Excerpt :In order to reveal hidden features in data, subspace representation techniques [11,12,15,18,36] have been developed to transform data to its significant feature space. They bring out good performance in characterizing global structures of real-world data and have been widely applied in computer vision [3], image processing [51], and system identification [40] to handle various types of visual scenarios such as motion, emotion, illumination, texture, etc. Moreover, sufficient mathematical theories in subspace techniques make them capable of dealing with noise and outliers [5,26].
Representation learning using step-based deep multi-modal autoencoders
2019, Pattern RecognitionMotion recognition and synthesis based on 3D sparse representation
2015, Signal Processing