Elsevier

Neurocomputing

Volume 269, 20 December 2017, Pages 21-28
Neurocomputing

Image categorization using non-negative kernel sparse representation

https://doi.org/10.1016/j.neucom.2016.08.144Get rights and content

Abstract

Sparse representation of signals have become an important tool in computer vision. In many computer vision applications such as image denoising, image super-resolution and object recognition, sparse representations have produced remarkable performances. Sparse representation models often contain two stages: sparse coding and dictionary learning. In this paper, we propose a non-linear non-negative sparse representation model: NNK-KSVD. In the sparse coding stage, a non-linear update rule is proposed to obtain the sparse matrix. In the dictionary learning stage, the proposed model extends the kernel KSVD by embedding the non-negative sparse coding. The proposed non-negative kernel sparse representation model was evaluated on several public image datasets for the task of classification. Experimental results show that by exploiting the non-linear structure in images and utilizing the ‘additive’ nature of non-negative sparse coding, promising classification performance can be obtained. Moreover, the proposed sparse representation method was also evaluated in image retrieval tasks, competitive results were obtained.

Introduction

In recent years, sparse representation of signals have become an important tool in computer vision. In many applications in computer vision, such as image denoising, image super-resolution and object recognition, sparse representations have produced remarkable performance [1], [2], [3]. It has also been verified that sparse representation or sparse coding can achieve good outcomes in many image classification tasks [4], [5], [6]. The reason of the success of sparse coding is that a signal Y can be well represented by a linear combination of a sparse vector x and a given dictionary D, namely, if x and D can be properly found, then Y can be well approximated as: YDx.

There are two methods to obtain a dictionary for sparse representation: using an existing dictionary and dictionary learning. Using an existing dictionary means to choose a pre-computed basis such as wavelet basis and curvelet basis. However, in different computer vision tasks, one often needs to learn a dictionary from given sample images, then better image representation can be expected from such a task-specific dictionary [7].

Many algorithms have been proposed to tackle the problem of dictionary learning, the method of optimal directions (MOD) [8] and the KSVD algorithm [9] are two of the most well-known methods. The dictionaries learned by MOD and KSVD are linear representations of the data. However, in many computer vision applications, one has to face non-linear distributions of the data, using a linear dictionary learning model often leads to poor performance. Therefore several non-linear dictionary learning methods have been proposed. Among them, a recently proposed method kernel KSVD (KKSVD)[10] has shown its ability to obtain better image description than the linear models. The kernel KSVD includes two steps: sparse coding and dictionary update. In sparse coding stage, the kernel orthogonal matching pursuit (KOMP) is used for seeking the sparse codes. Once the sparse codes are obtained, the kernerlized KSVD is then used in the second stage for dictionary update. It is demonstrated that the kernel KSVD can provide better image classification performance than the traditional linear dictionary learning models.

Although the kernel KSVD can outperform its linear counterparts by introducing the non-linear learning procedure, the learned sparse vector and the dictionary yet comprise both additive and subtractive interactions. From the perspective of biological modeling, the existence of both the additive and subtractive elements in the sparse representations of signals is contrary to the non-negativity of neural firing rates [11], [12]. Moreover, the negative and positive elements in the representations may induce the ‘cancel each other out’ phenomenon, which is contrary to the intuitive notion of combining non-negative representations [13]. Therefore, many researchers have claimed to use non-negative sparse representations in vision-related applications [14], [15]. The non-negative sparse representations are based on the constraints that the input data Y, the dictionary D, and the sparse vector x are all non-negative.

The non-negative sparse representation has been successfully applied in many computer vision applications, such as face recognition [14], [16], motion extraction [16], [17], image classification and retrieval [18], [19]. Nevertheless, these non-negative sparse representations are all based on linear learning models, therefore their ability to capture the non-linear distributions of data is limited.

Motivated by this drawback of the existing non-negative sparse representation models, in this paper, we propose a non-linear and non-negative sparse model: non-negative kernel KSVD (NNK-KSVD), which integrates the distinctive features of the non-linear models and the non-negative models. In NNK-KSVD, the non-negative constraints are embedded into the kernel KSVD for sparse coding and dictionary learning. With the non-negative constraints, the new update rules for sparse vector searching and dictionary learning of kernel KSVD are introduced. The proposed NNK-KSVD sparse model was tested on several benchmark image datasets for the tasks of classification and retrieval, state-of-the-art results were obtained.

Note that this paper is an extension work of our previous conference paper [20]. Compare to [20], more details of the proposed model were introduced in this paper, and the proposed model was evaluated on more image datasets to verify the effectiveness. Especially, in this paper the proposed model was employed for the tasks of image retrieval, competitive results were obtained as well.

The rest of this paper is organized as follows. In Section 2, some related work on non-negative sparse coding and kernel sparse coding is briefly reviewed. The proposed NNK-KSVD sparse model is introduced in Section 3. The experiments and the results are given in Section 4. Finally, we draw the conclusions in Section 5.

Section snippets

Related work

In this section, we will briefly review some related work in three aspects: (1) Image classification methods based on sparse image representations; (2) Sparse coding models with non-negative constraints; and (3) Non-linear sparse coding models and their applications.

One of the widely used methods for image classification is Bag of Words (BoW) [21]. A compact image representation can be obtained through BoW. Moreover, BoW is robust to scale and rotation variances. BoW consists of three modules:

The non-negative kernel KSVD

In this section, the proposed NNK-KSVD model is introduced. First, a brief review of the non-negative sparse coding and non-linear dictionary learning method kernel KSVD is given, then the proposed NNK-KSVD model is discussed.

Experiments and results

In order to evaluate the proposed sparse representation model, we first examined the effectiveness of the proposed method in image classification tasks, three open benchmark image datasets were used for evaluation. In particular, as the proposed sparse model can be naturally used in image retrieval tasks, the performance of the proposed model in image retrieval was investigated as well.

Conclusion

In this paper, we propose a non-linear and non-negative sparse coding model NNK-KSVD. The proposed model extended the kernel KSVD by embedding the non-negative sparse coding. The proposed model was tested on several benchmark image datasets, experimental results show that by exploiting the non-linear structure in images and utilizing the ‘additive’ nature of non-negative sparse coding, better classification performance can be expected. Future work includes more theoretical analysis and

Yungang Zhang received B.E. and M.S. in computer science from Yunnan Normal University, Kunming, Yunnan, China, in 1998 and 2005, respectively. And he received PhD in computer science from University of Liverpool, UK, in 2014. He is now with the Department of Computer Science, Yunnan Normal University. His research interests include image analysis and machine learning.

References (48)

  • M. Elad et al.

    Image denoising via sparse and redundant representations over learned dictionaries

    IEEE Trans. Image Process.

    (2006)
  • YangJ. et al.

    Image super-resolution as sparse representation of raw image patches

    Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08)

    (2008)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • YangJ. et al.

    Linear spatial pyramid matching using sparse coding for image classification

    Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09)

    (2009)
  • HuangJ. et al.

    Supervised and projected sparse coding for image classification

    Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI’13)

    (2013)
  • ZhangY. et al.

    Learning structured low-rank representations for image classification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2013)
  • J. Mairal et al.

    Task-driven dictionary learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • K. Engan et al.

    Method of optimal directions for frame design

    (1999)
  • M. Aharon et al.

    The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation

    IEEE Trans. Signal Process.

    (2006)
  • H.V. Nguyen et al.

    Design of non-linear kernel dictionaries for object recognition

    IEEE Trans. Image Process.

    (2013)
  • LeeD.D. et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • P.O. Hoyer et al.

    A multi-layer sparse coding network learns contour coding from natural images

    Vis. Res.

    (2002)
  • P.O. Hoyer

    Non-negative sparse coding

    Proceedings of IEEE Workshop on Neural Networks for Signal Processing

    (2002)
  • P.O. Hoyer

    Non-negative matrix factorization with sparseness constraints

    J. Mach. Learn. Res.

    (2004)
  • R. Zass et al.

    Nonnegative sparse PCA

    Advances in Neural Information Processing Systems

    (2007)
  • C. Vollmer et al.

    Sparse coding of human motion trajectories with non-negative matrix factorization

    Neurocomputing

    (2014)
  • T. Guthier et al.

    Non-negative sparse coding for motion extraction

    Proceedings of International Joint Conference on Neural Networks (IJCNN)

    (2013)
  • ZhangC. et al.

    Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition

    Comput. Vis. Image Underst.

    (2014)
  • ZouF. et al.

    Nonnegative sparse coding induced hashing for image copy detection

    Neurocomputing

    (2013)
  • ZhangY. et al.

    Non-negative kernel sparse coding for image classification

    Proceedings of the 2015 International Conference on Intelligence Science and Big Data Engineering. Image and Video Data Engineering, Part I, Lecture Notes in Computer Science

    (2015)
  • WuJ. et al.

    Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel

    Proceedings of the International Conference on Computer Vision

    (2009)
  • J.C.V. Gemert et al.

    Kernel codebooks for scene categorization

    Proceedings of the European Conference on Computer Vision

    (2008)
  • O. Boiman et al.

    In defense of nearest-neighbor based image classification

    Proceedings of the Computer Vision and Pattern Recognition

    (2008)
  • J. Philbin et al.

    Object retrieval with large vocabularies and fast spatial matching

    Proceedings of the Computer Vision and Pattern Recognition

    (2007)
  • Cited by (9)

    • Discriminant sparse and collaborative preserving embedding for bearing fault diagnosis

      2018, Neurocomputing
      Citation Excerpt :

      In recent years, sparse representation has attracted the attentions of researchers, which was initially introduced as an extension of traditional signal representations such as Fourier and wavelet [26,27]. Up to now, SR has been successfully applied in many practical applications in the field of signal processing [36,37], image processing [38,39] and object recognition [40–43]. Collaborative representation-based approaches treat a set of training samples as the basis, and assume that these training data are capable of linearly representing a given sample.

    View all citing articles on Scopus

    Yungang Zhang received B.E. and M.S. in computer science from Yunnan Normal University, Kunming, Yunnan, China, in 1998 and 2005, respectively. And he received PhD in computer science from University of Liverpool, UK, in 2014. He is now with the Department of Computer Science, Yunnan Normal University. His research interests include image analysis and machine learning.

    The project is funded by Natural Science Foundation China 61462097.

    View full text