Elsevier

Information Sciences

Volume 546, 6 February 2021, Pages 1014-1029
Information Sciences

Dual robust regression for pattern classification

https://doi.org/10.1016/j.ins.2020.09.062Get rights and content

Abstract

Linear regression-based and extended methods have been widely used in pattern classification. These methods can be roughly divided into two categories: reconstruction error-based methods and discriminative methods. The reconstruction error-based methods search the target label by learning the representation coefficients to compute the minimal reconstruction error. The goal of the discriminative methods is to learn the projection matrix to predict the target label of the image. To combine the advantages of these two kinds of regression-based methods, this paper presents a dual robust regression framework (DualRR) for pattern classification. In the training stage, a double low-rank robust regression model (DLR) is proposed to learn the projection matrix. In DLR, low-rank robust regression motivates us to model the data as the sum of a low-rank clean data and sparse noise matrix. The low-rank is further used to constrain the projection matrix to enhance the discriminative performance. In the testing stage, the proposed framework employs a robust regression representation model to learn the optimal representation coefficients and obtain the reconstruction sample to approximate the test sample. We thus apply the reconstruction sample to search the classification label by using the projection matrix learned in the training stage. Extensive experiments are conducted on six public available databases, namely, LFW, FRGC, CUHK Sketch, PolyU Palm, NUST-RF and Caltech 101, demonstrating the merits of the proposed model over state-of-the-art regression-based classification methods.

Introduction

Pattern classification is the core problem in the field of pattern recognition. It has drawn intensive interest and shown wide application value in practical pattern recognition systems. In recent decades, numerous pattern classification methods have been developed. Ferrari-Trecate et al. combine clustering, weighted least-squares and linear classifier to solve the identification of discrete-time hybrid systems [1]. Garatti et al. enhance the classification performance for two kinds of Leukaemia by using an unsupervised scheme without any knowledge about the pathology of patients [2]. Fractional-Grey Wolf optimizer-based kernel weighted regression model is proposed for multiview face video super-resolution [3]. Su et al. proposed a robust self-weighted version of the seamless-L0 penalty estimator, which can converge to the oracle estimator, for linear regression [4]. Huang et al. developed v-soft margin multitask learning logistic regression to solve the large-scale high dimensional document classification problem [5]. Based on the extreme learning machine (ELM) [6], the neural response based ELM is developed for image classification by using multifeature mapping and elastic-net regularization [7]. Liu et al. discussed the advent of ELM to solve the symbolic data classification problem. In addition, the authors evaluated some key properties of ELM, such as generalization ability, time complexity and noise-resistance ability [8]. In particular, regression-based classification (RC) methods have attracted greater attention from researchers. RC methods can reveal the intrinsic subspace structure of samples in a class under the assumption that the within-class samples lie in a low-dimensional subspace. The RC methods can be roughly divided into two categories: reconstruction error-based methods and discriminative methods. Table 1 lists some representative RC methods.

It is necessary to refer back to the nearest neighbor classification (NN) method before reviewing the reconstruction error-based methods. As is well known, NN is one of the most fundamental classification methods. In NN, each sample can be considered a point in a high-dimensional space. Given a test sample, NN actually computes the distances between the test sample and training samples. Thus, the classification label is determined by searching the nearest sample. The Nearest feature line (NFL) was developed based on NN [9]. In the high-dimensional space, NFL builds a lot of feature line between any two samples belonging to the same class. NFL searches the classification label by finding the closest feature line to the test sample. Compared with NFL, nearest feature plane (NFP) employs at least three samples of the same class to construct feature plane in the high dimensional space. The test sample belongs to the same class as the samples that formulate the nearest feature plane [10]. Nearest feature space method is an extension of NFP since the authors accommodate the prototype capacity by extending the geometrical concepts of plane to space. Different from NFP, NFS uses all samples from the same class to build the feature space. In summary, if NN is considered a 1-to-1 metric scheme, then NFL is a 1-to-2 metric scheme, NFP is a 1-to-3 metric scheme and NFS is a 1-to-n metric scheme, where n is the sample number of each class.

The above methods search the class label of the test sample by computing the Euclidean distance between the test sample and geometry structure, which are constructed by the samples from the same class in the high-dimensional space. However, these methods are sensitive to noises. To improve the robustness of the classification model, reconstruction error-based methods were developed. Different from NN-like methods, the linear regression-based classifier (LRC) used training samples of each class to represent the test sample respectively and computed the reconstruction error for each class [11]. The class label of the test sample is determined according to the minimal reconstruction error. Compared with LRC, Collaborative representation-based classifier (CRC) employed ridge regression to represent the test sample using all of the training samples [12]. CRC used the corresponding coefficients associated with each class to compute the reconstruction error. Wright et al. presented the sparse representation-based classifier (SRC) inspired by compressed sensing [13]. SRC assumed that the test sample can be sparsely represented by a few similar samples. The L1 norm is used to constrain the representation coefficients in SRC. Xu et al. introduced a two-phase sparse representation method to further improve the accuracy and efficiency of SRC [20]. Shang et al. combined the advantages of spare representation and local similarity preservation to propose an unsupervised feature selection method for the clustering task [21]. In reference [22], the authors concluded that the L1 norm is better than the L2 norm in the case of a large-scale dictionary. LRC, CRC and SRC applied the L2 norm to characterize the error term. However, some works demonstrated that the L1 norm of error term achieves better performance. Yang et al. concluded that the L2 norm is the best choice when errors fit a Gaussian distribution, while the L1 norm is better when the errors fit a Laplacian distribution [23]. In practical applications, the error does not fit a Gaussian distribution or Laplacian distribution. To overcome this problem, robust sparse representation-based classification methods have been developed [14], [15], [24]. The common feature of these methods is that they imposed weights on image pixels to adjust the distribution of the error image. Yang et al. took advantage of the low-rank property and presented a nuclear norm based matrix regression model to make full use of the 2D structural of the image. The authors claimed that the nuclear norm is equivalent to the L1 norm of a singular value, which can be considered the second-order sparsity [16]. Based on this work, Xie et al. proposed a bi-weighted matrix regression model to solve facial images with structural noises [25]. Qian et al. introduced a low-rank regularized error term to ridge regression by observing that the error images are approximately low-rank when handling facial images with contiguous occlusion [26], [27]. Subsequently, Michael et al. chose the tailed loss function to describe the error term in conjunction with low-rank regularization for robust facial image identification [28]. In addition, Xu et al. investigated the role of nonnegative representation in pattern classification and found that nonnegative representation can simultaneously enhance the representation power of homogeneous images and limit the heterogeneous images [29]. There are some works aiming to improve the robustness of the model [30], [31], [32], [33].

Discriminative methods mainly learn the projection matrix to transform the samples into a novel representation space for classification. Least square regression (LSR) is a basal regression-based discriminative technology and has been widely used in pattern classification. Naturally, the partial least square regression and weighted least square regression were developed to improve the performance of the conventional LSR. Xiang et al. presented a discriminative LSR for multiclass classification by enlarging the distances between different classes [17]. Compared with LSR and DLSR, retargeted LSR learned the regression targets from data directly and guaranteed that each sample is correctly classified with large margin [34]. To further enhance the discriminative power of projection matrix, Xiang et al. presented a low-rank linear regression model and learned a low-rank projection matrix for classification [18]. In reference [35], the authors show that low-rank linear regression is equivalent to performing linear regression in linear discriminative analysis subspace. Zhang et al. developed a compact and discriminative framework for classification by employing the elastic-net regularization to explore the intrinsic structure of different classes [19]. Fang et al. presented the regularized label relaxation linear regression model for classification, which has more freedom to fit the labels and enlarge the margins between different classes [36]. In [37], the authors proposed a novel discriminative regression framework to solve the multiview feature learning problem. These methods projected samples onto a subspace directly but failed to consider outliers in realistic datasets owing to occlusions and random noises. As is known, robust PCA and its extended methods model the data as the sum of clean data (low-rank matrix) and outliers (sparse matrix). Inspired by Robust PCA, Huang et al. proposed the low-rank-based robust regression model (LR-RR) by combining RPCA and robust regression into a unified framework [38]. LR-RR can remove the noises in training samples and learn the projection matrix iteratively. In the testing stage, LR-RR uses RPCA to clean the test samples and then employs the learned projection matrix to obtain the class label.

The problem of LR-RR is that it requires a batch of test samples to be processed at the same time. In other words, LR-RR employs the intrinsic relationships between the test samples. It is difficult to implement this scheme in real-world applications. For LR-RR, it cannot guarantee the robustness against noises when LR-RR does not use RPCA in the testing stage. To solve this problem, we propose a dual robust regression model for classification (DualRR) which combines the advantages of reconstruction error-based methods and regression-based discriminative methods. In the training stage, the double low-rank robust regression model (DLR) is proposed to clean the data and learn the projection matrix with more discriminative power. The robust regression representation model is used to reduce the influence of noises in test samples. The overview of the proposed framework is shown in Fig. 1.

The main contributions of our model are summarized as follows:

  • We propose a novel framework called dual robust regression for pattern classification. Unlike previous regression-based methods, DualRR employs the discriminative model in the training stage to clean the data and learn the projection matrix. In the testing stage, the robust regression representation model is used to reconstruct the test sample and decrease noises such that the robustness of the proposed model in dealing with various complicated cases can be enhanced.

  • The double low-rank-based robust regression model is proposed to obtain the projection matrix with more discriminative power. We use the augmented Lagrange multipliers algorithm to solve the model and achieve the convergence of the optimization algorithm. Compared with LR-RR, DLR assumes that the projection matrix is a low-rank matrix. In this way, it can discover the low-rank structure and reduce the redundant information.

  • The regression-based method is employed to reconstruct the test sample. Thus the learned projection matrix is used to achieve the class label. Our scheme does not require a batch of test samples to be processed together as in LR-RR.

The remainder of this paper is organized as follows. Section 2 describes the proposed dual robust regression for classification. Section 3 provides the convergence and complexity of the proposed optimization algorithm. Section 4 presents the comparisons with the related technologies. Section 5 presents experiments on publicly available databases, and Section 5 finally concludes the paper.

Section snippets

Dual robust regression-based classification

In this section, we first review the low-rank robust regression (LR-RR) for classification with a supervised learning scheme. The proposed dual robust regression-based classification framework is then described, including the double low-rank robust regression model for the training stage and robust regression representation model for the testing stage.

Comparison with related works

In this section, we compare the proposed model with related techniques and different types of classification methods as follows:

DualRR vs. LR-RR. In real-world applications, the classical discriminative methods encounter difficulty in achieving improved performance when the test images are corrupted by noises. To solve this problem, LR-RR borrows the idea of robust PCA to remove the noises of testing samples and to then compute the decision results in conjunction with the projection matrix.

Experiments

In this section, we evaluate the performance of the proposed model on the LFW (facial image), FRGC (facial image), CUHK (face sketch), PolyU Palm, NUST-RF(facial image) and Caltech 101 databases. The details of the databases are listed in Table 3. For different tasks, we mainly compare our model with state-of-the-art regression-based classification methods, including the linear regression-based classifier (LRC), collaborative representation-based classifier (CRC), sparse representation-based

Conclusions

This paper proposes a novel dual robust regression model for pattern classification. Our model employs double low-rank to clean the training data and learn the projection mapping. The robust regression representation model is used to obtain the reconstruction sample in order to approximate the test sample and reducing the noises in the testing stage. However, DualRR requires greater computational load when dealing with large-scale data because DualRR should iteratively compute the singular

CRediT authorship contribution statement

Jianjun Qian: Writing - original draft, Conceptualization, Methodology, Software. Shumin Zhu: Software, Visualization, Investigation. Wai Keung Wong: Supervision, Writing - review & editing. Hengmin Zhang: Formal analysis. Zhihui Lai: Validation. Jian Yang: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Science Fund of China under Grant Nos. 61876083, U1713208 and 61906067, in part by the research Grant of Hong Kong Scholars Program and The Hong Kong Polytechnic University (Project Code: YZ2K).

References (50)

  • A.B. Deshmukh, N. Usha Rani, Fractional-grey wolf optimizer-based kernel weighted regression model for multi-view face...
  • M. Su et al.

    A robust self-weighted SELO regression model

    International Journal of Machine Learning & Cybernetics

    (2019)
  • C. Huang et al.

    v-soft margin multi-task learning logistic regression

    International Journal of Machine Learning and Cybernetics

    (2019)
  • Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme learning machine: a new learning scheme of feedforward neural...
  • H. Li et al.

    Neural-response-based extreme learning machine for image classification

    IEEE Transactions on Neural Networks and Learning Systems

    (2019)
  • J. Liu et al.

    An experimental study on symbolic extreme learning machine

    International Journal of Machine Learning and Cybernetics

    (2019)
  • S.Z. Li et al.

    Face recognition using the nearest feature line method

    IEEE Transactions on Neural Networks

    (1999)
  • Jen-Tzung Chien et al.

    Discriminant waveletfaces and nearest feature classifiers for face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • I. Naseem et al.

    Linear regression for face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in:...
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: CVPR, IEEE Computer Society, 2011,...
  • R. He et al.

    Maximum correntropy criterion for robust face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • J. Yang et al.

    Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2017)
  • S. Xiang et al.

    Discriminative least squares regression for multiclass classification and feature selection

    IEEE Transactions on Neural Networks and Learning Systems

    (2012)
  • Cited by (10)

    • A new weakly supervised discrete discriminant hashing for robust data representation

      2022, Information Sciences
      Citation Excerpt :

      For example, in a dataset marked with birds, the hash codes of sparrows and ostriches are the same, and it is impossible for users to query accurate information. Data with noise also reduce the effect of feature extraction [32–35]. Since most of the data in our lives do not have high-quality label information, weakly supervised algorithms have received more attention [36–38].

    • Double L<inf>2,p</inf>-norm based PCA for feature extraction

      2021, Information Sciences
      Citation Excerpt :

      A common point of both methods is the derivation of the projection vectors by a greedy strategy. The non-greedy strategy designed in [3] has been introduced to solve the objective of Block PCA-L1 [28] and 2DPCA-L1 [29]. To have rotational invariance, the works in [5,30] proposed LF-norm based 2DPCA, which respectively maximize the variance of data and minimize the reconstructed error.

    • Sparse fuzzy two-dimensional discriminant local preserving projection (SF2DDLPP) for robust image feature extraction

      2021, Information Sciences
      Citation Excerpt :

      Unfortunately, these two classical algorithms are invalid for small sample size (SSS) problems [8]. Many feature extraction algorithms [9–12] have been advanced to solve this problem. In the past ten years, many algorithms based on manifold learning such as Isomap [13], Local Linear Embedding [14] and Laplacian feature mapping [15] have been advanced to discover the low-dimensional nonlinear data structures hidden in the observation space.

    View all citing articles on Scopus
    View full text