Dual robust regression for pattern classification
Introduction
Pattern classification is the core problem in the field of pattern recognition. It has drawn intensive interest and shown wide application value in practical pattern recognition systems. In recent decades, numerous pattern classification methods have been developed. Ferrari-Trecate et al. combine clustering, weighted least-squares and linear classifier to solve the identification of discrete-time hybrid systems [1]. Garatti et al. enhance the classification performance for two kinds of Leukaemia by using an unsupervised scheme without any knowledge about the pathology of patients [2]. Fractional-Grey Wolf optimizer-based kernel weighted regression model is proposed for multiview face video super-resolution [3]. Su et al. proposed a robust self-weighted version of the seamless- penalty estimator, which can converge to the oracle estimator, for linear regression [4]. Huang et al. developed v-soft margin multitask learning logistic regression to solve the large-scale high dimensional document classification problem [5]. Based on the extreme learning machine (ELM) [6], the neural response based ELM is developed for image classification by using multifeature mapping and elastic-net regularization [7]. Liu et al. discussed the advent of ELM to solve the symbolic data classification problem. In addition, the authors evaluated some key properties of ELM, such as generalization ability, time complexity and noise-resistance ability [8]. In particular, regression-based classification (RC) methods have attracted greater attention from researchers. RC methods can reveal the intrinsic subspace structure of samples in a class under the assumption that the within-class samples lie in a low-dimensional subspace. The RC methods can be roughly divided into two categories: reconstruction error-based methods and discriminative methods. Table 1 lists some representative RC methods.
It is necessary to refer back to the nearest neighbor classification (NN) method before reviewing the reconstruction error-based methods. As is well known, NN is one of the most fundamental classification methods. In NN, each sample can be considered a point in a high-dimensional space. Given a test sample, NN actually computes the distances between the test sample and training samples. Thus, the classification label is determined by searching the nearest sample. The Nearest feature line (NFL) was developed based on NN [9]. In the high-dimensional space, NFL builds a lot of feature line between any two samples belonging to the same class. NFL searches the classification label by finding the closest feature line to the test sample. Compared with NFL, nearest feature plane (NFP) employs at least three samples of the same class to construct feature plane in the high dimensional space. The test sample belongs to the same class as the samples that formulate the nearest feature plane [10]. Nearest feature space method is an extension of NFP since the authors accommodate the prototype capacity by extending the geometrical concepts of plane to space. Different from NFP, NFS uses all samples from the same class to build the feature space. In summary, if NN is considered a 1-to-1 metric scheme, then NFL is a 1-to-2 metric scheme, NFP is a 1-to-3 metric scheme and NFS is a 1-to-n metric scheme, where n is the sample number of each class.
The above methods search the class label of the test sample by computing the Euclidean distance between the test sample and geometry structure, which are constructed by the samples from the same class in the high-dimensional space. However, these methods are sensitive to noises. To improve the robustness of the classification model, reconstruction error-based methods were developed. Different from NN-like methods, the linear regression-based classifier (LRC) used training samples of each class to represent the test sample respectively and computed the reconstruction error for each class [11]. The class label of the test sample is determined according to the minimal reconstruction error. Compared with LRC, Collaborative representation-based classifier (CRC) employed ridge regression to represent the test sample using all of the training samples [12]. CRC used the corresponding coefficients associated with each class to compute the reconstruction error. Wright et al. presented the sparse representation-based classifier (SRC) inspired by compressed sensing [13]. SRC assumed that the test sample can be sparsely represented by a few similar samples. The norm is used to constrain the representation coefficients in SRC. Xu et al. introduced a two-phase sparse representation method to further improve the accuracy and efficiency of SRC [20]. Shang et al. combined the advantages of spare representation and local similarity preservation to propose an unsupervised feature selection method for the clustering task [21]. In reference [22], the authors concluded that the norm is better than the norm in the case of a large-scale dictionary. LRC, CRC and SRC applied the norm to characterize the error term. However, some works demonstrated that the norm of error term achieves better performance. Yang et al. concluded that the norm is the best choice when errors fit a Gaussian distribution, while the norm is better when the errors fit a Laplacian distribution [23]. In practical applications, the error does not fit a Gaussian distribution or Laplacian distribution. To overcome this problem, robust sparse representation-based classification methods have been developed [14], [15], [24]. The common feature of these methods is that they imposed weights on image pixels to adjust the distribution of the error image. Yang et al. took advantage of the low-rank property and presented a nuclear norm based matrix regression model to make full use of the 2D structural of the image. The authors claimed that the nuclear norm is equivalent to the norm of a singular value, which can be considered the second-order sparsity [16]. Based on this work, Xie et al. proposed a bi-weighted matrix regression model to solve facial images with structural noises [25]. Qian et al. introduced a low-rank regularized error term to ridge regression by observing that the error images are approximately low-rank when handling facial images with contiguous occlusion [26], [27]. Subsequently, Michael et al. chose the tailed loss function to describe the error term in conjunction with low-rank regularization for robust facial image identification [28]. In addition, Xu et al. investigated the role of nonnegative representation in pattern classification and found that nonnegative representation can simultaneously enhance the representation power of homogeneous images and limit the heterogeneous images [29]. There are some works aiming to improve the robustness of the model [30], [31], [32], [33].
Discriminative methods mainly learn the projection matrix to transform the samples into a novel representation space for classification. Least square regression (LSR) is a basal regression-based discriminative technology and has been widely used in pattern classification. Naturally, the partial least square regression and weighted least square regression were developed to improve the performance of the conventional LSR. Xiang et al. presented a discriminative LSR for multiclass classification by enlarging the distances between different classes [17]. Compared with LSR and DLSR, retargeted LSR learned the regression targets from data directly and guaranteed that each sample is correctly classified with large margin [34]. To further enhance the discriminative power of projection matrix, Xiang et al. presented a low-rank linear regression model and learned a low-rank projection matrix for classification [18]. In reference [35], the authors show that low-rank linear regression is equivalent to performing linear regression in linear discriminative analysis subspace. Zhang et al. developed a compact and discriminative framework for classification by employing the elastic-net regularization to explore the intrinsic structure of different classes [19]. Fang et al. presented the regularized label relaxation linear regression model for classification, which has more freedom to fit the labels and enlarge the margins between different classes [36]. In [37], the authors proposed a novel discriminative regression framework to solve the multiview feature learning problem. These methods projected samples onto a subspace directly but failed to consider outliers in realistic datasets owing to occlusions and random noises. As is known, robust PCA and its extended methods model the data as the sum of clean data (low-rank matrix) and outliers (sparse matrix). Inspired by Robust PCA, Huang et al. proposed the low-rank-based robust regression model (LR-RR) by combining RPCA and robust regression into a unified framework [38]. LR-RR can remove the noises in training samples and learn the projection matrix iteratively. In the testing stage, LR-RR uses RPCA to clean the test samples and then employs the learned projection matrix to obtain the class label.
The problem of LR-RR is that it requires a batch of test samples to be processed at the same time. In other words, LR-RR employs the intrinsic relationships between the test samples. It is difficult to implement this scheme in real-world applications. For LR-RR, it cannot guarantee the robustness against noises when LR-RR does not use RPCA in the testing stage. To solve this problem, we propose a dual robust regression model for classification (DualRR) which combines the advantages of reconstruction error-based methods and regression-based discriminative methods. In the training stage, the double low-rank robust regression model (DLR) is proposed to clean the data and learn the projection matrix with more discriminative power. The robust regression representation model is used to reduce the influence of noises in test samples. The overview of the proposed framework is shown in Fig. 1.
The main contributions of our model are summarized as follows:
- •
We propose a novel framework called dual robust regression for pattern classification. Unlike previous regression-based methods, DualRR employs the discriminative model in the training stage to clean the data and learn the projection matrix. In the testing stage, the robust regression representation model is used to reconstruct the test sample and decrease noises such that the robustness of the proposed model in dealing with various complicated cases can be enhanced.
- •
The double low-rank-based robust regression model is proposed to obtain the projection matrix with more discriminative power. We use the augmented Lagrange multipliers algorithm to solve the model and achieve the convergence of the optimization algorithm. Compared with LR-RR, DLR assumes that the projection matrix is a low-rank matrix. In this way, it can discover the low-rank structure and reduce the redundant information.
- •
The regression-based method is employed to reconstruct the test sample. Thus the learned projection matrix is used to achieve the class label. Our scheme does not require a batch of test samples to be processed together as in LR-RR.
The remainder of this paper is organized as follows. Section 2 describes the proposed dual robust regression for classification. Section 3 provides the convergence and complexity of the proposed optimization algorithm. Section 4 presents the comparisons with the related technologies. Section 5 presents experiments on publicly available databases, and Section 5 finally concludes the paper.
Section snippets
Dual robust regression-based classification
In this section, we first review the low-rank robust regression (LR-RR) for classification with a supervised learning scheme. The proposed dual robust regression-based classification framework is then described, including the double low-rank robust regression model for the training stage and robust regression representation model for the testing stage.
Comparison with related works
In this section, we compare the proposed model with related techniques and different types of classification methods as follows:
DualRR vs. LR-RR. In real-world applications, the classical discriminative methods encounter difficulty in achieving improved performance when the test images are corrupted by noises. To solve this problem, LR-RR borrows the idea of robust PCA to remove the noises of testing samples and to then compute the decision results in conjunction with the projection matrix.
Experiments
In this section, we evaluate the performance of the proposed model on the LFW (facial image), FRGC (facial image), CUHK (face sketch), PolyU Palm, NUST-RF(facial image) and Caltech 101 databases. The details of the databases are listed in Table 3. For different tasks, we mainly compare our model with state-of-the-art regression-based classification methods, including the linear regression-based classifier (LRC), collaborative representation-based classifier (CRC), sparse representation-based
Conclusions
This paper proposes a novel dual robust regression model for pattern classification. Our model employs double low-rank to clean the training data and learn the projection mapping. The robust regression representation model is used to obtain the reconstruction sample in order to approximate the test sample and reducing the noises in the testing stage. However, DualRR requires greater computational load when dealing with large-scale data because DualRR should iteratively compute the singular
CRediT authorship contribution statement
Jianjun Qian: Writing - original draft, Conceptualization, Methodology, Software. Shumin Zhu: Software, Visualization, Investigation. Wai Keung Wong: Supervision, Writing - review & editing. Hengmin Zhang: Formal analysis. Zhihui Lai: Validation. Jian Yang: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Science Fund of China under Grant Nos. 61876083, U1713208 and 61906067, in part by the research Grant of Hong Kong Scholars Program and The Hong Kong Polytechnic University (Project Code: YZ2K).
References (50)
- et al.
A clustering technique for the identification of piecewise affine systems
Automatica
(2003) - et al.
Beyond sparsity: The role of l1-optimizer in pattern classification
Pattern Recognition
(2012) - et al.
Bi-weighted robust matrix regression for face recognition
Neurocomputing
(2017) - et al.
Robust nuclear norm regularized regression for face recognition with occlusion
Pattern Recognition
(2015) - et al.
Sparse, collaborative, or nonnegative representation: Which helps pattern classification?
Pattern Recognition
(2019) - et al.
Weighted sparse coding regularized nonconvex matrix regression for robust face recognition
Information Sciences
(2017) - et al.
Online schatten quasi-norm minimization for robust principal component analysis
Information Sciences
(2019) - et al.
Adaptive-weighting discriminative regression for multi-view classification
Pattern Recognition
(2019) - et al.
Aspect based fine-grained sentiment analysis for online reviews
Information Sciences
(2019) - et al.
An unsupervised clustering approach for leukaemia classification based on dna micro-arrays data
Intelligent Data Analysis
(2007)
A robust self-weighted SELO regression model
International Journal of Machine Learning & Cybernetics
v-soft margin multi-task learning logistic regression
International Journal of Machine Learning and Cybernetics
Neural-response-based extreme learning machine for image classification
IEEE Transactions on Neural Networks and Learning Systems
An experimental study on symbolic extreme learning machine
International Journal of Machine Learning and Cybernetics
Face recognition using the nearest feature line method
IEEE Transactions on Neural Networks
Discriminant waveletfaces and nearest feature classifiers for face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Linear regression for face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust face recognition via sparse representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Maximum correntropy criterion for robust face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes
IEEE Transactions on Pattern Analysis and Machine Intelligence
Discriminative least squares regression for multiclass classification and feature selection
IEEE Transactions on Neural Networks and Learning Systems
Cited by (10)
A joint-norm distance metric 2DPCA for robust dimensionality reduction
2023, Information SciencesA robust mixed error coding method based on nonconvex sparse representation
2023, Information SciencesA new weakly supervised discrete discriminant hashing for robust data representation
2022, Information SciencesCitation Excerpt :For example, in a dataset marked with birds, the hash codes of sparrows and ostriches are the same, and it is impossible for users to query accurate information. Data with noise also reduce the effect of feature extraction [32–35]. Since most of the data in our lives do not have high-quality label information, weakly supervised algorithms have received more attention [36–38].
Double L<inf>2,p</inf>-norm based PCA for feature extraction
2021, Information SciencesCitation Excerpt :A common point of both methods is the derivation of the projection vectors by a greedy strategy. The non-greedy strategy designed in [3] has been introduced to solve the objective of Block PCA-L1 [28] and 2DPCA-L1 [29]. To have rotational invariance, the works in [5,30] proposed LF-norm based 2DPCA, which respectively maximize the variance of data and minimize the reconstructed error.
Sparse fuzzy two-dimensional discriminant local preserving projection (SF2DDLPP) for robust image feature extraction
2021, Information SciencesCitation Excerpt :Unfortunately, these two classical algorithms are invalid for small sample size (SSS) problems [8]. Many feature extraction algorithms [9–12] have been advanced to solve this problem. In the past ten years, many algorithms based on manifold learning such as Isomap [13], Local Linear Embedding [14] and Laplacian feature mapping [15] have been advanced to discover the low-dimensional nonlinear data structures hidden in the observation space.