Dual robust regression for pattern classification

doi:10.1016/j.ins.2020.09.062

Information Sciences

Volume 546, 6 February 2021, Pages 1014-1029

https://doi.org/10.1016/j.ins.2020.09.062 Get rights and content

Abstract

Linear regression-based and extended methods have been widely used in pattern classification. These methods can be roughly divided into two categories: reconstruction error-based methods and discriminative methods. The reconstruction error-based methods search the target label by learning the representation coefficients to compute the minimal reconstruction error. The goal of the discriminative methods is to learn the projection matrix to predict the target label of the image. To combine the advantages of these two kinds of regression-based methods, this paper presents a dual robust regression framework (DualRR) for pattern classification. In the training stage, a double low-rank robust regression model (DLR) is proposed to learn the projection matrix. In DLR, low-rank robust regression motivates us to model the data as the sum of a low-rank clean data and sparse noise matrix. The low-rank is further used to constrain the projection matrix to enhance the discriminative performance. In the testing stage, the proposed framework employs a robust regression representation model to learn the optimal representation coefficients and obtain the reconstruction sample to approximate the test sample. We thus apply the reconstruction sample to search the classification label by using the projection matrix learned in the training stage. Extensive experiments are conducted on six public available databases, namely, LFW, FRGC, CUHK Sketch, PolyU Palm, NUST-RF and Caltech 101, demonstrating the merits of the proposed model over state-of-the-art regression-based classification methods.

Introduction

Pattern classification is the core problem in the field of pattern recognition. It has drawn intensive interest and shown wide application value in practical pattern recognition systems. In recent decades, numerous pattern classification methods have been developed. Ferrari-Trecate et al. combine clustering, weighted least-squares and linear classifier to solve the identification of discrete-time hybrid systems [1]. Garatti et al. enhance the classification performance for two kinds of Leukaemia by using an unsupervised scheme without any knowledge about the pathology of patients [2]. Fractional-Grey Wolf optimizer-based kernel weighted regression model is proposed for multiview face video super-resolution [3]. Su et al. proposed a robust self-weighted version of the seamless- $L_{0}$ penalty estimator, which can converge to the oracle estimator, for linear regression [4]. Huang et al. developed v-soft margin multitask learning logistic regression to solve the large-scale high dimensional document classification problem [5]. Based on the extreme learning machine (ELM) [6], the neural response based ELM is developed for image classification by using multifeature mapping and elastic-net regularization [7]. Liu et al. discussed the advent of ELM to solve the symbolic data classification problem. In addition, the authors evaluated some key properties of ELM, such as generalization ability, time complexity and noise-resistance ability [8]. In particular, regression-based classification (RC) methods have attracted greater attention from researchers. RC methods can reveal the intrinsic subspace structure of samples in a class under the assumption that the within-class samples lie in a low-dimensional subspace. The RC methods can be roughly divided into two categories: reconstruction error-based methods and discriminative methods. Table 1 lists some representative RC methods.

It is necessary to refer back to the nearest neighbor classification (NN) method before reviewing the reconstruction error-based methods. As is well known, NN is one of the most fundamental classification methods. In NN, each sample can be considered a point in a high-dimensional space. Given a test sample, NN actually computes the distances between the test sample and training samples. Thus, the classification label is determined by searching the nearest sample. The Nearest feature line (NFL) was developed based on NN [9]. In the high-dimensional space, NFL builds a lot of feature line between any two samples belonging to the same class. NFL searches the classification label by finding the closest feature line to the test sample. Compared with NFL, nearest feature plane (NFP) employs at least three samples of the same class to construct feature plane in the high dimensional space. The test sample belongs to the same class as the samples that formulate the nearest feature plane [10]. Nearest feature space method is an extension of NFP since the authors accommodate the prototype capacity by extending the geometrical concepts of plane to space. Different from NFP, NFS uses all samples from the same class to build the feature space. In summary, if NN is considered a 1-to-1 metric scheme, then NFL is a 1-to-2 metric scheme, NFP is a 1-to-3 metric scheme and NFS is a 1-to-n metric scheme, where n is the sample number of each class.

The above methods search the class label of the test sample by computing the Euclidean distance between the test sample and geometry structure, which are constructed by the samples from the same class in the high-dimensional space. However, these methods are sensitive to noises. To improve the robustness of the classification model, reconstruction error-based methods were developed. Different from NN-like methods, the linear regression-based classifier (LRC) used training samples of each class to represent the test sample respectively and computed the reconstruction error for each class [11]. The class label of the test sample is determined according to the minimal reconstruction error. Compared with LRC, Collaborative representation-based classifier (CRC) employed ridge regression to represent the test sample using all of the training samples [12]. CRC used the corresponding coefficients associated with each class to compute the reconstruction error. Wright et al. presented the sparse representation-based classifier (SRC) inspired by compressed sensing [13]. SRC assumed that the test sample can be sparsely represented by a few similar samples. The $L_{1}$ norm is used to constrain the representation coefficients in SRC. Xu et al. introduced a two-phase sparse representation method to further improve the accuracy and efficiency of SRC [20]. Shang et al. combined the advantages of spare representation and local similarity preservation to propose an unsupervised feature selection method for the clustering task [21]. In reference [22], the authors concluded that the $L_{1}$ norm is better than the $L_{2}$ norm in the case of a large-scale dictionary. LRC, CRC and SRC applied the $L_{2}$ norm to characterize the error term. However, some works demonstrated that the $L_{1}$ norm of error term achieves better performance. Yang et al. concluded that the $L_{2}$ norm is the best choice when errors fit a Gaussian distribution, while the $L_{1}$ norm is better when the errors fit a Laplacian distribution [23]. In practical applications, the error does not fit a Gaussian distribution or Laplacian distribution. To overcome this problem, robust sparse representation-based classification methods have been developed [14], [15], [24]. The common feature of these methods is that they imposed weights on image pixels to adjust the distribution of the error image. Yang et al. took advantage of the low-rank property and presented a nuclear norm based matrix regression model to make full use of the 2D structural of the image. The authors claimed that the nuclear norm is equivalent to the $L_{1}$ norm of a singular value, which can be considered the second-order sparsity [16]. Based on this work, Xie et al. proposed a bi-weighted matrix regression model to solve facial images with structural noises [25]. Qian et al. introduced a low-rank regularized error term to ridge regression by observing that the error images are approximately low-rank when handling facial images with contiguous occlusion [26], [27]. Subsequently, Michael et al. chose the tailed loss function to describe the error term in conjunction with low-rank regularization for robust facial image identification [28]. In addition, Xu et al. investigated the role of nonnegative representation in pattern classification and found that nonnegative representation can simultaneously enhance the representation power of homogeneous images and limit the heterogeneous images [29]. There are some works aiming to improve the robustness of the model [30], [31], [32], [33].

Discriminative methods mainly learn the projection matrix to transform the samples into a novel representation space for classification. Least square regression (LSR) is a basal regression-based discriminative technology and has been widely used in pattern classification. Naturally, the partial least square regression and weighted least square regression were developed to improve the performance of the conventional LSR. Xiang et al. presented a discriminative LSR for multiclass classification by enlarging the distances between different classes [17]. Compared with LSR and DLSR, retargeted LSR learned the regression targets from data directly and guaranteed that each sample is correctly classified with large margin [34]. To further enhance the discriminative power of projection matrix, Xiang et al. presented a low-rank linear regression model and learned a low-rank projection matrix for classification [18]. In reference [35], the authors show that low-rank linear regression is equivalent to performing linear regression in linear discriminative analysis subspace. Zhang et al. developed a compact and discriminative framework for classification by employing the elastic-net regularization to explore the intrinsic structure of different classes [19]. Fang et al. presented the regularized label relaxation linear regression model for classification, which has more freedom to fit the labels and enlarge the margins between different classes [36]. In [37], the authors proposed a novel discriminative regression framework to solve the multiview feature learning problem. These methods projected samples onto a subspace directly but failed to consider outliers in realistic datasets owing to occlusions and random noises. As is known, robust PCA and its extended methods model the data as the sum of clean data (low-rank matrix) and outliers (sparse matrix). Inspired by Robust PCA, Huang et al. proposed the low-rank-based robust regression model (LR-RR) by combining RPCA and robust regression into a unified framework [38]. LR-RR can remove the noises in training samples and learn the projection matrix iteratively. In the testing stage, LR-RR uses RPCA to clean the test samples and then employs the learned projection matrix to obtain the class label.

The problem of LR-RR is that it requires a batch of test samples to be processed at the same time. In other words, LR-RR employs the intrinsic relationships between the test samples. It is difficult to implement this scheme in real-world applications. For LR-RR, it cannot guarantee the robustness against noises when LR-RR does not use RPCA in the testing stage. To solve this problem, we propose a dual robust regression model for classification (DualRR) which combines the advantages of reconstruction error-based methods and regression-based discriminative methods. In the training stage, the double low-rank robust regression model (DLR) is proposed to clean the data and learn the projection matrix with more discriminative power. The robust regression representation model is used to reduce the influence of noises in test samples. The overview of the proposed framework is shown in Fig. 1.

The main contributions of our model are summarized as follows:

•
We propose a novel framework called dual robust regression for pattern classification. Unlike previous regression-based methods, DualRR employs the discriminative model in the training stage to clean the data and learn the projection matrix. In the testing stage, the robust regression representation model is used to reconstruct the test sample and decrease noises such that the robustness of the proposed model in dealing with various complicated cases can be enhanced.
•
The double low-rank-based robust regression model is proposed to obtain the projection matrix with more discriminative power. We use the augmented Lagrange multipliers algorithm to solve the model and achieve the convergence of the optimization algorithm. Compared with LR-RR, DLR assumes that the projection matrix is a low-rank matrix. In this way, it can discover the low-rank structure and reduce the redundant information.
•
The regression-based method is employed to reconstruct the test sample. Thus the learned projection matrix is used to achieve the class label. Our scheme does not require a batch of test samples to be processed together as in LR-RR.

The remainder of this paper is organized as follows. Section 2 describes the proposed dual robust regression for classification. Section 3 provides the convergence and complexity of the proposed optimization algorithm. Section 4 presents the comparisons with the related technologies. Section 5 presents experiments on publicly available databases, and Section 5 finally concludes the paper.

Section snippets

Dual robust regression-based classification

In this section, we first review the low-rank robust regression (LR-RR) for classification with a supervised learning scheme. The proposed dual robust regression-based classification framework is then described, including the double low-rank robust regression model for the training stage and robust regression representation model for the testing stage.

Comparison with related works

In this section, we compare the proposed model with related techniques and different types of classification methods as follows:

DualRR vs. LR-RR. In real-world applications, the classical discriminative methods encounter difficulty in achieving improved performance when the test images are corrupted by noises. To solve this problem, LR-RR borrows the idea of robust PCA to remove the noises of testing samples and to then compute the decision results in conjunction with the projection matrix.

Experiments

In this section, we evaluate the performance of the proposed model on the LFW (facial image), FRGC (facial image), CUHK (face sketch), PolyU Palm, NUST-RF(facial image) and Caltech 101 databases. The details of the databases are listed in Table 3. For different tasks, we mainly compare our model with state-of-the-art regression-based classification methods, including the linear regression-based classifier (LRC), collaborative representation-based classifier (CRC), sparse representation-based

Conclusions

This paper proposes a novel dual robust regression model for pattern classification. Our model employs double low-rank to clean the training data and learn the projection mapping. The robust regression representation model is used to obtain the reconstruction sample in order to approximate the test sample and reducing the noises in the testing stage. However, DualRR requires greater computational load when dealing with large-scale data because DualRR should iteratively compute the singular

CRediT authorship contribution statement

Jianjun Qian: Writing - original draft, Conceptualization, Methodology, Software. Shumin Zhu: Software, Visualization, Investigation. Wai Keung Wong: Supervision, Writing - review & editing. Hengmin Zhang: Formal analysis. Zhihui Lai: Validation. Jian Yang: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Science Fund of China under Grant Nos. 61876083, U1713208 and 61906067, in part by the research Grant of Hong Kong Scholars Program and The Hong Kong Polytechnic University (Project Code: YZ2K).

References (50)

G. Ferrari-Trecate et al.
A clustering technique for the identification of piecewise affine systems
Automatica
(2003)
J. Yang et al.
Beyond sparsity: The role of l₁-optimizer in pattern classification
Pattern Recognition
(2012)
J. Xie et al.
Bi-weighted robust matrix regression for face recognition
Neurocomputing
(2017)
J. Qian et al.
Robust nuclear norm regularized regression for face recognition with occlusion
Pattern Recognition
(2015)
J. Xu et al.
Sparse, collaborative, or nonnegative representation: Which helps pattern classification?
Pattern Recognition
(2019)
H. Zhang et al.
Weighted sparse coding regularized nonconvex matrix regression for robust face recognition
Information Sciences
(2017)
X. Jia et al.
Online schatten quasi-norm minimization for robust principal component analysis
Information Sciences
(2019)
M. Yang et al.
Adaptive-weighting discriminative regression for multi-view classification
Pattern Recognition
(2019)
F. Tang et al.
Aspect based fine-grained sentiment analysis for online reviews
Information Sciences
(2019)
D.L.A.M. Simone Garatti et al.
An unsupervised clustering approach for leukaemia classification based on dna micro-arrays data
Intelligent Data Analysis
(2007)

A.B. Deshmukh, N. Usha Rani, Fractional-grey wolf optimizer-based kernel weighted regression model for multi-view face...

M. Su et al.

A robust self-weighted SELO regression model

International Journal of Machine Learning & Cybernetics

(2019)

C. Huang et al.

v-soft margin multi-task learning logistic regression

International Journal of Machine Learning and Cybernetics

(2019)

Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme learning machine: a new learning scheme of feedforward neural...

H. Li et al.

Neural-response-based extreme learning machine for image classification

IEEE Transactions on Neural Networks and Learning Systems

(2019)

J. Liu et al.

An experimental study on symbolic extreme learning machine

International Journal of Machine Learning and Cybernetics

(2019)

S.Z. Li et al.

Face recognition using the nearest feature line method

IEEE Transactions on Neural Networks

(1999)

Jen-Tzung Chien et al.

Discriminant waveletfaces and nearest feature classifiers for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

I. Naseem et al.

Linear regression for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2010)

L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in:...

J. Wright et al.

Robust face recognition via sparse representation

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2009)

M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: CVPR, IEEE Computer Society, 2011,...

R. He et al.

Maximum correntropy criterion for robust face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2011)

J. Yang et al.

Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2017)

S. Xiang et al.

Discriminative least squares regression for multiclass classification and feature selection

IEEE Transactions on Neural Networks and Learning Systems

(2012)

Cited by (10)

Intelligent fault identification in sample imbalance scenarios using robust low-rank matrix classifier with fuzzy weighting factor
2024, Applied Soft Computing
Low-rank matrix learning techniques, especially support matrix machine (SMM) approach, have significantly altered mechanical fault diagnosis by efficiently uncovering correlations within matrix-form data. However, SMM exhibits limitations to eliminate the influence of redundant features and struggles with imbalanced classification scenarios, hampering the precise creation of SMM-based prediction model. To overcome these challenges, this paper introduces an innovative algorithm termed robust low-rank matrix classifier with fuzzy weighting factor (RLRMC-FWF). The core innovation of RLRMC-FWF lies in the integration of an inventive grouping elastic net within its framework. This novel inclusion empowers the control of essential attributes such as low-rank, sparsity, and grouping effects. Particularly noteworthy is grouping effects can autonomously identify relevant features without limitation of sample number. This greatly improving RLRMC-FWF’s ability to extract information from minority class samples in imbalanced data. Furthermore, RLRMC-FWF introduces the incorporation of a fuzzy weighting factor, enabling the assignment of distinct weights to samples originating from different classes. This weighting mechanism refines the treatment of diverse sample classes, enhancing the adaptability and performance of RLRMC-FWF model. The effectiveness of the proposed method is validated through experimentation on two fault datasets associated with roller bearings. The experimental findings unequivocally highlight the exceptional performance of RLRMC-FWF within the domain of roller bearing fault diagnosis.
A joint-norm distance metric 2DPCA for robust dimensionality reduction
2023, Information Sciences
Two-dimensional principal component analysis (2DPCA) is one of the most representative dimensionality reduction methods and the robustness degrades in the presence of noise. To improve the robustness, several 2DPCA methods using similar distance metrics were proposed to achieve the objective of the maximum projection distance or the minimum reconstruction error. However, these two objectives cannot be accomplished simultaneously or are constrained by the same projection matrix. To handle these problems, we propose a generalized robust 2DPCA method called 2DPCA-2-Lp, which joins 2-norm and $l_{p}$ -norm metrics in the objective function. It aims to maximize the ratio of projected vectors to image row vectors and to fulfill the dual objectives of directly maximizing projection distances and indirectly minimizing reconstruction errors. 2DPCA-2-Lp measures the distances between row vectors of matrices rather than the distances between matrices, thus remarkably enhancing the robustness. Furthermore, the relationship between reconstruction errors and projection distances for different parameter values is analyzed theoretically under the joint-norm metric. Then, the closed greedy iterative algorithm is developed for obtaining optimal solutions. Finally, extensive experimental results on four datasets show that 2DPCA-2-Lp outperforms most existing 2DPCA methods in terms of reconstruction and classification performances and has better robustness against noise.
A robust mixed error coding method based on nonconvex sparse representation
2023, Information Sciences
Linear representation based methods have been extensively applied in image recognition, especially for those with noise, illumination changes, and occlusions. However, most existing methods assume a specific distribution for image noise estimation, which is intractable to handle complex variations. Besides, they usually use convex norm to describe the noise sparse and low-rank property, and it is a biased approximation. To address these problems, we propose a novel nonconvex regularized robust mixed error coding (NRRM) method, which uses mixed norms from both 1D and 2D perspectives to model the complex image noise without convex relaxation. In specific, we use weighted $ℓ_{2}$ -norm based robust coding to characterize the sparse noise in images, and weighted matrix nuclear norm to characterize the low-rank noise. Compared with traditional regression approaches, our method can more fine-grained and accurate to capture noise and alleviate its negative influence for robust recognition. Besides, we constrain the representation component in a group-wise manner to weigh the roles of different classes. The NRRM model is solved efficiently by adopting an alternating direction method of multipliers (ADMM) algorithm. Comprehensive experiments on some benchmark face image databases validate the superiority of NRRM over several state-of-the-art linear representation based methods.
A new weakly supervised discrete discriminant hashing for robust data representation
2022, Information Sciences
Citation Excerpt :
For example, in a dataset marked with birds, the hash codes of sparrows and ostriches are the same, and it is impossible for users to query accurate information. Data with noise also reduce the effect of feature extraction [32–35]. Since most of the data in our lives do not have high-quality label information, weakly supervised algorithms have received more attention [36–38].
In real applications, the label information on many data is inaccurate, or a completely reliable label needs to be obtained at a high cost. The previous supervised hashing algorithms consider only the label information in the mapping process from Euclidean space to Hamming space when learning hash codes. However, there is no doubt that these algorithms are suboptimal in maintaining the relationships between high-dimensional data spaces. To overcome this problem, this paper advances a new weakly supervised discrete discriminant hashing (WDDH) to ensure a more effective representation of data and better retrieval of information. First, we consider the nearest neighbour relationship between samples, and new neighbourhood graphs are constructed to describe the geometric relationship between samples. Second, the algorithm embeds the learning of the hash function into the model and optimises the hash codes by a one-step iterative updating algorithm. Finally, it is compared with the existing classical unsupervised hashing algorithm and supervised hashing algorithm on different databases. The results and discussion of the experiments clearly show that the proposed WDDH algorithm in this paper is more robust for data representation in learning low-quality label data, coarse-grained label data and noisy data.
Double L<inf>2,p</inf>-norm based PCA for feature extraction
2021, Information Sciences
Citation Excerpt :
A common point of both methods is the derivation of the projection vectors by a greedy strategy. The non-greedy strategy designed in [3] has been introduced to solve the objective of Block PCA-L1 [28] and 2DPCA-L1 [29]. To have rotational invariance, the works in [5,30] proposed LF-norm based 2DPCA, which respectively maximize the variance of data and minimize the reconstructed error.
Recently, robust-norm distance related principal component analysis (PCA) for feature extraction has been shown to be very effective for image analysis, which considers either minimization of reconstruction error or maximization of data variance in low-dimensional subspace. However, both of them are important for feature extraction. Furthermore, most of existing methods cannot obtain satisfactory results due to the utilization of inflexible robust norm for distance metric. To address these problems, this paper proposes a novel robust PCA formulation called Double L_2,p-norm based PCA (DLPCA) for feature extraction, in which the minimization of reconstruction error and the maximization of variance are simultaneously taken into account in a unified framework. In the reconstruction error function, we target to learn a latent subspace to bridge the relationship between the transformed features and the original features. To guarantee the objective to be insensitive to outliers, we take L_2,p-norm as the distance metric for both reconstruction error and data variance. These characteristics make our method more applicable for feature extraction. We present an effective iterative algorithm to obtain the solution of this challenging work, and conduct theoretical analysis on the convergence of the algorithm. The experimental results on several databases show the effectiveness of our model.
Sparse fuzzy two-dimensional discriminant local preserving projection (SF2DDLPP) for robust image feature extraction
2021, Information Sciences
Citation Excerpt :
Unfortunately, these two classical algorithms are invalid for small sample size (SSS) problems [8]. Many feature extraction algorithms [9–12] have been advanced to solve this problem. In the past ten years, many algorithms based on manifold learning such as Isomap [13], Local Linear Embedding [14] and Laplacian feature mapping [15] have been advanced to discover the low-dimensional nonlinear data structures hidden in the observation space.
Recently, image feature extraction algorithms based on 2D discriminant local preserving projection (2DDLPP) algorithms have been successfully applied in many fields. The 2DDLPP can maintain the discrimination information of the local intrinsic manifold structure using two-dimensional image representation data. However, the 2DDLPP algorithm encounters the problem of the sensitivity of overlapping points (outliers) and requires high computational cost in real-world applications. In order to resolve the problems mentioned above, we introduce a new elastic feature extraction algorithm called the sparse fuzzy 2D discriminant local preserving projection (SF2DDLPP). First, the membership matrix is calculated using the fuzzy k-nearest neighbours (FKNN), which is applied to the intraclass weighted matrix and the interclass weighted matrix. Second, two theorems are developed to directly solve the generalized eigenfunctions. Finally, the optimal sparse fuzzy 2D discriminant projection matrix is regressed using the elastic net regression. The experiments show the effectiveness and stability of this algorithm on several face (ORL, Yale, AR and Yale B), USPS and palm print datasets.

View all citing articles on Scopus

View full text

Dual robust regression for pattern classification

Abstract

Introduction

Section snippets

Dual robust regression-based classification

Comparison with related works

Experiments

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Automatica

Pattern Recognition

Neurocomputing

Pattern Recognition

Pattern Recognition

Information Sciences

Information Sciences

Pattern Recognition

Information Sciences

An unsupervised clustering approach for leukaemia classification based on dna micro-arrays data

Intelligent Data Analysis

A robust self-weighted SELO regression model

International Journal of Machine Learning & Cybernetics

v-soft margin multi-task learning logistic regression

International Journal of Machine Learning and Cybernetics

Neural-response-based extreme learning machine for image classification

IEEE Transactions on Neural Networks and Learning Systems

An experimental study on symbolic extreme learning machine

International Journal of Machine Learning and Cybernetics

Face recognition using the nearest feature line method

IEEE Transactions on Neural Networks

Discriminant waveletfaces and nearest feature classifiers for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Linear regression for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Robust face recognition via sparse representation

IEEE Transactions on Pattern Analysis and Machine Intelligence

Maximum correntropy criterion for robust face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes

IEEE Transactions on Pattern Analysis and Machine Intelligence

Discriminative least squares regression for multiclass classification and feature selection

IEEE Transactions on Neural Networks and Learning Systems