Neighborhood linear discriminant analysis

doi:10.1016/j.patcog.2021.108422

Pattern Recognition

Volume 123, March 2022, 108422

https://doi.org/10.1016/j.patcog.2021.108422 Get rights and content

Highlights

•
The neighborhood linear discriminant analysis (nLDA) is proposed to address multimodality in LDA.
•
In nLDA, the scatters are defined on a neighborhood consisting of reverse nearest neighbors.
•
The within- and between-neighborhood scatters can avoid estimating the subclasses in multimodal class.
•
The nLDA performs significantly better than some existing discriminators, such as LDA, LFDA, ccLDA, LM-NNDA and $l_{2, 1}$ -RLDA.

Abstract

Linear Discriminant Analysis (LDA) assumes that all samples from the same class are independently and identically distributed (i.i.d.). LDA may fail in the cases where the assumption does not hold. Particularly when a class contains several clusters (or subclasses), LDA cannot correctly depict the internal structure as the scatter matrices that LDA relies on are defined at the class level. In order to mitigate the problem, this paper proposes a neighborhood linear discriminant analysis (nLDA) in which the scatter matrices are defined on a neighborhood consisting of reverse nearest neighbors. Thus, the new discriminator does not need an i.i.d. assumption. In addition, the neighborhood can be naturally regarded as the smallest subclass, for which it is easier to be obtained than subclass without resorting to any clustering algorithms. The projected directions are sought to make sure that the within-neighborhood scatter as small as possible and the between-neighborhood scatter as large as possible, simultaneously. The experimental results show that nLDA performs significantly better than previous discriminators, such as LDA, LFDA, ccLDA, LM-NNDA, and $l_{2, 1}$ -RLDA.

Introduction

As a widely used supervised dimensionality reduction method, the linear discriminant analysis (LDA) seeks a linear combination of features which makes between-class scatter be maximized and within-class scatter be minimized, simultaneously [1]. Therefore, in the projected space, the samples from the same class appear to be as close as possible, while simultaneously, the samples from different classes appear to be as far as possible. In LDA, both between-class scatter and within-class scatter are defined at the class level. Implicitly one needs to assume that the samples from the same class follow the same distribution. Otherwise, the scatters defined based on the class are meaningless, thus LDA may project data samples into an undesired space in the cases where samples belonging to the same class are from a mixture of distributions (from several subclasses or clusters). This is very often for real world data where the class is multimodal consisting of several separated subclasses or clusters [2].

Within-class multimodality widely exists in many realistic scenes. For instance, the CIFAR-100 dataset contains 20 superclasses and each superclass contains 5 classes¹. Within-class multimodality appears in superclasses. When we identify whether a digit is odd or even, the class odd or even contains five subclasses in handwritten digit recognition. Fig. 1 shows a multimodal artificial synthetic example. Each star is from class 1 while each plus is from class 2. Class 1 follows a two-Gaussian distribution (consisting of two separated subclasses or clusters). The dotted line is the projected subspace obtained by the classic LDA while the dash-dotted line is the desired subspace. The desired subspace in Fig. 1 can be obtained by a discriminator that considers within-class multimodality, such as FLDA [2]. Obviously, class 1 and class 2 are mixed together in the projected space of LDA. LDA fails on this synthetic dataset.

The LDA may not work on the dataset containing any multimodal class. This is because the scatter matrices used in LDA are defined at the class level without incorporating meaningful internal structure in a class. A straightforward way is to define matrices at the cluster or subclass level instead of the class level by using clustering algorithm [3]. However, there is no prior knowledge about the internal structure of a multimodal class. It is difficult to determine the number of clusters (or subclasses). In this paper, we propose to define the scatter matrices according to neighborhood information. As such the within-neighborhood scatter and between-neighborhood scatter are introduced to take place of the within-class scatter and between-class scatter. The motivation is that the neighborhood can be naturally regarded as the smallest subclass, which does not need any prior knowledge about internal structure in a class. The new discriminator is termed as neighborhood linear discriminant analysis (nLDA). The nLDA inherits the Fisher criterion and can be solved as a generalized eigenvector problem. It is very simple but effective.

The neighborhood can be obtained by Parzen window [4], $k$ nearest neighbors [4], or reverse nearest neighbors [5]. The reverse nearest neighbors method is used in this paper as it is also a state-of-the-art unsupervised outlier detection method which can exclude the ‘isolated point’ in the training set [6], [7]. We expect that a sample and its reverse nearest neighbors should be as close as possible, while simultaneously, the reverse nearest neighbors from two samples which belong to different classes should be as far as possible in the projected space. Since the scatters defined based on the reverse nearest neighborhood do not consider entire class directly, the proposed discriminator can overcome the issue where a dataset contains multimodal classes.

The rest of the paper is organized as follows. The basic review of the related work is provided in Section 2. Section 3 presents neighborhood linear discriminant analysis (nLDA). In Section 4, we compare nLDA with several previous discriminators. The validation on different datasets is conducted in Section 5. The last section concludes the discussion.

Section snippets

Related work

As a successful supervised feature extraction method, LDA has been studied for several decades and widely used in many fields, such person re-identification [8], hyperspectral image classification [9], EEG signals analysis [10] etc. [11] proposed a deep linear discriminant analysis[12]. proposed a multi-view discriminant analysis. Hu et al. [13] employed multiple feedforward neural networks and a novel eigenvalue-based multi-view objective function into multi-view discriminant analysis. Belous

Neighborhood linear discriminant analysis

Before introducing the new discriminator, let us recap the linear discriminant analysis (LDA) first.

Let ${X, Y}$ be the training set which consists of ${x_{i}, y_{i}}$ , $i = 1, \dots, n$ , $x_{i} \in R^{d}$ , $y_{i} \in {1, \dots, C}$ ; $X_{j}$ be the set consisting of all samples in class $j$ . Let $n_{j}, j = 1, \dots, C$ be the size of $X_{j}$ , $\sum_{j = 1}^{C} n_{j} = n$ . The aim of LDA is to find the optimal projected directions $w$ which make $J_{L D A} (φ)$ maximize. $J_{L D A} (φ) = | \frac{φ^{T} S_{b} φ}{φ^{T} S_{w} φ} |$ where $S_{b}$ and $S_{w}$ are the between-class scatter matrix and within-class scatter matrix, respectively, defined

Connections to previous discriminators

In this section, we will discuss the relationship between nLDA and several existing discriminators which can solve multimodal class as well. The first one is local Fisher discriminant analysis (LFDA) [2] which utilizes manifold to depict the local structure of multimodal class. Many manifold based discriminators stem from LFDA. The second one is nonparametric discriminant analysis (NDA) [41] which utilizes $k$ nearest neighbors to redefine scatter matrices. Both LFDA and NDA inherit the Fisher

Experiments and simulations

In this section, we will compare nLDA with several existing discriminators, including LDA, LFDA, LM-NNDA, ccLDA, and $l_{2, 1}$ -RLDA. The LFDA is a classic and widely used localized discriminator. Recently, LFDA was successfully used in pedestrian re-identification [46]. The LM-NNDA is a recent $k$ -nearest neighbor based discriminator. In ccLDA, [31] utilized the within- and between-cluster scatter matrices to regularize the within- and between-class scatter matrices in discriminator analysis to solve

Discussion and conclusions

It assumes that the samples from the same class are independent and identically distributed (i.i.d.) in linear discriminant analysis (LDA). When a class in a dataset contains several clusters or subclasses, LDA will perform poorly on this dataset. This weakness stems from the fact that the scatter matrices in LDA are defined at the whole class level. In this paper, we define the scatter matrices on neighborhood instead. The neighborhood consists of reverse nearest neighbors, which can exclude

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to thank the handling associate editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work was partially supported by the National Science Fund of China under Grant 91420201, Grant 61472187, Grant 61233011, Grant 61373063, and Grant 61602244, in part by the 973 Program under Grant 2014CB349303, and in part by the Program for Changjiang Scholars and Innovative Research Team in University.

Fa Zhu graduated from the School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, P. R. China with Ph.D. degree in 2019. He was a visiting Ph.D. student in the Centre for Artificial Intelligence (CAI), and the Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia. He is a lecturer in College of Information Science and Technology, Nanjing Forestry University, Nanjing, P.R. China. His current research interests

References (52)

Y. Tao et al.
Enhanced iterative projection for subclass discriminant analysis under EM-alike framework
Pattern Recognit.
(2014)
W.-H. Li et al.
One-pass person re-identification by sketch online discriminant analysis
Pattern Recognit.
(2019)
K.-K. Huang et al.
Hyperspectral image classification via discriminative convolutional neural network with an improved triplet loss
Pattern Recognit.
(2021)
G. Belous et al.
Dual subspace discriminative projection learning
Pattern Recognit.
(2021)
P. Hu et al.
Cross-modal discriminant adversarial network
Pattern Recognit.
(2021)
F. Zhu et al.
Extended nearest neighbor chain induced instance-weights for SVMs
Pattern Recognit.
(2016)
F. Zhu et al.
A weighted one-class support vector machine
Neurocomputing
(2016)
Q. Ye et al.
$l_{p}$ - and $l_{s}$ -norm distance based robust linear discriminant analysis
Neural Netw.
(2018)
K. Chumachenko et al.
Speed-up and multi-view extensions to subclass discriminant analysis
Pattern Recognit.
(2021)
J. Yang et al.
From classifiers to discriminators: a nearest neighbor rule induced discriminant analysis
Pattern Recognit.
(2011)

S. García et al.

Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power

Inf. Sci.

(2010)

F. Zhu et al.

On removing potential redundant constraints for SVOR learning

Appl. Soft Comput.

(2021)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugen.

(1936)

M. Sugiyama

Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis

J. Mach. Learn. Res.

(2007)

R.O. Duda et al.

Pattern Classification

(2000)

F. Korn et al.

Influence sets based on reverse nearest neighbor queries

Proceedings of the ACM SIGMOD International Conference on Management of Data

(2000)

M. Radovanovic et al.

Hubs in space: popular nearest neighbors in high-dimensional data

J. Mach. Learn. Res.

(2010)

M. Radovanovic et al.

Reverse nearest neighbors in unsupervised distance-based outlier detection

IEEE Trans. Knowl. Data Eng.

(2015)

L.C.D. Nkengfack et al.

Eeg signals analysis for epileptic seizures detection using polynomial transforms, linear discriminant analysis and support vector machines

Biomed. Signal Process. Control

(2020)

M. Dorfer et al.

Deep linear discriminant analysis

CoRR

(2015)

M. Kan et al.

Multi-view discriminant analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2016)

P. Hu et al.

Multi-view linear discriminant analysis network

IEEE Trans. Image Process.

(2019)

Y.C.C. Alarcón et al.

Imprecise gaussian discriminant classification

Pattern Recognit.

(2020)

H. Wang et al.

Fisher discriminant analysis with $l_{1}$ -norm

IEEE Trans. Cybern.

(2014)

F. Zhong et al.

Linear discriminant analysis based on $l_{1}$ -norm maximization

IEEE Trans. Image Process.

(2013)

F. Nie et al.

Towards robust discriminative projections learning via non-greedy $l_{2, 1}$ -norm minmax

IEEE Trans. Pattern Anal. Mach. Intell.

(2021)

Cited by (127)

Domain adaptive learning based on equilibrium distribution and dynamic subspace approximation
2024, Expert Systems with Applications
Nowadays, big data analysis has become an important approach in social information network. However, the social information may not be distributed independently and identically (i.i.d.), which can be addressed using domain adaptation. However, most of the existing domain adaptation methods are designed to align cross-domain distributions. The label information of the samples in the target domain is completely unavailable. Thus, the class-conditional distribution differences cannot be well measured, and the effect of feature distortion on distribution alignment in the original feature space is difficult to handle. This paper proposes a domain adaptation learning based on the Equilibrium Distribution and Dynamic Subspace Approximation (EDDSA) to alleviate these problems. First, EDDSA learns to project the source and target domains into associated feature spaces, and dynamically approximates two subspaces to overcome the feature distortion problem. Second, the balanced distribution alignment term is introduced to dynamically weight the importance of the conditional and marginal distributions. Through many experiments, EDDSA is superior to most traditional methods.
Adaptive orthogonal semi-supervised feature selection with reliable label matrix learning
2024, Information Processing and Management
Feature selection (FS) can select features with high value from high-dimensional data as much as possible and reduce the dimensionality of the data to improve the performance of the machine learning model and enhance the generalization ability. In semi-supervised FS (SSFS), regression-based methods utilize the label matrix, and the quality of the label matrix can directly affect the performance of FS. Hence, the importance of the reliable label matrix is obvious. Orthogonal regression (OR) can retain more information in the subspace than least squares regression (LSR). Therefore, this paper introduces OR into SSFS, and then the label scaling technique is used to learn a reliable label matrix. Also, this study utilizes adaptive graph learning to exploit more structural information about the data. Two constraints, Frobenius-norm or maximum information entropy imposed on the similarity matrix and two adaptive orthogonal SSFS (AGLSOFS) methods with reliable label matrix learning are constructed. The impact of these two constraints on the construction of dynamic similarity graphs and FS results is discussed. Effective optimization algorithms for these two methods are based on the Alternating Direction Method of Multipliers (ADMM) and Generalized Power Iteration (GPI). Experiments are conducted on 15 benchmark datasets, and the results show that: (1) similarity graphs constructed using both original and projected data are more accurate; (2) both constraints are valid; (3) both of the methods of this paper perform well on most datasets; (4) OR performs better than LSR for FS; and (5) the scaling factor affects the convergence speed of the model.
Robust sparse graph regularized nonnegative matrix factorization for automatic depression diagnosis
2024, Biomedical Signal Processing and Control
Multichannel electroencephalogram (EEG) signals, which directly reflects the brain’s inner workings state, is an powerful tool to diagnosis depression. For EEG-based depression diagnosis, appropriate features are important to get a good classification result. Considering the multichannel nature of the EEG signals, the extracted features are generally high-dimensional and contain many redundant features, which impairs the performance of the classifier. Therefore, it is necessary to perform dimensionality reduction. However, existing dimensionality reduction methods are not well suited for EEG data analysis. For this reason, a novel dimensionality reduction algorithm termed correntropy based sparse graph regularized nonnegative matrix factorization (RSGNMF) is proposed. Our RSGNMF solution finds the discriminative low-dimensional representations via joint optimization for better classification. Specifically, RSGNMF adopts correntropy replace the squared Euclidean distance (SED) into nonnegative matrix factorization (NMF) as similarity measure to increase the robustness for noise and outliers, and simultaneously integrating graph regularization and sparse constraints. In order to solve the optimization problem of RSGNMF, the half-quadratic technique was used and subsequently the multiplicative update rule was obtained. Furthermore, convergence and computational complexity of RSGNMF is analyzed. Experimental results reveal the effectiveness of RSGNMF in EEG depression diagnosis in comparison with other state-of-the-art NMF methods. It also shows the practical application value of our method in detecting depression.
MTL-PIE: A multi-task learning based drone pilot identification and operation evaluation scheme
2024, Vehicular Communications
As one of the most promising industries, consumer-grade Unmanned Aerial Vehicles (UAVs), also known as drones, have changed our lives. Although significant progress in drones has been made, adversary impersonation attacks still pose severe risks to flying drones. In addition, authorized pilot miss-operations also has become a critical factor leading to drone flight accidents. To validate the pilot's legal status and remind the authorized pilot about their miss-operations, we propose a multi-task learning-based drone pilot identification and operation evaluation scheme named MTL-PIE. Specifically, we first present qualitative and quantitative guidelines to evaluate pilot operation proficiency. Then, we design a pilot identification module and an operation evaluation module to resist pilot impersonation attacks and assess pilot operation proficiency, respectively. Finally, we propose a soft-parameter sharing mechanism to transfer knowledge between two modules and a dynamic weight-adjusting algorithm to prevent domain-dominant problems. Numerical results show that MTL-PIE can verify pilot legal status with an accuracy of 95.36% (outperforming our previous work with a margin of 2%-3%) and act as assessors to evaluate pilot operation proficiency with an accuracy of 94.47%. Note that MTL-PIE needs only 35 ms to verify pilot legal status and assess pilot operation proficiency; it has great potential to reduce drone flight accidents.
Local dual-graph discriminant classifier for binary classification
2024, Neurocomputing
Graph-based methods mine the potential structural information of data by constructing various graphs that positively affect the classifiers when dealing with classification problems. However, traditional graph-based classifiers are the most common single-graph classifiers and minimize only intra-class compactness, where inter-class separability is replaced by other factors. To consider real inter-class separability, we introduce a novel local dual-graph structure that can fully mine the geometric distribution of data by simultaneously maximizing the inter-class separability and minimizing the intra-class compactness. This local dual-graph structure reflects the relationship between samples and their neighbors and hence avoids the negative impact of outliers on the construction of graphs. Furthermore, a novel classifier called the local dual-graph discriminant classifier (LDGDC) is proposed using a local dual-graph structure. Originally, LDGDC is designed to perform the following optimization: minimization of the 2-norm regularization of model coefficients and intra-class compactness, and maximization of the inter-class separability, which is a non-convex optimization problem. To facilitate the solution, we transform the original non-convex problem of LDGDC into a convex problem. Finally, experiments were conducted on several public datasets, and the results demonstrate the effectiveness and robustness of the proposed LDGDC.
Reformative ROCOSD–ORESTE–LDA model with an MLP neural network to enhance decision reliability
2024, Knowledge-Based Systems
Multi-criteria decision-making (MCDM) problems require a decision model and outcomes that are stable and reliable, which is especially true for safety systems. To this end, we develop a hybrid MCDM model that combines robustness, correlation, and standard deviation (ROCOSD), organization, rangement et synthèse dedonnées relarionnelles (ORESTE), and linear discriminant analysis (LDA), namely, the ROCOSD–ORESTE–LDA model. In particular, we enhance the model performance by embedding a multilayer perceptron (MLP) neural network into the LDA to minimize the variance in the outcomes. Specifically, we address the possible model failure of a general LDA under non-ideal conditions such as shared mean of the distributions and non-Gaussian distributed samples. Based on a case study analyzing transport safety situations for G20 member countries over the past decade, the proposed model is shown to be adaptable, stable, and reliable via multiple experiments and multilevel comparisons. This systematic decision framework may aid in future transport safety development planning in G20 countries, and this methodology may be feasibly applied to resolve safety management issues, as well as other MCDM activities with high complexity and uncertainty.

View all citing articles on Scopus

Junbin Gao graduated from Huazhong University of Science and Technology (HUST),China in 1982 with BSc. degree in Computational Mathematics and obtained PhD from Dalian University of Technology, China in 1991. He is a Professor in Discipline of Business Analytics, University of Sydney Business School, Universtiy of Sydney, Australia. He was a senior lecturer, a lecturer in Computer Science from 2001 to 2005 at University of New England, Australia. From 1982 to 2001 he was an associate lecturer, lecturer, associate professor and professor in Department of Mathematics at HUST. From 2002 to 2015, he was a Professor in Computing Science in the School of Computing and Mathematics at Charles Sturt University, Australia. His main research interests include machine learning, data mining, Bayesian learning and inference, and image analysis.

Jian Yang received the BS degree in mathematics from the Xuzhou Normal University in 1995. He received the MS degree in applied mathematics from the Changsha Railway University in 1998 and the PhD degree from the Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza, and in the same year, he was awarded the RyC Program Research Fellowship sponsored by the Spanish Ministry of Science and Technology. From 2004 to 2006, he was a postdoctoral fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a postdoctoral fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 80 scientific papers in pattern recognition and computer vision. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an associate editor of Pattern Recognition Letters and IEEE Transactions on Neural Networks and Learning Systems.

Ning Ye received M.S. in Test Measurement Technology and Instruments from Nanjing University of Aeronautics and Astronautics, Nanjing, China in 2006, and Ph. D. degree in Computer Application Technology from Southeast University, Nanjing, China. He is full-time professor at the School of Information Science and Technology, Nanjing Forestry University, Nanjing, China. His research interests include machine learning, bioinformatics, and data mining.

View full text

Neighborhood linear discriminant analysis

Highlights

Abstract

Introduction

Section snippets

Related work

Neighborhood linear discriminant analysis

Connections to previous discriminators

Experiments and simulations

Discussion and conclusions

Declaration of Competing Interest

Acknowledgement

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Neural Netw.

Pattern Recognit.

Pattern Recognit.

Inf. Sci.

Appl. Soft Comput.

The use of multiple measurements in taxonomic problems

Ann. Eugen.

Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis

J. Mach. Learn. Res.

Pattern Classification

Influence sets based on reverse nearest neighbor queries

Proceedings of the ACM SIGMOD International Conference on Management of Data

Hubs in space: popular nearest neighbors in high-dimensional data

J. Mach. Learn. Res.

Reverse nearest neighbors in unsupervised distance-based outlier detection

IEEE Trans. Knowl. Data Eng.

Eeg signals analysis for epileptic seizures detection using polynomial transforms, linear discriminant analysis and support vector machines

Biomed. Signal Process. Control

Deep linear discriminant analysis

CoRR

Multi-view discriminant analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Multi-view linear discriminant analysis network

IEEE Trans. Image Process.

Imprecise gaussian discriminant classification

Pattern Recognit.

Fisher discriminant analysis with l1-norm

IEEE Trans. Cybern.

Linear discriminant analysis based on l1-norm maximization

IEEE Trans. Image Process.

Towards robust discriminative projections learning via non-greedy l2,1-norm minmax

IEEE Trans. Pattern Anal. Mach. Intell.

Fisher discriminant analysis with $l_{1}$ -norm

Linear discriminant analysis based on $l_{1}$ -norm maximization

Towards robust discriminative projections learning via non-greedy $l_{2, 1}$ -norm minmax