Elsevier

Pattern Recognition

Volume 123, March 2022, 108422
Pattern Recognition

Neighborhood linear discriminant analysis

https://doi.org/10.1016/j.patcog.2021.108422Get rights and content

Highlights

  • The neighborhood linear discriminant analysis (nLDA) is proposed to address multimodality in LDA.

  • In nLDA, the scatters are defined on a neighborhood consisting of reverse nearest neighbors.

  • The within- and between-neighborhood scatters can avoid estimating the subclasses in multimodal class.

  • The nLDA performs significantly better than some existing discriminators, such as LDA, LFDA, ccLDA, LM-NNDA and l2,1-RLDA.

Abstract

Linear Discriminant Analysis (LDA) assumes that all samples from the same class are independently and identically distributed (i.i.d.). LDA may fail in the cases where the assumption does not hold. Particularly when a class contains several clusters (or subclasses), LDA cannot correctly depict the internal structure as the scatter matrices that LDA relies on are defined at the class level. In order to mitigate the problem, this paper proposes a neighborhood linear discriminant analysis (nLDA) in which the scatter matrices are defined on a neighborhood consisting of reverse nearest neighbors. Thus, the new discriminator does not need an i.i.d. assumption. In addition, the neighborhood can be naturally regarded as the smallest subclass, for which it is easier to be obtained than subclass without resorting to any clustering algorithms. The projected directions are sought to make sure that the within-neighborhood scatter as small as possible and the between-neighborhood scatter as large as possible, simultaneously. The experimental results show that nLDA performs significantly better than previous discriminators, such as LDA, LFDA, ccLDA, LM-NNDA, and l2,1-RLDA.

Introduction

As a widely used supervised dimensionality reduction method, the linear discriminant analysis (LDA) seeks a linear combination of features which makes between-class scatter be maximized and within-class scatter be minimized, simultaneously [1]. Therefore, in the projected space, the samples from the same class appear to be as close as possible, while simultaneously, the samples from different classes appear to be as far as possible. In LDA, both between-class scatter and within-class scatter are defined at the class level. Implicitly one needs to assume that the samples from the same class follow the same distribution. Otherwise, the scatters defined based on the class are meaningless, thus LDA may project data samples into an undesired space in the cases where samples belonging to the same class are from a mixture of distributions (from several subclasses or clusters). This is very often for real world data where the class is multimodal consisting of several separated subclasses or clusters [2].

Within-class multimodality widely exists in many realistic scenes. For instance, the CIFAR-100 dataset contains 20 superclasses and each superclass contains 5 classes1. Within-class multimodality appears in superclasses. When we identify whether a digit is odd or even, the class odd or even contains five subclasses in handwritten digit recognition. Fig. 1 shows a multimodal artificial synthetic example. Each star is from class 1 while each plus is from class 2. Class 1 follows a two-Gaussian distribution (consisting of two separated subclasses or clusters). The dotted line is the projected subspace obtained by the classic LDA while the dash-dotted line is the desired subspace. The desired subspace in Fig. 1 can be obtained by a discriminator that considers within-class multimodality, such as FLDA [2]. Obviously, class 1 and class 2 are mixed together in the projected space of LDA. LDA fails on this synthetic dataset.

The LDA may not work on the dataset containing any multimodal class. This is because the scatter matrices used in LDA are defined at the class level without incorporating meaningful internal structure in a class. A straightforward way is to define matrices at the cluster or subclass level instead of the class level by using clustering algorithm [3]. However, there is no prior knowledge about the internal structure of a multimodal class. It is difficult to determine the number of clusters (or subclasses). In this paper, we propose to define the scatter matrices according to neighborhood information. As such the within-neighborhood scatter and between-neighborhood scatter are introduced to take place of the within-class scatter and between-class scatter. The motivation is that the neighborhood can be naturally regarded as the smallest subclass, which does not need any prior knowledge about internal structure in a class. The new discriminator is termed as neighborhood linear discriminant analysis (nLDA). The nLDA inherits the Fisher criterion and can be solved as a generalized eigenvector problem. It is very simple but effective.

The neighborhood can be obtained by Parzen window [4], k nearest neighbors [4], or reverse nearest neighbors [5]. The reverse nearest neighbors method is used in this paper as it is also a state-of-the-art unsupervised outlier detection method which can exclude the ‘isolated point’ in the training set [6], [7]. We expect that a sample and its reverse nearest neighbors should be as close as possible, while simultaneously, the reverse nearest neighbors from two samples which belong to different classes should be as far as possible in the projected space. Since the scatters defined based on the reverse nearest neighborhood do not consider entire class directly, the proposed discriminator can overcome the issue where a dataset contains multimodal classes.

The rest of the paper is organized as follows. The basic review of the related work is provided in Section 2. Section 3 presents neighborhood linear discriminant analysis (nLDA). In Section 4, we compare nLDA with several previous discriminators. The validation on different datasets is conducted in Section 5. The last section concludes the discussion.

Section snippets

Related work

As a successful supervised feature extraction method, LDA has been studied for several decades and widely used in many fields, such person re-identification [8], hyperspectral image classification [9], EEG signals analysis [10] etc. [11] proposed a deep linear discriminant analysis[12]. proposed a multi-view discriminant analysis. Hu et al. [13] employed multiple feedforward neural networks and a novel eigenvalue-based multi-view objective function into multi-view discriminant analysis. Belous

Neighborhood linear discriminant analysis

Before introducing the new discriminator, let us recap the linear discriminant analysis (LDA) first.

Let {X,Y} be the training set which consists of {xi,yi}, i=1,,n, xiRd, yi{1,,C}; Xj be the set consisting of all samples in class j. Let nj,j=1,,C be the size of Xj, j=1Cnj=n. The aim of LDA is to find the optimal projected directions w which make JLDA(φ) maximize.JLDA(φ)=|φTSbφφTSwφ|where Sb and Sw are the between-class scatter matrix and within-class scatter matrix, respectively, defined

Connections to previous discriminators

In this section, we will discuss the relationship between nLDA and several existing discriminators which can solve multimodal class as well. The first one is local Fisher discriminant analysis (LFDA) [2] which utilizes manifold to depict the local structure of multimodal class. Many manifold based discriminators stem from LFDA. The second one is nonparametric discriminant analysis (NDA) [41] which utilizes k nearest neighbors to redefine scatter matrices. Both LFDA and NDA inherit the Fisher

Experiments and simulations

In this section, we will compare nLDA with several existing discriminators, including LDA, LFDA, LM-NNDA, ccLDA, and l2,1-RLDA. The LFDA is a classic and widely used localized discriminator. Recently, LFDA was successfully used in pedestrian re-identification [46]. The LM-NNDA is a recent k-nearest neighbor based discriminator. In ccLDA, [31] utilized the within- and between-cluster scatter matrices to regularize the within- and between-class scatter matrices in discriminator analysis to solve

Discussion and conclusions

It assumes that the samples from the same class are independent and identically distributed (i.i.d.) in linear discriminant analysis (LDA). When a class in a dataset contains several clusters or subclasses, LDA will perform poorly on this dataset. This weakness stems from the fact that the scatter matrices in LDA are defined at the whole class level. In this paper, we define the scatter matrices on neighborhood instead. The neighborhood consists of reverse nearest neighbors, which can exclude

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to thank the handling associate editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work was partially supported by the National Science Fund of China under Grant 91420201, Grant 61472187, Grant 61233011, Grant 61373063, and Grant 61602244, in part by the 973 Program under Grant 2014CB349303, and in part by the Program for Changjiang Scholars and Innovative Research Team in University.

Fa Zhu graduated from the School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, P. R. China with Ph.D. degree in 2019. He was a visiting Ph.D. student in the Centre for Artificial Intelligence (CAI), and the Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia. He is a lecturer in College of Information Science and Technology, Nanjing Forestry University, Nanjing, P.R. China. His current research interests

References (52)

  • S. García et al.

    Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power

    Inf. Sci.

    (2010)
  • F. Zhu et al.

    On removing potential redundant constraints for SVOR learning

    Appl. Soft Comput.

    (2021)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • M. Sugiyama

    Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis

    J. Mach. Learn. Res.

    (2007)
  • R.O. Duda et al.

    Pattern Classification

    (2000)
  • F. Korn et al.

    Influence sets based on reverse nearest neighbor queries

    Proceedings of the ACM SIGMOD International Conference on Management of Data

    (2000)
  • M. Radovanovic et al.

    Hubs in space: popular nearest neighbors in high-dimensional data

    J. Mach. Learn. Res.

    (2010)
  • M. Radovanovic et al.

    Reverse nearest neighbors in unsupervised distance-based outlier detection

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • L.C.D. Nkengfack et al.

    Eeg signals analysis for epileptic seizures detection using polynomial transforms, linear discriminant analysis and support vector machines

    Biomed. Signal Process. Control

    (2020)
  • M. Dorfer et al.

    Deep linear discriminant analysis

    CoRR

    (2015)
  • M. Kan et al.

    Multi-view discriminant analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • P. Hu et al.

    Multi-view linear discriminant analysis network

    IEEE Trans. Image Process.

    (2019)
  • Y.C.C. Alarcón et al.

    Imprecise gaussian discriminant classification

    Pattern Recognit.

    (2020)
  • H. Wang et al.

    Fisher discriminant analysis with l1-norm

    IEEE Trans. Cybern.

    (2014)
  • F. Zhong et al.

    Linear discriminant analysis based on l1-norm maximization

    IEEE Trans. Image Process.

    (2013)
  • F. Nie et al.

    Towards robust discriminative projections learning via non-greedy l2,1-norm minmax

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2021)
  • Cited by (127)

    View all citing articles on Scopus

    Fa Zhu graduated from the School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, P. R. China with Ph.D. degree in 2019. He was a visiting Ph.D. student in the Centre for Artificial Intelligence (CAI), and the Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia. He is a lecturer in College of Information Science and Technology, Nanjing Forestry University, Nanjing, P.R. China. His current research interests include pattern recognition, machine learning, and anomaly detection.

    Junbin Gao graduated from Huazhong University of Science and Technology (HUST),China in 1982 with BSc. degree in Computational Mathematics and obtained PhD from Dalian University of Technology, China in 1991. He is a Professor in Discipline of Business Analytics, University of Sydney Business School, Universtiy of Sydney, Australia. He was a senior lecturer, a lecturer in Computer Science from 2001 to 2005 at University of New England, Australia. From 1982 to 2001 he was an associate lecturer, lecturer, associate professor and professor in Department of Mathematics at HUST. From 2002 to 2015, he was a Professor in Computing Science in the School of Computing and Mathematics at Charles Sturt University, Australia. His main research interests include machine learning, data mining, Bayesian learning and inference, and image analysis.

    Jian Yang received the BS degree in mathematics from the Xuzhou Normal University in 1995. He received the MS degree in applied mathematics from the Changsha Railway University in 1998 and the PhD degree from the Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza, and in the same year, he was awarded the RyC Program Research Fellowship sponsored by the Spanish Ministry of Science and Technology. From 2004 to 2006, he was a postdoctoral fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a postdoctoral fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 80 scientific papers in pattern recognition and computer vision. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an associate editor of Pattern Recognition Letters and IEEE Transactions on Neural Networks and Learning Systems.

    Ning Ye received M.S. in Test Measurement Technology and Instruments from Nanjing University of Aeronautics and Astronautics, Nanjing, China in 2006, and Ph. D. degree in Computer Application Technology from Southeast University, Nanjing, China. He is full-time professor at the School of Information Science and Technology, Nanjing Forestry University, Nanjing, China. His research interests include machine learning, bioinformatics, and data mining.

    View full text