Incremental granular relevance vector machine: A case study in multimodal biometrics

doi:10.1016/j.patcog.2015.11.013

Pattern Recognition

Volume 56, August 2016, Pages 63-76

https://doi.org/10.1016/j.patcog.2015.11.013 Get rights and content

Highlights

•
The proposed iGRVM incorporates incremental and granular learning in RVM.
•
Experiments are performed on NIST BSSR1, CASIA-Iris-Distance V4, and Biosecure DS2 databases.
•
Results illustrate that iGRVM can be a good alternative for biometric score classification.

Abstract

This paper focuses on extending the capabilities of relevance vector machine which is a probabilistic, sparse, and linearly parameterized classifier. It has been shown that both relevance vector machine and support vector machine have similar generalization performance but RVM requires significantly fewer relevance vectors. However, RVM has certain limitations which limits its applications in several pattern recognition problems including biometrics such as (1) slow training process, (2) difficult to train with large training samples, and (3) may not be suitable to handle large class imbalance. To address these limitations, we propose iGRVM which incorporates incremental and granular learning in RVM. The proposed classifier is evaluated in context to multimodal biometrics score classification using the NIST BSSR1, CASIA-Iris-Distance V4, and Biosecure DS2 databases. The experimental analysis illustrates that the proposed classifier can be a good alternative for biometric score classification with faster testing time.

Introduction

Classifiers are an integral component of a pattern classification system. In order to determine the class of any query, the data is processed, a representation is computed, and the classifier classifies it into one of the classes. Before testing, the classifier learns a model using the given training data. For instance, in a biometric verification problem, there are two classes, genuine and imposter. The task is to match the probe image with the corresponding gallery image and determine whether the probe is a genuine match or imposter. Existing biometric recognition algorithms have used different classifiers such as linear threshold, Bayesian classification, and Support Vector Machine (SVM) [1].

For training an accurate classification model, it is generally assumed that sufficient and representative training data is available during the training stage. However, in real world applications, there are several challenges in ensuring the availability of good quality training data:

•
There exists the possibility that the entire training data is not available simultaneously. For example, in the case of India׳s Aadhaar project [2] or US-VISIT program [3], users are enrolled on a continuous basis. In such a scenario, training data is available only in an incremental manner. Training the classifiers in batch mode with every incremental update can be computationally expensive.
•
Training databases can be highly unbalanced where data from one class is over populated compared to other class(es). In a biometric system that has n users in the database each having m samples $(n ⪢ m)$ , the number of genuine scores available for training is $nm (m - 1)$ /2 in comparison to $n (n - 1) m^{2}$ /2 impostor scores.
•
Some classifiers are inherently computationally expensive, they perform well if the training size is small but on large training data they may require significant computational time or become intractable.

To address some of these challenges, researchers have proposed multiple solutions. The availability of sequential training data is addressed by incremental learning and online learning algorithms [4]. In incremental learning, classifiers are trained with new batches of data, as they arrive, while preserving the knowledge of previous learning. Some incremental learning approaches are incremental Principal Component Analysis (IPCA) [5], incremental learning of Bidirectional Principal Component Analysis [6], incremental Linear Discriminant Analysis (ILDA) [7], incremental Subclass Discriminant Analysis (ISDA) [8], and incremental and decremental SVM [9], [10].

In the literature, several researchers have also explored the challenge of class imbalance [11], [12]. Chawla et al. [13] have stated that class imbalance problem is handled either by assigning distinct cost to training data [14], [15], [16] or by resampling the entire database [17]. The resampling approaches work by either oversampling the minority class and under-sampling the majority class, or by combining the under-sampling and oversampling approaches [18], [19]. To balance class distributions, random under-sampling may lead to information loss whereas random oversampling can increase the chances of overfitting. Tang et al. [20] have proposed an under-sampling approach using granular learning. Granular learning divides the data into granules represented as either classes, clusters or subsets and solves the problem in each information granule locally [21]. The challenge of large training database for learning computationally expensive classifiers has also been addressed by granular computing approaches [22].

Since the formulation of every classifier is different, the extension of an existing classifier that operates in batch mode to the corresponding incremental version is also different. In designing the incremental or granular variant of an existing classifier, it is important to ensure that the updated variants do not reduce the accuracy while reducing the training time or computational complexity. Therefore, researchers have proposed specific formulations for individual classifiers, such as SVM.

SVM has been shown to yield good results in several pattern classification problems including biometrics. It avoids overfitting and leads to good generalization by finding the separating hyperplane that maximizes the margin width. The subset of training data points used to represent the hyperplane are denoted as support vectors. Several formulations have been proposed for online training of SVM and addressing the class imbalance problem [10], [20], [22], [23]. However, SVM suffers from the following limitations [24]:

1.
The number of support vectors required for classification is relatively large,
2.
In classical SVM, there is a need to fine tune the regularization parameter (C) during the training phase, and
3.
The kernel function must satisfy the Mercer conditions [25].

Relevance vector machine (RVM) [24], on the other hand, is a probabilistic classifier which introduces a prior over each weight governed by the set of hyper-parameters. RVM is a sparse linearly parameterized model like SVM and it has been shown that the generalization performance of RVM is comparable to that of SVM with significantly fewer relevance vectors [24]. Another advantage of RVM is that it has very few parameters to be optimized while training. Along with these advantages, RVM has the following challenges owing to which it has not been well explored particularly in biometrics.

1.
The native formulation of RVM requires expensive matrix inversion which makes it difficult to learn conventional RVM with very large training databases. Further, the amount of memory required to store the product of basis functions also limits its utilization for considerably large training databases.
2.
RVM is trained in batch mode and if new batch of data arrives, the classifier has to be re-trained with new as well as old data. This is not feasible for many real-time applications such as biometrics where it may be required to continuously update the classifier to adjust the changes (in data and template) that happen over time.
3.
RVM may not be suitable to handle large class imbalance in the training data and may get biased towards the class with more number of training samples.

To address these challenges, in this paper, we propose an incremental granular RVM that can be trained with large unbalanced training data to perform efficient classification. As shown in Fig. 1, the learning process starts by considering batches of training data which are divided into granules. An RVM is trained on each granule independently and the results are amalgamated to obtain a robust boundary for classification. For online learning, the knowledge from the previous training is carried forward to learn the next batch of training database. The major contributions of this research are:

1.
Incremental RVM (iRVM) is proposed which is scalable with new enrollments and also reduces the training time.
2.
Granular RVM (GRVM) handles the class imbalance problem by training the classifier locally for each granule.
3.
Incremental Granular RVM (iGRVM) combines the advantages of both incremental and granular learning into RVM.

The proposed variant provides a good alternative to existing classifiers and overcomes the limitations of native RVM classifier. The performance of incremental granular RVM is evaluated using a case study in multimodal biometrics with two classes (genuine and imposter). The match scores obtained from different modalities, units and algorithms are normalized followed by incremental granular RVM classification. Experiments performed on three match score databases show that the proposed classifier is comparable to existing approaches in terms of classification performance and provides significant reduction in computational time.

Section snippets

Incremental granular relevance vector machine

This section describes the formulation of iGRVM for data classification. The proposed classifier is designed to incrementally update the learnt model and decision boundary for new batches of training data. Training RVM using data divided into granules may further boost the performance. Our hypothesis is that this unique combination of granulation and incremental learning when applied to RVM can improve the performance. The proposed variant of RVM is more focussed towards developing an adaptive

Case study: multimodal biometric match score classification

The formulations of the proposed algorithms are particularly helpful when the size of databases is large and they are unbalanced in terms of samples per class. Biometrics projects have both these characteristics. For instance, projects such as Aadhaar and US VISIT have millions of enrollments. The recognition pipeline of such projects involves four primary steps: (a) preprocessing, (b) segmentation, (c) feature representation, and (d) matching. iGRVM can be used for matching the extracted

Conclusions

The main contribution of this research is to propose incremental and granular learning in RVM and develop iGRVM classifier. The proposed classifier not only preserves the sparse property of original RVM classifier, but it is also scalable, faster and can be trained with unbalanced large training samples. The case study on multibiometric score classification is performed using the NIST BSSR1, CASIA-Iris-Distance V4, and Biosecure DS2 databases. Experimental results suggest that the proposed

Conflict of interest

None declared.

Hunny Mehrotra received M.Tech and Ph.D. degrees in Computer Science in 2010 and 2014 respectively from National Institute of Technology Rourkela, India. Her area of research includes biometrics, image processing, and computer vision. She has been conferred with various prestigious awards such as Google India Women in Engineering Award in 2010, Innovative Student Project Award in 2010 by INAE, and fellowship in 2012 from Department of Science and Technology under Women Scientist Scheme,

References (38)

C.-X. Ren et al.
Incremental learning of bidirectional principal components for face recognition
Pattern Recognit.
(2010)
R. Singh et al.
Biometric classifier update using online learninga case study in near infrared face verification
Image Vis. Comput.
(2010)
R. Barandela et al.
Strategies for learning in class imbalance problems
Pattern Recognit.
(2003)
Y. Sun et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognit.
(2007)
M.A. Tahir et al.
Inverse random under sampling for class imbalance problem and its application to multi-label classification
Pattern Recognit.
(2012)
S. Nikitidis et al.
Multiplicative update rules for incremental training of multiclass support vector machines
Pattern Recognit.
(2012)
R. Singh et al.
Integrated multilevel image fusion and match score fusion of visible and infrared face images for robust face recognition
Pattern Recognit.
(2008)
N. Poh et al.
A Multimodal biometric test bed for quality-dependent, cost-sensitive and client-specific score-level fusion algorithms
Pattern Recognit.
(2010)
S. Bharadwaj et al.
Qfuseonline learning framework for adaptive biometric system
Pattern Recognit.
(2015)
V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc.,...

Unique Identification Authority of India....

US-VISIT....

N. A. Syed, S. Huan, L. Kah, K. Sung, Incremental Learning with Support Vector Machines, in: Workshop on Support Vector...

H. Zhao et al.

Incremental principal component analysis and its application for face recognition

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

(2006)

H. Zhao et al.

Incremental linear discriminant analysis for face recognition

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

(2008)

H. Lamba, T. Dhamecha, M. Vatsa, R. Singh, Incremental subclass discriminant analysis: a case study in face...

G. Cauwenberghs et al.

Incremental and decremental support vector machine learning

Adv. Neural Inf. Process. Syst.

(2001)

H. He et al.

Learning from imbalanced data

IEEE Trans. Knowl. Data Eng.

(2009)

N.V. Chawla et al.

SMOTEsynthetic minority over-sampling technique

J. Artif. Intell. Res.

(2002)

Cited by (29)

A novel symplectic relevance matrix machine method for intelligent fault diagnosis of roller bearing
2022, Expert Systems with Applications
Citation Excerpt :
The super parameter controls the weight one-to-one, and the value of super parameter is obtained by iterating the data repeatedly. In practical application, the posterior distribution of most weights is close to zero, which indicates that RVM model is sparse (He, Xiao, & Wang, 2017; Mehrotra, Singh, & Vatsa, 2016). Unfortunately, although RVM overcomes the sparsity and statistical probability of SVM, it does not effectively solve the robustness of algorithm.
In the fault classification of roller bearing, it is often encountered that the input sample is composed of feature matrix with rich structural information. As a new classifier, support matrix machine (SMM) makes full use of the structural information of input samples to establish prediction model. However, SMM lacks necessary probability information, and its sparsity and robustness are not clear. To resolve the above issues, a new matrix classifier - symplectic relevance matrix machine (SRMM) is proposed based on the probability framework and symplectic geometry theory. In SRMM, the model takes the original signal matrix as the input, and constructs the model elements with rich structural information. Then, symplectic geometry matrix is obtained by symplectic geometry similarity transformation, which makes SRMM robust. Meanwhile, the model elements are constructed by symplectic geometry, which can solve the difficulty of constructing recursive kernel function, reduce the complexity of the model, avoid the time consumption in the process of model parameter optimization. The experimental results of three roller bearing datasets show that SRMM has good classification performance in roller bearing fault diagnosis by comparing recognition rate, time, kappa, accuracy, recall rate and F₁ and statistical test.
On the design of Bayesian principled algorithms for imbalanced classification
2021, Knowledge-Based Systems
A principled methodology for solving imbalanced binary classification problems has been recently introduced. It permits to obtain high performance designs avoiding the risks of degradation that other procedures suffer from. The corresponding paper Benítez-Buenache et al. (2019) shows evidence of these facts by applying direct versions, using just one of the possible rebalancing techniques and applying full rebalancing.
In this contribution, we extend the above study for maximizing the performance of the resulting designs. To this end, we combine principled techniques in order to taking benefit from their different characteristics. The combination weights as well as the rebalance degree are selected by means of a simple (cross-validation) search. A number of experiments with different kinds of databases shows significant performance improvements. At the same time, the database characteristics that limit the performance improvements $-$ such as small size and noisy samples $-$ are detected.
Between-subclass piece-wise linear solutions in large scale kernel SVM learning
2019, Pattern Recognition
The paper proposes a novel approach for learning kernel Support Vector Machines (SVM) from large scale data with reduced computation time. The proposed approach, termed as Subclass Reduced Set SVM (SRS-SVM), utilizes the subclass structure of data to effectively estimate the candidate support vector set. Since the candidate support vector set cardinality is only a fraction of the training set cardinality, learning SVM from the former requires less time without significantly changing the decision boundary. SRS-SVM depends on a domain knowledge related input parameter, i.e., number of subclasses. To reduce the domain knowledge dependency and to make the approach less sensitive to the subclass parameter, we extend the proposed SRS-SVM to create a robust and improved hierarchical model termed as the Hierarchical Subclass Reduced Set SVM (HSRS-SVM). Since SRS-SVM and HSRS-SVM splits non-linear optimization problem into multiple (smaller) linear optimization problems, both of them are amenable to parallelization. The effectiveness of the proposed approaches is evaluated on four synthetic and six real-world datasets. The performance is also compared with traditional solver (LibSVM) and state-of-the-art approaches such as divide-and-conquer SVM, FastFood, and LLSVM. The experimental results demonstrate that the proposed approach achieves similar classification accuracies while requiring fewer folds of reduced computation time as compared to existing solvers. We further demonstrate the suitability and improved performance of the proposed HSRS-SVM with deep learning features for face recognition using Labeled Faces in the Wild (LFW) dataset.
Likelihood ratio equivalence and imbalanced binary classification
2019, Expert Systems with Applications
Citation Excerpt :
Such problems have much relevance, and are frequent in practice. Consequently, there is a long list of works addressing different applications, including Rao, Krishnan, and Niculescu (2006), Mazurowski et al. (2008), Mena and González (2009), Freitas (2011) and Nahar, Imam, Tickle, and Chen (2013) in medicine, Radivojac, Chawla, Dunker, and Obradovic (2004), Batuwita and Palade (2009), Yu, Ni, and Zhao (2013) and Triguero et al. (2015) in bioinformatics, Viola and Jones (2004), Tao, Tang, Li, and Wu (2006), Kwak (2008), Chen, Fang, Huo, and Li (2011) and De la Torre, Granger, Sabourin, and Gorodnichy (2015) in image processing and retrieval, Liao (2008), Park, Oh, and Pedrycz (2013) and Seiffert, Khoshgoftaar, Van Hulse, and Folleco (2014) in production processes, Chan and Stolfo (1998), Phua, Alahakoon, and Lee (2004), Tavallaee, Stakhanova, and Ghorbani (2010) and Mehrotra, Singh, Vatsa, and Majhi (2016) in security and safety, Liu, Hsu, and Ma (1999) and Zhou (2013) in business and finance, Manevitz and Yousef (2001) and Tong and Koller (2001) in text classification, Tsai, Chang, and Chiang (2009) in meteorology and González et al. (2013) in biology. The most widely used classifiers, also referred to as discriminative machines - including those employing multi-layer perceptrons (MLPs) and radial basis function networks (RBFNs), support vector machines (SVMs), and the corresponding machine ensembles - are sensitive to imbalance because their parameter values are established by algorithms that try to optimize performance measures that do not consider imbalance effects.
This contribution proves that neutral re-balancing mechanisms, that do not alter the likelihood ratio, and training discriminative machines using Bregman divergences as surrogate costs are necessary and sufficient conditions to estimate the likelihood ratio of imbalanced binary classification problems in a consistent manner. These two conditions permit the estimation of the theoretical Neyman–Pearson operating characteristic corresponding to the problem under study. In practice, a classifier operates at a certain working point corresponding to, for example, a given false positive rate. This perspective allows the introduction of an additional principled procedure to improve classification performance by means of a second design step in which more weight is assigned to the appropriate training samples. The paper includes a number of examples that demonstrate the performance capabilities of the methods presented, and concludes with a discussion of relevant research directions and open problems in the area.
A multi-scale prediction model based on empirical mode decomposition and chaos theory for industrial melt index prediction
2019, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
In addition, relevance vectors of RVM are sparser than support vectors of SVM due to prior distribution of relevance vectors. Due to these advantages, the RVM can effectively solve high-dimensional, nonlinear classification and regression problems [45–47]. A schema of IACO optimized RVM chaotic prediction model is shown in Fig. 1.
Melt index (MI) is one of the most important variables determining the product quality in the industrial propylene polymerization process. In this paper, a multi-scale prediction model is proposed for MI prediction by combining the empirical mode decomposition (EMD), chaos theory and optimized relevance vector machine (RVM) model. First, the EMD method is used to decompose the MI time series into intrinsic mode functions (IMFs) and the residual. Then the chaotic characteristics of each component are identified with chaos theory. For the components with chaotic characteristics, relevance vector machine (RVM) chaotic prediction model is developed as the predictive model. For the components without chaotic characteristics, least squares support vector machine (LSSVM) is used as the predictive model. At the same time, an improved ant colony optimization (IACO) algorithm is used to optimize the parameters of RVM and LSSVM. In the end, the final prediction results of MI are obtained by summing the predicted results of all components. Research on the proposed multi-scale model is carried out on a real propylene polymerization plant and the results are compared among the RVM-chaos, IACO-RVM-chaos and multi-scale models. The research results show that the model developed achieves a good performance in the industrial MI prediction process.
Helical fault diagnosis model based on data-driven incremental mergence
2018, Computers and Industrial Engineering
Citation Excerpt :
To solve this problem, Mao, He, Yan, and Wang (2017) used the bearing data which arrived in sequence to rebuild the obtained granules and principal curves to dynamically update the diagnosis model. Mehrotra et al. (2016) proposed an iGRVM algorithm. The relevant vectors obtained from the previous procedure are trained with the new data in each incremental phase to keep and efficiently transmit the effective information of the original data and the relationship of incremental data.
With the improving capability for acquiring real-time data in the field of intelligent manufacturing, the data-driven machine learning approach has been an effective means for equipment fault diagnosis. Although incremental learning can make up for the shortcoming of machine learning that newly generated data must be combined with the original data for retraining, it cannot be carried out directly and effectively in the face of problems caused by fault data streams of massive-volume, imbalance, strong noise, and strong causality. In this paper, a helical fault diagnosis model based on data-driven incremental mergence is proposed to tackle this problem. Each helical cycle includes four procedures to handle incremental data blocks for imbalanced data processing, feature extraction and classification, effective example selection, and dynamic evaluation of features and examples. The effective features and examples are then transmitted to the next helical cycle to merge for preserving the fault information. The experimental results of bearing operation data demonstrate that the proposed model could efficiently solve the problem of incremental learning with massive and imbalanced fault data, significantly improve the recognition rate of minority faults, and reduce the time cost, thus contributing to meeting the specific requirements of equipment fault data.

View all citing articles on Scopus

Richa Singh received the M.S. and Ph.D. degrees in computer science from West Virginia University, Morgantown, USA, in 2005 and 2008, respectively. She is currently an Associate Professor and the Kusum and TV Mohandas Pai Faculty Research Fellow with the Indraprastha Institute of Information Technology Delhi, India. Her research has been funded by the UIDAI and DeitY, India. Her areas of interest are biometrics, pattern recognition, and machine learning. She is a recipient of the FAST Award by DST, India. She is also an Editorial Board Member of Information Fusion (Elsevier), IEEE Access, and the EURASIP Journal of Image and Vision Processing (Springer). She is a member of the Computer Society and the Association for Computing Machinery. She has co-authored over 150 research papers and received several best paper and best poster awards in international conferences. She is a recipient of the NVIDIA Innovation Award 2015 and the Best Reviewer Award at the IAPR International Conference on Biometrics 2013. She serves as the General Co-Chair of the IEEE International Conference on Identity, Security and Behavior Analysis 2017 and PC Co-Chair of the IEEE International Conference on Biometrics: Theory, Applications, and Systems 2016.

Mayank Vatsa received the M.S. and Ph.D. degrees in computer science from West Virginia University, Morgan- town, USA, in 2005 and 2008, respectively. He is currently an Associate Professor and AR Krishnaswamy Faculty Research Fellow with the Indraprastha Institute of Information Technology Delhi, India. His research has been funded by the UIDAI, DST, and DeitY. He has authored over 150 research papers and received several best paper and best poster awards and the NVIDIA Innovation Award 2015. His areas of interest are biometrics, image processing, machine learning, and information fusion. He is a recipient of the FAST Award by DST, India. He is a member of the Computer Society and the Association for Computing Machinery. He is the Vice President (Publications) of the IEEE Biometrics Council, an Area Editor of the IEEE Biometric Compendium, and an Associate Editor of the Information Fusion Journal (Elsevier) and the IEEE ACCESS. He was the Program Committee Co-Chair of the IAPR International Conference on Biometrics 2013 and the IEEE/IAPR International Joint Conference on Biometrics 2014. He also serves as the PC Co-Chair of IEEE International Conference on Identity, Security and Behavior Analysis 2017.

Banshidhar Majhi is Professor in Department of Computer Science and Engineering, National Institute of Technology Rourkela, India since 2006. He has 20 years of teaching and research experience. He has published several articles in refereed journals and international conferences. He has worked on several government funded projects. His area of interest includes data structures, image processing, cryptography, biometrics, parallel processing and soft computing.

View full text

Incremental granular relevance vector machine: A case study in multimodal biometrics

Highlights

Abstract

Introduction

Section snippets

Incremental granular relevance vector machine

Case study: multimodal biometric match score classification

Conclusions

Conflict of interest

Pattern Recognit.

Image Vis. Comput.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Incremental principal component analysis and its application for face recognition

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

Incremental linear discriminant analysis for face recognition

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

Incremental and decremental support vector machine learning

Adv. Neural Inf. Process. Syst.

Learning from imbalanced data

IEEE Trans. Knowl. Data Eng.

SMOTEsynthetic minority over-sampling technique

J. Artif. Intell. Res.