A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection
Introduction
Ensemble learning is useful in overcoming the problems of single classifier systems, i.e. computational problems: when the learning process of a weak classifier is imperfect, statistical problems: when learning data is too small to capture the entire hypotheses space and representational problems: when the true target function cannot be found by any of the hypothesis from the hypotheses space (Dietterich, 1997). One of the active areas of research in supervised learning has been to study methods for constructing good ensembles of classifiers (Dietterich, 2000a).
It has been observed that the performance of ensemble learning depends heavily on the diversity among the individual classifiers of an ensemble. Polikar (2006) defines four ways to increase the diversity among the base classifiers: 1) by using different training data to train the base classifiers, 2) by using diverse training parameters, 3) by using different features for training the base classifiers, and 4) by combining different types of classifiers.
Multiple classifier systems (MCS) sometimes referred to as a committee of classifiers or a mixture of experts have been exploited by various algorithms (Polikar, 2006). Bagging, boosting, stacking and random forest are the popular methods based on MCS paradigm. Multiple variants of these ensemble methods have been proposed and used in the past, such as Ubagging (Liang & Cohn, 2013), AdaBoost (Freund, Schapire, 1997, Sun, yue Jia, Li, 2011), AveBoost (Oza, 2003), conservative boosting (Kuncheva & Whitaker, 2002), GA-stacking (Ledezma, Aler, Sanchis, & Borrajo, 2010), cooperative ensemble learning system (CELS) (Yong Liu & Xin Yao, 1998), etc.
Stacking (Wolpert, 1992) and bagging (Breiman, 1996) are two popular ensemble learning approaches applied in various real-world scenarios such as intrusion detection, spam classification, credit scoring etc. (du Jardin, 2018, Papouskova, Hajek, 2019, Porwik, Doroz, Wrobel, 2019, Ruano-Ords, Yevseyeva, Fernandes, Mndez, Emmerich, 2019, Syarif, Zaluska, Prugel-Bennett, Wills, 2012, Zhang, Mahadevan, 2019).
Stacking uses a meta-classifier to fuse the ensemble outputs, whereas voting, weighted majority voting etc. are the common ways to combine ensemble outputs in bagging. Also, the diversity in stacking is achieved by using heterogeneous classifiers on the same training set, whereas in bagging we try to gain diversity by using the same base classifier on different training sets (Bian & Wang, 2007). However, as these different training sets are bootstrapped from a single dataset, they are not entirely disjoint with each other, which results in low diversity (Banfield, Hall, Bowyer, & Kegelmeyer, 2005).
Several modified versions of popular ensemble learning approaches have been proposed in the past (Cheplygina, Tax, Loog, 2016, Ditzler, LaBarck, Ritchie, Rosen, Polikar, 2018, Ting, Witten, 1997), but to the best of our knowledge the adaptiveness of the algorithm towards the dataset has not been explored yet.
Ensemble learning-based approaches have been used in the past for spoof fingerprint detection where the decisions of multiple base classifiers are integrated to classify an image as “live” or “spoof” (Ding, Ross, 2016, Kho, Lee, Choi, Kim, 2019). Although ensemble learning is well-known for this particular application, to the best of our knowledge, stacking has not been used for spoof fingerprint detection. We claim that for such applications, instead of straightforward usage of base classifiers, it is crucial to adapt to the features of the dataset and to adjust the learning model accordingly.
Merz (1999) argues that having a disjoint set of classifiers is advantageous in the ensemble learning as it yields weakly correlated predictions. This motivated us to maintain the diversity of the ensemble by dividing the original training set into multiple subsets using clustering. In that way, we are able to generate a diverse set of classifiers by considering the features extracted from live and spoof fingerprint images of the dataset.
The models for fingerprint recognition are vulnerable to attacks by spoof fingerprints made of different moulds of substances like silicon, wood glue, latex, gelatin, etc. Therefore, it is required to perform liveness detection before fingerprint recognition to ensure that fabricated moulds are not used for authentication. Examples of spoof fingerprints generated using these substances are shown in Fig. 1.
Local Binary Patterns (LBP) is an efficient way to determine the texture of an image by labelling each pixel with a binary value based on the thresholds on the neighbouring pixels (Jia, Yang, Cao, Zang, Zhang, Dai, Zhu, Tian, 2014, Nanni, Lumini, 2008). LBP considers the central pixel as the threshold and based on that it assigns the binary values to the neighbouring pixels. LBP value of the pixel is calculated by summing up the element-wise product of the binary values with their weights. LBP histograms are robust in terms of grayscale variations, making them suitable for spoof fingerprint detection, as they can easily incorporate fingerprints with skin distortions, different skin qualities, dry, moist or dirty skin.
- •
We explore the behaviours of stacking and bagging with various base classifiers on spoof fingerprint detection problem.
- •
We emphasize that the learning algorithms must be adaptive towards the properties inherent in the dataset.
- •
We establish that the diversity among the ensemble of classifiers can be achieved by performing clustering on the original training set and forming subsets of it.
- •
We propose adaptive models of stacking and bagging for spoof fingerprint detection and show their competitiveness on class balanced and imbalanced datasets.
Section snippets
Stacking
Stacking (Wolpert, 1992) is a learning approach based on ensemble learning which combines the predictions made by multiple base classifiers generated by using different learning algorithms . These classifiers are trained on the same training data DTrain containing examples in the form where xi is the input vector, and yi is the class label associated with it.
In the first phase, base classifiers make predictions for the query instance xq. In the second phase, the
Bagging
Bagging (Breiman, 1996) is a method of generating multiple versions of a base classifier by making bootstrapped replicates of training data and using them to get an aggregated predictor. The performance of Bagging improves if used with an unstable learner, i.e. if the learner causes significant changes by perturbing the training set.
Let the size of the original training set DTrain is N. Our task is to generate n bags of size N each by sampling DTrain with replacement. These n bags of instances
Spoof fingerprint detection
The application we consider in this paper is spoof fingerprint detection which has its importance in forensics and information security (Ding, Ross, 2016, Kho, Lee, Choi, Kim, 2019, Nogueira, de Alencar Lotufo, Campos Machado, 2016, Rattani, Scheirer, Ross, 2015). The machine learning methods for spoof/liveness detection are usually grouped into two categories: dynamic features based methods and static features based methods (Marasco & Ross, 2014). Dynamic features are identified as skin
Experimental setup
We use python Weka wrapper to use clustering and classification functionalities of Weka (Hall et al., 2009). All the original datasets have been randomized and divided into 80:20 ratio for training DTrain and validation DValid, so that the validation set remains disjoint from the training set. We use Simple-kMeans (Arthur & Vassilvitskii, 2007) as our clustering algorithm which performs reasonably well on the chosen datasets with k = 3. We encourage the readers to experiment with various values
Conclusions
In this study, we explore the behaviour of various ensemble learning approaches to spoof fingerprint detection. We propose A-Stacking and A-Bagging: the adaptive versions of ensemble learning approaches Stacking and Bagging, respectively. We hypothesize that the learning algorithms must take into consideration the similarity inherently present in the data. By doing so, the experts can be made adaptive towards the task associated with the dataset.
To maintain diversity among the ensemble, we
Author contribution statements
Ravindranath Chowdary C defined the problem statement and Shivang Agarwal worked on the problem under the supervision of Ravindranath Chowdary C. This work was done as part of the PhD programme of Shivang Agarwal which started in 2017. This work is our original work and is currently not submitted anywhere else.
Declaration of Competing Interest
Tha authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (56)
- et al.
Ensemble diversity measures and their application to thinning
Information Fusion
(2005) - et al.
Machine learning models and bankruptcy prediction
Expert Systems with Applications
(2017) - et al.
An experimental comparison of classification algorithms for imbalanced credit scoring data sets
Expert Systems with Applications
(2012) - et al.
Enhanced artificial intelligence for ensemble approach to predicting high performance concrete compressive strength
Construction and Building Materials
(2013) - et al.
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences
(1997) - et al.
Local contrast phase descriptor for fingerprint liveness detection
Pattern Recognition
(2015) Failure pattern-based ensembles applied to bankruptcy forecasting
Decision Support Systems
(2018)- et al.
Multi-scale local binary pattern with filters for spoof fingerprint detection
Information Sciences
(2014) - et al.
An incremental learning method for spoof fingerprint detection
Expert Systems with Applications
(2019) - et al.
Neural network ensemble with probabilistic fusion and its application to gait recognition
Neurocomputing
(2009)
Local binary patterns for a hybrid fingerprint matcher
Pattern Recognition
Brain mr image classification using two-dimensional discrete wavelet transform and adaboost with random forests
Neurocomputing
Two-stage consumer credit risk modelling using heterogeneous ensemble learning
Decision Support Systems
An ensemble learning approach to lip-based biometric verification, with a dynamic selection of classifiers
Expert Systems with Applications
Estimation and decision fusion: A survey
Neurocomputing
Application of bagging, boosting and stacking to intrusion detection
A comparative assessment of ensemble learning for credit scoring
Expert Systems with Applications
Stacked generalization
Neural Networks
A novel heterogeneous ensemble credit scoring model based on bstacking approach
Expert Systems with Applications
Livdet 2011 - fingerprint liveness detection competition 2011
k-means++: the advantages of careful seeding
On diversity and accuracy of homogeneous and heterogeneous ensembles
International Journal of Hybrid Intelligence Systems
Bagging predictors
Machine Learning
Random forests
Machine Learning
Ridge estimators in logistic regression
Applied Statistics
Dissimilarity-based ensembles for multiple instance learning
IEEE Transactions on Neural Networks and Learning Systems
IEEE Transactions on Information Forensics and Security
Fingerprint Spoof Buster: Use of Minutiae-Centered Patches
Machine-learning research–four current directions
AI Magazine
Cited by (75)
Pattern recognition system for rapid detection of gases using microfluidic olfaction detector: A case study using methane and ethane
2024, Sensors and Actuators B: ChemicalEnsemble learning based software defect prediction
2023, Journal of Engineering Research (Kuwait)Accelerating automatic hate speech detection using parallelized ensemble learning models
2023, Expert Systems with ApplicationsUncertainty management in electricity demand forecasting with machine learning and ensemble learning: Case studies of COVID-19 in the US metropolitans
2023, Engineering Applications of Artificial IntelligenceSFincBuster: Spoofed fingerprint buster via incremental learning using leverage bagging classifier
2023, Image and Vision Computing