Abstract
The performance of any biometric recognition system heavily dependents on finding a good and suitable feature representation space where observations from different classes are well separated. Unfortunately, finding this proper representation is a challenging problem which has taken a huge interest in machine learning and computer vision communities. In the this paper we present a comprehensive overview of the different existing feature representation techniques. This is carried out by introducing simple and clear taxonomies as well as effective explanation of the prominent techniques. This is intended to guide the neophyte and provide researchers with state-of-the-art approaches in order to help advance the research topic in biometrics.
Similar content being viewed by others
References
Al Maadeed S, Jiang X, Rida I, Bouridane A (2018) Palmprint identification using sparse and dense hybrid representation. Multimedia Tools and Applications:1–15. https://doi.org/10.1007/s11042-018-5655-8
Amari S (1999) Natural gradient learning for over-and under-complete bases in ICA. Neural Comput 11(8):1875–1883. https://doi.org/10.1162/089976699300015990
Archibald R, Fann G (2007) Feature selection and classification of hyperspectral images with support vector machines. IEEE Geosci Remote Sens Lett 4(4):674–677. https://doi.org/10.1109/LGRS.2007.905116
Bach F (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225
Bakshi S, Tuglular T (2013) Security through human-factors and biometrics. In: Proceedings of the 6th International Conference on Security of Information and Networks. ACM, pp 463–463. https://doi.org/10.1145/2523514.2523597
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550. https://doi.org/10.1109/72.298224
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS, vol 14, pp 585–591
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv:1306.6709
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50
Bishop CM (2006) Pattern recognition. Machine Learning
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Bleakley K, Vert JP (2011) The group fused lasso for multiple change-point detection. arXiv:1106.4199
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177
Chang X, Nie F, Yang Y, Zhang C, Huang H (2016) Convex sparse PCA for unsupervised feature learning. ACM Trans Knowl Discov Data (TKDD) 11(1):3. https://doi.org/10.1145/29105854
Chen X, Yuan G, Wang W, Nie F, Chang X, Huang JZ (2018) Local adaptive projection framework for feature selection of labeled and unlabeled data. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2830186
Cheng B, Yang J, Yan S, Fu Y, Huang TS (2010) Learning with ℓ1-graph for image analysis. IEEE Trans Image Process 19(4):858–866. https://doi.org/10.1109/TIP.2009.2038764
Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16:2859–2900
d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GR (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49(3):434–448. https://doi.org/10.1137/050645506
De Ridder D, Duin RP, Kittler J (2002) Texture description by independent components. In: Structural, Syntactic, and Statistical Pattern Recognition. Springer, pp 587–596. https://doi.org/10.1007/3-540-70659-3_61
Diamant I, Klang E, Amitai M, Konen E, Goldberger J, Greenspan H (2017) Task-driven dictionary learning based on mutual information for medical image classification. IEEE Trans Biomed Eng 64(6):1380–1392. https://doi.org/10.1109/TBME.2016.2605627
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745. https://doi.org/10.1109/TIP.2006.881969
Elad M, Figueiredo MA, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982. https://doi.org/10.1109/JPROC.2009.2037655
Eshelman LJ (2014) The CHC adaptive search algorithm: How to have safe search when engaging. Found Genet Algorithm 1991 (FOGA 1(1):265. https://doi.org/10.1016/B978-0-08-050684-5.50020-3
Evgeniou T, Poggio T, Pontil M, Verri A (2002) Regularization and statistical learning theory for data analysis. Comput Stat Data Anal 38(4):421–432. https://doi.org/10.1016/S0167-9473(01)00069-X
Fan M, Chang X, Tao D (2017) Structure regularized unsupervised discriminant feature analysis. In: AAAI, pp 1870–1876
Fei L, Teng S, Wu J, Rida I (2017) Enhanced minutiae extraction for high-resolution palmprint recognition. Int J Image Graph 17(04):1750,020. https://doi.org/10.1142/S0219467817500206
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7(2):179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x4
Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345
Goldberg DE et al (1989) Genetic algorithms in search optimization and machine learning, vol 412. Addison-wesley, Reading. ISBN: 0201157675
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18 (5):602–610. https://doi.org/10.1016/j.neunet.2005.06.042
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hastie T, Tibshirani R, Friedman J (2001) Springer series in statistics. The elements of statistical learning: Data mining, inference and prediction. https://doi.org/10.1007/978-0-387-84858-7
He X, Cai D, Yan S, Zhang HJ (2005) Neighborhood preserving embedding. In: IEEE International conference on computer vision, vol 2, pp 1208–1213. https://doi.org/10.1109/ICCV.2005.167
Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New York
Jiang Z, Lin Z, Davis LS (2011) Learning a discriminative dictionary for sparse coding via label consistent k-svd. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1697–1704. https://doi.org/10.1109/CVPR.2011.5995354
Jimenez LO, Landgrebe DA (1998) Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern Part C: Appl Rev 28(1):39–54. https://doi.org/10.1109/5326.661089
John GH, Kohavi R, Pfleger K et al (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, pp 121–129. https://doi.org/10.1016/B978-1-55860-335-6.50023-4
Joho M, Mathis H, Lambert RH (2000) Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture. In: Proceedings of International conference on independent component analysis and blind signal separation. Helsinki, pp 81–86
Journée M., Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
Kao YH, Van Roy B (2013) Learning a factor model via regularized PCA. Mach Learn 91(3):279–303. https://doi.org/10.1007/s10994-013-5345-8
Kennedy J (2011) Particle swarm optimization. In: Encyclopedia of machine learning. Springer, pp 760–766. https://doi.org/10.1007/978-0-387-30164-8
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Kokiopoulou E, Saad Y (2007) Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans Pattern Anal Mach Intell 29(12):2143–2156. https://doi.org/10.1109/TPAMI.2007.1131
Kong S, Wang D (2012) A brief summary of dictionary learning based approach for classification (revised). arXiv:1205.6544
Langley P et al (1994) Selection of relevant features in machine learning. Defense Technical Information Center
Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166. https://doi.org/10.1109/TPAMI.2004.71
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinforma (TCBB) 9(4):1106–1119. https://doi.org/10.1109/TCBB.2012.33
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Liu H, Setiono R et al (1996) A probabilistic approach to feature selection-a filter solution. In: ICML. Citeseer, vol 96, pp 319–327
Luo M, Nie F, Chang X, Yang Y, Hauptmann A, Zheng Q (2016) Avoiding optimal mean robust pca/2dpca with non-greedy ℓ1 norm maximization. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence
Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2017) Avoiding optimal mean ℓ2,1, norm maximization-based robust pca for reconstruction. Neural computation 29(4):1124–1150. https://doi.org/10.1162/NECO_a_00937
Luo M, Chang X, Li Z, Nie L, Hauptmann A, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77. https://doi.org/10.1016/j.cviu.2017.07.001
Luo M, Chang X, Nie L, Yang Y, Hauptmann A, Zheng Q (2018) An adaptive semisupervised feature analysis for video semantic recognition. IEEE Trans Cybern 48(2):648–660. https://doi.org/10.1109/TCYB.2017.2647904
Luo M, Nie F, Chang X, Yang Y, Hauptmann A, Zheng Q (2018) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956. https://doi.org/10.1109/TNNLS.2017.2650978
Ma Z, Chang X, Xu Z, Sebe N, Hauptmann A (2017) Joint attributes and event analysis for multimedia event detection. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2017.2709308
Mairal J, Elad M, Sapiro G (2008) Sparse representation for color image restoration. IEEE Trans Image Process 17(1):53–69. https://doi.org/10.1109/TIP.2007.911828
Mairal J, Bach F, Ponce J (2014) Sparse modeling for image and vision processing. arXiv:1411.3230
Mairal J, Ponce J, Sapiro G, Zisserman A, Bach F (2009) Supervised dictionary learning. In: Advances in neural information processing systems, pp 1033–1040
Mukherjee S, Rifkin R, Poggio T (2003) Regression and classification with regularization. In: Nonlinear estimation and classification. Springer, pp 111–128. https://doi.org/10.1007/978-0-387-21579-2_7
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 100(9):917–922. https://doi.org/10.1109/TC.1977.1674939
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. The London. Edinb Dublin Philos Mag J Sci 2(11):559–572. https://doi.org/10.1080/147864401094627209
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10(3):61–74
Porrill J, Stone JV (1998) Undercomplete independent component analysis for signal separation and dimension reduction. Technical report, Citeseer
Poultney C, Chopra S, Cun YL et al (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144
Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3501–3508. https://doi.org/10.1109/CVPR.2010.5539964
Rida I, Almaadeed S, Bouridane A (2014) Improved gait recognition based on gait energy images. In: 26Th international conference on microelectronics (ICM). IEEE, pp 40–43. https://doi.org/10.1109/ICM.2014.7071801
Rida I, Herault R, Gasso G (2014) Supervised music chord recognition. In: 2014 13th international conference on Machine learning and applications (ICMLA). IEEE, pp 336–341. https://doi.org/10.1109/ICMLA.2014.60
Rida I, Al Maadeed S, Bouridane A (2015) Unsupervised feature selection method for improved human gait recognition. In: 23Rd european signal processing conference (EUSIPCO). IEEE, pp 1128–1132. https://doi.org/10.1109/EUSIPCO.2015.7362559
Rida I, Bouridane A, Marcialis GL, Tuveri P (2015) Improved human gait recognition. In: Image Analysis and Processing (ICIAP). Springer, pp 119–129. https://doi.org/10.1007/978-3-319-23234-8_12
Rida I, Boubchir L, Al-Maadeed N, Al-Maadeed S, Bouridane A (2016) Robust model-free gait recognition by statistical dependency feature selection and globality-locality preserving projections. In: 39Th international conference on telecommunications and signal processing (TSP). IEEE, pp 652–655. https://doi.org/10.1109/TSP.2016.7760963
Rida I, Jiang X, Marcialis GL (2016) Human body part selection by group lasso of motion for model-free gait recognition. IEEE Signal Process Lett 23(1):154–158. https://doi.org/10.1109/LSP.2015.2507200
Rida I, Almaadeed S, Bouridane A (2016) Gait recognition based on modified phase-only correlation. SIViP 10(3):463–470. https://doi.org/10.1007/s11760-015-0766-4
Rida I, Al Maadeed N, Marcialis GL, Bouridane A, Herault R, Gasso G (2017) Improved model-free gait recognition based on human body part. In: Biometric Security and Privacy. Springer, pp 141–161. https://doi.org/10.1007/978-3-319-47301-7_6
Rida I, Al-maadeed N, Al-maadeed S (2018) Robust gait recognition: a comprehensive survey. IET Biometrics. https://doi.org/10.1049/iet-bmt.2018.5063
Rida I, Al-Maadeed S, Mahmood A, Bouridane A, Bakshi S (2018) Palmprint identification using an ensemble of sparse representations. IEEE Access 6:3241–3248. https://doi.org/10.1109/ACCESS.2017.2787666
Rida I, Herault R, Marcialis GL, Gasso G (2018) Palmprint recognition with an efficient data driven ensemble classifier. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.04.033
Rida I, Maadeed SA, Jiang X, Lunke F, Bensrhair A (2018) An ensemble learning method based on random subspace sampling for palmprint identification. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2047–2051. https://doi.org/10.1109/ICASSP.2018.8462051
Roweis S (1998) EM algorithms for PCA and SPCA. Advances in neural information processing systems:626–632
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323
Salakhutdinov R, Hinton GE (2009) Deep boltzmann machines. In: AISTATS, vol 1, p 3
Scholkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Advances in Kernel Methods-Support Vector Learning
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge. ISBN: 0262194759
Silva VD, Tenenbaum JB (2002) Global versus local methods in nonlinear dimensionality reduction. In: Advances in neural information processing systems, pp 705–712
Spearman C (1904) General intelligence, objectively determined and measured. Amer J Psychol 15(2):201–292. https://doi.org/10.2307/1412107
Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning for signal processing applications. IEEE Trans Pattern Anal Mach Intell 32(5):788–798. https://doi.org/10.1109/TPAMI.2009.98
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323. https://doi.org/10.1126/science.290.5500.2319
Theis FJ, Lang EW, Puntonet CG (2004) A geometric algorithm for overcomplete linear ICA. Neurocomputing 56:381–398. https://doi.org/10.1016/j.neucom.2003.09.008
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc. Series B (Methodological) 58:267–288
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J Royal Stat Soc Ser B (Stat Methodol) 67(1):91–108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
Tipping M, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc Ser B (Stat Methodol) 61(3):611–622. https://doi.org/10.1111/1467-9868.00196
Torgerson WS (1952) Multidimensional scaling: i. theory and method. Psychometrika 17(4):401–419. https://doi.org/10.1007/BF02288916
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
Vapnik V (1995) The nature of statistical learning theory. Springer Science & Business Media, Berlin
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction. In: Computational Intelligence and Bioinspired Systems. Springer, pp 758–770. https://doi.org/10.1007/11494669_93
Wang S, Chang X, Li X, Long G, Yao L, Sheng QZ (2016) Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans Knowl Data Eng 28(12):3191–3202. https://doi.org/10.1109/TKDE.2016.2605687
Wang R, Nie F, Hong R, Chang X, Yang X, Yu W (2017) Fast and orthogonal locality preserving projections for dimensionality reduction. IEEE Trans Image Process 26(10):5019–5030. https://doi.org/10.1109/TIP.2017.2726188
Wang S, Li X, Yao L, Sheng QZ, Long G et al (2017) Learning multiple diagnosis codes for ICU patients with local disease correlation mining. ACM Trans Knowl Discov Data (TKDD) 11(3):31. https://doi.org/10.1145/3003729
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244
Welling M, Zemel RS, Hinton GE (2004) Probabilistic sequential independent components analysis. IEEE Trans Neural Netw 15(4):838–849. https://doi.org/10.1109/TNN.2004.828765
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044. https://doi.org/10.1109/JPROC.2010.2044470
Wu L, Wang Y, Pan S (2016) Exploiting attribute correlations: a novel trace lasso-based weakly supervised dictionary learning method. IEEE Trans Cybern 47(12):4497–4508. https://doi.org/10.1109/TCYB.2016.2612686
Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. In: IEEE International conference on image processing (ICIP), pp 1601–1604. https://doi.org/10.1109/ICIP.2010.5652363
Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: IEEE International conference on computer vision (ICCV), pp 543–550. https://doi.org/10.1109/ICCV.2011.6126286
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J Royal Stat Soc Ser B (Stat Methodol) 68(1):49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Zhang LQ, Cichocki A, Amari S (1999) Natural gradient algorithm for blind separation of overdetermined mixture with additive noise. IEEE Signal Process Lett 6(11):293–295. https://doi.org/10.1109/97.796292
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang Q, Li B (2010) Discriminative k-svd for dictionary learning in face recognition. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2691–2698. https://doi.org/10.1109/CVPR.2010.5539989
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286. https://doi.org/10.1198/106186006X113430
Acknowledgments
This publication was made possible using a grant from the Qatar National Research Fund through National Priority Research Program (NPRP) # NPRP 8-140-2-065. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the Qatar National Research Fund or Qatar University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rida, I., Al-Maadeed, N., Al-Maadeed, S. et al. A comprehensive overview of feature representation for biometric recognition. Multimed Tools Appl 79, 4867–4890 (2020). https://doi.org/10.1007/s11042-018-6808-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6808-5