Abstract
Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine learning, and bioinformatics. However, its performance is affected by the multi-co-linearity among its predictors, and the features’ redundancy. ℓ1-regularizion and features extraction methods are commonly used to enhance the performance of logistic regression under multi-co-linearity and ovefitting problems, and to reduce computational complexity by discarding less relevant or redundant features. These methods include principal component analysis, kernel principal component analysis and independent component analysis. Recently, ℓ1-regularized logistic regression has received much attention as a promising method for features selection in classification tasks. So there is a great need to be compared with these existing methods. In this paper, we assess the performance of the aforementioned feature selection methods on LR and ℓ1-regularized logistic regression using different statistical measures. A variety of performance metrics has been utilized: accuracy, sensitivity, specificity, precision, the area under receiver operating characteristic curve and the receiver operating characteristic analysis. This study is distinct by its inclusion of a comprehensive statistical analysis.
Similar content being viewed by others
References
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley series in probability and statistics, 2nd edn. Wiley, New York
Menard S (2002) Applied logistic regression analysis, 2nd edn. Sage publications Inc, UK
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago
Ryan TP (2008) Modern regression methods, 2nd edn. Wiley, New York
Brzezinski JR Knafl GJ (1999) Logistic regression modeling for context-based classification. In: Proceedings tenth international workshop on database and expert systems applications 1999, pp 755–759. doi:10.1109/DEXA.1999.795279
Liao JG, Chin K-V (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951
Sartor MA, Leikauf GD, Medvedovic Lrpath M (2008) A logistic regression approach for identifying enriched biological groups in gene expression data. Bioinformatics 25(2):211–217
Asgary MP, Jahandideh S, Abdolmaleki P, Kazemnejad A (2007) Analysis and identification of β-turn types using multinomial logistic regression and artificial neural network. Bioinformatics 23(23):3125–3130
Komarek P (2004) Logistic regression for data mining and high-dimensional classification. Robotics Institute, paper 222. http://repository.cmu.edu/ro-botics/222
Kwak N, Kim C, Kim H (2008) Dimensionality reduction based on ICA for regression problems. Neurocomputing 71(13–15):2596–2603
Wei P, Ma P, Hu Q, Su X, Ma C (2013) Comparative analysis on margin based features selection algorithms. Int J Mach Learn Cybern (IJMLC). doi:10.1007/s13042-013-0164-6
Wainwright M, Ravikumar P, Lafferty J (2007) High-dimensional graphical model selection using ℓ1-regularized logistic regression. To appear in advances in neural information processing systems (NIPS) 19
Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355. doi:10.1093/bioinformatics/btl386
Genkin A, Lewis DD, Madigan D (2007) Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3):291–304
Wang X, Dong L, Yan J (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003) A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine, Neurocomputing 55(1–2):321–336
Cai LJ, Zhang JQ, Zongwu CAI, Kian Guan LIM (2006) An empirical study of dimensionality reduction in support vector machine. Neural Network World, pp 177–192
Cao LJ, Chong WK (2002) Feature extraction in support vector machine: a comparison of PCA, XPCA and ICA. In: Proceedings of the 9th international conference on neural information processing 2002, ICONIP ‘02, vol 2, pp 1001–1005. doi:10.1109/ICONIP.2002.1198211
Lerner B, Guterman H, Aladjem M, Dinstein I (1999) A comparative study of neural network based feature extraction paradigms. Pattern Recogn Lett 20(1):7–14
Ekenel HK, Sankur B (2004) Feature selection in the independent component subspace for face recognition. Pattern Recogn Lett 25:1377–1388
Aguilera AM, Escabias M, Valderrama MJ (2006) Using principal components for estimating logistic regression with high-dimensional multi collinear data. Comput Stat Data Anal 50(8):1905–1924
Xiang D (2010) The listed company’s financial evaluation based on PCA-logistic regression model. In: Second international conference on multimedia and information technology (MMIT) 2010, vol 2, pp 168–171, 24–25. doi:10.1109/MMIT.2010.148
Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis. J. Biomed Biotechnol l2:155–169
Gao Q-S, Xue F-Z (2011) Applications of the kernel principal component analysis-based logistic regression model on nonlinear association study. J Shandong Univ (health sciences) doi:10.1186/1471-2156-12-75
Villa A, Chanussot J, Jutten C, Benediktsson JA, Moussaoui S (2009) On the use of ICA for hyperspectral image analysis. In: IEEE international, IGARSS 4:IV-97-IV-100 geoscience and remote sensing symposium. doi:10.1109/IGARSS.2009.5417363
Widodo A, Yang B-S (2007) Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst Appl 33(1):241–250
Yu S-N, Chou K-T (2008) Integration of independent component analysis and neural networks for ECG beat classification. Expert Syst Appl 34(4):2841–2846
Liwei F (2010) Independent component analysis for naive classification, PhD thesis, National University of Singapore, Singapore
Deniz O, Castrillon M, Hernandez M (2003) Face recognition using independent component analysis and support vector machines. Pattern Recogn Lett 24:2153–2157
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754
Kim S-J, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for largescale l1-regularized least squares. IEEE J Sel Top Sign Process 1(4):606–617
Jolliffe IT (2002) Principle components analysis, 2nd edn. Springer, Verlag
Escabias M, Aguilera AM, Valderrama MJ (2004) principal component estimation of functional logistic regression: discussion of two different approaches. J Nonparametric Stat 16(3–4):365–384
van der Maaten LJP, Postma EO, van den Herik HJ (2008) Dimensionality reduction: a comparative review. Neurocomputing
Scholkopf B, Burges C, Smola A (eds) (1999) Advances in kernel methods—support vector learning, MIT Press, Cambridge, pp 327–352
Kim KI, Jung K, Kim HJ (2002) Face recognition using kernel principal component analysis. IEEE Signal Process Lett 9(2):40–42. doi:10.1109/9-7.991133
Hoffmann H (2007) Kernel PCA for novelty detection. Pattern Recogn 40(3):863–874
Tipping ME (2001) Sparse kernel principal component analysis. In: Advances in neural information processing systems 13:633–639
Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis, J Biomed Biotechnol. doi:10.1155/JBB.2005.155
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430
Hyvarinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):483–1492
Musa AB (2012) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern. doi: 10.1007/s13042-012-0068-x
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
van der Maaten L (2010) Statistical pattern recognition toolbox for Matlab (stprtool) version 2.11, version 0.7.2b
Gavert H, Hurri J, Sarela J, Hyvarinen A (2005) Fast ICA for Matlab 7.x and 6.x, Version 2.5
Koh K, Kim SJ, Boyd S (2009) l1_logreg: A large-scale solver for l1-regularized logistic regression problems. 0.8.2 Available at http://www.stanford.edu/~boyd/l1_logreg/
Maloof M (2002) On machine learning, ROC analysis, and statistical tests of significance. In: Proceedings of the sixteenth international conference on pattern recognition, pp 204–207
Liang G, Zhu X, Zhang C (2012) The effect of varying levels of class distribution on bagging for different algorithms: an empirical study. Int J Mach Learn Cybern (IJMLC). doi:10.1007/s13042-012-0125-5
Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recog (PR) 41(12):3600–3612
Acknowledgments
This work was supported by a grant from Hebei University, Baoding, Hebei, P. R. China. I wish to thank the PhD students of the departments of computer Sciences and mathematics for their encouragement, useful discussions, and interest. This work is completed in Hebei University during My PhD study period.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Musa, A.B. A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int. J. Mach. Learn. & Cyber. 5, 861–873 (2014). https://doi.org/10.1007/s13042-013-0171-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-013-0171-7