Skip to main content
Log in

A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Relevant information extraction and dimensionality reduction of the original input features is an interesting research area in machine learning and data analysis. Logistic regression (LR) is a well-known classification method that has been used widely in many applications of data mining, machine learning, and bioinformatics. However, its performance is affected by the multi-co-linearity among its predictors, and the features’ redundancy. ℓ1-regularizion and features extraction methods are commonly used to enhance the performance of logistic regression under multi-co-linearity and ovefitting problems, and to reduce computational complexity by discarding less relevant or redundant features. These methods include principal component analysis, kernel principal component analysis and independent component analysis. Recently, ℓ1-regularized logistic regression has received much attention as a promising method for features selection in classification tasks. So there is a great need to be compared with these existing methods. In this paper, we assess the performance of the aforementioned feature selection methods on LR and ℓ1-regularized logistic regression using different statistical measures. A variety of performance metrics has been utilized: accuracy, sensitivity, specificity, precision, the area under receiver operating characteristic curve and the receiver operating characteristic analysis. This study is distinct by its inclusion of a comprehensive statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley series in probability and statistics, 2nd edn. Wiley, New York

  2. Menard S (2002) Applied logistic regression analysis, 2nd edn. Sage publications Inc, UK

  3. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago

    Google Scholar 

  4. Ryan TP (2008) Modern regression methods, 2nd edn. Wiley, New York

    Google Scholar 

  5. Brzezinski JR Knafl GJ (1999) Logistic regression modeling for context-based classification. In: Proceedings tenth international workshop on database and expert systems applications 1999, pp 755–759. doi:10.1109/DEXA.1999.795279

  6. Liao JG, Chin K-V (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951

    Google Scholar 

  7. Sartor MA, Leikauf GD, Medvedovic Lrpath M (2008) A logistic regression approach for identifying enriched biological groups in gene expression data. Bioinformatics 25(2):211–217

    Article  Google Scholar 

  8. Asgary MP, Jahandideh S, Abdolmaleki P, Kazemnejad A (2007) Analysis and identification of β-turn types using multinomial logistic regression and artificial neural network. Bioinformatics 23(23):3125–3130

    Google Scholar 

  9. Komarek P (2004) Logistic regression for data mining and high-dimensional classification. Robotics Institute, paper 222. http://repository.cmu.edu/ro-botics/222

  10. Kwak N, Kim C, Kim H (2008) Dimensionality reduction based on ICA for regression problems. Neurocomputing 71(13–15):2596–2603

    Google Scholar 

  11. Wei P, Ma P, Hu Q, Su X, Ma C (2013) Comparative analysis on margin based features selection algorithms. Int J Mach Learn Cybern (IJMLC). doi:10.1007/s13042-013-0164-6

  12. Wainwright M, Ravikumar P, Lafferty J (2007) High-dimensional graphical model selection using ℓ1-regularized logistic regression. To appear in advances in neural information processing systems (NIPS) 19

  13. Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355. doi:10.1093/bioinformatics/btl386

    Article  Google Scholar 

  14. Genkin A, Lewis DD, Madigan D (2007) Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3):291–304

    Article  MathSciNet  Google Scholar 

  15. Wang X, Dong L, Yan J (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505

    Article  Google Scholar 

  16. Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003) A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine, Neurocomputing 55(1–2):321–336

    Google Scholar 

  17. Cai LJ, Zhang JQ, Zongwu CAI, Kian Guan LIM (2006) An empirical study of dimensionality reduction in support vector machine. Neural Network World, pp 177–192

  18. Cao LJ, Chong WK (2002) Feature extraction in support vector machine: a comparison of PCA, XPCA and ICA. In: Proceedings of the 9th international conference on neural information processing 2002, ICONIP ‘02, vol 2, pp 1001–1005. doi:10.1109/ICONIP.2002.1198211

  19. Lerner B, Guterman H, Aladjem M, Dinstein I (1999) A comparative study of neural network based feature extraction paradigms. Pattern Recogn Lett 20(1):7–14

    Google Scholar 

  20. Ekenel HK, Sankur B (2004) Feature selection in the independent component subspace for face recognition. Pattern Recogn Lett 25:1377–1388

    Article  Google Scholar 

  21. Aguilera AM, Escabias M, Valderrama MJ (2006) Using principal components for estimating logistic regression with high-dimensional multi collinear data. Comput Stat Data Anal 50(8):1905–1924

    Article  MATH  MathSciNet  Google Scholar 

  22. Xiang D (2010) The listed company’s financial evaluation based on PCA-logistic regression model. In: Second international conference on multimedia and information technology (MMIT) 2010, vol 2, pp 168–171, 24–25. doi:10.1109/MMIT.2010.148

  23. Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis. J. Biomed Biotechnol l2:155–169

    Google Scholar 

  24. Gao Q-S, Xue F-Z (2011) Applications of the kernel principal component analysis-based logistic regression model on nonlinear association study. J Shandong Univ (health sciences) doi:10.1186/1471-2156-12-75

  25. Villa A, Chanussot J, Jutten C, Benediktsson JA, Moussaoui S (2009) On the use of ICA for hyperspectral image analysis. In: IEEE international, IGARSS 4:IV-97-IV-100 geoscience and remote sensing symposium. doi:10.1109/IGARSS.2009.5417363

  26. Widodo A, Yang B-S (2007) Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst Appl 33(1):241–250

    Article  Google Scholar 

  27. Yu S-N, Chou K-T (2008) Integration of independent component analysis and neural networks for ECG beat classification. Expert Syst Appl 34(4):2841–2846

    Google Scholar 

  28. Liwei F (2010) Independent component analysis for naive classification, PhD thesis, National University of Singapore, Singapore

  29. Deniz O, Castrillon M, Hernandez M (2003) Face recognition using independent component analysis and support vector machines. Pattern Recogn Lett 24:2153–2157

    Article  Google Scholar 

  30. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754

    Article  Google Scholar 

  31. Kim S-J, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for largescale l1-regularized least squares. IEEE J Sel Top Sign Process 1(4):606–617

    Article  Google Scholar 

  32. Jolliffe IT (2002) Principle components analysis, 2nd edn. Springer, Verlag

  33. Escabias M, Aguilera AM, Valderrama MJ (2004) principal component estimation of functional logistic regression: discussion of two different approaches. J Nonparametric Stat 16(3–4):365–384

    Google Scholar 

  34. van der Maaten LJP, Postma EO, van den Herik HJ (2008) Dimensionality reduction: a comparative review. Neurocomputing

  35. Scholkopf B, Burges C, Smola A (eds) (1999) Advances in kernel methods—support vector learning, MIT Press, Cambridge, pp 327–352

  36. Kim KI, Jung K, Kim HJ (2002) Face recognition using kernel principal component analysis. IEEE Signal Process Lett 9(2):40–42. doi:10.1109/9-7.991133

    Article  Google Scholar 

  37. Hoffmann H (2007) Kernel PCA for novelty detection. Pattern Recogn 40(3):863–874

    Article  MATH  Google Scholar 

  38. Tipping ME (2001) Sparse kernel principal component analysis. In: Advances in neural information processing systems 13:633–639

  39. Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis, J Biomed Biotechnol. doi:10.1155/JBB.2005.155

  40. Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York

    Book  Google Scholar 

  41. Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430

    Article  Google Scholar 

  42. Hyvarinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):483–1492

    Article  Google Scholar 

  43. Musa AB (2012) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern. doi: 10.1007/s13042-012-0068-x

  44. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    Google Scholar 

  45. van der Maaten L (2010) Statistical pattern recognition toolbox for Matlab (stprtool) version 2.11, version 0.7.2b

  46. Gavert H, Hurri J, Sarela J, Hyvarinen A (2005) Fast ICA for Matlab 7.x and 6.x, Version 2.5

  47. Koh K, Kim SJ, Boyd S (2009) l1_logreg: A large-scale solver for l1-regularized logistic regression problems. 0.8.2 Available at http://www.stanford.edu/~boyd/l1_logreg/

  48. Maloof M (2002) On machine learning, ROC analysis, and statistical tests of significance. In: Proceedings of the sixteenth international conference on pattern recognition, pp 204–207

  49. Liang G, Zhu X, Zhang C (2012) The effect of varying levels of class distribution on bagging for different algorithms: an empirical study. Int J Mach Learn Cybern (IJMLC). doi:10.1007/s13042-012-0125-5

    Google Scholar 

  50. Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recog (PR) 41(12):3600–3612

    Google Scholar 

Download references

Acknowledgments

This work was supported by a grant from Hebei University, Baoding, Hebei, P. R. China. I wish to thank the PhD students of the departments of computer Sciences and mathematics for their encouragement, useful discussions, and interest. This work is completed in Hebei University during My PhD study period.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdallah Bashir Musa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Musa, A.B. A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int. J. Mach. Learn. & Cyber. 5, 861–873 (2014). https://doi.org/10.1007/s13042-013-0171-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0171-7

Keywords

Navigation