Skip to main content
Log in

Parallel regressions for variable selection using GPU

  • Published:
Computing Aims and scope Submit manuscript

Abstract

This paper proposes a parallel regression formulation to reduce the computational time of variable selection algorithms. The proposed strategy can be used for several forward algorithms in order to select uncorrelated variables that contribute for a better predictive capability of the model. Our demonstration of the proposed method include the use of Successive Projections Algorithm (SPA), which is an iterative forward technique that minimizes multicollinearity. SPA is traditionally used for variable selection in the context of multivariate calibration. Nevertheless, due to the need of calculating an inverse matrix for each insertion of a new variable in the model calibration, the computational performance of the algorithm may become impractical as the matrix size increases. Based on such limitation, this paper proposes a new strategy called Parallel Regressions (PR). PR strategy was implemented in the SPA to avoid the matrix inverse calculation of original SPA in order to increase the computational performance of the algorithm. It uses a parallel computing platform called Compute Unified Device Architecture (CUDA) in order to exploit a Graphics Processing Unit, and was called SPA-PR-CUDA. For this purpose, we used a case study involving a large data set of spectral variables. The results obtained with SPA-PR-CUDA presented 37\(\times \) times better performance compared to a traditional SPA implementation. Additionally, when compared to traditional algorithms we demonstrated that SPA-PR-CUDA may be a more viable choice for obtaining a model with a reduced prediction error value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. SPA is a variable-selection technique that has attracted increasing interest in the community in the past 10 years [14].

  2. Kernel is a function that is performed on the device by each thread. Threads are organized into blocks [23].

References

  1. Lavine BK, Workman J (2005) Chemometrics: past, present, and future. ACS symposium series, vol 894. Oxford University Press, Vandœuvre-lès-Nancy, pp 1–13. doi:10.1021/bk-2005-0894.ch001

  2. Chau F-T, Liang Y-Z, Gao J, Shao X-G (2004) Chemometrics from basic to wavelet transform. Willey, Hoboken

    Google Scholar 

  3. Tang Y, Liang Y, Fang K-T (2003) Data mining in chemometrics: sub-structures learning via peak combinations searching in mass spectra. J Data Sci 1:481–496

    Google Scholar 

  4. Lawson CL, Hanson RJ (1974) Solving least squares problems. SIAM, Philadelphia

    MATH  Google Scholar 

  5. Cortina JM (1994) Interaction, nonlinearity, and multicollinearity: implications for multiple regression. J Manag 19:915–922

    Google Scholar 

  6. Montgomery DC, Peck EA, Vining GG (2012) Introduction to Linear Regression Analysis, Wiley Series in Probability and Statistics

  7. Estienne F (2003) New trends in multivariate analysis and calibration, Laboratorium voor Farmaceutische en Biomedische Analyse

  8. Skoog DA, Leary JJ (2002) Princpios de Anlise Instrumental. Artmed Editora S.A, Porto Alegre

    Google Scholar 

  9. Martens H (1991) Multivariate calibration. Wiley, New York

    MATH  Google Scholar 

  10. Soares AS, Lima TW, Soares FA, Coelho CJ, Delbem AC (2014) Mutation-based compact genetic algorithm for spectroscopy variable selection in determining protein concentration in wheat grain. Electron Lett 50:932–934

    Article  Google Scholar 

  11. Paula LCM, Soares AS, Soares TW, Martins WS, Filho ARG, Coelho CJ (2013) Partial parallelization of the successive projections algorithm using compute unified device architecture. In: International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA, p 737–741

  12. Paula LCM, Soares AS, Soares TW, Delbem ACB, Coelho CJ, Filho ARG (2014) Parallelization of a modified firefly algorithm using GPU for variable selection in a multivariate calibration problem. Int J Natl Comput Res 4:31–42

    Article  Google Scholar 

  13. Soares AS, Galvão Filho AR, Galvão RKH, Araújo MCU (2010) Multi-core computation in chemometrics: case studies of voltammetric and NIR spectrometric analyses. J Braz Chem Soc 21:1626–1634

    Article  Google Scholar 

  14. Soares SFC, Gomes AA, Araújo MC, Galvão RK, Filho ARG (2013) The successive projections algorithm. TrAC Trends Anal Chem 42:84–98

    Article  Google Scholar 

  15. Marreto PD, Zimer AM, Faria RC, Mascaro LH, Pereira EC, Fragoso WD, Lemos SG (2014) Multivariate linear regression with variable selection by a successive projections algorithm applied to the analysis of anodic stripping voltammetry data. Electrochim Acta 127:6878

    Article  Google Scholar 

  16. Moreira ED, Pontes MJ, Galvão RK, Araújo MC (2009) Near infrared reflectance spectrometry classification of cigarettes using the successive projections algorithm for variable selection. Talanta 79:5. doi:10.1016/j.talanta.2009.05.031

    Article  Google Scholar 

  17. Tang G, Huang Y, Tian K, Song X, Yan H, Hu J, Xionga Y, Min S (2014) A new spectral variable selection pattern using competitive adaptive reweighted sampling combined with successive projections algorithm. Analyst. doi:10.1039/C4AN00837E

    Google Scholar 

  18. Pontes MJC, Galvo RKH, Araújo MCU, Moreira PNT, Neto ODP, José GE, Saldanha TCB (2005) The successive projections algorithm for spectral variable selection in classification problems. Chemom Intell Lab Syst 78(12):1118

    Google Scholar 

  19. Soares AS, Galvão Filho AR, Galvão RKH, Araújo MCU (2010) Improving the computational efficiency of the successive projections algorithm by using a sequential regression implementation: a case study involving NIR spectrometric analysis of wheat samples. J Braz Chem Soc 21:760–763

    Article  Google Scholar 

  20. Makridakis SG, Hibon M (1995) Evaluating accuracy (or error) measures. INSEAD working paper. INSEAD, Fontainebleau

  21. Araújo MCU, Saldanha TC, Galvão RK, Yoneyama T (2001) The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom Intell Lab Syst 57:65–73

    Article  Google Scholar 

  22. Gusnanto A, Pawitan Y, Huang J (2003) Variable selection in random calibration of near-infrared instruments: ridge regression and partial least squares regression settings. J Chemom 17:174–185

    Article  Google Scholar 

  23. \(CUDA^{TM}\) (2013) NVIDIA CUDA C Programming Guide, NVIDIA Corporation, 5.0

  24. Bradstreet RB (1965) The Kjeldahl method for organic nitrogen. Academic Press Inc., New York

    Google Scholar 

  25. Paula LCM, Soares AS, Delbem ACB, Lima TW, Coelho CJ, Filho ARG (2014) A GPU-based Implementation of the firefly algorithm for variable selection in multivariate calibration problems. Plos One 9:e114145

    Article  Google Scholar 

Download references

Acknowledgments

Authors thank to the research agencies CAPES, CNPq, FAPESP and FAPEG for the support provided to this study. This is also a contribution of the National Institute of Advanced Analytical Science and Technology (INCTAA) (CNPq-proc. no. 573894/2008-6 e FAPESP proc. no. 2008/57808-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anderson S. Soares.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Paula, L.C.M., Soares, A.S., Soares, T.W.L. et al. Parallel regressions for variable selection using GPU. Computing 99, 219–234 (2017). https://doi.org/10.1007/s00607-016-0487-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-016-0487-8

Keywords

Mathematics Subject Classification

Navigation