Parallel Feature Selection for Regularized Least-Squares

Okser, Sebastian; Airola, Antti; Aittokallio, Tero; Salakoski, Tapio; Pahikkala, Tapio

doi:10.1007/978-3-642-36803-5_20

Sebastian Okser^17,18,
Antti Airola^17,18,
Tero Aittokallio^17,19,20,
Tapio Salakoski^17,18 &
…
Tapio Pahikkala^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7782))

Included in the following conference series:

International Workshop on Applied Parallel Computing

2574 Accesses

Abstract

This paper introduces a parallel version of the machine learning based feature selection algorithm known as greedy regularized least-squares (RLS). The aim of such machine learning methods is to develop accurate predictive models on complex datasets. Greedy RLS is an efficient implementation of the greedy forward feature selection procedure using regularized least-squares, capable of efficiently selecting the most predictive features from large datasets. It has previously been shown, through the use of matrix algebra shortcuts, to perform feature selection in only a fraction of the time required by traditional implementations. In this paper, the algorithm is adapted to allow for efficient parallel-based feature selection in order to scale the method to run on modern clusters. To demonstrate its effectiveness in practice, we implemented it on a sample genome-wide association study, as well as a number of other high-dimensional datasets, scaling the method to up to 128 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Pahikkala, T., Airola, A., Salakoski, T.: Speeding up greedy forward selection for regularized least-squares. In: Draghici, S., Khoshgoftaar, T.M., Palade, V., Pedrycz, W., Wani, M.A., Zhu, X. (eds.) Proceedings of The Ninth International Conference on Machine Learning and Applications (ICMLA 2010). IEEE Computer Society (2010)
Google Scholar
Pahikkala, T., Okser, S., Airola, A., Salakoski, T., Aittokallio, T.: Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms for Molecular Biology 7(1), 11 (2012)
Article Google Scholar
He, Q., Lin, D.Y.: A variable selection method for genome-wide association studies. Bioinformatics 27(1), 1–8 (2011)
Article Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article MATH Google Scholar
Lachenbruch, P.A.: An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. Biometrics 23(4), 639–645 (1967)
Article MathSciNet Google Scholar
Elisseeff, A., Pontil, M.: Leave-one-out error and stability of learning algorithms with applications. In: Suykens, J., Horvath, G., Basu, S., Micchelli, C., Vandewalle, J. (eds.) Advances in Learning Theory: Methods, Models and Applications. NATO Science Series III: Computer and Systems Sciences, vol. 190, pp. 111–130. IOS Press, Amsterdam (2003)
Google Scholar
Burton, P.R., Clayton, D.G., Cardon, L.R., Craddock, N., Deloukas, P., Duncanson, A., Kwiatkowski, D.P., McCarthy, M.I., Ouwehand, W.H., Samani, N.J., Todd, J.A., Donnelly, P., et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
Article Google Scholar
Evans, D.M., Visscher, P.M., Wray, N.R.: Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Human Molecular Genetics 18(18), 3525–3531 (2009)
Article Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Google Scholar
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbsnp: the ncbi database of genetic variation. Nucleic Acids Research 29(1), 308–311 (2001)
Article Google Scholar
Nejentsev, S., Howson, J.M.M., Walker, N.M., Szeszko, J., Field, S.F., Stevens, H.E., Reynolds, P., Hardy, M., King, E., Masters, J., Hulme, J., Maier, L.M., Smyth, D., Bailey, R., Cooper, J.D., Ribas, G., Campbell, D.R., Clayton, D.G., Todd, J.A.: Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A. Nature 450(7171), 887–892 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TUCS - Turku Centre for Computer Science, Finland
Sebastian Okser, Antti Airola, Tero Aittokallio, Tapio Salakoski & Tapio Pahikkala
Department of Information Technology, University of Turku, Finland
Sebastian Okser, Antti Airola, Tapio Salakoski & Tapio Pahikkala
Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
Tero Aittokallio
Department of Mathematics, University of Turku, Finland
Tero Aittokallio

Authors

Sebastian Okser
View author publications
You can also search for this author in PubMed Google Scholar
Antti Airola
View author publications
You can also search for this author in PubMed Google Scholar
Tero Aittokallio
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CSC - IT Center for Science, 02101, Espoo, Finland
Pekka Manninen & Per Öster &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okser, S., Airola, A., Aittokallio, T., Salakoski, T., Pahikkala, T. (2013). Parallel Feature Selection for Regularized Least-Squares. In: Manninen, P., Öster, P. (eds) Applied Parallel and Scientific Computing. PARA 2012. Lecture Notes in Computer Science, vol 7782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36803-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-36803-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36802-8
Online ISBN: 978-3-642-36803-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics