A Comparison of Dimensionality Reduction Techniques in Virtual Screening

Pasupa, Kitsuchart

doi:10.1007/978-3-642-38610-7_28

Kitsuchart Pasupa²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7895))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2215 Accesses
1 Citations

Abstract

Most of the screening methods have always struggled to deal with the high dimensionality of data in virtual screening task. One of the most commonly used techniques to reduce the high dimensional data is principal component analysis (PCA). PCA and its variants have been introduced and re-introduced to solve the problems in particular tasks in real world applications. In this paper, PCA and four variants of it are compared and analyzed together in virtual screening task in particular using fingerprint representation. Fingerprint is one of the most regularly used descriptors in virtual screening task. None of these methods have never been compared and studied together with high dimensional and binary-valued data elsewhere. The results show superiority of the variants of PCA to PCA on the most heterogeneous classes, while the methods are competitive to PCA on the homogeneous classes. Supervised PCA is found to be the best technique and is competitive to Fisher discriminant analysis. It should be noted that Fisher discriminant analysis uses all the provided information while Supervised PCA uses only few components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics. Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by Supervised Principal Components. J. Am. Statist. Assoc. 101, 119–137 (2006)
Article MathSciNet MATH Google Scholar
Yu, S., Yu, K., Tresp, V., Kriegel, H.P., Wu, M.: Supervised Probabilistic Principal Component Analysis. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining, pp. 464–473. ACM Press, New York (2006)
Chapter Google Scholar
de Leeuw, J.: Principal Component Analysis of Binary Data by Iterated Singular Value Decomposition. Comput. Stat. Data An. 50(1), 21–39 (2006)
Article MATH Google Scholar
Schein, A.I., Saul, L.K., Ungar, L.H.: A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic Principal Component Analysis. J. R. Stat. Soc. Ser. B 61, 611–622 (1999)
Article MathSciNet MATH Google Scholar
Collins, M., Dasgupta, S., Schapire, R.E.: A Generalization of Principal Components Analysis to the Exponential Family. In: Advances in Neural Information Processing Systems, pp. 617–624 (2001)
Google Scholar
Mccullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, London (1989)
MATH Google Scholar
Cox, D.R., Snell, E.J.: Analysis of Binary Data, 2nd edn. Chapman & Hall, London (1989)
MATH Google Scholar
Tinterwordspacing MDL Information Systems Inc.: The MDL drug data report database (2006), http://www.mdli.com
Morgan, H.L.: The Generation of a Unique Machine Description for Chemical Structure – A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965)
Article Google Scholar
MathWorks Inc.: Matlab Version 7.10 (2010), http://www.mathworks.com
Siegel, S., Castellian, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, Singapore (1988)
Google Scholar
Kabán, A., Bingham, E., Hirsimäki, T.: Learning to Read Between the Lines: The Aspect Bernoulli Model. In: Proceedings of 4th SIAM International Conference on Data Mining, pp. 462–466. SIAM, Florida (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa

Authors

Kitsuchart Pasupa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Michiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pasupa, K. (2013). A Comparison of Dimensionality Reduction Techniques in Virtual Screening. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-38610-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38609-1
Online ISBN: 978-3-642-38610-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics