Skip to main content

A Comparison of Dimensionality Reduction Techniques in Virtual Screening

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7895))

Included in the following conference series:

Abstract

Most of the screening methods have always struggled to deal with the high dimensionality of data in virtual screening task. One of the most commonly used techniques to reduce the high dimensional data is principal component analysis (PCA). PCA and its variants have been introduced and re-introduced to solve the problems in particular tasks in real world applications. In this paper, PCA and four variants of it are compared and analyzed together in virtual screening task in particular using fingerprint representation. Fingerprint is one of the most regularly used descriptors in virtual screening task. None of these methods have never been compared and studied together with high dimensional and binary-valued data elsewhere. The results show superiority of the variants of PCA to PCA on the most heterogeneous classes, while the methods are competitive to PCA on the homogeneous classes. Supervised PCA is found to be the best technique and is competitive to Fisher discriminant analysis. It should be noted that Fisher discriminant analysis uses all the provided information while Supervised PCA uses only few components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  2. Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by Supervised Principal Components. J. Am. Statist. Assoc. 101, 119–137 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Yu, S., Yu, K., Tresp, V., Kriegel, H.P., Wu, M.: Supervised Probabilistic Principal Component Analysis. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining, pp. 464–473. ACM Press, New York (2006)

    Chapter  Google Scholar 

  4. de Leeuw, J.: Principal Component Analysis of Binary Data by Iterated Singular Value Decomposition. Comput. Stat. Data An. 50(1), 21–39 (2006)

    Article  MATH  Google Scholar 

  5. Schein, A.I., Saul, L.K., Ungar, L.H.: A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (2003)

    Google Scholar 

  6. Tipping, M.E., Bishop, C.M.: Probabilistic Principal Component Analysis. J. R. Stat. Soc. Ser. B 61, 611–622 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Collins, M., Dasgupta, S., Schapire, R.E.: A Generalization of Principal Components Analysis to the Exponential Family. In: Advances in Neural Information Processing Systems, pp. 617–624 (2001)

    Google Scholar 

  8. Mccullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, London (1989)

    MATH  Google Scholar 

  9. Cox, D.R., Snell, E.J.: Analysis of Binary Data, 2nd edn. Chapman & Hall, London (1989)

    MATH  Google Scholar 

  10. Tinterwordspacing MDL Information Systems Inc.: The MDL drug data report database (2006), http://www.mdli.com

  11. Morgan, H.L.: The Generation of a Unique Machine Description for Chemical Structure – A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965)

    Article  Google Scholar 

  12. MathWorks Inc.: Matlab Version 7.10 (2010), http://www.mathworks.com

  13. Siegel, S., Castellian, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, Singapore (1988)

    Google Scholar 

  14. Kabán, A., Bingham, E., Hirsimäki, T.: Learning to Read Between the Lines: The Aspect Bernoulli Model. In: Proceedings of 4th SIAM International Conference on Data Mining, pp. 462–466. SIAM, Florida (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pasupa, K. (2013). A Comparison of Dimensionality Reduction Techniques in Virtual Screening. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38610-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38609-1

  • Online ISBN: 978-3-642-38610-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics