Abstract
Current practice in Quantitative Structure Activity Relationship (QSAR) methods usually involves generating a great number of chemical descriptors and then cutting them back with variable selection techniques. Variable selection is an effective method to reduce the dimensionality but may discard some valuable information. This paper introduces Locally Linear Embedding (LLE), a local non-linear dimensionality reduction technique, that can statistically discover a low-dimensional representation of the chemical data. LLE is shown to create more stable representations than other non-linear dimensionality reduction algorithms, and to be capable of capturing non-linearity in chemical data.
Similar content being viewed by others
References
Hansch, C. and Leo, A., Exploring QSAR: Fundamentals and Applications in Chemistry and Biology. ACS Professional Reference Book, 1995.
L. Saul S. Roweis (2002) J. Mach. Learn. Res., 4 119
I.T. Jolliffe (2002) Principal Component Analysis Springer New York
J. Tenenbaum V. Silva Particlede J. Langford (2000) Science 290 IssueID5500 2319
S. Roweis L. Saul (2000) Science 290 IssueID5500 2323
B. Schölkopf A. Smola K.-R. Müller (1998) Neural Comput., 10 1299
Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Le Roux, N. and Ouimet M., Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Thrun, S., Saul, L. and Schölkopf, B. (Eds), Advances in Neural Information Processing Systems 16, 2004.
Williams, C.K.I. and Seeger, M., Using the Nyström method to speed up kernel machines. In Leen, T., Dietterich, T. and Tresp, V. (Eds), Advances in Neural Information Processing Systems 13. Cambridge, MA, 2001, pp. 682–688.
Shawe-Taylor, J., Cristianini, N. and Kandola., J., On the concentration of spectral properties. In Dietterich, T., Becker, S. and Ghahramani, Z. (Eds), Advances in Neural Information Processing Systems 14, 2002.
Shawe-Taylor, J. and Williams, C., The stability of kernel principal components analysis and its relation to the process eigenspectrum. In Becker, S., Thrun, S. and Obermayer, K. (Eds), Advances in Neural Information Processing Systems 15, 2003.
Zwald, L., Bousquet, O. and Blanchard, G., Statistical Properties of Kernel Principal Component Analysis. In Shawe-Taylor, J. and Singer, Y. (Eds), Learning Theory: Proceedings of 17th Annual Conference on Learning theory, COLT 2004, Banff, Canada, July 1–4. Vol. 3120 of Lecture Notes in Computer Science. Springer, Berlin, Germany, 2004, pp. 594–608.
T. Cox M. Cox (1994) Multidimensional Scaling Chapman & Hall London
B. Schölkopf A. Smola K.-R. Müller (1996) Nonlinear Component Analysis as a Kernel Eigenvalue Problem Max Planck Institute for Biological Cybernetics Tübingen, Germany
B. Schölkopf C.J.C. Burges A.J. Smola (1999) Advances in Kernel Methods–Support Vector Learning MIT Press Cambridge, MA
D. Rumelhart G. Hinton R. Williams (1986) Nature 323 533
I. Frank J. Friedman (1993) Technometrics 35 IssueID2 109
P. Harrison G. Barlin L. Davies S. Ireland P. Matyus M. Wong (1996) Eur. J. Med. Chem. 31 651
F. Burden M. Ford D. Whitley D. Winkler (2000) J. Chem. Inf. Comput. Sci. 40 1423
B. Orlek F. Blaney F. Brown M. Clark M. Hadley J. Hatcher G. Riley H. Rosenberg H. Wadsworth P. Wyman (1991) J. Med. Chem. 34 2726
Santavy, M. and Labute, P., SVL: The Scientific Vector Language, 1997. www.chemcomp.com/Journal_of_CCG/Features/svl.htm.
T. Halgren (1996) J. Comput. Chem. 17 490
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
L’Heureux, PJ., Carreau, J., Bengio, Y. et al. Locally Linear Embedding for dimensionality reduction in QSAR. J Comput Aided Mol Des 18, 475–482 (2004). https://doi.org/10.1007/s10822-004-5319-9
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-5319-9