Skip to main content
Log in

Statistical variation in progressive scrambling

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, non-parametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis -- the deprecated predictivity (\(Q_{\rm s}^{\ast^2}\)) and standard error of prediction (SDEP *s ) -- that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (\(Q_{\rm 0}^{\ast^2}\) and SDEP *0 ) and the sensitivity to perturbation (dq 2/dr yy ′ 2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • G.W. Snedecor W.G. Cochran (1989) Statistical Methods EditionNumber10 Iowa State Press Ames, IA

    Google Scholar 

  • H. Martens T. Næs (1989) Multivariate Calibration Wiley Chichester, UK

    Google Scholar 

  • Wold, S., Johansson, E. and Cocchi, M., In Kubinyi, H. (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, The Netherlands, 1993, pp. 523–550.

  • J. Zupan J. Gasteiger (1999) Neural Networks in Chemistry and Drug Design EditionNumber2 Wiley-VCH Weinheim, Germany

    Google Scholar 

  • Wold, S. and Eriksson, L., In van de Waterbeemd, H. (Ed.), Chemometric Methods in Molecular Design, VCH, Weinheim, Germany, 1995, pp. 309–318.

  • A. Tropsha P. Grammatica V.K. Gombar (2003) QSAR Comb. Sci., 22 69

    Google Scholar 

  • A. Golbraikh A. Tropsha (2002) J. Mol. Graph. Model., 20 269

    Google Scholar 

  • R.D. Clark (2003) J. Comput.-Aided Mol. Des., 17 265

    Google Scholar 

  • D.M. Hawkins S.C. Basak D. Mills (2003) J. Chem. Inf. Comput. Sci., 43 579

    Google Scholar 

  • K. Baumann M. von Korff H. Albert (2002) J. Chemom., 16 351

    Google Scholar 

  • D.M. Hawkins (2004) J. Chem. Inf. Comput. Sci., 44 1

    Google Scholar 

  • Heritage, T.W. and Lowis, D.R., In Parrill, A.L. and Reddy, M.R. (Eds.), Rational Drug Design: Novel Methodology and Practical Applications, ACS Symposium Series 719, American Chemical Society, Washington, DC, 1999, pp. 212–225.

  • Clark, R.D., Sprous, D.G. and Leonard, J.M., In Höltje, H.-D. and Sippl, W. (Eds.), Rational Approaches to Drug Design, Prous Science, Barcelona, Spain, 2001, pp. 475–485.

  • D.B. Kireev J.R. Chrétien D.S. Grierson C. Monneret (1997) J. Med. Chem., 40 4257

    Google Scholar 

  • J.M. Luco F.H. Ferretti (1997) J. Chem. Inf. Comput. Sci., 37 392

    Google Scholar 

  • HQSAR™ is distributed by Tripos, Inc., St. Louis, MO; www.tripos.com.

  • R.D. Cramer ParticleIII D.E. Patterson J.D. Bunce (1988) J. Am. Chem. Soc., 110 5959

    Google Scholar 

  • Cramer III, R.D., DePriest, S.A., Patterson, D.E. and Hecht, P., In Kubinyi, H. (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, The Netherlands, 1993, pp. 443–485.

  • P. Chavatte S. Yous C. Marot N. Baurin D. Lesiur (2001) J. Med. Chem., 44 3223

    Google Scholar 

  • H. Voet Particlevan der (1999) J. Chemom., 13 195

    Google Scholar 

  • J.H. Kalivas J.B. Forrester H.A. Seipel (2004) J. Comput.-Aided Mol. Design 18 537

    Google Scholar 

  • In fact, Equation 5 in Ref. 13 includes a typographical error, with sSDEP′ substituted for s.

  • M. Clark R.D. Cramer ParticleIII D.M. Jones D.E. Patterson P.E. Simeroth (1990) Tetrahedron Comput. Methodol., 3 47

    Google Scholar 

  • Advanced CoMFA® and SYBYL® are distributed by Tripos, Inc., St. Louis, MO; www.tripos.com.

  • B.L. Bush R.B. Nachbar ParticleJr. (1993) J. Comput.-Aided Mol. Design, 7 587

    Google Scholar 

  • Given that the most statistically powerful model will always be the one based on all available observations [Refs. 9–11].

  • M. Otto (1999) Chemometrics Wiley-VCH Weinheim, Germany

    Google Scholar 

  • A full factorial design includes two observations for each first-order factor, each of which is a partial replicate of its complement in the descriptor space (see Ref. 27).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert D. Clark.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clark, R.D., Fox, P.C. Statistical variation in progressive scrambling. J Comput Aided Mol Des 18, 563–576 (2004). https://doi.org/10.1007/s10822-004-4077-z

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-004-4077-z

Keywords

Navigation