Skip to main content
Log in

Megavariate analysis of hierarchical QSAR data

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Multivariate PCA- and PLS-models involving many variables are often difficult to interpret, because plots and lists of loadings, coefficients, VIPs, etc, rapidly become messy and hard to overview. There may then be a strong temptation to eliminate variables to obtain a smaller data set. Such a reduction of variables, however, often removes information and makes the modelling efforts less reliable. Model interpretation may be misleading and predictive power may deteriorate.

A better alternative is usually to partition the variables into blocks of logically related variables and apply hierarchical data analysis. Such blocked data may be analyzed by PCA and PLS. This modelling forms the base-level of the hierarchical modelling set-up. On the base-level in-depth information is extracted for the different blocks. The score vectors formed on the base-level, here called `super variables', may be linked together in new matrices on the top-level. On the top-level superficial relationships between the X- and the Y-data are investigated.

In this paper the basic principles of hierarchical modelling by means of PCA and PLS are reviewed. One objective of the paper is to disseminate this concept to a broader QSAR audience. The hierarchical methods are used to analyze a set of 10 haloalkanes for which K = 30 chemical descriptors and M = 255 biological responses have been gathered. Due to the complexity of the biological data, they are sub-divided in four blocks. All the modelling steps on the base-level and the top-level are reported and the final QSAR model is interpreted thoroughly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jackson, J.E, A User’s Guide to Principal Components, John Wiley, New York, 1991, (ISBN 0-471-62267-2).

    Google Scholar 

  2. Martens, H., and Naes, T., Multivariate Calibration, John Wiley & Sons, NY, 1989, (ISBN 0-471-90979-3).

    Google Scholar 

  3. Wold, S., Esbensen, K., and Geladi, P., Chemom. Intel. Lab. Syst., 2 (1987) 37.

    Google Scholar 

  4. Wold, S., Albano, C., Dunn, W.J., Edlund, U., Esbensen, K., Geladi, P., Hellberg, S., Johansson, E., Lindberg, W., and Sjöström, M., In: Kowalski, B.R. (Ed.) Chemometrics: Mathematics and Statistics in Chemistry, D. Reidel Publishing Company, Dordrecht, Holland, 1984.

    Google Scholar 

  5. Sjöström, M., Wold, S., and Söderström, B., PLS Discriminant Plots, Proceedings of PARC in Practice, Amsterdam, June 19-21, 1985.

  6. Kalivas, J.H., J. Chemom., 13 (1999) 111.

    Google Scholar 

  7. Wold, S., Johansson, E., and Cocchi, M., In Kubinyi, H., (Ed.), 3D-QSAR in Drug Design, Theory, Methods, and Applications, ESCOM Science Publishers, Leiden, 1993, pp. 523–550.

    Google Scholar 

  8. Burnham, A.J., Viveros, R., and MacGregor, J.F., J. Chemom., 10 (1996) 31.

    Google Scholar 

  9. Burnham, A.J., MacGregor, J.F., and Viveros, R., Chemom. Intel. Lab. Syst., 48 (1999) 167.

    Google Scholar 

  10. Eriksson, L., and Earll, M., The 14th European Symposium on Quantitative Structure-Activity Relationships, Bournemouth, UK, September 8-13, 2002.

  11. Eriksson, L., Johansson, E., Kettaneh-Wold, N., and Wold, S., Multi-and Megavariate Data Analysis – Principles and Applications, Umetrics AB, 2001, ISBN 91-973730-1-X.

  12. Berglund, A., De Rosa, M.C., and Wold, S., J. Comp.-Aid. Mol. Des., 11 (1997) 601.

    Google Scholar 

  13. Westerhuis, J., Kourti, T., and MacGregor, J.F., J. Chemom., 12 (1998) 301.

    Google Scholar 

  14. Wold, S., Kettaneh. N., and Tjessem, K., J. Chemom., 10 (1996) 463.

    Google Scholar 

  15. Rännar, S., MacGregor, J.F., and Wold, S., Chemom. Intel. Lab. Syst., 41 (1998) 73.

    Google Scholar 

  16. K. Janné, J. Pettersen, N.-O. Lindberg and T. Lundstedt, J. Chemom., 15 (2001) 203.

    Google Scholar 

  17. Tosato, M.L., Marchini, S., Passerini, L., Pino, A., Eriksson, L., Lindgren, F., Hellberg, S., Jonsson, J., Sjöström, M., Skagerberg, B. and Wold, S., Environ. Toxicol. Chem., 9 (1990) 265.

    Google Scholar 

  18. Gabrielsson, J., Lindberg, N.O., and Lundstedt, T., J. Chemom., 16 (2002) 141.

    Google Scholar 

  19. Linusson, A., Gottfries, J., Lindgren, F., and Wold, S., J. Med. Chem., 43 (2000) 1320.

    Google Scholar 

  20. Giraud, E., Luttman. C., Lavelle, F., Riou, J.F., Mailliet, P., and Laoui, A., J. Med. Chem., 43 (2000) 1807.

    Google Scholar 

  21. Eriksson, L., A Strategy for Ranking Environmentally Occurring Chemicals, Ph.D. Thesis, Umeå University, Umeå, Sweden, 1991.

  22. Eriksson, L., Rännar, S., Sjöström, M. and Hermens, J.L.M., Environmetrics 5 (1994) 197.

    Google Scholar 

  23. Eriksson, L., Jonsson, J., Hellberg, S., Lindgren, F., Skagerberg, B., Sjöström, M., Wold, S. and Berglind, R., Environ. Toxicol. Chem., 9 (1990) 1339.

    Google Scholar 

  24. Lindgren, F., Eriksson, L., Hellberg, S., Jonsson, J., Sjöström, M. and Wold, S., Quant. Struct.-Act. Relat., 10 (1991) 36.

    Google Scholar 

  25. Eriksson, L., Jonsson, J., Hellberg, S., Lindgren, F., Sjöström, M., Wold, S., Sandström, B. and Svensson, I., Environ. Toxicol. Chem., 10 (1991) 585.

    Google Scholar 

  26. Eriksson, L., Hellberg, S., Johansson, J., Jonsson, J., Sjöström, M., Wold, S. and Berglind, R., Acta Chem. Scand. 45 (1991) 935.

    Google Scholar 

  27. Eriksson, L., Jonsson J. and Berglind, R., Environ. Tox. Chem. 12 (1993) 1185.

    Google Scholar 

  28. Eriksson, L., Sjöström, M. and Wold, S., Chemom. Intel. Lab. Syst., 14 (1992) 245.

    Google Scholar 

  29. Eriksson, L., Berglind, R., Larsson, R. and Sjöström, M., J. Env. Sci. Health, A28 (1993) 1123.

    Google Scholar 

  30. Eriksson, L., Sandström, B.E., Tysklind, M. and Wold, S., Quant. Struct.-Act. Relat., 12 (1993) 124.

    Google Scholar 

  31. Eriksson, L., Verboom, H. and Peijnenburg, W., J. Chemom., 10 (1996) 483.

    Google Scholar 

  32. SIMCA-P, version 10, Umetrics AB, www.umetrics.com.

  33. Wold, S., and Dunn, III, W.J., J.Chem. Inf. Comp. Sci., 23 (1983) 6.

    Google Scholar 

  34. Wold, S., Technometrics, 20, (1978) 397.

    Google Scholar 

  35. Wakeling, I.N., and Morris, J.J., J. Chemom., 7 (1993) 291.

    Google Scholar 

  36. Clark, M.C., and Cramer, R.D., Quant. Struct.-Act. Relat., 12 (1993) 137.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eriksson, L., Johansson, E., Lindgren, F. et al. Megavariate analysis of hierarchical QSAR data. J Comput Aided Mol Des 16, 711–726 (2002). https://doi.org/10.1023/A:1022450725545

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022450725545

Navigation