Abstract
Multivariate PCA- and PLS-models involving many variables are often difficult to interpret, because plots and lists of loadings, coefficients, VIPs, etc, rapidly become messy and hard to overview. There may then be a strong temptation to eliminate variables to obtain a smaller data set. Such a reduction of variables, however, often removes information and makes the modelling efforts less reliable. Model interpretation may be misleading and predictive power may deteriorate.
A better alternative is usually to partition the variables into blocks of logically related variables and apply hierarchical data analysis. Such blocked data may be analyzed by PCA and PLS. This modelling forms the base-level of the hierarchical modelling set-up. On the base-level in-depth information is extracted for the different blocks. The score vectors formed on the base-level, here called `super variables', may be linked together in new matrices on the top-level. On the top-level superficial relationships between the X- and the Y-data are investigated.
In this paper the basic principles of hierarchical modelling by means of PCA and PLS are reviewed. One objective of the paper is to disseminate this concept to a broader QSAR audience. The hierarchical methods are used to analyze a set of 10 haloalkanes for which K = 30 chemical descriptors and M = 255 biological responses have been gathered. Due to the complexity of the biological data, they are sub-divided in four blocks. All the modelling steps on the base-level and the top-level are reported and the final QSAR model is interpreted thoroughly.
Similar content being viewed by others
References
Jackson, J.E, A User’s Guide to Principal Components, John Wiley, New York, 1991, (ISBN 0-471-62267-2).
Martens, H., and Naes, T., Multivariate Calibration, John Wiley & Sons, NY, 1989, (ISBN 0-471-90979-3).
Wold, S., Esbensen, K., and Geladi, P., Chemom. Intel. Lab. Syst., 2 (1987) 37.
Wold, S., Albano, C., Dunn, W.J., Edlund, U., Esbensen, K., Geladi, P., Hellberg, S., Johansson, E., Lindberg, W., and Sjöström, M., In: Kowalski, B.R. (Ed.) Chemometrics: Mathematics and Statistics in Chemistry, D. Reidel Publishing Company, Dordrecht, Holland, 1984.
Sjöström, M., Wold, S., and Söderström, B., PLS Discriminant Plots, Proceedings of PARC in Practice, Amsterdam, June 19-21, 1985.
Kalivas, J.H., J. Chemom., 13 (1999) 111.
Wold, S., Johansson, E., and Cocchi, M., In Kubinyi, H., (Ed.), 3D-QSAR in Drug Design, Theory, Methods, and Applications, ESCOM Science Publishers, Leiden, 1993, pp. 523–550.
Burnham, A.J., Viveros, R., and MacGregor, J.F., J. Chemom., 10 (1996) 31.
Burnham, A.J., MacGregor, J.F., and Viveros, R., Chemom. Intel. Lab. Syst., 48 (1999) 167.
Eriksson, L., and Earll, M., The 14th European Symposium on Quantitative Structure-Activity Relationships, Bournemouth, UK, September 8-13, 2002.
Eriksson, L., Johansson, E., Kettaneh-Wold, N., and Wold, S., Multi-and Megavariate Data Analysis – Principles and Applications, Umetrics AB, 2001, ISBN 91-973730-1-X.
Berglund, A., De Rosa, M.C., and Wold, S., J. Comp.-Aid. Mol. Des., 11 (1997) 601.
Westerhuis, J., Kourti, T., and MacGregor, J.F., J. Chemom., 12 (1998) 301.
Wold, S., Kettaneh. N., and Tjessem, K., J. Chemom., 10 (1996) 463.
Rännar, S., MacGregor, J.F., and Wold, S., Chemom. Intel. Lab. Syst., 41 (1998) 73.
K. Janné, J. Pettersen, N.-O. Lindberg and T. Lundstedt, J. Chemom., 15 (2001) 203.
Tosato, M.L., Marchini, S., Passerini, L., Pino, A., Eriksson, L., Lindgren, F., Hellberg, S., Jonsson, J., Sjöström, M., Skagerberg, B. and Wold, S., Environ. Toxicol. Chem., 9 (1990) 265.
Gabrielsson, J., Lindberg, N.O., and Lundstedt, T., J. Chemom., 16 (2002) 141.
Linusson, A., Gottfries, J., Lindgren, F., and Wold, S., J. Med. Chem., 43 (2000) 1320.
Giraud, E., Luttman. C., Lavelle, F., Riou, J.F., Mailliet, P., and Laoui, A., J. Med. Chem., 43 (2000) 1807.
Eriksson, L., A Strategy for Ranking Environmentally Occurring Chemicals, Ph.D. Thesis, Umeå University, Umeå, Sweden, 1991.
Eriksson, L., Rännar, S., Sjöström, M. and Hermens, J.L.M., Environmetrics 5 (1994) 197.
Eriksson, L., Jonsson, J., Hellberg, S., Lindgren, F., Skagerberg, B., Sjöström, M., Wold, S. and Berglind, R., Environ. Toxicol. Chem., 9 (1990) 1339.
Lindgren, F., Eriksson, L., Hellberg, S., Jonsson, J., Sjöström, M. and Wold, S., Quant. Struct.-Act. Relat., 10 (1991) 36.
Eriksson, L., Jonsson, J., Hellberg, S., Lindgren, F., Sjöström, M., Wold, S., Sandström, B. and Svensson, I., Environ. Toxicol. Chem., 10 (1991) 585.
Eriksson, L., Hellberg, S., Johansson, J., Jonsson, J., Sjöström, M., Wold, S. and Berglind, R., Acta Chem. Scand. 45 (1991) 935.
Eriksson, L., Jonsson J. and Berglind, R., Environ. Tox. Chem. 12 (1993) 1185.
Eriksson, L., Sjöström, M. and Wold, S., Chemom. Intel. Lab. Syst., 14 (1992) 245.
Eriksson, L., Berglind, R., Larsson, R. and Sjöström, M., J. Env. Sci. Health, A28 (1993) 1123.
Eriksson, L., Sandström, B.E., Tysklind, M. and Wold, S., Quant. Struct.-Act. Relat., 12 (1993) 124.
Eriksson, L., Verboom, H. and Peijnenburg, W., J. Chemom., 10 (1996) 483.
SIMCA-P, version 10, Umetrics AB, www.umetrics.com.
Wold, S., and Dunn, III, W.J., J.Chem. Inf. Comp. Sci., 23 (1983) 6.
Wold, S., Technometrics, 20, (1978) 397.
Wakeling, I.N., and Morris, J.J., J. Chemom., 7 (1993) 291.
Clark, M.C., and Cramer, R.D., Quant. Struct.-Act. Relat., 12 (1993) 137.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Eriksson, L., Johansson, E., Lindgren, F. et al. Megavariate analysis of hierarchical QSAR data. J Comput Aided Mol Des 16, 711–726 (2002). https://doi.org/10.1023/A:1022450725545
Issue Date:
DOI: https://doi.org/10.1023/A:1022450725545