Abstract
Non-linear effects in multi-linear structure–property (QSPR) models are sometimes included by using descriptors transformed by mathematical functions such as the square root or logarithm. Less commonly, products of two descriptors are used to account for cross dependencies. As described here, simple division of descriptors by chemical sample size (e.g. molecular weight, length, area or volume) creates size-intensive descriptors (alternatively, intrinsic descriptors) that are independent of the size of the chemical sample described, weakly correlated with the original descriptor, and important contributors to the best QSPR models. In our automated QSPRs, size-intensive descriptors in competition with their extensive descriptors are frequently selected as the best descriptors in the models with the highest r 2. Examples of QSPR models that use size-intensive descriptors are given, the lack of correlation of descriptors with their size-intensive version is demonstrated, and their physical significance is discussed.



Similar content being viewed by others
Notes
A trivial example of a useless size-intensive descriptor is molecular weight divided by molecular weight which is intensive, but also devoid of all information.
References
Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH
Karleson M (2000) Molecular descriptors in QSAR/QSPR. Wiley-Interscience, New York
Stanton DT, Jurs PC (1990) Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal Chem 62:2323–2329
Kier LB (1986) Molecular connectivity in structure-activity analysis. Wiley, New York
Kier LB, Hall L (1976) Molecular connectivity in chemistry and drug research. Academic Press, New York
Kier LB (1999) Molecular structure description: the electrotopological state. Academic Press, New York
Stuper AJ, Jurs PC (1976) J Chem Inf Comput Sci 2:99
Jurs PC, Chou JT, Yuan M (1979) In: Olson RC, Christoffersen RE (eds) Computer-assisted drug design. American Chemical Society, Washington, DC, pp 103–129
CODESSA, www.codessa-pro.com/descriptors/. Accessed 13 Aug 2007
Purvis GD III (1994) The chemical sample: a fundamental object for molecular modeling. J Chem Inf Comput Sci 34:17–21
Katritzky AR, Lobanov VS, Karelson M (1998) Normal boiling points for organic compounds: correlation and prediction by a quantitative structure-property relationship. J Chem Inf Comput Sci 38:28–41
Katritzky AR, Mu L, Lobanov VS, Karelson M (1996) Correlation of boiling points with molecular structure. 1. A training set of 298 diverse organics and test set of 9 simple inorganics. J Phys Chem 100:10400–10407
Bartlett RJ, Purvis GD (1978) Int J Quantum Chem 14:561
Bartlett RJ, Purvis GD III (1980) Molecular applications of coupled cluster and many-body perturbation methods. Phys Scr 21:255–265
Wessel MD, Jurs PC, Tolan JW, Muskal SM (1998) Prediction of human intestinal absorption of drug compounds from molecular structure. J Chem Inf Comput Sci 38:726–735
Hou TJ, Xu XJ (2003) ADME: evaluation in drug discovery. 3. modeling blood-brain barrier partitioning using simple molecular descriptors. J Chem Inf Comput Sci 43:2137–2152
CAChe Worksystem Pro 6.1, Fujitsu Computer Systems, Beaverton, OR, 97007, (2007)
Randic M (1975) J Am Chem Soc 97:6606–6615
Guha R (2007) Chemical informatics functionality in R. J Stat Softw 18(5), supplemental material, http://www.jstatsoft.org/. Accessed 20 July 2007
Stanton DT, Jurs PC, Hicks MG (1991) Computer-assisted prediction of normal boiling points of furans, tetrahydrofurans, and thophenes. J Chem Inf Comput Sci 31:301–310
Egolf LM, Wessel MD, Jurs PC (1994) Prediction of boiling points and critical temperatures of industrially important organic compounds from molecular structure. J Chem Inf Comput Sci 34:947–956
Goll E, Jurs P (1999) Prediction of the normal boiling points of organic compounds from molecular structures with a computational neural network model. J Chem Inf Comput Sci 39(6):974–983
Lowell H Hall LH, Story CT (1996) Boiling point and critical temperature of a heterogeneous data set: QSAR with atom type electrotopological state indices using artificial neural networks. J Chem Inf Comput Sci 36:1004–1014
Liang C, Gallagher D (1997) Prediction of physical & chemical properties by quantitative structure-property relationships: water solubility prediction. Am Lab March:34–40
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Purvis, G.D. Size-intensive descriptors. J Comput Aided Mol Des 22, 461–468 (2008). https://doi.org/10.1007/s10822-008-9209-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9209-4