Elsevier

Computers & Chemistry

Volume 24, Issue 2, March 2000, Pages 171-179
Computers & Chemistry

Use of artificial neural networks to predict the gas chromatographic retention index data of alkylbenzenes on carbowax-20M

https://doi.org/10.1016/S0097-8485(99)00058-3Get rights and content

Abstract

Quantitative structure–activity relationships (QSARs) quantify the connection between the structure and properties of molecules and allow the prediction of properties from structural parameters. Models of relationships between structure and retention index of alkylbenzenes were constructed by means of a multilayer neural network using extended delta-bar-delta (EDBD) algorithms. The 165 group data belong to 129 alkylbenzenes at different temperatures on carbowax-20M. We proposed a new method to describe the structure of the alkylbenzene with a simple set of six numeric code depending on its molecular formula. A set of six numbers and the temperature were used as input parameters to predict the retention indices. The performance of different order of structural coding was investigated. The networks’ architecture and the learning times were optimized. The optimum ANNs could give excellent prediction results. In addition, the multiple linear regression (MLR) and nonlinear multivariate regression were applied. We have shown in our studies that, based on the structural numeric codes, ANNs give more accurate predictions of retention index data of alkykbenzenes than regression analysis.

Introduction

Quantitative structure–activity relationships (QSARs) quantify the connection between the structure and properties of molecules and allow the prediction of properties from structural parameters. The retention of a solute in gas–liquid chromatography (GLC) was determined by different kinds of interaction between the stationary phase and the solute molecules. The kind of interaction depends on the structure and properties of the stationary phase and the solute molecules. The retention index of a solute is an important parameter in the study of the QSARs (Kaliszan, 1987). Typically, either empirical physicochemical parameters or non-empirical structural descriptor parameters have been used in order to obtain quantitative or semi-quantitative relationships which allow the prediction of the retention behavior of an individual solute of a given system. A number of theoretic and empirical models which predict the retention data of the solutes have been published, a simple physical parameter often being employed is the solubility parameter (Zhang and Hu, 1992, Sun et al., 1994).

One of the central problems of QSAR studies is how to represent the molecular structure. To make possible the study of correlation between numeric values of physico-chemical properties of chemical compounds and their structure, it is necessary to find a way of transforming the molecular structure into mathematical one, i.e. into a computer-readable form (Cherqaoui and Villemin, 1994, Kranz et al., 1996). Considering each of them has a same part of phenyl, in this paper, we decided to use a new method to describe the structure of each of the alkylbenzenes by a set of six numeric codes depending on its substituents. Compared with other methods (Pompe et al., 1997, Sutter et al., 1997), the method of describing the compounds of alkylbenzenes used in paper is simple effective, and easy to use because it doesn’t include complicated quantum chemical and mathematical calculations.

Once the uniform structure representation has been defined, artificial neural networks are a promising model for the solving quantitative structure–activity relationships (QSARs) problems, and particularly useful in cases where it is difficult to specific an exact mathematical model which describes a specific structure–activity relationship. It has already shown that ANNs can be used for divers applications such as prediction of copolymer composition drift (Ni and Hunkeler, 1997), prediction of chemical shift in 13C NMR spectra (Ivanciuc et al., 1996), or multivariate calibration (Chan et al., 1997).

Prediction of gas chromatographic retention index data of simple organic compounds has been reported (Bruchmann et al., 1993, Pompe et al., 1997, Sutter et al., 1997). Compared with other persons’ work, the input parameters applied in this paper are more straightforward and easy.

We have also done some work to predict the retention index of compounds by their physico-chemical parameters (Yan et al., 1998, Zhang et al., 1999). Compared with other work we have made previously, the aim of this work is to predict the retention index data of alkylbenzenes at different temperatures by a simple set of six numeric code determined by molecular formula. In this work, no physico-chemical descriptors were used so that retention index could be predicted from only the structure of new molecules. A set of six numbers and the temperature were used as input parameters to predict the retention indices.

The topological structural encoding also proved its effectiveness by both linear and nonlinear multivariable regression analysis.

Four models have been investigated, namely two ANNs models with different orders of structural coding, multilinear regression and nonlinear multivariable regression models. The predictions of ANNs models are compared with other two models, the results show that ANNs models are better than regression analysis for the calculation of retention index data of alkylbenzenes.

Section snippets

Brief description of neural networks

A neural network model is composed of a large number of simple processing elements (PE) or neuron nodes organized into a sequence of layers (see Fig. 1).

The first layer is the input layer with one node for each variable or feature of the data. The last layer is the output layer consisting of one node for each variable to be investigated. In between are a series of one or more hidden layer(s) consisting of a number of nodes, which are responsible for learning. Nodes in any layer are fully or

ANNs structure optimization

The neural network has seven input nodes (a set of six numeric number and the temperature T), one hidden layer of y nodes, and a single output node. Such an ANN may be designed as 7-y-1 net to indicate the number of nodes in input, hidden and output layers, respectively. The neural network methodology has several empirically determined parameters. These include:

  • when to stop training (i.e. the number of epochs or the convergence criterion)

  • the number of hidden units

  • learning rate and momentum term

Comparison with different order of structural coding modeling of ANN

In the above study, the order of structural coding to every compound is according to its standard nomination. For the coding order is very important to the prediction effect, the different coding order was examined. In this modeling, the structural coding starts from the largest substituent to the smallest. For instance, to the compound of 1,4-dimethyl-2-sec-butylbenzene in Fig. 2, the numeric code is 3.707, 1, 0, 0, 1, 0. According to this coding rule, all the compounds in Table 1 were encoded

Conclusion

We have presented ANNs QSAR model for the estimation of the gas chromatographic retention data at different temperatures on carbowax-20M. The description vector used for the coding of compounds of alkylbenzenes is very simple. The encoding scheme used here may be applied to other class of chemical compounds if they have the same parent part and the identical type of substitutes that could be easily numerical.

Although predicted results are slightly worse than reported in previous studies where

Acknowledgements

This project was supported by the National Nature Science Foundation of China and the Major Items Foundation of Planning Committee of GanSu Province.

References (17)

  • A. Bruchmann et al.

    Anal. Chim. Acta

    (1993)
  • H. Ni et al.

    Polymer

    (1997)
  • M. Pompe et al.

    Anal. Chim. Acta

    (1997)
  • J.M. Sutter et al.

    Anal. Chim. Acta

    (1997)
  • J.U. Thomsen et al.

    J. Magn. Reson.

    (1989)
  • A.X. Yan et al.

    Comput. Chem.

    (1998)
  • R.S. Zhang et al.

    Chemom. Intell. Lab. Syst.

    (1999)
  • H. Chan et al.

    Anal. Chem.

    (1997)
There are more references available in the full text version of this article.

Cited by (37)

  • QSRR modelling for the investigation of gas chromatography retention indices of flavour and fragrance compounds on Carbowax 20 ​M glass capillary column with the index of ideality of correlation and the consensus modelling

    2022, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    Hence, quantitative structure retention relationships (QSRR) for the chromatographic parameters have been acknowledged as a significant methodology. Literature survey reveals that a number of QSRR investigations on the stationary polar phase Carbowax 20 ​M have been reported for both modelling and prediction of the retention data [7–12]. However, limited investigations on flavor and fragrance volatile compounds are available in the literature [1,2,13].

  • Quantitative structure-property relationship analysis for the retention index of fragrance-like compounds on a polar stationary phase

    2015, Journal of Chromatography A
    Citation Excerpt :

    In our case, the data set was 2.7 times bigger than the largest one previously analyzed [20] but the parameters were very close for the training set and slightly different for the test set. In contrast, seven QSRR studies from Table 3 did not perform external validation [8,9,11,13,14,16,17] or did not present the parameters for the test set of the 81 hydrocarbons [8]. The model developed in the current work has a reasonable number of uncorrelated descriptors.

  • Multivariate characterisation and quantitative structure-property relationship modelling of nitroaromatic compounds

    2008, Analytica Chimica Acta
    Citation Excerpt :

    Gas chromatography has been used for environmental analysis for many years and continues to play an important role in the identification and quantification of organic compounds. Several models for the relationship between molecular descriptors and gas chromatographic retention data for different organic compounds such as alkanes [13], methylalkanes [14], alkenes [15], alkylbenzenes [16–19], PAH [12,20], PCB [21], nitrogen-containing polycyclic aromatic compounds [22], polychlorinated naphthalenes [23], esters, alcohols, aldehydes and ketones [24–27], terpenes [28] and other organic compounds [29–31], have been reported. By combining GC with solid-phase microextraction (SPME), a fast, sensitive and solvent-free technique for extracting organic compounds is etablished.

View all citing articles on Scopus
View full text