Predicting the efficiency of UAG translational stop signal through studies of physicochemical properties of its composite mono- and dinucleotides

https://doi.org/10.1016/j.compbiolchem.2004.05.003Get rights and content

Abstract

In this study, we explored the problem of predicting the UAG stop-codon read-through efficiency. The reported nucleotide sequences were first converted into physicochemical property vectors before being presented to a machine learning algorithm. Two sets of physicochemical properties were applied: one for mononucleosides (in terms of steric bulk, hydrophobicity and electronics) and another for dinucleotides. To the best of our knowledge, this is the first report of how dinucleotides are converted into principle components derived from NMR chemical shift data. A few efficiency prediction models were then derived and a comparison between mononucleoside and dinucleotide-based models was shown. In the derived models, the coefficients of these property based predictors lend themselves to bio-physical interpretations, an advantage which is demonstrated in this study via a prediction model based on the steric bulk factor. Although it is quite simple, the steric bulk factor model explained well the effect of sequence variations surrounding the amber stop codon and the tRNA bearing UCCU anticodon. We further proposed new alternatives at position −1 and +4 of a UAG stop codon sequence to enhance the readthrough efficiency. This research may contribute to a better understanding of the readthrough mechanisms and may also help to study the normal translation termination process.

Introduction

Translation termination, which is a crucial step in maintaining the life of organisms, is recognized by one of three stop codons (UAG [amber], UAA [orchr], or UGA [opal]), whose efficiency is largely dependent on the context of the codon (Bjornsson et al., 1996, Bossi, 1983, Mottagui-Tabar et al., 1994, Poole et al., 1998, Stormo et al., 1986). In readthrough, however, a stop codon is misread as a sense codon and this results in the synthesis of an extended polypeptide. Studies on termination contexts in different cells (see review, Bertram et al., 2001) indicate that the nucleotides immediately after and before the stop codon (defined as −1 and +4) are non-random. Intrinsic physicochemical properties of nucleotides are expected to play a key role in this “programmed translational error”, since only normal interactions between the mRNA and components of the translational machinery are involved and no specific gene products have been implicated (Cassan and Rousset, 2001). Little progress, however, has been made in determining the readthrough efficiency from first principles. A ribosome alone contains millions of atoms, which is out of the scope of a molecular dynamic simulation even on the fastest supercomputers. Most of the reported theory-related readthrough models are therefore based on statistical analysis of biological bases (A, U, G, and C) and they do not link to the molecular physics of nucleotides. In the study by Stormo et al. (1986), multiple regressions using Miller and Albertini’s (1983) 42 sequence data sets together with an additional 43rd sequence from Bossi (1983) were done (Table 1), which led to a few models showing which bases were important for efficiency. Stormo et al. (1986) as well as others (Major et al., 1996, Poole et al., 1998) found that the nucleotides at +4 (A or G), +5 and −1 (A) play a key role in the readthrough efficiency, but no physicochemical properties were put forward to explain the phenomenon of suppression due to limitations of the mathematical approach using biological bases as the model elements.

Recently, a number of materials research projects were successfully carried out using correlation techniques (Heng et al., 1999, Jin et al., 2000, Wu and Heng, 1999, Wu et al., 1999, Wu et al., 2002) to predict a bulk material property from the fundamental atomic properties of the constituent elements. These models may lead to approximated physics of otherwise very complicated compound formation mechanisms since all atomic properties are well known (or can be easily computed by well established quantum mechanics). Without correlation approaches, it is very difficult if not impossible in many cases to derive the mechanisms from first principles methodology alone.

Similar efforts were reported to link sequence activity to physicochemical properties of its composite nucleotides (Jonsson et al., 1993, Sjostrom et al., 1986), in which principal component analysis was performed on a data set of 21 experimentally determined and calculated nucleoside properties (Sandberg and Sjöström, 1996). Four (4) statistically significant components, or principal properties (P) were extracted which described 68.4% of the variance in the data. Since the principal properties are condensed descriptors from the original property data set, each of the four Ps can be related to physicochemical properties of the nucleotides; specifically these are: P1 relates to the steric bulk, P2 relates to the hydrophobicity, P3 relates to the electronic properties, and P4 relates to the electronic/hydrophobic properties. A typical steric bulk property (P1) is the heat of formation: −91, −127, −176 and −230 kcal/mol for nucleoside A, G, C and U, respectively. Other useful steric bulk properties include molecular weight and total molecular surface area. Hydrophobicity (P2) is well represented by the logarithm of octanol/water partition coefficient: −2.83, −2.88, −3.08 and −3.85 for U, A, C, and G, respectively. Electronic properties (P3) may be obtained from the energy of lowest unoccupied molecular orbital: −0.538, −0.29, −0.064 and −0.032 eV for G, A, U and C, respectively. G has the strongest electron affinity while C has the weakest. Based on these four Ps and the developed sequence correlation model, Sandberg and Sjöström (1996) proposed and experimentally verified a new sequence with high activity of gene expression. They also explained the physical bases on the nucleotides that are favoured at certain sequence positions.

In the present study, the four Ps proposed by Sandberg and Sjöström are used to develop mononucleoside based readthrough models. For dinucleotide based readthrough models, we constructed principal properties (pps) for dinucleotides using the same procedure as for the mononucleosides. A challenge in constructing pps is finding sufficient dinucleotide data from the literature since a property can only be used if data is available for all 16 different dinucleotides. After an extensive literature search, we found seven parameters of proton chemical shifts (NMR), 1′″, 2′, 2″, 3′, 4′ 5′, 5″ (Cheng and Sarma, 1977) for all 16 different dinucleotides under five different experimental conditions. Therefore, we had a total of 560 (16 × 7 × 5) data points, which were used in this study to derive the principal properties for dinucleotides. Although this data uses deoxyribodinucletotides instead of ribodinucleotides that are directly involved in readthrough, we cannot find any experimental information on the 16 ribodinucleoside monophosphates. The difference between thymine and uracil is, however, limited to the C5 position (–CH3 group in thymine and –H in uracil), in which only one proton NMR is affected. The only other site affected is in the sugar group. Thus, the 16 deoxyribodinucleoside monophosphates also may still reveal the intrinsic intermolecular interactions of the 16 ribodinucleoside monophosphates.

In this work, the three pps were used to construct dinucleotide based readthrough models. We compare the two groups of models and try to link the observed readthrough efficiency to physical aspects of the nucleotides. Lastly, a new steric bulk model is proposed to explain some observed effects of sequence variations surrounding the amber stop codon and the tRNA with UCCU anticodon.

Section snippets

APEX (advanced process expert)

An in-house developed pattern recognition software tool, APEX (Jin et al., 1999, Wu et al., 2004, Wu et al., 2002), is used to derive correlation models. Interested readers may repeat the model development by applying MATLAB (Hunt et al., 2001) and the flow charts of APEX (Jin et al., 1999, Wu et al., 2002), which involves data preprocessing, collinearity checking, feature reduction and pattern recognition. Principal component analysis (PCA), partial least squares regression (PLSR),

Results

This article is organised in four parts. First, correlation models based on mononucleosides will be presented, followed by construction of the three principal components from dinucleotide properties. Next, we will present the correlation models based on dinucleotides and lastly we will discuss the results.

Discussion

In the literature, the observed readthrough efficiency is determined using an empirical rule for tetranucleotides as the fourth base controls the efficiency of termination (Bonetti et al., 1995, Poole et al., 1995), or by some near cognate tRNA (Yarus and Curran, 1992). The nature of the flanking 3′ base was first shown to be important in experiments with lacIlacZ fusions in Escherichia coli, whose data (Miller and Albertini, 1983) has been applied in this paper. The suppressor tRNA performed

Acknowledgements

The APEX software was developed with a grant from the exploratory funding of the National Science and Technology Board of Singapore. We would like to thank Dr. Michael B. Sullivan for a critical English proof reading and useful suggestions.

References (39)

  • A. Björnsson et al.

    Structure of the C-terminal end of the nascent peptide influences translation termination

    EMBO J.

    (1996)
  • C.M. Brown et al.

    The signal for the termination of protein synthesis in prokaryotes

    Nucleic Acids Res.

    (1990)
  • M. Cassan et al.

    UAG readthrough in mammalian cells: effect of upstream and downstream stop codon contexts reveal different signals

    BMC Mol. Biol.

    (2001)
  • D.M. Cheng et al.

    Intimate details of the conformational characteristics of deoxyribodinucleoside monophosphates in aqueous solution

    J. Am. Chem. Soc.

    (1977)
  • W.J. Craigen et al.

    Recent advances in peptide chain termination

    Mol. Microbiol.

    (1990)
  • D.-J.G. Crawford et al.

    Indirect regulation of translational termination efficiency at highly expressed genes and recoding sites by the factor recycling function of Escherichia coli release factor RF3

    EMBO J.

    (1999)
  • D.V. Freistroffer et al.

    Release factor RF3 in E. coli accelerates the dissociation of release factors RF1 and RF2 from the ribosome in a GTP-dependent manner

    EMBO J.

    (1997)
  • G. Grentzmann et al.

    Localization and characterization of the gene encoding release factor RF3 in Escherichia coli

    Proc. Natl. Acad. Sci. U.S.A.

    (1994)
  • Hunt, B.R., Lipsamn, R.L., Rosenberg, J.M., 2001. A Guide to MATLAB: for Beginners and Experienced Users. Cambridge...
  • Cited by (0)

    View full text