Research article
Automated prediction of three-way junction topological families in RNA secondary structures

https://doi.org/10.1016/j.compbiolchem.2011.11.001Get rights and content

Abstract

We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure: the sequence and the Watson–Crick pairings. The parameters of the algorithm have been determined on a data set of 33 three-way junctions whose 3D conformation is known. We applied the algorithm on 53 other junctions and compared the predictions to the real shape of those junctions. We show that the correct answer is selected out of nine possible configurations 64% of the time. Additionally, these results are noticeably improved if homology information is used. The resulting software, Cartaj, is available online and downloadable (with source) at: http://cartaj.lri.fr.

Graphical abstract

The prediction workflow. A given RNA three-way junction can have three stackings; for each stacking, the junction can be in three families (A, B or C), depending on the angle of the third helix. This gives nine configurations. We compute a score for each configuration, and the configuration with the best score is our prediction.

  1. Download : Download full-size image

Highlights

► We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure. ► We show that the correct answer is selected out of nine possible con gurations 64% of the time. Additionally, these results are noticeably improved if homology information used. ► This work may have important applications in the field of three-dimensional modelling of RNA molecules. ► The resulting software, Cartaj, is available online and downloadable (with source) at: http://cartaj.lri.fr.

Introduction

RNA molecules fold into complex three-dimensional structures in a hierarchical and modular way, with recurring autonomous building blocks being packed together to form the molecule. These modules are also hierarchical: high level modules, like RNA junctions, are made of smaller, lower level modules, in this case of Watson–Crick helices linked together by single strands.

Knowledge of the shape of the lower level modules can give us insight on the shape of the higher level ones, leading to an approximation of the shape of the molecule that can be refined in subsequent steps. Since Watson–Crick helices have a well-defined shape, RNA junctions are the next obvious target (Bindewald et al., 2008, Lescoute et al., 2005, Lescoute and Westhof, 2006, Laing and Schlick, 2009). Notably, in Lescoute and Westhof (2006), the authors have showed that the three-way junctions where two helices are approximately stacked can be divided in three families A, B and C, according to the position of the third helix (P3) relatively to the two other helices that are stacked together (P1 and P2). Fig. 1 shows a schematic drawing of each of the families.

The topology of each of the families is notably due to the different non Watson–Crick interactions that occur within the helices, and between the helices and the other nucleotides of the junction. After a thorough examination of 33 junctions whose three-dimensional structure was known, Lescoute and Westhof gave some hints towards predicting the family of a junction, given its secondary structure.

In this paper, we propose a method for automatically predicting the topological family of any given three-way junction, with only information from sequence and the deduced secondary structure (only Watson–Crick interactions). We also show that the accuracy of the prediction is noticeably improved if homology information is given in addition, that is a set of sequences that are homologous to the input sequence. We evaluate the accuracy of our method on a set of 86 junctions from the structural databases, and we compare it to other possible approaches.

Section snippets

Data

We distinguished the following three data sets:

    LW:

    The 33 junctions from Lescoute and Westhof (2006).

    FR3D:

    In order to test our predictions, we automatically extracted the three-way junctions from all molecules in the non-redundant FR3D Database (Sarver et al., 2008). We found 86 junctions, among them 53 were new junctions – “new” being defined here as having a secondary structure different from that of the junctions in LW. Details on the extraction process are given in the next section.

    ALL:

    This

Prediction result

For the three data sets described in Section 2.1, we computed how many times we predicted the correct answer in the first position, or in the first three positions among the nine possible ones (see Table 2). Randomly selecting a configuration would produce the correct answer in the first position in 11% of the cases, and in the first three positions in 33% of the cases. We predict the correct configuration in the first position 64% of the time, and in the first three positions 87% of the time.

Conclusion

We described an automated method for predicting the topological family of three-way RNA junctions from their secondary structure only. We showed that this approach works well on single junctions, and is improved either by having sequence alignements for that junction or, even better, a set of homologous 2D structures deduced from crystallographic data. Among the three topological families, family B, the rarest one, is the less well predicted by our method, showing that new criteria have to be

Acknowledgments

We warmly thank Julie Bernauer and Fabrice Jossinet for fruitful discussions. We also thank Christian Cadéré and Vincent Reinhard for their help at an early stage of the work. This research was supported in part by the Digiteo project PASAPAS, by the ANR project AMIS ARN ANR-09-BLAN-0160, and by the UniverSud Paris project PASAPRES.

References (11)

  • LaingC. et al.

    Analysis of four-way junctions in RNA structures

    Journal of Molecular Biology

    (2009)
  • BindewaldE. et al.

    RNAJunction: a database of RNA junctions and kissing loops for three-dimensional structural analysis and nanodesign

    Nucleic Acids Research

    (2008)
  • CannoneJ.J. et al.

    The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs

    BioMed Central Bioinformatics

    (2002)
  • LescouteA. et al.

    Recurrent structural RNA motifs, isostericity matrices and sequence alignments

    Nucleic Acids Research

    (2005)
  • LescouteA. et al.

    Topology of three-way junctions in folded RNAs

    RNA

    (2006)
There are more references available in the full text version of this article.

Cited by (18)

  • RNA as a stable polymer to build controllable and defined nanostructures for material and biomedical applications

    2015, Nano Today
    Citation Excerpt :

    Therefore, RNA polymers might be advantageous concerning controllable biodegradability [73,74] simply by tuning the ratio and location of 2′-modified nucleotides in the RNA sequence. The versatility of RNA is highly evident given the diversity of structural repertoires available in nature, which include simple structures such as helical stems and single stranded hairpin loops to more complicated structures such as multi-way junctions and pseudoknots [75–84]. Persistence length is a basic mechanical property used in polymer science to measure the flexibility and stiffness of a polymer.

  • Novel features for identifying A-minors in three-dimensional RNA molecules

    2013, Computational Biology and Chemistry
    Citation Excerpt :

    It is well established that single-stranded RNA molecules fold back on themselves to form short, double-stranded helices that are stabilized primarily by Watson–Crick and wobble base pairs (Lamiable et al., 2012; Shapiro et al., 2007).

View all citing articles on Scopus
View full text