Research articleAutomated prediction of three-way junction topological families in RNA secondary structures
Graphical abstract
The prediction workflow. A given RNA three-way junction can have three stackings; for each stacking, the junction can be in three families (A, B or C), depending on the angle of the third helix. This gives nine configurations. We compute a score for each configuration, and the configuration with the best score is our prediction.
Highlights
► We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure. ► We show that the correct answer is selected out of nine possible con gurations 64% of the time. Additionally, these results are noticeably improved if homology information used. ► This work may have important applications in the field of three-dimensional modelling of RNA molecules. ► The resulting software, Cartaj, is available online and downloadable (with source) at: http://cartaj.lri.fr.
Introduction
RNA molecules fold into complex three-dimensional structures in a hierarchical and modular way, with recurring autonomous building blocks being packed together to form the molecule. These modules are also hierarchical: high level modules, like RNA junctions, are made of smaller, lower level modules, in this case of Watson–Crick helices linked together by single strands.
Knowledge of the shape of the lower level modules can give us insight on the shape of the higher level ones, leading to an approximation of the shape of the molecule that can be refined in subsequent steps. Since Watson–Crick helices have a well-defined shape, RNA junctions are the next obvious target (Bindewald et al., 2008, Lescoute et al., 2005, Lescoute and Westhof, 2006, Laing and Schlick, 2009). Notably, in Lescoute and Westhof (2006), the authors have showed that the three-way junctions where two helices are approximately stacked can be divided in three families A, B and C, according to the position of the third helix (P3) relatively to the two other helices that are stacked together (P1 and P2). Fig. 1 shows a schematic drawing of each of the families.
The topology of each of the families is notably due to the different non Watson–Crick interactions that occur within the helices, and between the helices and the other nucleotides of the junction. After a thorough examination of 33 junctions whose three-dimensional structure was known, Lescoute and Westhof gave some hints towards predicting the family of a junction, given its secondary structure.
In this paper, we propose a method for automatically predicting the topological family of any given three-way junction, with only information from sequence and the deduced secondary structure (only Watson–Crick interactions). We also show that the accuracy of the prediction is noticeably improved if homology information is given in addition, that is a set of sequences that are homologous to the input sequence. We evaluate the accuracy of our method on a set of 86 junctions from the structural databases, and we compare it to other possible approaches.
Section snippets
Data
We distinguished the following three data sets:
- LW:
The 33 junctions from Lescoute and Westhof (2006).
- FR3D:
In order to test our predictions, we automatically extracted the three-way junctions from all molecules in the non-redundant FR3D Database (Sarver et al., 2008). We found 86 junctions, among them 53 were new junctions – “new” being defined here as having a secondary structure different from that of the junctions in LW. Details on the extraction process are given in the next section.
- ALL:
This
Prediction result
For the three data sets described in Section 2.1, we computed how many times we predicted the correct answer in the first position, or in the first three positions among the nine possible ones (see Table 2). Randomly selecting a configuration would produce the correct answer in the first position in 11% of the cases, and in the first three positions in 33% of the cases. We predict the correct configuration in the first position 64% of the time, and in the first three positions 87% of the time.
Conclusion
We described an automated method for predicting the topological family of three-way RNA junctions from their secondary structure only. We showed that this approach works well on single junctions, and is improved either by having sequence alignements for that junction or, even better, a set of homologous 2D structures deduced from crystallographic data. Among the three topological families, family B, the rarest one, is the less well predicted by our method, showing that new criteria have to be
Acknowledgments
We warmly thank Julie Bernauer and Fabrice Jossinet for fruitful discussions. We also thank Christian Cadéré and Vincent Reinhard for their help at an early stage of the work. This research was supported in part by the Digiteo project PASAPAS, by the ANR project AMIS ARN ANR-09-BLAN-0160, and by the UniverSud Paris project PASAPRES.
References (11)
- et al.
Analysis of four-way junctions in RNA structures
Journal of Molecular Biology
(2009) - et al.
RNAJunction: a database of RNA junctions and kissing loops for three-dimensional structural analysis and nanodesign
Nucleic Acids Research
(2008) - et al.
The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
BioMed Central Bioinformatics
(2002) - et al.
Recurrent structural RNA motifs, isostericity matrices and sequence alignments
Nucleic Acids Research
(2005) - et al.
Topology of three-way junctions in folded RNAs
RNA
(2006)
Cited by (18)
RNA as a stable polymer to build controllable and defined nanostructures for material and biomedical applications
2015, Nano TodayCitation Excerpt :Therefore, RNA polymers might be advantageous concerning controllable biodegradability [73,74] simply by tuning the ratio and location of 2′-modified nucleotides in the RNA sequence. The versatility of RNA is highly evident given the diversity of structural repertoires available in nature, which include simple structures such as helical stems and single stranded hairpin loops to more complicated structures such as multi-way junctions and pseudoknots [75–84]. Persistence length is a basic mechanical property used in polymer science to measure the flexibility and stiffness of a polymer.
Novel features for identifying A-minors in three-dimensional RNA molecules
2013, Computational Biology and ChemistryCitation Excerpt :It is well established that single-stranded RNA molecules fold back on themselves to form short, double-stranded helices that are stabilized primarily by Watson–Crick and wobble base pairs (Lamiable et al., 2012; Shapiro et al., 2007).
RNAJP: enhanced RNA 3D structure predictions with non-canonical interactions and global topology sampling
2023, Nucleic Acids ResearchRNAloops: A database of RNA multiloops
2022, BioinformaticsAdvancements in 3WJ-based RNA nanotechnology and its application for cancer diagnosis and therapy
2022, Frontiers in Bioscience - Landmark