Skip to main content
Log in

An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gesteland R F, Cech T R, Atkins J F (Eds.). The RNA World. Cold Spring Harbor Laboratory Press, New York, 1999.

    Google Scholar 

  2. Eddy S. Computational genomics of noncoding RNA genes.Cell, 2002, 109: 137–140.

    Article  Google Scholar 

  3. Simons R W, Grumberg-Magnago M (Eds.). RNA Structure and Function, Cold Spring Harbor Laboratory Press, New York, 1998.

    Google Scholar 

  4. Gray N K, Wickens M. Control of translation initiation in animals.Ann. Rev. Cell Dev. Biol., 1998, 14: 399–458.

    Article  Google Scholar 

  5. Fox G E, Woese C R. 5s RNA secondary structure.Nature, 1975, 256: 505–507.

    Article  Google Scholar 

  6. Westhof E, Auffinger E, Gaspin C. DNA and RNA structure prediction. InDNA-Protein Sequence Analysis, Oxford, 1996, pp.255–278.

  7. Stephan W, Parsch J, Braverman J M. Comparative sequence analysis and patterns of covariation in RNA secondary structures.Genetics, 2000, 154(2): 909–921.

    Google Scholar 

  8. Tahi F, Gouy M, Regnier M. Automatic RNA secondary structure prediction with a comparative approach.Computers and Chemistry, 2002, 26: 521–530.

    Article  Google Scholar 

  9. Bouthinon D, Soldano H. A new method to predict the consensus secondary structure of a set of unaligned RNA sequences.Bioinformatics, 1999, 15(10): 785–798.

    Article  Google Scholar 

  10. Zucker M, Matthews D H, Turner D H. Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide. InRNA Biochemistry and Biotechnology, NATO ASI Series. Kluwer Academic Publishers, 1999, pp.11–43.

  11. Chen S J, Dill K A. RNA folding energy landscapes. InProc. Natl. Acad. Sci. USA, 2000, 97: 646–651.

  12. Mauri G, Pavesi G. Pattern discovery in RNA secondary structure using affix trees. InProc. CPM 2003, Lecture Notes in Computer Science 2676, 2003, pp.278–294.

  13. Witwer C, Rauscher S, Hofacker I L, Stadler P F. Conserved RNA secondary structures in picornaviridae genomes.Nucleic Acids Res., 2001, 29: 5079–5089.

    Article  Google Scholar 

  14. Le S-Y, Liu W-M, Maizel Jr J V. A data mining approach to discover unusual folding regions in genome sequences.Knowledge Based Systems, 2002, 15: 243–250.

    Article  Google Scholar 

  15. Le S-Y, Chen J-H, Konings D, Maizel Jr J V. Discovering well ordered folding patterns in nucleotide sequences.Bioinformatics, 2003, 19(3): 354–361.

    Article  Google Scholar 

  16. Rivas E, Eddy S. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs.Bioinformatics, 2000, 16(7): 583–605.

    Article  Google Scholar 

  17. Gorodkin J, Heyer L J, Stormo G D. Finding common sequence and structure motifs in a set of RNA sequences.Nucleic Acids Res., 1997, 25(18): 3724–3732.

    Article  Google Scholar 

  18. Gorodkin J, Stricklin S L, Stormo G D. Discovering common stem-loop motifs in unaligned RNA sequences.Nucleic Acids Res., 2001, 29(10): 2135–2144.

    Article  Google Scholar 

  19. Laferriere A, Gautheret D, Cedergren R. An RNA pattern matching program with enhanced performance and portability.Comp. Appl. Biosci., 1994, 10: 211–212.

    Google Scholar 

  20. Macke T J, Ecker D J, Gutell R Ret al. RNA motif, an RNA secondary structure definition and search algorithm.Nucleic Acids Res., 2001, 29(22): 4724–4735.

    Article  Google Scholar 

  21. Pesole G, Liuni S, D'Souza M. Patsearch: A pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance.Bioinformatics, 2000, 16(5): 439–450.

    Article  Google Scholar 

  22. Fogel G, Porto W, Weekes Det al. Discovery of RNA structural elements using evolutionary computation.Nucleic Acids Res., 2002, 30(23): 5310–5317.

    Article  Google Scholar 

  23. Hu Y-J. Prediction of consensus structural motifs in a family of coregulated RNA sequences.Nucleic Acids Res., 2002, 30(17): 3886–3893.

    Article  Google Scholar 

  24. Dandekar T, Hentze M W. Finding the hairpin in the haystack: Searching for RNA motifs.TIGS, 1995, 11: 45–50.

    Google Scholar 

  25. Maass M G. Linear bidirectional on-line construction of affix trees. InProc. CPM 2000, Lecture Notes in Computer Science 1848, 2000, pp.320–334.

  26. Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.

    MATH  Google Scholar 

  27. Hertz G, Hartzell G, Stormo G. Identification of consensus patterns in unaligned DNA sequences known to be functionally related.comput. Appl. Biosci., 1990, 6: 81–92.

    Google Scholar 

  28. Hertz G, Stormo G. Identifying DNA and protein patterns with statistically significant alignment of multiple sequences.Bioinformatics, 1999, 15: 563–577.

    Article  Google Scholar 

  29. Pavesi G. Aligning RNA sequences and their secondary structures. Technical report. University of Milano Bicocca, 2003.

  30. Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Mol. Biol., 1970, 48: 443–453.

    Article  Google Scholar 

  31. Hofacker I L, Fontana W, Stadler P Fet al. Fast folding and comparison of RNA secondary structures.Monatshefte f. Chemie, 1994, 125: 167–188.

    Article  Google Scholar 

  32. Walter A, Turner D H, Kim Jet al. Coaxial stacking of helices enhances binding of oligoribonucleotides.PNAS, 1994, 91: 9218–9222.

    Article  Google Scholar 

  33. Mathews D H, Sabina J, Zucker Met al. Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure.J. Mol. Biol., 1999, 288: 911–940.

    Article  Google Scholar 

  34. Hentze M W, Kuhn L C. Molecular control of vertebrate iron metabolism: mRNA based regulatory circuits operated by iron, nitric oxide and oxidative stress. InProc. Natl. Acad. Sci., USA, 1996, 93: 8175–8182.

  35. Pesole G, Liuni S, Grillo Get al. Utrdb and utrsite: Specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNA. update 2002.Nucleic Acids Res., 2002, 30(1): 335–340.

    Article  Google Scholar 

  36. Rosenblad M A, Gorodkin J, Knudsen Bet al. Srpdb: Signal recognition particle database.Nucleic Acids Res., 2003, 31(1): 363–364.

    Article  Google Scholar 

  37. Lutcke H. Signal recognition particle (SRP), a ubiquitous initiator of protein translocation.Eur. J. Biochem., 1995, 228(3): 531–550.

    Article  Google Scholar 

  38. Schmitz U, James T, Lukavsky Pet al. Structure of the most conserved internal loop in SRP RNA.Nature Structural Biology, 1999, 6(7): 634–638.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulio Pavesi.

Additional information

Giulio Pavesi is a postdoc researcher at the Department of Computer Science, Systems and Communication, the University of Milano-Bicocca. His research interests are mainly focused on bioinformatics and discrete models of complex systems.

Giancarlo Mauri is a full professor of computer science at the University of Milano-Bicocca. His research interests are mainly in the area of theoretical computer science, including formal languages and automata, computational complexity, computational learning theory, neural networks, cellular automata and models of concurrent/parallel computing, and bioinformatics.

Graziano Pesole is a full professor of Molecular Biology at the University of Milan, Italy. His research interests include bioinformatics and molecular evolution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pavesi, G., Mauri, G. & Pesole, G. An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. J. Comput. Sci. & Technol. 19, 2–12 (2004). https://doi.org/10.1007/BF02944781

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02944781

Keywords

Navigation