Abstract
In this paper we propose a novel H/ACA box snoRNA gene mining algorithm, which is based on ensemble learning and a special secondary structure prediction algorithm. Three contributions are made to improve current mining methods, including enriching the negative training set, using the ensemble classifiers for the class imbalance data, and developing a special secondary structure prediction algorithm for extracting features with high quality. The performance of learning method is proved by cross validation and the mining method is proved by the experiments on genome data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Markus, B., Carsten, W.: Ab initio identification of human microRNAs based on structure motifs. BMC Bioinformatics 8, 478 (2007)
Peter, S., Angela, B., Todd, L.: The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Research 33, W686–W689 (2005)
Huttenhofer, A., Kiefmann, M., Meier-Ewert, S., et al.: RNomics: An experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 20, 2943–2953 (2001)
Jian-Hua, Y., Xiao-Chen, Z., Zan-Peng, H., et al.: snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Research 34, 5112–5123 (2006)
Jana H, Ivo H, Peter S: SnoReport: Computational identification of snoRNAs with unknown targets. Bioinformatics. 24,158-164(2008)
Sverker, E., Paul, G., Anthony, P., et al.: A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction. Bioinformatics 19, 865–873 (2003)
Laurent, L., Michel, J.W.: snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research 34, D158–D162 (2006)
Sam, J., Simon, M., Mhairi, M., Ajay, K., Sean, R.E., Alex, B.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33, D121–D124 (2005)
European Bioinformatics Institute, ftp://ftp.ebi.ac.uk
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
Eibe, F., et al.: Data mining in bioinformatics using Weka. Bioinformatics 20, 2479–2481 (2004)
Chunlin, W., Chris, D., Richard, F.M., Stephen, R.H.: PSol: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22, 2590–2596 (2006)
Burkhard, M., Oliver, R., Said, A., Dirk, H., Klaus, F.X.M., Andreas, D., Hans, W.M.: Exon discovery by genomic sequence alignment. Bioinformatics 18, 777–787 (2002)
Chenghai, X., Fei, L., Tao, H., Guo-Ping, L., Yanda, L., Xuegong, Z.: Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6, 310 (2005)
Zuker, M.: On finding all suboptimal foldings of an RNA molecular. Science 244, 48–52 (1989)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zou, Q., Guo, M., Wang, C., Han, Y., Li, W. (2009). Novel H/ACA Box snoRNA Mining and Secondary Structure Prediction Algorithms. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds) Rough Sets and Knowledge Technology. RSKT 2009. Lecture Notes in Computer Science(), vol 5589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02962-2_68
Download citation
DOI: https://doi.org/10.1007/978-3-642-02962-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02961-5
Online ISBN: 978-3-642-02962-2
eBook Packages: Computer ScienceComputer Science (R0)