Elsevier

Computers & Chemistry

Volume 20, Issue 1, March 1996, Pages 123-133
Computers & Chemistry

Statistical analysis of genemark performance by cross-validation

https://doi.org/10.1016/S0097-8485(96)80014-3Get rights and content

Abstract

We have explored the performance of the GeneMark gene identification method using cross-validation over learning samples of E. coli DNA sequences. The computations gave more accurate estimations of the error rates in comparison with previous results when a sample of non-coding regions was derived from GenBank sequences with many true coding regions unannotated. The error rate components have been classified and delineated. It was shown that the method performs differently on class I, II and III genes. The most frequent errors come from misinterpreting the coding potential of the complementary sequence in the same frame. The effects of stop-codons present in alternative frames were also studied to understand better the main factors contributing to GeneMark performance.

References (14)

  • M. Borodovsky et al.

    Computers Chem.

    (1993)
  • M. Borodovsky et al.

    Trends Biochem. Sci.

    (1994)
  • C. Medigue et al.

    J. Mol. Biol.

    (1991)
  • P. Billingsley

    Ann. Math. Stat.

    (1961)
  • M. Borodovsky et al.

    Molecular Biol.

    (1986)
  • M. Borodovsky et al.

    Nucl. Acids Res.

    (1994)
  • J.W. Fickett et al.

    Nucleic Acids Res.

    (1992)
There are more references available in the full text version of this article.

Cited by (4)

  • Rapid detection of KPC-producing Klebsiella pneumoniae in China based on MALDI-TOF MS

    2022, Journal of Microbiological Methods
    Citation Excerpt :

    The sequenced genome was assembled using unicycler (Wick, R.R, 2017). Furthermore, tRNA-scan-SE (Kleffe et al., 1996) and Barrnap (https://github.com/tseemann/barrnap/) were used for tRNA prediction and rRNA prediction respectively and Glimmer (Delcher et al., 2007) was used for CDS prediction. The sequence alignment tool, Diamond (https://github.com/bbuchfink/diamond) was used to annotate predicted CDS from the COG database (http://www.genome.jp/kegg/).

  • Self-identification of protein-coding regions in microbial genomes

    1998, Proceedings of the National Academy of Sciences of the United States of America
  • Applications of GeneMark in Multispecies Environments

    1996, Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, ISMB 1996
View full text