Skip to main content

Tests of Automatic Annotation Using KOG Proteins and ESTs from 4 Eukariotic Organisms

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3594))

Included in the following conference series:

  • 758 Accesses

Abstract

BLAST homology searches have been largely used to annotate function to novel sequences. Secondary databases like KOG can be used in this intention since their sequences have functional classification. We devised an experiment where public ESTs from four eukariotic organisms, which protein sequences are present in the KOG database, are classified to functional KOG categories using tBLASTn. First we assigned the ESTs from one organism to KTL (KOG, TWOG and LSEs) proteins and then we searched the database depleted of the same organism’s proteins to simulate a novel transcriptome. Data show that classification was correct (assignment equals annotation) 87.2%, 96.8%, 92.0%, 88.7% for A. thaliana(Ath), C. elegans(Cel), D. melanogaster(Dme) and H. sapiens(Hsa) respectively. We have estimated identity cutoffs for all organisms to use with tBLASTn. These cutoffs trim the same amount of events that a BLASTn in order to minimize false positives in consequence of sequence errors. We found values of 80%, 78%, 78% and 84% for amino-acid identity cutoff for Hsa, Dme, Cel and Ath, respectively. We then evaluated our system by comparing the KTL categories of the assigned ESTs with the KTL categories that the ESTs were classified without the organism’s KTL proteins. Moreover, we show the potential of annotation of the KOG database and the ESTs used. Suplementary Information can be found at: http://www.biodados.icb.ufmg.br

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991)

    Article  Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, AMINO-ACID, Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Cuff, J.A., Birney, E., Clamp, M.E., Barton, G.J.: ProtEST: protein multiple sequence alignments from expressed sequence tags. Bioinformatics 16(2), 111–116 (1999)

    Article  Google Scholar 

  4. Faria-Campos, A.C., Cerqueira, G.C., Anacleto, C., Carvalho, C.M.B., Ortega, J.M.: Mining microorganism EST databases in the quest for new proteins. Genet. Mol. Res. 2(1), 169–177 (2003)

    Google Scholar 

  5. Felipe, M.S., Andrade, R.V., Petrofeza, S.S., Maranhao, A.Q., Torres, F.A., Albuquerque, P., Arraes, F.B., Arruda, M., Azevedo, M.O., Baptista, A.J., Bataus, L.A., Borges, C.L., Campos, E.G., Cruz, M.R., Daher, B.S., Dantas, A., Ferreira, M.A., Ghil, G.V., Jesuino, R.S., Kyaw, C.M., Leitao, L., Martins, C.R., Moraes, L.M., Neves, E.O., Nicola, A.M., Alves, E.S., Parente, J.A., Pereira, M., Pocas-Fonseca, M.J., Resende, R., Ribeiro, B.M., Saldanha, R.R., Santos, S.C., Silva-Pereira, I., Silva, M.A., Silveira, E., Simoes, I.C., Soares, R.B., Souza, D.P., De-Souza, M.T., Andrade, E.V., Xavier, M.A., Veiga, H.P., Venancio, E.J., Carvalho, M.J., Oliveira, A.G., Inoue, M.K., Almeida, N.F., Walter, M.E., Soares, C.M., Brigido, M.M.: Transcriptome characterization of the dimorphic and pathogenic fungus Paracoccidioides brasiliensis by EST analysis. Yeast 20(3), 263–271 (2003)

    Article  Google Scholar 

  6. Franco, G.R., Rabelo, E.M., Azevedo, V., Pena, H.B., Ortega, J.M., Santos, T.M., Meira, W.S., Rodrigues, N.A., Dias, C.M., Harrop, R., Wilson, A., Saber, M., Abdel-Hamid, H., Faria, M.S., Margutti, M.E., Parra, J.C., Pena, S.D.: Evaluation of cDNA libraries from different developmental stages of Schistosoma mansoni for production of expressed sequence tags (ESTs). DNA Res. 4(3), 231–240 (1997)

    Article  Google Scholar 

  7. ftp://ftp.ncbi.nih.gov/pub/COG/KOG/

  8. http://www.ncbi.nlm.nih.gov/BLAST/

  9. http://www.ncbi.nlm.nih.gov/dbEST

  10. http://www.phrap.org

  11. http://www.uniprot.org

  12. Koonin, E.V., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Krylov, D.M., Makarova, K.S., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Rogozin, I.B., Smirnov, S., Sorokin, A.V., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004, R7 (2004)

    Article  Google Scholar 

  13. Koonin, E.V., Galperin, M.Y.: Sequence - Evolution - Function Computational Approaches in Comparative Genomics. Norwell, MA (2003)

    Google Scholar 

  14. Lee, N.H., Weinstock, K.G., Kirkness, E.F., Earle-Hughes, J.A., Fuldner, R.A., Marmaros, S., Glodek, A., Gocayne, J.D., Adams, M.D., Kerlavage, A.R., et al.: Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 Cel ls before and after nerve growth factor treatment. Proc. Natl. Acad. Sci. 92(18), 8303–8307 (1995)

    Article  Google Scholar 

  15. McCallum, J., Ganesh, S.: Text mining of DNA sequence homology searches. Appl. Bioinformatics 2(3 Suppl.), 59–63 (2003)

    Google Scholar 

  16. Stekel, D.J., Git, Y., Falciani, F.: The Comparison of Gene Expression from Multiple cDNA Libraries. Gen. Res. 10, 2055–2061 (2000)

    Article  Google Scholar 

  17. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4(1), 41 (2003)

    Article  Google Scholar 

  18. Vettore, A.L., da Silva, F.R., Kemper, E.L., Souza, G.M., da Silva, A.M., Ferro, M.I., Henrique-Silva, F., Giglioti, E.A., Lemos, M.V., Coutinho, L.L., Nobrega, M.P., Carrer, H., Franca, S.C., Bacci Junior, M., Goldman, M.H., Gomes, S.L., Nunes, L.R., Camargo, L.E., Siqueira, W.J., Van Sluys, M.A., Thiemann, O.H., Kuramae, E.E., Santelli, R.V., Marino, C.L., Targon, M.L., Ferro, J.A., Silveira, H.C., Marini, D.C., Lemos, E.G., Monteiro-Vitorello, C.B., Tambor, J.H., Carraro, D.M., Roberto, P.G., Martins, V.G., Goldman, G.H., de Oliveira, R.C., Truffi, D., Colombo, C.A., Rossi, M., de Araujo, P.G., Sculaccio, S.A., Angella, A., Lima, M.M., de Rosa Junior, V.E., Siviero, F., Coscrato, V.E., Machado, M.A., Grivet, L., Di Mauro, S.M., Nobrega, F.G., Menck, C.F., Braga, M.D., Telles, G.P., Cara, F.A., Pedrosa, G., Meidanis, J., Arruda, P., Telles, G.P., Braga, M.D.V., Dias, Z., Lin, T., Quitazau, J., AMINO-ACID, da Silva, F.R., Meidanis, J.: Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane. Genome Res. 13(12), 2725–2735 (2003)

    Article  Google Scholar 

  19. Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., Wagner, L.: Database Resources of the National Center for Biotechnology. Nucl. Acids Res. 31, 28–33 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Alvarenga Mudado, M., Bravo-Neto, E., Ortega, J.M. (2005). Tests of Automatic Annotation Using KOG Proteins and ESTs from 4 Eukariotic Organisms. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_15

Download citation

  • DOI: https://doi.org/10.1007/11532323_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28008-8

  • Online ISBN: 978-3-540-31861-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics