Abstract
BLAST homology searches have been largely used to annotate function to novel sequences. Secondary databases like KOG can be used in this intention since their sequences have functional classification. We devised an experiment where public ESTs from four eukariotic organisms, which protein sequences are present in the KOG database, are classified to functional KOG categories using tBLASTn. First we assigned the ESTs from one organism to KTL (KOG, TWOG and LSEs) proteins and then we searched the database depleted of the same organism’s proteins to simulate a novel transcriptome. Data show that classification was correct (assignment equals annotation) 87.2%, 96.8%, 92.0%, 88.7% for A. thaliana(Ath), C. elegans(Cel), D. melanogaster(Dme) and H. sapiens(Hsa) respectively. We have estimated identity cutoffs for all organisms to use with tBLASTn. These cutoffs trim the same amount of events that a BLASTn in order to minimize false positives in consequence of sequence errors. We found values of 80%, 78%, 78% and 84% for amino-acid identity cutoff for Hsa, Dme, Cel and Ath, respectively. We then evaluated our system by comparing the KTL categories of the assigned ESTs with the KTL categories that the ESTs were classified without the organism’s KTL proteins. Moreover, we show the potential of annotation of the KOG database and the ESTs used. Suplementary Information can be found at: http://www.biodados.icb.ufmg.br
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991)
Altschul, S.F., Madden, T.L., Schaffer, AMINO-ACID, Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Cuff, J.A., Birney, E., Clamp, M.E., Barton, G.J.: ProtEST: protein multiple sequence alignments from expressed sequence tags. Bioinformatics 16(2), 111–116 (1999)
Faria-Campos, A.C., Cerqueira, G.C., Anacleto, C., Carvalho, C.M.B., Ortega, J.M.: Mining microorganism EST databases in the quest for new proteins. Genet. Mol. Res. 2(1), 169–177 (2003)
Felipe, M.S., Andrade, R.V., Petrofeza, S.S., Maranhao, A.Q., Torres, F.A., Albuquerque, P., Arraes, F.B., Arruda, M., Azevedo, M.O., Baptista, A.J., Bataus, L.A., Borges, C.L., Campos, E.G., Cruz, M.R., Daher, B.S., Dantas, A., Ferreira, M.A., Ghil, G.V., Jesuino, R.S., Kyaw, C.M., Leitao, L., Martins, C.R., Moraes, L.M., Neves, E.O., Nicola, A.M., Alves, E.S., Parente, J.A., Pereira, M., Pocas-Fonseca, M.J., Resende, R., Ribeiro, B.M., Saldanha, R.R., Santos, S.C., Silva-Pereira, I., Silva, M.A., Silveira, E., Simoes, I.C., Soares, R.B., Souza, D.P., De-Souza, M.T., Andrade, E.V., Xavier, M.A., Veiga, H.P., Venancio, E.J., Carvalho, M.J., Oliveira, A.G., Inoue, M.K., Almeida, N.F., Walter, M.E., Soares, C.M., Brigido, M.M.: Transcriptome characterization of the dimorphic and pathogenic fungus Paracoccidioides brasiliensis by EST analysis. Yeast 20(3), 263–271 (2003)
Franco, G.R., Rabelo, E.M., Azevedo, V., Pena, H.B., Ortega, J.M., Santos, T.M., Meira, W.S., Rodrigues, N.A., Dias, C.M., Harrop, R., Wilson, A., Saber, M., Abdel-Hamid, H., Faria, M.S., Margutti, M.E., Parra, J.C., Pena, S.D.: Evaluation of cDNA libraries from different developmental stages of Schistosoma mansoni for production of expressed sequence tags (ESTs). DNA Res. 4(3), 231–240 (1997)
Koonin, E.V., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Krylov, D.M., Makarova, K.S., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Rogozin, I.B., Smirnov, S., Sorokin, A.V., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004, R7 (2004)
Koonin, E.V., Galperin, M.Y.: Sequence - Evolution - Function Computational Approaches in Comparative Genomics. Norwell, MA (2003)
Lee, N.H., Weinstock, K.G., Kirkness, E.F., Earle-Hughes, J.A., Fuldner, R.A., Marmaros, S., Glodek, A., Gocayne, J.D., Adams, M.D., Kerlavage, A.R., et al.: Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 Cel ls before and after nerve growth factor treatment. Proc. Natl. Acad. Sci. 92(18), 8303–8307 (1995)
McCallum, J., Ganesh, S.: Text mining of DNA sequence homology searches. Appl. Bioinformatics 2(3 Suppl.), 59–63 (2003)
Stekel, D.J., Git, Y., Falciani, F.: The Comparison of Gene Expression from Multiple cDNA Libraries. Gen. Res. 10, 2055–2061 (2000)
Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4(1), 41 (2003)
Vettore, A.L., da Silva, F.R., Kemper, E.L., Souza, G.M., da Silva, A.M., Ferro, M.I., Henrique-Silva, F., Giglioti, E.A., Lemos, M.V., Coutinho, L.L., Nobrega, M.P., Carrer, H., Franca, S.C., Bacci Junior, M., Goldman, M.H., Gomes, S.L., Nunes, L.R., Camargo, L.E., Siqueira, W.J., Van Sluys, M.A., Thiemann, O.H., Kuramae, E.E., Santelli, R.V., Marino, C.L., Targon, M.L., Ferro, J.A., Silveira, H.C., Marini, D.C., Lemos, E.G., Monteiro-Vitorello, C.B., Tambor, J.H., Carraro, D.M., Roberto, P.G., Martins, V.G., Goldman, G.H., de Oliveira, R.C., Truffi, D., Colombo, C.A., Rossi, M., de Araujo, P.G., Sculaccio, S.A., Angella, A., Lima, M.M., de Rosa Junior, V.E., Siviero, F., Coscrato, V.E., Machado, M.A., Grivet, L., Di Mauro, S.M., Nobrega, F.G., Menck, C.F., Braga, M.D., Telles, G.P., Cara, F.A., Pedrosa, G., Meidanis, J., Arruda, P., Telles, G.P., Braga, M.D.V., Dias, Z., Lin, T., Quitazau, J., AMINO-ACID, da Silva, F.R., Meidanis, J.: Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane. Genome Res. 13(12), 2725–2735 (2003)
Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., Wagner, L.: Database Resources of the National Center for Biotechnology. Nucl. Acids Res. 31, 28–33 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Alvarenga Mudado, M., Bravo-Neto, E., Ortega, J.M. (2005). Tests of Automatic Annotation Using KOG Proteins and ESTs from 4 Eukariotic Organisms. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_15
Download citation
DOI: https://doi.org/10.1007/11532323_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28008-8
Online ISBN: 978-3-540-31861-3
eBook Packages: Computer ScienceComputer Science (R0)