Abstract
This work involves the comparison of protein information in a genomic scale. The main goal is to improve the quality and interpretation of biological data, besides our understanding of biological systems and their interactions. Stringent comparisons were obtained after the application of the Smith-Waterman algorithm in a pair wise manner to all predicted proteins encoded in both completely sequenced and unfinished genomes available in the public database RefSeq. Comparisons were run through a computational grid and the complete result reaches a volume of over 900 GB. Consequently, the database system design is a critical step in order to store and manage the information from comparisons’ results. This paper describes database conceptual design issues for the creation of a database that represents a data set of protein sequence cross-comparisons. We show that our conceptual schema and its relational mapping enables users to extract relevant information, from simple to complex queries integrating distinct data sources.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
KEGG: Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/
NCBI Taxonomy Database, http://www.ncbi.nlm.nih.gov/Taxonomy/
PostgreSQL, http://postgresql.org
The Gene Ontology, http://www.geneontology.org/
The Pfam Protein Families Database, http://pfam.sanger.ac.uk
Chen, J.Y., Carlis, J.V.: Genomic data modeling. Information, Special issue: Data Management in Bioinformatics 28, 287–310 (2003)
Elmasri, R., Ji, F., Fu, J., Zhang, Y., Raja, Z.: Modelling Concepts and Database Implementation Techniques For Complex Biological Data. International Journal of Bioinformatics Research and Applications 3, 366–388 (2007)
Keet, C.M.: Biological Data and Conceptual Modelling Methods. Journal of Conceptual Modeling (2003)
Mount, D.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2004)
Navathe, S.B., Kogelnik, A.M.: The Challenges of Modeling Biological Information for Genome Databases. In: Chen, P.P., Akoka, J., Kangassalu, H., Thalheim, B. (eds.) Conceptual Modeling. LNCS, vol. 1565, pp. 168–182. Springer, Heidelberg (1999)
Nelson, M.R., Reisinger, S.J., Henry, S.G.: Designing databases to store biological information. BIOSILICO 1, 134–142 (2003)
Otto, T.D., Catanho, M., Tristão, C., Bezerra, M., Fernandes, R.M., Elias, G.S., Scaglia, A.C., Bovermann, B., Berstis, V., Lifschitz, S., de Miranda, A.B., Degrave, W.: ProteinWorldDB: Querying radical pairwise alignments among protein sets from complete genomes. Bioinformatics (2010)
Pastor, O.: Conceptual Modeling Meets the Human Genome. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 1–11. Springer, Heidelberg (2008)
Pearson, W.: SSearch. Genomics 11, 635–650 (1991)
Smith, T., Waterman, M.: Comparison of Biosequences. Advances in Applied Mathematics 2, 482–489 (1981)
Zhou, X., Song, I.Y.: Conceptual Modeling of Genetic Studies and Pharmacogenetics. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005, Part III. LNCS, vol. 3482, pp. 402–415. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lifschitz, S. et al. (2012). Design and Implementation of ProteinWorldDB. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-31927-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31926-6
Online ISBN: 978-3-642-31927-3
eBook Packages: Computer ScienceComputer Science (R0)