Abstract
The rapid growth of life science databases demands the fusion of knowledge from heterogeneous databases to answer complex biological questions. The discrepancies in nomenclature, various schemas and incompatible formats of biological databases, however, result in a significant lack of interoperability among databases. Therefore, data preparation is a key prerequisite for biological database mining. Integrating diverse biological molecular databases is an essential action to cope with the heterogeneity of biological databases and guarantee efficient data mining. However, the inconsistency in biological databases is a key issue for data integration. This paper proposes a framework to detect the inconsistency in biological databases using ontologies. A numeric estimate is provided to measure the inconsistency and identify those biological databases that are appropriate for further mining applications. This aids in enhancing the quality of databases and guaranteeing accurate and efficient mining of biological databases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
AmiGO browser, (2005) http://www.godatabase.org/dev/
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM and Sherlock G (2000). The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nat Genet 25(1): 25–29
Baker PG, Goble CA, Bechhofer S, Paton NW, Stevens R and Brass A (1999). An ontology for bioinformatics applications. Bioinformatics 15(6): 510–520
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J and Wheeler DL (2004). GenBank update. Nucleic Acids Res 32(Database issue): 23–26
Chen Y-PP (ed) (2005) Bioinformatics technologies. Springer.
Chen Y-PP, Colomb BM (2003) Database technologies for L-system simulations in virtual plant applications on bioinformatics. Knowledge Inform Syst 5(3):288–314, Springer-Verlag.
Chen RO, Felciano R, Altman RB (1997) RiboWeb: Linking structural computations to a knowledge base of published experimental data. In: Proceeding of the 5th international conference on intelligent systems for molecular biology. AAAI Press, pp 84–87
DNA data bank of Japan, http://www.ddbj.nig.ac.jp/
EMBL-the European molecular biology laboratory (2005) http://www.ebi.ac.uk/embl/
Etzold T, Ulyanov A and Argos P (1996). SRS: information retrieval system for molecular biology data banks. Methods Enzymol 226: 114–128
Fujibuchi W, Goto S, Migimatsu H, Uchiyama I, Ogiwara A, Akiyama Y, Kanehisa M (1998) DBGET/LinkDB: an integrated database retrieval system. In: Proceeding of the pacific symposium on biocomputing, pp 683–694, Hawaii
Gene ontology (2006) http://www.geneontology.org/
Gene ontology annotation database (2006) http://www.ebi.ac.uk/GOA
Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC (2001) DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst J 40(2): DOI: 10.1147/sj.402.0489
Hunter L (ed) (1993) Artificial intelligence and molecular biology. MIT Press
Hunter A (2002) Measuring inconsistency in knowledge via quasi-classical models. In: Proceedings of AAAI-02, pp 68–73
Hunter A (2003) Evaluating the Significance of Inconsistencies. In: Proceedings of the International Joint Conference on AI (IJCAI’03), pp 468–473
Karp PD (1995) A strategy for database interoperation. J comput Biol 2(4):59–61
Karp PD (2000). An ontology for biological function based on molecular interactions. Bioinformatics 16(3): 269–285
Karp PD, Riley M, Saier M, Paulsen IT, Paley SM and Pellegrini-Toole A (2000). The EcoCyc and MetaCyc databases. Nucleic Acids Res 30(1): 59–61
Kohler J, Philippi S and Lange M (2003). SEMEDA: ontology based semantic integration of biological databases. Bioinformatics 19(18): 2420–2427
Lin JX (1996). Integration of weighted knowledge bases. Artif Int 83(2): 363–378
Miyazaki S, Sugawara H, Gojobori T and Tateno Y (2003). DNA Data Bank of Japan (DDBJ) in XML. Nucleic Acids Res 31(1): 13–16
Oinn TM (2003). Talisman–rapid application development for the grid. Bioinformatics 19(Suppl): 212–214
Philippi S and Kohler J (2004). Using XML technology for the ontology-based semantic integration of life science databases. IEEE Trans Inf Technol Biomed 8(2): 154–160
Stevens R, Goble C, Horrocks I and Bechhofer S (2002). OILing the way to machine understandable bioinformatics resources. IEEE Trans Inf Technol Biomed 6(2): 129–134
The national center for biotechnology information (NCBI) (2005). http://www.ncbi.nlm.nih.gov/
Williams N (1997). Bioinformatics: how to get databases talking the same language. Science 275(5298): 301–302
Yeh I, Karp PD, Noy NF and Altman RB (2003). Knowledge acquisition, consistency checking and concurrency control for Gene Ontology (GO). Bioinformatics 19(2): 241–248
Zhang SC, Yang Q and Zhang CQ (2003). Data preparation for data mining. Appl Artif Intel 17: 375–382
Zhang SC, Zhang CQ and Yang Q (2004). Information enhancement for data mining. IEEE Intelligent Sys 9(2): 12–13
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Shichao Zhang and M. J. Zaki.
Rights and permissions
About this article
Cite this article
Chen, Q., Chen, YP.P. & Zhang, C. Detecting inconsistency in biological molecular databases using ontologies. Data Min Knowl Disc 15, 275–296 (2007). https://doi.org/10.1007/s10618-007-0071-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-007-0071-0