Skip to main content
Log in

Detecting inconsistency in biological molecular databases using ontologies

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The rapid growth of life science databases demands the fusion of knowledge from heterogeneous databases to answer complex biological questions. The discrepancies in nomenclature, various schemas and incompatible formats of biological databases, however, result in a significant lack of interoperability among databases. Therefore, data preparation is a key prerequisite for biological database mining. Integrating diverse biological molecular databases is an essential action to cope with the heterogeneity of biological databases and guarantee efficient data mining. However, the inconsistency in biological databases is a key issue for data integration. This paper proposes a framework to detect the inconsistency in biological databases using ontologies. A numeric estimate is provided to measure the inconsistency and identify those biological databases that are appropriate for further mining applications. This aids in enhancing the quality of databases and guaranteeing accurate and efficient mining of biological databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • AmiGO browser, (2005) http://www.godatabase.org/dev/

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM and Sherlock G (2000). The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nat Genet 25(1): 25–29

    Article  Google Scholar 

  • Baker PG, Goble CA, Bechhofer S, Paton NW, Stevens R and Brass A (1999). An ontology for bioinformatics applications. Bioinformatics 15(6): 510–520

    Article  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J and Wheeler DL (2004). GenBank update. Nucleic Acids Res 32(Database issue): 23–26

    Article  Google Scholar 

  • Chen Y-PP (ed) (2005) Bioinformatics technologies. Springer.

  • Chen Y-PP, Colomb BM (2003) Database technologies for L-system simulations in virtual plant applications on bioinformatics. Knowledge Inform Syst 5(3):288–314, Springer-Verlag.

    Google Scholar 

  • Chen RO, Felciano R, Altman RB (1997) RiboWeb: Linking structural computations to a knowledge base of published experimental data. In: Proceeding of the 5th international conference on intelligent systems for molecular biology. AAAI Press, pp 84–87

  • DNA data bank of Japan, http://www.ddbj.nig.ac.jp/

  • EMBL-the European molecular biology laboratory (2005) http://www.ebi.ac.uk/embl/

  • Etzold T, Ulyanov A and Argos P (1996). SRS: information retrieval system for molecular biology data banks. Methods Enzymol 226: 114–128

    Article  Google Scholar 

  • Fujibuchi W, Goto S, Migimatsu H, Uchiyama I, Ogiwara A, Akiyama Y, Kanehisa M (1998) DBGET/LinkDB: an integrated database retrieval system. In: Proceeding of the pacific symposium on biocomputing, pp 683–694, Hawaii

  • Gene ontology (2006) http://www.geneontology.org/

  • Gene ontology annotation database (2006) http://www.ebi.ac.uk/GOA

  • Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC (2001) DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst J 40(2): DOI: 10.1147/sj.402.0489

  • Hunter L (ed) (1993) Artificial intelligence and molecular biology. MIT Press

  • Hunter A (2002) Measuring inconsistency in knowledge via quasi-classical models. In: Proceedings of AAAI-02, pp 68–73

  • Hunter A (2003) Evaluating the Significance of Inconsistencies. In: Proceedings of the International Joint Conference on AI (IJCAI’03), pp 468–473

  • Karp PD (1995) A strategy for database interoperation. J comput Biol 2(4):59–61

    Article  Google Scholar 

  • Karp PD (2000). An ontology for biological function based on molecular interactions. Bioinformatics 16(3): 269–285

    Article  Google Scholar 

  • Karp PD, Riley M, Saier M, Paulsen IT, Paley SM and Pellegrini-Toole A (2000). The EcoCyc and MetaCyc databases. Nucleic Acids Res 30(1): 59–61

    Article  Google Scholar 

  • Kohler J, Philippi S and Lange M (2003). SEMEDA: ontology based semantic integration of biological databases. Bioinformatics 19(18): 2420–2427

    Article  Google Scholar 

  • Lin JX (1996). Integration of weighted knowledge bases. Artif Int 83(2): 363–378

    Article  Google Scholar 

  • Miyazaki S, Sugawara H, Gojobori T and Tateno Y (2003). DNA Data Bank of Japan (DDBJ) in XML. Nucleic Acids Res 31(1): 13–16

    Article  Google Scholar 

  • Oinn TM (2003). Talisman–rapid application development for the grid. Bioinformatics 19(Suppl): 212–214

    Article  Google Scholar 

  • Philippi S and Kohler J (2004). Using XML technology for the ontology-based semantic integration of life science databases. IEEE Trans Inf Technol Biomed 8(2): 154–160

    Article  Google Scholar 

  • Stevens R, Goble C, Horrocks I and Bechhofer S (2002). OILing the way to machine understandable bioinformatics resources. IEEE Trans Inf Technol Biomed 6(2): 129–134

    Article  Google Scholar 

  • The national center for biotechnology information (NCBI) (2005). http://www.ncbi.nlm.nih.gov/

  • Williams N (1997). Bioinformatics: how to get databases talking the same language. Science 275(5298): 301–302

    Article  Google Scholar 

  • Yeh I, Karp PD, Noy NF and Altman RB (2003). Knowledge acquisition, consistency checking and concurrency control for Gene Ontology (GO). Bioinformatics 19(2): 241–248

    Article  Google Scholar 

  • Zhang SC, Yang Q and Zhang CQ (2003). Data preparation for data mining. Appl Artif Intel 17: 375–382

    Article  Google Scholar 

  • Zhang SC, Zhang CQ and Yang Q (2004). Information enhancement for data mining. IEEE Intelligent Sys 9(2): 12–13

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingfeng Chen.

Additional information

Responsible editors: Shichao Zhang and M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Q., Chen, YP.P. & Zhang, C. Detecting inconsistency in biological molecular databases using ontologies. Data Min Knowl Disc 15, 275–296 (2007). https://doi.org/10.1007/s10618-007-0071-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-007-0071-0

Keywords

Navigation