Skip to main content

Measuring Data Completeness for Microbial Genomics Database

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7802))

Included in the following conference series:

Abstract

Poor quality data such as data with missing values (or records) cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this paper, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to support PBC measurements in practice. This paper explores the need of PBC in the microbial genomics where real sample data sets retrieved from a microbial database called Comprehensive Microbial Resources are used (CMR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45, 211–218 (2002)

    Article  Google Scholar 

  2. Iles, M.M.: What can genome-wide association studies tell us about the genetics of common disease. PLOS Genetics 4, 1–8 (2008)

    Article  Google Scholar 

  3. Tiffin, N., Andrade-Navarro, M.A., Perez-Iratxeta, C.: Linking genes to diseases: it’s all in the data. Genome Medicine 1, 1–7 (2009)

    Article  Google Scholar 

  4. Codd, E.F.: Extending the database relational model to capture more meaning. ACM Transactions on Database Systems (TODS) 4 (1979)

    Google Scholar 

  5. Reich, D.E., Gabriel, S., Atshuler, D.: Quality and completeness of SNP databases. Nature Genetics 33, 457–458 (2003)

    Article  Google Scholar 

  6. Zaniolo, C.: Database relations with null values. Journal of Computer and System Sciences 28, 142–166 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  7. Codd, E.F.: Understanding relations (installment #7). Bulletin of ACM SIGMOD 7, 23–28 (1975)

    Google Scholar 

  8. Imieliński, T., Lipski, J.: Incomplete information in relational databases. Journal of the ACM 31, 761–791 (1984)

    Article  MATH  Google Scholar 

  9. Fox, C., Levitin, A., Redman, T.: The notion of data and its quality dimensions. Information Processing and Management 30, 9–19 (1994)

    Article  Google Scholar 

  10. Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems 14, 480–502 (1989)

    Article  Google Scholar 

  11. Motro, A., Rakov, I.: Estimating the Quality of Databases. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds.) FQAS 1998. LNCS (LNAI), vol. 1495, pp. 298–307. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Sampaio, S.F.M., Sampaio, P.R.F.: Incorporating completeness quality support in internet query systems. In: CAiSE Forum. CEUR-WS.org, pp. 17–20 (2007)

    Google Scholar 

  13. Scannapieco, M., Batini, C.: Completeness in the relational model: a comprehensive framework. In: Ninth International Conference on Information Quality (IQ), pp. 333–345. MIT (2004)

    Google Scholar 

  14. Knudson, A.: Mutation and cancer: statistical study of retinoblastoma. Proceedings of the National Academy of Sciences of the United States of America 68, 820–823 (1971)

    Article  Google Scholar 

  15. Hashimoto, C.: Population census of the chimpanzees in the Kalinzu forest, Uganda: Comparison between methods with nest counts. Primates 36, 477–488 (2006)

    Article  Google Scholar 

  16. Liang, Z., Ma, Z.: China’s floating population: new evidence from the 2000 census. Population and Development Review 30, 467–488 (2004)

    Article  Google Scholar 

  17. Bird, A., Tobin, E.: Natural kinds. In: The Stanford Encyclopedia of Philosophy (summer 2010)

    Google Scholar 

  18. Science Daily: Human gene count tumbles again (2008), http://www.sciencedaily.com/releases/2008/01/080113161406.htm (accessed June 27, 2011)

  19. Maddux, R.: The origin of relation algebras in the development and axiomatization of the calculus of relations. Studia Logica 50, 421–455 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  20. Falkow, S.: Who speaks for the microbes? Emerging Infectious Disease 4, 495–497 (1998)

    Article  Google Scholar 

  21. Fraser, C.M., Eisen, J.A., Salzberg, S.L.: Consanguinity and susceptibility to infectious diseases in humans. Nature 406, 799–803 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Emran, N.A., Embury, S., Missier, P., Isa, M.N.M., Muda, A.K. (2013). Measuring Data Completeness for Microbial Genomics Database. In: Selamat, A., Nguyen, N.T., Haron, H. (eds) Intelligent Information and Database Systems. ACIIDS 2013. Lecture Notes in Computer Science(), vol 7802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36546-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36546-1_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36545-4

  • Online ISBN: 978-3-642-36546-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics