Skip to main content

A Flexible Denormalization Technique for Data Analysis above a Deeply-Structured Relational Database: Biomedical Applications

  • Conference paper
Bioinformatics and Biomedical Engineering (IWBBIO 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9043))

Included in the following conference series:

Abstract

Relational databases are sometimes used to store biomedical and patient data in large clinical or international projects. This data is inherently deeply structured, records for individual patients contain varying number of variables. When ad-hoc access to data subsets is needed, standard database access tools do not allow for rapid command prototyping and variable selection to create flat data tables. In the context of Thalamoss, an international research project on β-thalassemia, we developed and experimented with an interactive variable selection method addressing these needs. Our newly-developed Python library sqlAutoDenorm.py automatically generates SQL commands to denormalize a subset of database tables and their relevant records, effectively generating a flat table from arbitrarily structured data. The denormalization process can be controlled by a small number of user-tunable parameters. Python and R/Bioconductor are used for any subsequent data processing steps, including visualization, and Weka is used for machine-learning above the generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Viennas, E., Gkantouna, V., Ioannou, M., Georgitsi, M., Rigou, M., Poulas, K., Patrinos, G., Tzimas, G.: Population-ethnic group specific genome variation allele frequency data: A querying and visualization journey. Genomics 100, 93–101 (2012)

    Article  Google Scholar 

  2. van Baal, S., Kaimakis, P., Phommarinh, M., Koumbi, D., Cuppens, H., Riccardino, F., Macek Jr., M., Scriver, C., Patrinos, G.: Findbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide. Nucleic Acids Res. 35, D690–D695 (2007)

    Google Scholar 

  3. Mitropoulou, C., Webb, A., Mitropoulos, K., Brookes, A., Patrinos, G.: Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum. Mutat. 31(10), 1109–1116 (2010)

    Article  Google Scholar 

  4. Smith, T., Cotton, R.: Varivis: a visualisation toolkit for variation databases. BMC Bioinformatics 9, 206 (2008)

    Article  Google Scholar 

  5. Zaker, M., Phon-Amnuaisuk, S., Haw, S.-C.: Optimizing the data warehouse design by hierarchical denormalizing, pp. 131–138 (2008)

    Google Scholar 

  6. Singh, J., Singh, B., Sriveni, Y.: A convenient way from normalized database to denormalized database. International Journal of Computer Communication and Information System (IJCCIS) 2, 84–87 (2010)

    Google Scholar 

  7. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)

    Article  MATH  MathSciNet  Google Scholar 

  8. Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM 5, 345 (1962)

    Article  Google Scholar 

  9. Chak, D.: Enterprise Rails. O’Reilly, Beijing Farnham (2009)

    Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  11. Guy, L., Kultima, J.R., Andersson, S.G.E.: genoplotr: comparative gene and genome visualization. Bioinformatics 26(18), 2334–2335 (2010)

    Article  Google Scholar 

  12. Yin, T., Cook, D., Lawrence, M.: ggbio: an r package for extending the grammar of graphics for genomic data. Genome Biology 13(8), R77 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Štefanič, S., Lexa, M. (2015). A Flexible Denormalization Technique for Data Analysis above a Deeply-Structured Relational Database: Biomedical Applications. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16483-0_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16482-3

  • Online ISBN: 978-3-319-16483-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics