A Flexible Denormalization Technique for Data Analysis above a Deeply-Structured Relational Database: Biomedical Applications

Štefanič, Stanislav; Lexa, Matej

doi:10.1007/978-3-319-16483-0_12

Stanislav Štefanič²⁰ &
Matej Lexa²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9043))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

2500 Accesses
1 Citations

Abstract

Relational databases are sometimes used to store biomedical and patient data in large clinical or international projects. This data is inherently deeply structured, records for individual patients contain varying number of variables. When ad-hoc access to data subsets is needed, standard database access tools do not allow for rapid command prototyping and variable selection to create flat data tables. In the context of Thalamoss, an international research project on β-thalassemia, we developed and experimented with an interactive variable selection method addressing these needs. Our newly-developed Python library sqlAutoDenorm.py automatically generates SQL commands to denormalize a subset of database tables and their relevant records, effectively generating a flat table from arbitrarily structured data. The denormalization process can be controlled by a small number of user-tunable parameters. Python and R/Bioconductor are used for any subsequent data processing steps, including visualization, and Weka is used for machine-learning above the generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Viennas, E., Gkantouna, V., Ioannou, M., Georgitsi, M., Rigou, M., Poulas, K., Patrinos, G., Tzimas, G.: Population-ethnic group specific genome variation allele frequency data: A querying and visualization journey. Genomics 100, 93–101 (2012)
Article Google Scholar
van Baal, S., Kaimakis, P., Phommarinh, M., Koumbi, D., Cuppens, H., Riccardino, F., Macek Jr., M., Scriver, C., Patrinos, G.: Findbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide. Nucleic Acids Res. 35, D690–D695 (2007)
Google Scholar
Mitropoulou, C., Webb, A., Mitropoulos, K., Brookes, A., Patrinos, G.: Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum. Mutat. 31(10), 1109–1116 (2010)
Article Google Scholar
Smith, T., Cotton, R.: Varivis: a visualisation toolkit for variation databases. BMC Bioinformatics 9, 206 (2008)
Article Google Scholar
Zaker, M., Phon-Amnuaisuk, S., Haw, S.-C.: Optimizing the data warehouse design by hierarchical denormalizing, pp. 131–138 (2008)
Google Scholar
Singh, J., Singh, B., Sriveni, Y.: A convenient way from normalized database to denormalized database. International Journal of Computer Communication and Information System (IJCCIS) 2, 84–87 (2010)
Google Scholar
Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Article MATH MathSciNet Google Scholar
Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM 5, 345 (1962)
Article Google Scholar
Chak, D.: Enterprise Rails. O’Reilly, Beijing Farnham (2009)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)
Google Scholar
Guy, L., Kultima, J.R., Andersson, S.G.E.: genoplotr: comparative gene and genome visualization. Bioinformatics 26(18), 2334–2335 (2010)
Article Google Scholar
Yin, T., Cook, D., Lawrence, M.: ggbio: an r package for extending the grammar of graphics for genomic data. Genome Biology 13(8), R77 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Stanislav Štefanič & Matej Lexa

Authors

Stanislav Štefanič
View author publications
You can also search for this author in PubMed Google Scholar
Matej Lexa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dpto. de Arquitectura y Tecnología de Computadores (ATC)., E.T.S. de Ingenierías en Informática y Telecomunicación. CITIC-UGR, Universidad de Granada, c/ Periodista Daniel Saucedo Aranda s/n, 18071, Granada, Spain
Francisco Ortuño
E.T.S. Ingenierías Informática y de Telecomunicación , , Dpto. Arquitectura y Tecnología de Computadores, CITIC-UGR, Universidad de Granada, C Periodista Rafael Gómez Montero, 18071, Granada, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Štefanič, S., Lexa, M. (2015). A Flexible Denormalization Technique for Data Analysis above a Deeply-Structured Relational Database: Biomedical Applications. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-16483-0_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics