Abstract
Relational databases are sometimes used to store biomedical and patient data in large clinical or international projects. This data is inherently deeply structured, records for individual patients contain varying number of variables. When ad-hoc access to data subsets is needed, standard database access tools do not allow for rapid command prototyping and variable selection to create flat data tables. In the context of Thalamoss, an international research project on β-thalassemia, we developed and experimented with an interactive variable selection method addressing these needs. Our newly-developed Python library sqlAutoDenorm.py automatically generates SQL commands to denormalize a subset of database tables and their relevant records, effectively generating a flat table from arbitrarily structured data. The denormalization process can be controlled by a small number of user-tunable parameters. Python and R/Bioconductor are used for any subsequent data processing steps, including visualization, and Weka is used for machine-learning above the generated data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Viennas, E., Gkantouna, V., Ioannou, M., Georgitsi, M., Rigou, M., Poulas, K., Patrinos, G., Tzimas, G.: Population-ethnic group specific genome variation allele frequency data: A querying and visualization journey. Genomics 100, 93–101 (2012)
van Baal, S., Kaimakis, P., Phommarinh, M., Koumbi, D., Cuppens, H., Riccardino, F., Macek Jr., M., Scriver, C., Patrinos, G.: Findbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide. Nucleic Acids Res. 35, D690–D695 (2007)
Mitropoulou, C., Webb, A., Mitropoulos, K., Brookes, A., Patrinos, G.: Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum. Mutat. 31(10), 1109–1116 (2010)
Smith, T., Cotton, R.: Varivis: a visualisation toolkit for variation databases. BMC Bioinformatics 9, 206 (2008)
Zaker, M., Phon-Amnuaisuk, S., Haw, S.-C.: Optimizing the data warehouse design by hierarchical denormalizing, pp. 131–138 (2008)
Singh, J., Singh, B., Sriveni, Y.: A convenient way from normalized database to denormalized database. International Journal of Computer Communication and Information System (IJCCIS) 2, 84–87 (2010)
Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM 5, 345 (1962)
Chak, D.: Enterprise Rails. O’Reilly, Beijing Farnham (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)
Guy, L., Kultima, J.R., Andersson, S.G.E.: genoplotr: comparative gene and genome visualization. Bioinformatics 26(18), 2334–2335 (2010)
Yin, T., Cook, D., Lawrence, M.: ggbio: an r package for extending the grammar of graphics for genomic data. Genome Biology 13(8), R77 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Štefanič, S., Lexa, M. (2015). A Flexible Denormalization Technique for Data Analysis above a Deeply-Structured Relational Database: Biomedical Applications. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-16483-0_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)