Abstract
Linking biobank data, such as molecular profiles, with clinical phenotypes is of great importance in epidemiological and predictive studies. A comprehensive overview of various data sources that can be combined in order to power up a study is a key factor in the design. Clinical data stored in health registries and biobank data in research projects are commonly provisioned in different database systems and governed by separate organizations, making the integration process challenging and hampering biomedical investigations. We here describe the integration of data on prostate cancer from a clinical health registry with data from a biobank, and its provisioning in the SAIL availability system. We demonstrate the implications of using the actual raw data, data transformed to availability data, and availability data which has been subjected to anonymization techniques to reduce the risk of re-identification. Our results show that an availability system such as SAIL with integrated clinical and biobank data can be a valuable tool for planning new studies and finding interesting subsets to investigate further. We also show that an availability system can deliver useful insights even when the data has been subjected to anonymization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Solomon, D.J., Henry, R.C., Hogan, J.G., Van Amburg, G.H., Taylor, J.: Evaluation and implementation of public health registries. Public Health Rep. 106(2), 142–150 (1991)
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)
Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)
Kaiser, J.: Swedish bioscience. working sweden’s population gold mine. Science 293(5539), 2375 (2001)
Fortier, I., Doiron, D., Little, J., Ferretti, V., L’Heureux, F., Stolk, R.P., Knoppers, B.M., Hudson, T.J., Burton, P.R.: Is rigorous retrospective harmonization possible? application of the datashaper approach across 53 large studies. Int. J. Epidemiol. 40(5), 1314–1328 (2011)
Reiter, J.P., Kinney, S.K.: Sharing confidential data for research purposes: a primer. Epidemiology 22(5), 632–635 (2011)
Harris, J.R., Burton, P., Knoppers, B.M., Lindpaintner, K., Bledsoe, M., Brookes, A.J., Budin-Ljøsne, I., Chisholm, R., Cox, D., Deschênes, M., Fortier, I., Hainaut, P., Hewitt, R., Kaye, J., Litton, J.E., Metspalu, A., Ollier, B., Palmer, L.J., Palotie, A., Pasterk, M., Perola, M., Riegman, P.H.J., van Ommen, G.J., Yuille, M., Zatloukal, K.: Toward a roadmap in global biobanking for health. Eur. J. Hum. Genet. 20(11), 1105–1111 (2012)
Dankar, F.K., El Emam, K., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med. Inform. Decis. Mak. 12, 66 (2012)
Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15, 627–637 (2008)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report (1998)
Avillach, P., Coloma, P.M., Gini, R., Schuemie, M., Mougin, F., Dufour, J.C., Mazzaglia, G., Giaquinto, C., Fornari, C., Herings, R., Molokhia, M., Pedersen, L., Fourrier-Réglat, A., Fieschi, M., Sturkenboom, M., van der Lei, J., Pariente, A., Trifirò, G.: EU-ADR consortium: Harmonization process for the identification of medical events in eight european healthcare databases: the experience from the eu-adr project. J. Am. Med. Inform. Assoc. 20(1), 184–192 (2013)
Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., Fortier, I., Knoppers, B.M., Burton, P.R.: Datashield: resolving a conflict in contemporary bioscience–performing a pooled analysis of individual-level data without sharing the data. Int. J. Epidemiol. 39(5), 1372–1382 (2010)
Gostev, M., Fernandez-Banet, J., Rung, J., Dietrich, J., Prokopenko, I., Ripatti, S., McCarthy, M.I., Brazma, A., Krestyaninova, M.: Sail–a software system for sample and phenotype availability across biobanks and cohorts. Bioinformatics 27(4), 589–591 (2011)
ENGAGE Consortium: Data sharing in large research consortia: experiences and recommendations from engage. Eur. J. Hum. Genet. 22(3), 317–321 (2014)
Kuriyama, M., Wang, M.C., Papsidero, L.D., Killian, C.S., Shimano, T., Valenzuela, L., Nishiura, T., Murphy, G.P., Chu, T.M.: Quantitation of prostate-specific antigen in serum by a sensitive enzyme immunoassay. Cancer Research 40(12), 4658–4662 (1980)
Milette, F., Larivière, L., Piché, J.: Gleason grading of prostatic biopsies. Am. J. Surg. Pathol. 24(10),1443–1444 (2000)
NCI: Cancer staging, http://www.cancer.gov/cancertopics/factsheet/detection/
SIMBIOMS: Sail user guide, http://www.simbioms.org/wordpress/wp-content/uploads/2013/08/SAIL_documentation.pdf
Templ, M.: scdMicro: A package for statistical disclosure control in R. ISI (2007)
Swedish Cancer Centre: Variable description for the prostate cancer quality regsitry, http://www.cancercentrum.se/Global/Diagnoser/prostatacancer/Prostata_variabelbeskr_130101.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Spjuth, O., Heikkinen, J., Litton, JE., Palmgren, J., Krestyaninova, M. (2014). Data Integration between Swedish National Clinical Health Registries and Biobanks Using an Availability System. In: Galhardas, H., Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2014. Lecture Notes in Computer Science(), vol 8574. Springer, Cham. https://doi.org/10.1007/978-3-319-08590-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-08590-6_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08589-0
Online ISBN: 978-3-319-08590-6
eBook Packages: Computer ScienceComputer Science (R0)