Skip to main content

Data Integration between Swedish National Clinical Health Registries and Biobanks Using an Availability System

  • Conference paper
Data Integration in the Life Sciences (DILS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8574))

Included in the following conference series:

  • 544 Accesses

Abstract

Linking biobank data, such as molecular profiles, with clinical phenotypes is of great importance in epidemiological and predictive studies. A comprehensive overview of various data sources that can be combined in order to power up a study is a key factor in the design. Clinical data stored in health registries and biobank data in research projects are commonly provisioned in different database systems and governed by separate organizations, making the integration process challenging and hampering biomedical investigations. We here describe the integration of data on prostate cancer from a clinical health registry with data from a biobank, and its provisioning in the SAIL availability system. We demonstrate the implications of using the actual raw data, data transformed to availability data, and availability data which has been subjected to anonymization techniques to reduce the risk of re-identification. Our results show that an availability system such as SAIL with integrated clinical and biobank data can be a valuable tool for planning new studies and finding interesting subsets to investigate further. We also show that an availability system can deliver useful insights even when the data has been subjected to anonymization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Solomon, D.J., Henry, R.C., Hogan, J.G., Van Amburg, G.H., Taylor, J.: Evaluation and implementation of public health registries. Public Health Rep. 106(2), 142–150 (1991)

    Google Scholar 

  2. McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)

    Article  Google Scholar 

  3. Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)

    Article  Google Scholar 

  4. Kaiser, J.: Swedish bioscience. working sweden’s population gold mine. Science 293(5539), 2375 (2001)

    Article  Google Scholar 

  5. Fortier, I., Doiron, D., Little, J., Ferretti, V., L’Heureux, F., Stolk, R.P., Knoppers, B.M., Hudson, T.J., Burton, P.R.: Is rigorous retrospective harmonization possible? application of the datashaper approach across 53 large studies. Int. J. Epidemiol. 40(5), 1314–1328 (2011)

    Article  Google Scholar 

  6. Reiter, J.P., Kinney, S.K.: Sharing confidential data for research purposes: a primer. Epidemiology 22(5), 632–635 (2011)

    Article  Google Scholar 

  7. Harris, J.R., Burton, P., Knoppers, B.M., Lindpaintner, K., Bledsoe, M., Brookes, A.J., Budin-Ljøsne, I., Chisholm, R., Cox, D., Deschênes, M., Fortier, I., Hainaut, P., Hewitt, R., Kaye, J., Litton, J.E., Metspalu, A., Ollier, B., Palmer, L.J., Palotie, A., Pasterk, M., Perola, M., Riegman, P.H.J., van Ommen, G.J., Yuille, M., Zatloukal, K.: Toward a roadmap in global biobanking for health. Eur. J. Hum. Genet. 20(11), 1105–1111 (2012)

    Article  Google Scholar 

  8. Dankar, F.K., El Emam, K., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med. Inform. Decis. Mak. 12, 66 (2012)

    Article  Google Scholar 

  9. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)

    Google Scholar 

  10. Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)

    Article  Google Scholar 

  11. El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15, 627–637 (2008)

    Article  Google Scholar 

  12. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report (1998)

    Google Scholar 

  13. Avillach, P., Coloma, P.M., Gini, R., Schuemie, M., Mougin, F., Dufour, J.C., Mazzaglia, G., Giaquinto, C., Fornari, C., Herings, R., Molokhia, M., Pedersen, L., Fourrier-Réglat, A., Fieschi, M., Sturkenboom, M., van der Lei, J., Pariente, A., Trifirò, G.: EU-ADR consortium: Harmonization process for the identification of medical events in eight european healthcare databases: the experience from the eu-adr project. J. Am. Med. Inform. Assoc. 20(1), 184–192 (2013)

    Article  Google Scholar 

  14. Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., Fortier, I., Knoppers, B.M., Burton, P.R.: Datashield: resolving a conflict in contemporary bioscience–performing a pooled analysis of individual-level data without sharing the data. Int. J. Epidemiol. 39(5), 1372–1382 (2010)

    Article  Google Scholar 

  15. Gostev, M., Fernandez-Banet, J., Rung, J., Dietrich, J., Prokopenko, I., Ripatti, S., McCarthy, M.I., Brazma, A., Krestyaninova, M.: Sail–a software system for sample and phenotype availability across biobanks and cohorts. Bioinformatics 27(4), 589–591 (2011)

    Article  Google Scholar 

  16. ENGAGE Consortium: Data sharing in large research consortia: experiences and recommendations from engage. Eur. J. Hum. Genet. 22(3), 317–321 (2014)

    Google Scholar 

  17. Kuriyama, M., Wang, M.C., Papsidero, L.D., Killian, C.S., Shimano, T., Valenzuela, L., Nishiura, T., Murphy, G.P., Chu, T.M.: Quantitation of prostate-specific antigen in serum by a sensitive enzyme immunoassay. Cancer Research 40(12), 4658–4662 (1980)

    Google Scholar 

  18. Milette, F., Larivière, L., Piché, J.: Gleason grading of prostatic biopsies. Am. J. Surg. Pathol. 24(10),1443–1444 (2000)

    Google Scholar 

  19. NCI: Cancer staging, http://www.cancer.gov/cancertopics/factsheet/detection/

  20. SIMBIOMS: Sail user guide, http://www.simbioms.org/wordpress/wp-content/uploads/2013/08/SAIL_documentation.pdf

  21. Templ, M.: scdMicro: A package for statistical disclosure control in R. ISI (2007)

    Google Scholar 

  22. Swedish Cancer Centre: Variable description for the prostate cancer quality regsitry, http://www.cancercentrum.se/Global/Diagnoser/prostatacancer/Prostata_variabelbeskr_130101.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Spjuth, O., Heikkinen, J., Litton, JE., Palmgren, J., Krestyaninova, M. (2014). Data Integration between Swedish National Clinical Health Registries and Biobanks Using an Availability System. In: Galhardas, H., Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2014. Lecture Notes in Computer Science(), vol 8574. Springer, Cham. https://doi.org/10.1007/978-3-319-08590-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08590-6_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08589-0

  • Online ISBN: 978-3-319-08590-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics