Skip to main content

Abstract

Medical research requires biological material and data of documented trustworthy quality for delivering relevant and reproducible results. The management of the quality of biological samples for medical research received high attention in recent years resulting in well-documented and audited standard operating procedures and standards for the documentation of various quality characteristics. We need similar efforts to establish systems, policies, and procedures for assuring well-documented quality characteristics of data and metadata. We review the typical data and metadata characteristics and point to precise definitions of these properties. We present and discuss the requirements for managing these qualities and propose a process and the necessary activities for biobanks to establish such a holistic system for data quality management. The complex nature of biobanks as data producers, data providers, data mediators, and data repositories dealing with data from various sources and the highly sensitive nature of personal health data makes them a most interesting use case for data quality management, supporting both known and unknown future demands.

This work has been supported by the Austrian Bundesministerium für Bildung, Wissenschaft und Forschung within the project BBMRI.AT (GZ 10.470/0010-V/3c/2018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. ASQ Quality Glossary. https://asq.org/quality-resources/quality-glossary/d

  2. Guidance on a data quality framework for health and social care. Health Information and Quality Authority, Dublin (2018)

    Google Scholar 

  3. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. (CSUR) 41(3), 1–52 (2009)

    Article  Google Scholar 

  4. Batini, C., Pernici, B.: Data quality management and evolution of information systems. In: IFIP World Computer Congress, TC 8, pp. 51–62. Springer (2006). https://doi.org/10.1007/978-0-387-34732-5_5

  5. Betsou, F.: Quality assurance and quality control in biobanking. In: Biobanking of Human Biospecimens, pp. 23–49. Springer (2017). https://doi.org/10.1007/978-3-319-55120-3_2

  6. Dollé, L., Bekaert, S.: High-quality biobanks: pivotal assets for reproducibility of OMICS-data in biomedical translational research. Proteomics 19(21–22), 1800485 (2019)

    Google Scholar 

  7. Doucet, M., et al.: Quality matters: 2016 annual conference of the national infrastructures for biobanking. Biopreserv. Biobank. 15(3), 270–276 (2017)

    Google Scholar 

  8. Dravis, F.: Data quality strategy: a step-by-step approach. In: ICIQ (2004)

    Google Scholar 

  9. Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems I, pp. 156–190. Springer (2009). https://doi.org/10.1007/978-3-642-03722-1_7

  10. Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genom. 15(5), 254–262 (2012)

    Article  Google Scholar 

  11. Eder, J., Koncilia, C.: Modelling changes in ontologies. In: OTM International Conference On the Move to Meaningful Internet Systems, pp. 662–673. Springer (2004). https://doi.org/10.1007/978-3-540-30470-8_77

  12. Eder, J., Lehmann, M., Tahamtan, A.: Choreographies as federations of choreographies and orchestrations. In: International Conference on Conceptual Modeling, pp. 183–192. Springer (2006). https://doi.org/10.1007/11908883_22

  13. Eder, J., Shekhovtsov, V.A.: Data Quality for Medical Data Lakelands. In: International Conference on Future Data and Security Engineering, pp. 28–43. Springer (2020). https://doi.org/10.1007/978-3-030-63924-2_2

  14. Eder, J., Shekhovtsov, V.A.: Data quality for federated medical data lakes. Int. J. Web Inf. Syst. 17(5), 407–426 (2021). https://doi.org/10.1108/IJWIS-03-2021-0026

    Article  Google Scholar 

  15. Gassman, J.J., Owen, W.W., Kuntz, T.E., Martin, J.P., Amoroso, W.P.: Data quality assurance, monitoring, and reporting. Controll. Clin. Trials 16(2), 104–136 (1995)

    Article  Google Scholar 

  16. Holub, P., Wittner, R., et al.: Towards a Common Standard for Data and Specimen Provenance in Life Sciences, July 2021. https://doi.org/10.5281/zenodo.5093125, preprint

  17. Karimi-Busheri, F., Rasouli-Nia, A.: Integration, networking, and global biobanking in the age of new biology. In: Biobanking in the 21st Century. Springer (2015). https://doi.org/10.1007/978-3-319-20579-3_1

  18. Kerr, K., Norris, T.: The development of a healthcare data quality framework and strategy. In: ICIQ, pp. 218–233 (2004)

    Google Scholar 

  19. Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018)

    Google Scholar 

  20. Lehmann, S., et al.: Standard preanalytical coding for Biospecimens: review and implementation of the sample PREanalytical code (SPREC). Biopreserv. Biobank. 10(4), 366–374 (2012)

    Google Scholar 

  21. Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inf. Sci. Technol. 63(4), 724–737 (2012)

    Article  Google Scholar 

  22. Moore, H.M., Kelly, A.B., Jewell, S.D., et al.: Biospecimen reporting for improved study quality (BRISQ). J. Proteome Res. 10(8), 3429–3438 (2011)

    Article  Google Scholar 

  23. Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Current Opin. Biotechnol. 65, 45–51 (2020)

    Article  Google Scholar 

  24. Zozus, M.N., Kahn, M.G., Weiskopf, N.G.: Data quality in clinical research. In: Clinical Research Informatics, 2nd Ed., pp. 213–248. Springer (2019)

    Google Scholar 

  25. Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, Burlington (2003)

    Google Scholar 

  26. Quinlan, P.R., Gardner, S., Groves, M., Emes, R., Garibaldi, J.: A data-centric strategy for modern biobanking. In: Biobanking in the 21st Century, pp. 165–169. Springer (2015). https://doi.org/10.1007/978-3-319-20579-3_13

  27. Radulovic, F., Mihindukulasooriya, N., García-Castro, R., Gómez-Pérez, A.: A comprehensive quality model for Linked Data. Semant. Web 9(1), 3–24 (2018)

    Article  Google Scholar 

  28. Ranasinghe, S., Pichler, H., Eder, J.: Report on data quality in biobanks: problems, issues, state-of-the-art. arXiv:1812.10423 (2018)

  29. Riley, J.: Understanding metadata. Washington DC, United States: National Information Standards Organization 23 (2017)

    Google Scholar 

  30. Shekhovtsov, V.A., Eder, J.: Data item quality for biobanks. Trans. Large-Scale Data Knowl.-Centered Syst. L, 77–115 (2021). https://doi.org/10.1007/978-3-662-64553-6_5

  31. Shekhovtsov, V.A., Eder, J.: Metadata quality for biobanks. Appl. Sci. 12(19), 9578 (2022). https://doi.org/10.3390/app12199578

  32. Slone, J.P.: Information quality strategy: an empirical investigation of the relationship between information quality improvements and organizational outcomes. Ph.D. thesis, Capella University (2006)

    Google Scholar 

  33. Stark, K., Eder, J., Zatloukal, K.: Priority-based k-anonymity accomplished by weighted generalisation structures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 394–404. Springer (2006). https://doi.org/10.1007/11823728_38

  34. Stark, K., Koncilia, C., Schulte, J., Schikuta, E., Eder, J.: Incorporating data provenance in a medical CSCW system. In: International Conference on Database and Expert Systems Applications, pp. 315–322. Springer (2010)

    Google Scholar 

  35. Wittner, R., et al.: Lightweight distributed provenance model for complex real-world environments. Scient. Data 9(1), 1–19 (2022)

    Google Scholar 

  36. Woollen, S.W.: Data Quality and the Origin of ALCOA. Newsletter of the Southern Regional Chapter Society of Quality Assurance, Summer (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eder, J., Shekhovtsov, V.A. (2022). Managing the Quality of Data and Metadata for Biobanks. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8069-5_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8068-8

  • Online ISBN: 978-981-19-8069-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics