Big genetic data and its big data protection challenges
Introduction
The use of genetic data in research has been undergoing a fundamental shift. Researchers are no longer restricted to working with relatively small samples of individual genomes (for example DNA relating to a gene known to effect disease aetiology) but now work with various markers scattered across the entire genome. This type of data is used in various areas of research including efforts to discover new disease variants or to increase understanding of evolutionary processes. The field of bioinformatics and computational genetics has evolved inter alia to allow researchers to focus on detailed ‘high-depth’ sequencing of the entire genome of individuals allowed by advances in genome sequencing technology and computing power. These advances mean that an individual's genome can be sequenced relatively quickly and cheaply (costing less than a MRI scan in a local hospital). Powerful software has furthermore been developed to analyse such genome wide sequences (GWSs). The research potential of such techniques has been complimented by the ability to share and combine GWS data with a range of potential complimentary data sets (e.g. electronic health records). These developments have ushered in a world of ‘big data genomics’ where researchers carry out complex data mining operations on the entire genomes of individuals and groups of individuals.
Whilst these developments promise to permit great leaps forward in our understanding of the human genome and its relationship to various important issues (not least to human disease), they also pose new risks in terms of privacy related harms. These include harms not only to the individuals providing the genetic samples in question but even to those who may be related to them.1 Complying with laws relating to privacy, and in particular to data protection will therefore be a serious issue for researchers conducting research on large samples of genetic data. This article aims to illustrate a number of these issues, highlighting some of the major challenges that the data protection framework poses for researchers active in the use of big genetic data.2 It will focus on compliance with the EU’s new General Data Protection Regulation (GDPR), which comes into effect across the EU from May 2018. In doing so this paper will use several prominent examples from documented research practice in the area of computational genetics. The authors will illustrate how common practices in this area may be difficult to reconcile with the key pillars of data protection, including the need to have a valid legal ground for processing personal data, the need to respect data processing principles and the need to facilitate data protection rights. As this paper suggests, such burdens may mean that compliance with the EU’s data protection regime (including under the new General Data Protection Regulation) may not only be cumbersome but may, in many cases, be difficult even to envisage given the aims of big genetic data processing for research.
Section 2 of this paper will briefly introduce the concept of ‘big genetic data’ and discuss how researchers can use it. Sections 3 and 4 will look at how, given the nature of modern computational genetics', genetic data used in research is likely not only be to be of a personal nature, (i.e. rarely anonymous in nature) but also categorised as ‘sensitive' or ‘special' data also. Section 5 will look at how the need to respect data processing principles will present difficulties for researchers involved in computational genetics. Section 6 will look at the issue of data protection impact assessments, something that will be obligatory (and potentially onerous) for many forms of research given the sensitive (or special) nature of genetic data. Section 7 will analyse how the need to facilitate data subject rights may create major obstacles for researchers involved in the use of big genetic data. The issues surrounding the use of both consent and the scientific research exception as a legal base for processing will be discussed in Sections 8 and 9 respectfully. The requirements of each may mean that on many occasions the latter is more suitable, though as Section 9 discusses this may be something researchers (including in areas of computational genetics) have difficulty in convincing ethics committees of, presenting further problems for research in this area.
Section snippets
Big genetic data and its use in research
Genetic data originates from human tissue or other biological samples. These range from blood, saliva and urine samples taken from individuals to tissues taken from cadavers in ancient DNA studies to soil, water and rock samples in environmental DNA studies.3
It is becoming easier to link genetic data to specific individuals
Personal data is data that can likely be linked to an identifiable individual. Data that cannot be linked to an individual is not personal data and is not governed by the EU data protection framework.9 Consequently, those involved in processing such data will not have to comply with its requirements. Where possible, researchers have in the past tended to claim that genetic data was not personal data in order to avoid the need for compliance with data protection regulations. This
Personal genetic data is always sensitive data
Personal data that is sensitive in nature attracts a higher regulatory burden than non-sensitive data. The legal situation concerning genetic data is in a situation of flux. This is because the GDPR explicitly describes genetic data as ‘special' (i.e. sensitive) data.32 This was not the case with Directive 95/46/EC. It did not define what genetic data was or what legal value it had. The Article 29 Working Party opinion on genetic data33
Data processing principles cannot be consented away
The data protection principles contained within the data protection framework are of crucial importance given that, in general, they must be adhered to in all cases of processing of personal data.45 It is not possible for example for individuals to consent away the need to adhere to the data protection principles. Requirements such as accuracy, purpose
The need for an impact assessment
One of the novel requirements of the GDPR is the need perform a ‘Data Protection Impact Assessment’ (DPIA) in a number of circumstances where the proposed processing may “represent a high risk to the rights and freedoms of natural persons”.59 The GDPR does not exhaustively describe all the situations where a data protection impact assessment is required but does describe certain occasions where it shall be required, including situations that require “processing on a large
The need to facilitate data subject rights
Data subject rights allow data subjects to ensure that their data is being processed both fairly and lawfully and, in a number of situations to exercise a level of autonomy over the processing of their personal data.65
Researchers have a choice of legal base
A sine qua non for the processing of personal data is the existence of a legal basis for processing given its context and purpose. As with its predecessor, the GDPR sets out a (expanded) number of potential legal bases that can be used to justify the processing of personal data.82
An alternative to consent as legal basis
In addition to `explicit consent', another potentially relevant legal base is where such processing may be in the “public interest”.101 This provision has thus far been used by Member States in their transposition of Directive 95/46/EC (and in other legislation) to permit processing of sensitive data for a range of purposes, including for scientific research.102
The critical role of ethics bodies
Despite the clear existence of a legal ground for the processing of sensitive data for research purposes that does not require consent, regulatory authorities and ethics bodies have, in many cases, been reticent to use this option, preferring to insist that researchers obtain consent or use anonymised data.119
Conclusion
Computational genetics is undergoing a revolution. A number of developments have fuelled this revolution. Chief amongst these is the increasing ability to produce (rapidly and for low cost) GWSs. These can be mined repeatedly because of increases in computing power. The possibility to access and share various forms of potentially compatible information throughout the online-connected world have not only allowed for more research opportunities but also changed the way we view genetic data in
References (55)
Anonymisation of personal data - A missed opportunity for the European Commission
Comput Law Secur Rev
(2014)- et al.
Genetic linkage studies
Lancet North Am Ed
(2005) - et al.
Genetic data and the data protection regulation: anonymity, multiple subjects, sensitivity and a prohibitionary logic regarding genetic data?
Comput Law Security Rev
(2013) - et al.
DNA data sharing: research participants' perspectives
Genet Med
(2008) - et al.
Ethical issues in consumer genome sequencing: use of consumers' samples and data
Appl Transl Genet
(2016) - et al.
Secure and efficient multiparty computation on genomic data
Proceedings of the 20th International Database Engineering & Applications Symposium
(2016) - et al.
Does more schooling improve health outcomes and health related behaviors? Evidence from U.K. twins
Econ Educ Rev
(2013) Social, legal, and ethical implications of genetic testing
(1994)- et al.
The 1000 Genomes Project, C. A global reference for human genetic variation
Nature
(2015) - et al.
Consent forms in genomics: the difference between law and practice
Eur J Health Law
(2011)
Genealogy databases enable naming of anonymous DNA donors
Science
The future of forensic DNA analysis
Phil Trans R Soc
Deterministic identification of specific individuals from GWAS results
Bioinformatics
The social licence for research: why care data ran into trouble
J Med Ethics
The impact of the EU general data protection regulation on scientific research
Ecancermedicalscience
Frontotemporal dementia caused by CHMP2B mutation is characterised by neuronal lysosomal storage pathology
Acta Neuropathol (Berl)
The UK10K project identifies rare variants in health and disease
Nature
Privacy, data protection and law enforcement. opacety of the individual and transparency of the power
Mapping the geographical distribution of podoconiosis in Cameroon using parasitological, serological, and clinical evidence to exclude other causes of lymphedema
PLoS NeglTrop Dis
Genetic and environmental contributions to weight, height, and bmi from birth to 19 years of age: an international study of over 12,000 twin pairs
PLoS One
Data protection regulaiton and the promotion of health research: getting the balance right
Q J Med
Open consent, biobanking and data protection law: can open consent be ‘informed’ under the forthcoming data protection regulation?
Life Sci Soc Policy
Big data and data protection - issues with purpose limitation principle
Int J Adv Soft Comput Appl
European data protection: in good health?
Identifying personal genomes by surname inference
Science
Big data analytics for genomic medicine
Int J Mol Sci
Sample size and statistical power calculation in genetic association studies
Genom Inform
Cited by (23)
Towards a privacy impact assessment methodology to support the requirements of the general data protection regulation in a big data analytics context: A systematic literature review
2022, Computer Law and Security ReviewCitation Excerpt :We identified 13 established PIA methodologies in our publication sample (Table 4). Twenty articles referred to the DPIA imposed by the GDPR (i.e. (Bu-Pasha, 2020; Bisztray and Gruschka, 2019; Coles et al., 2018; Crockett et al., 2018; Custers et al., 2018; Raphaël Gellert, 2018; Drewer and Miladinova, 2017; Easton, 2017; Raphael Gellert, 2017; Gonçalves, 2017; Edwards et al., 2016; Mantelero, 2014; Notario et al., 2015; Puijenbroek and Hoepman, 2017; Quinn and Quinn, 2018; Todde et al., 2020; van Dijk et al., 2016; Wei et al., 2020; Wright and Raab, 2014; Yordanov, 2017)). The EU DPIA has likely received interest with the introduction of the GDPR as the new data protection regulation in Europe and because it mandates impact assessments for privacy-vulnerable data processing operations.
Data protection, scientific research, and the role of information
2020, Computer Law and Security ReviewChallenges in big data adoption for Malaysian organizations: a review
2024, Indonesian Journal of Electrical Engineering and Computer Science