Systematic Literature Review on the Anonymization of High Dimensional Streaming Datasets for Health Data Sharing

https://doi.org/10.1016/j.procs.2015.08.353Get rights and content
Under a Creative Commons license
open access

Abstract

One of the biggest challenges to health data sharingis regulations that prohibit the transmission and distribution of Personal Health Information (PHI) even among collaborating organizations. This impedes research and reduces the utility of these datasets. Anonymization can address this issue by hidingPHI while maintaining the analytical utility of the data. Much research has focused on data that is static, independent and complete. Unfortunately, this is not typical of health data. Instead of static, independent tables, health data is in relational databases with multiple high-dimensional tables that are transactional and constantly changing. Data recipients usually receive multiple versions of the database over time. This study reviews literature on anonymization methodologies for large and fast changing high-dimensional datasets, especially health data. Relevant papers are analyzed, categorized and compared in terms of scope, and contributions. Finally, we used the extracted details from our analysis to outline possible research direction for developing a realistic anonymization framework for health data sharing.

Keywords

privacy
personal health information
high-dimensional datasets
streaming databases
data sharing

Cited by (0)

Peer-review under responsibility of the Program Chairs.