skip to main content
10.1145/3173574.3173900acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking

Published: 21 April 2018 Publication History

Abstract

Effective use of data involving personal or sensitive information often requires different people to have access to personal information, which significantly reduces the personal privacy of those whose data is stored and increases risk of identity theft, data leaks, or social engineering attacks. Our research studies the tradeoffs between privacy and utility of personal information for human decision making. Using a record-linkage scenario, this paper presents a controlled study of how varying degrees of information availability influences the ability to effectively use personal information. We compared the quality of human decision-making using a visual interface that controls the amount of personal information available using visual markup to highlight data discrepancies. With this interface, study participants who viewed only 30% of data content had decision quality similar to those who had full 100% access. The results demonstrate that it is possible to greatly limit the amount of personal information available to human decision makers without negatively affecting utility or human effectiveness. However, the findings also show there is a limit to how much data can be hidden before negatively influencing the quality of judgment in decisions involving person-level data. Despite the reduced accuracy with extreme data hiding, the study demonstrates that with proper interface designs, many correct decisions can be made with even legally de-identified data that is fully masked (74.5% accuracy with fully-masked data compared to 84.1% with full access). Thus, when legal requirements only allow for de-identified data access, use of well-designed interface can significantly improve data utility.

Supplementary Material

suppl.mov (pn3007-file3.mp4)
Supplemental video

References

[1]
Martha Bailey, Connor Cole, Morgan Henderson, and Catherine Massey. 2017. How Well Do Automated Linking Methods Perform in Historical Samples? Evidence from New Ground Truth. Technical Report.
[2]
Francis P Boscoe, Deborah Schrag, Kun Chen, Patrick J Roohan, and Maria J Schymura. 2011. Building capacity to assess cancer care in the Medicaid population in New York State. Health services research 46, 3 (2011), 805--820.
[3]
Nadia Boukhelifa, Marc-Emmanuel Perrin, Samuel Huron, and James Eagan. 2017. How Data Workers Cope with Uncertainty: A Task Characterisation Study. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3645--3656.
[4]
Cathy J Bradley, Charles W Given, Zhehui Luo, Caralee Roberts, Glenn Copeland, and Beth A Virnig. 2007. Medicaid, Medicare, and the Michigan Tumor Registry: a linkage strategy. Medical Decision Making 27, 4 (2007), 352--363.
[5]
Janet M Bronstein, Charles T Lomatsch, David Fletcher, Terri Wooten, Tsai Mei Lin, Richard Nugent, and Curtis L Lowery. 2009. Issues and biases in matching medicaid pregnancy episodes to vital records data: the Arkansas experience. Maternal and child health journal 13, 2 (2009), 250--259.
[6]
Kelly Caine and Rima Hanania. 2012. Patients want granular privacy control over health information in electronic medical records. Journal of the American Medical Informatics Association 20, 1 (2012), 7--15.
[7]
Kelly E Caine, Marita O'Brien, Sung Park, Wendy A Rogers, Arthur D Fisk, Koert Van Ittersum, Muge Capar, and Leonard J Parsons. 2006. Understanding acceptance of high technology products: 50 years of research. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 50. SAGE Publications Sage CA: Los Angeles, CA, 2148--2152.
[8]
Daphne Chang, Erin L Krupka, Eytan Adar, and Alessandro Acquisti. 2016. Engineering Information Disclosure: Norm Shaping Designs. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 587--597.
[9]
Jia-Kai Chou, Yang Wang, and Kwan-Liu Ma. 2016. Privacy preserving event sequence data visualization using a Sankey diagram-like representation. In SIGGRAPH ASIA Symposium on Visualization. ACM.
[10]
Serdar C ¸ iftc ¸i, Pavel Korshunov, Ahmet Oguz Akyuz, and Touradj Ebrahimi. 2015. Using false colors to protect visual privacy of sensitive content. In Human Vision And Electronic Imaging Xx, Vol. 9394. Spie-Int Soc Optical Engineering, 93941L.
[11]
Federal Trade Commission and others. 2008. Innovations in health care delivery. (2008).
[12]
Gordon Darroch. 2002. Semi-Automated Record Linkage with Surname Samples: a Regional Study of Case LawLinkage, Ontario 1861--1871. History and Computing 14, 1--2 (2002), 153--183.
[13]
Aritra Dasgupta, Min Chen, and Robert Kosara. 2013. Measuring Privacy and Utility in Privacy-Preserving Visualization. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 35--47.
[14]
Aritra Dasgupta and Robert Kosara. 2011. Adaptive privacy-preserving visualization using parallel coordinates. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2241--2248.
[15]
Aritra Dasgupta, Eamonn Maguire, Alfie Abdul-Rahman, and Min Chen. 2014. Opportunities and challenges for privacy-preserving visualization of electronic health record data. In Proc. of IEEE VIS 2014 Workshop on Visualization of Electronic Health Records.
[16]
Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2017. Finding similar people to guide life choices: Challenge, design, and evaluation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 5498--5544.
[17]
Stephen E Fienberg. 2005. Confidentiality and disclosure limitation. Encyclopedia of Social Measurement 1 (2005), 463--69.
[18]
Daniel J Gilman and James C Cooper. 2009. There is a Time to Keep Silent and a Time to Speak, The Hard Part is Knowing Which is Which: Striking the Balance Between Privacy Protection and the Flow of Health Care Information. (2009).
[19]
Rob Hall and Stephen E Fienberg. 2010. Privacy-Preserving Record Linkage. In Privacy in statistical databases, Vol. 6344. Springer, 269--283.
[20]
Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. 2011. Research directions in data wrangling: Visualizations and transformations for usable and credible data. Information Visualization 10, 4 (2011), 271--288.
[21]
Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 547--554.
[22]
Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, and Louis Licamele. 2008. Interactive entity resolution in relational data: A visual analytic tool and its evaluation. IEEE transactions on visualization and computer graphics 14, 5 (2008), 999--1014.
[23]
Hanna K¨ opcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment 3, 1--2 (2010), 484--493.
[24]
Hye-Chung Kum, Stanley Ahalt, and Darshana Pathak. 2013. Privacy-preserving data integration using decoupled data. In Security and Privacy in Social Networks. Springer, 225--253.
[25]
Hye-Chung Kum, Ashok Krishnamurthy, Ashwin Machanavajjhala, and Stanley C Ahalt. 2014a. Social genome: Putting big data to work for population informatics. Computer 47, 1 (2014), 56--63.
[26]
Hye-Chung Kum, Ashok Krishnamurthy, Ashwin Machanavajjhala, Michael K Reiter, and Stanley Ahalt. 2014b. Privacy preserving interactive record linkage (PPIRL). Journal of the American Medical Informatics Association 21, 2 (2014), 212--220.
[27]
Pin Luarn and Hsin-Hui Lin. 2005. Toward an understanding of the behavioral intention to use mobile banking. Computers in human behavior 21, 6 (2005), 873--891.
[28]
National Cancer Institute NIH. 2017. SEER Research Data Use Agreement -- Surveillance, Epidemiology and End Results Program. (2017).
[29]
E.C. O'Brien, A.M. Rodriguez, H.-C. Kum, L. Schanberg, S.M. O'Brien, and S. Setoguchi. 2017. Patient perspectives on the linkage of health data for clinical research: insights from a survey in the United States. Presentation abstract at the 2017 World Congress of Epidemiology. (2017).
[30]
Vaishali Patel, Penelope Hughes, Wesley Barker, and Lisa Moon. 2016. Trends in Individuals Perceptions regarding Privacy and Security of Medical Records and Exchange of Health Information: 2012--2014. Technical Report. ONC Data Brief, no.33. Office of the National Coordinator for Health Information Technology: Washington DC.
[31]
George G Robertson, Mary P Czerwinski, and John E Churchill. 2005. Visualization of mappings between schemas. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 431--439.
[32]
Hans-J¨ org Schulz, Thomas Nocke, Magnus Heitzler, and Heidrun Schumann. 2017. A systematic view on data descriptors for the visual analysis of tabular data. Information Visualization 16, 3 (2017), 232--256.
[33]
Qiaomu Shen, Tongshuang Wu, Haiyan Yang, Yanhong Wu, Huamin Qu, and Weiwei Cui. 2017. NameClarifier: a visual analytics system for author name disambiguation. IEEE transactions on visualization and computer graphics 23, 1 (2017), 141--150.
[34]
Dinusha Vatsalan, Peter Christen, and Vassilios S Verykios. 2013. A taxonomy of privacy-preserving record linkage techniques. Information Systems 38, 6 (2013), 946--969.
[35]
Joan L Warren, Carrie N Klabunde, Deborah Schrag, Peter B Bach, and Gerald F Riley. 2002. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Medical care 40, 8 (2002), IV--3.
[36]
Daniel J Weitzner, Harold Abelson, Tim Berners-Lee, Joan Feigenbaum, James Hendler, and Gerald Jay Sussman. 2008. Information accountability. Commun. ACM 51, 6 (2008), 82--87.

Cited By

View all
  • (2024)Advancing Privacy Research: A Novel Realistic Persona-Based Dataset2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)10.1109/ICETSIS61505.2024.10459555(1720-1725)Online publication date: 28-Jan-2024
  • (2024)Pattern Masking for Dictionary Matching: Theory and PracticeAlgorithmica10.1007/s00453-024-01213-886:6(1948-1978)Online publication date: 1-Jun-2024
  • (2023)Co-Speculating on Dark Scenarios and Unintended Consequences of a Ubiquitous(ly) Augmented RealityProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596073(2392-2407)Online publication date: 10-Jul-2023
  • Show More Cited By

Index Terms

  1. Balancing Privacy and Information Disclosure in Interactive Record Linkage with Visual Masking

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
      April 2018
      8489 pages
      ISBN:9781450356206
      DOI:10.1145/3173574
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 April 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      • Honorable Mention

      Author Tags

      1. human-computer interaction
      2. information privacy
      3. privacy-preserving interactive record linkage
      4. privacy-preserving visualization

      Qualifiers

      • Research-article

      Funding Sources

      • Patient Centered Outcomes Research Institute (PCORI)

      Conference

      CHI '18
      Sponsor:

      Acceptance Rates

      CHI '18 Paper Acceptance Rate 666 of 2,590 submissions, 26%;
      Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

      Upcoming Conference

      CHI 2025
      ACM CHI Conference on Human Factors in Computing Systems
      April 26 - May 1, 2025
      Yokohama , Japan

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Advancing Privacy Research: A Novel Realistic Persona-Based Dataset2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)10.1109/ICETSIS61505.2024.10459555(1720-1725)Online publication date: 28-Jan-2024
      • (2024)Pattern Masking for Dictionary Matching: Theory and PracticeAlgorithmica10.1007/s00453-024-01213-886:6(1948-1978)Online publication date: 1-Jun-2024
      • (2023)Co-Speculating on Dark Scenarios and Unintended Consequences of a Ubiquitous(ly) Augmented RealityProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596073(2392-2407)Online publication date: 10-Jul-2023
      • (2021)US Privacy Laws Go Against Public Preferences and Impede Public Health and Research: Survey StudyJournal of Medical Internet Research10.2196/2526623:7(e25266)Online publication date: 5-Jul-2021
      • (2020)Communicating With Patients About Software for Enhancing Privacy in Secondary Database Research Involving Record Linkage: Delphi StudyJournal of Medical Internet Research10.2196/2078322:12(e20783)Online publication date: 15-Dec-2020
      • (2020)CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalabilityBMC Medical Informatics and Decision Making10.1186/s12911-020-01285-w20:1Online publication date: 9-Nov-2020
      • (2020)Privacy‐Preserving Data Visualization: Reflections on the State of the Art and Research OpportunitiesComputer Graphics Forum10.1111/cgf.1403239:3(675-692)Online publication date: 18-Jul-2020
      • (2019)Enhancing privacy through an interactive on-demand incremental information disclosure interfaceProceedings of the Fifteenth USENIX Conference on Usable Privacy and Security10.5555/3361476.3361489(175-189)Online publication date: 12-Aug-2019
      • (2019)PRIMATProceedings of the VLDB Endowment10.14778/3352063.335207612:12(1826-1829)Online publication date: 1-Aug-2019
      • (2019)Patient perspectives on the linkage of health data for research: Insights from an online patient community questionnaireInternational Journal of Medical Informatics10.1016/j.ijmedinf.2019.04.003127(9-17)Online publication date: Jul-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media