Skip to main content

Plausible Deniability

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12276))

Included in the following conference series:

Abstract

From the perspective of responsible data release, simulation is a useful tool for estimating risk from adversaries with an unknown amount of identified auxiliary information. We present a simple approach to simulation of attack on sampled datasets, along with an implementation, and demonstrate how a data steward might make use of it to evaluate the privacy risk of release for data gathered about students in the University of California system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The project repository is https://codeberg.org/bavajadas.de.benadam/PrivacySim.

  2. 2.

    We use the terms ‘attacker’ and ‘intruder’ interchangeably throughout.

  3. 3.

    In communications anonymity, a similar point is made by pointing out that “usability is a security property;” that is, that increased adoption of an anonymity system increases the total set of individuals that an attacker must individuate (as well as those individuals’ diversity, but that is a separate point). See Serjantov et al. (2003), and discussion of “degree of anonymity” in Berthold et al. (2001).

  4. 4.

    The insights in this paper can also be extended to attacks that seek out attribute disclosure, without unique reidentification to a single research subject.

  5. 5.

    We could add to this model the harm that could arise from incorrect matches that are presumed by the attacker and others to be accurate, but with increased attention on the likelihood of false matches, this problem should be marginalized. (Put differently, every form of deidentification runs the risk that a careless or fraudulent intruder might claim that they have reidentified a research subject when the chance that they have actually done so is very low.)

  6. 6.

    This interpretation might be generalized further to include cases in which groups of one kind or another share low plausible deniability.

  7. 7.

    The assumptions need not be uniform, either. A data steward can assign greater likelihood of accessible auxiliary information for a data subject who is likely to be targeted, such as a Governor, or to the members of a vulnerable group who face special harms from successful attack.

  8. 8.

    Rocher et al. (2019) estimated that there was a 23% chance the match was wrong. Ironically, this study by Rocher et al. was the same study that was described in the New York Times’ article with the misleading headline “Your Data Were ‘Anonymized’? These Scientists Can Still Identify You.”

  9. 9.

    Nayek et al. (2016) assesses the problem with formal privacy measures, like “differential privacy,” concluding “… for developing practical disclosure control goals, it is essential for the agency to consider intruders with limited prior information about their target units.”. Elliot et al. note “many authors have commented that this environment is inherently difficult—if not impossible—to understand and therefore directly assessing risk is itself impossible. This in turn has led to bad decision-making about data sharing (a strange mixture of over-caution and imprudence which is driven more often than not by the personality of the decision-maker rather than by rational processes.”

  10. 10.

    More technically, sampling alone could never meet Differential Privacy standards because any microdata release that does not involve perturbation or the creation of synthetic data will violate the Differential Privacy guarantee.

  11. 11.

    We allow simulation of noise levels in both the released data and the auxiliary data, though this is not explicit in the simulation steps below. See the code repository, above at note 1.

  12. 12.

    This decision would be similar to the judgments that must be made when differentiating between quasi-identifiers and non-identifiers when implementing k-anonymity.

  13. 13.

    A slightly more sophisticated version of our methodology would include all matches, and sampling uniformly to decide which records from the released data to match with which from the auxiliary data.

  14. 14.

    The data dictionary can be found in the data directory of our repository. See above at note 1.

  15. 15.

    See https:/www.census.gov/programs-surveys/acs/data/pums.html for the PUMS data, and https://mimic.physionet.org/ for the MIMIC III data.

  16. 16.

    There were 1,620 parameter settings. For each iteration at a given setting, steps 1–9 mentioned above are performed. The full set of simulation runs is computationally intensive, so there are two implementations of the simulation code. One is designed to run serially, and is suitable for small, slow runs on a single laptop; the other is designed to run in parallel on a high-performance computing cluster (HPC). The cluster we used had some specific features, such as use of the PBS Scheduler, but minor modifications should allow the code to be used on a variety of HPC setups. See the experimental_actors branch of the repository referenced above at note 1.

References

  • Abowd, J.: Formal Privacy Methods for the 2020 Census. 2020 Census Program Memorandum Series: 2020.07 (2020)

    Google Scholar 

  • Barth-Jones, D.: The Debate Over ‘Re-identification’ of Health Information: What Do We Risk? Health Affairs (2012a)

    Google Scholar 

  • Barth-Jones, D.: The ‘Re-identification’ of Governor William Weld’s Medical Information: A Critical Re-examination of Health Data Identification Risks and Privacy Protections, Then and Now. Draft (2012b). https://fpf.org/wp-content/uploads/The-Re-identification-of-Governor-Welds-Medical-Information-Daniel-Barth-Jones.pdf

  • Berthold, O., Pfitzmann, A., Standtke, R.: The disadvantages of free MIX routes and how to overcome them. In: Federrath, H. (ed.) Designing Privacy Enhancing Technologies. LNCS, vol. 2009, pp. 30–45. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44702-4_3

    Chapter  Google Scholar 

  • Bhaskar, R., Bhowmick, A., Goyal, V., Laxman, S., Thakurta, A.: Noiseless database privacy. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 215–232. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-0_12

    Chapter  Google Scholar 

  • Christensen, G., Miguel, E.: Transparency, Reproducibility, and the Credibility of Economics Research. NBER Working Paper No. 22989 (2016)

    Google Scholar 

  • de Montjoye, Y.-A., Radaelli, L., Singh, V.K.: Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347, 536–539 (2015)

    Article  Google Scholar 

  • Domingo-Ferrer, F., Muralidhar, K.: New directions in anonymization: permutation paradigm, verifiability by subjects and intruders, transparency to users. Inf. Sci. 337, 11–24 (2016)

    Article  Google Scholar 

  • Dwork, C., Smith, A.: Differential privacy for statistics: what we know and what we want to learn. J. Priv. Confidentiality 1, 135–139 (2009)

    Google Scholar 

  • Elliot, M., Domingo-Ferrer, J.: The future of statistical disclosure control. arXiv preprint arXiv:1812.09204 (2018)

  • Federal Committee on Statistical Methodology: Statistical Policy Working Paper 22: Report on Statistical Disclosure Limitation Methodology (2nd version). Office of Management and Budget, Executive Office of the President (2005)

    Google Scholar 

  • Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (2012)

    Google Scholar 

  • Nayak, T., Zhang, C., You, J.: Measuring Identification Risk in microdata Release and Its Control by Post-Randomization. Center for Disclosure Avoidance Research, U.S. Census Bureau Research Report Series #2016-02 (2016)

    Google Scholar 

  • Ohm, P.: Broken Promises of Privacy, 57 UCLA L. Rev. 1701, 1719 (2010)

    Google Scholar 

  • Ramachandran, A., Singh, L., Porter, E., Nagle, F.: Exploring re-identification risks in public domains. In: Tenth Annual International Conference on Privacy, Security and Trust, pp. 35–42 (2012)

    Google Scholar 

  • Rocher, L., Hendrickx, J.M., De Montjoye, Y.-A.: Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10, 1–9 (2019)

    Article  Google Scholar 

  • Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 1010–1027 (2001)

    Article  Google Scholar 

  • Serjantov, A., Dingledine, R., Syverson, P.: From a trickle to a flood: active attacks on several mix types. In: Petitcolas, F.A.P. (ed.) IH 2002. LNCS, vol. 2578, pp. 36–52. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36415-3_3

    Chapter  Google Scholar 

  • Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Sidi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLS 37 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sidi, D., Bambauer, J. (2020). Plausible Deniability. In: Domingo-Ferrer, J., Muralidhar, K. (eds) Privacy in Statistical Databases. PSD 2020. Lecture Notes in Computer Science(), vol 12276. Springer, Cham. https://doi.org/10.1007/978-3-030-57521-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57521-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57520-5

  • Online ISBN: 978-3-030-57521-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics