Skip to main content

A Quantitative Study of Attribute Based Correlation in Micro-databases and Its Effects on Privacy

  • Conference paper
  • First Online:
  • 1084 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11547))

Abstract

Preserving the privacy associated with publicly released micro-databases is an active area of research since an adversary can mine sensitive information about the database respondents from them. The work in this paper establishes a working model for quantitatively estimating the attribute based correlation present among multiple micro-databases. In this study, we have introduced an information-theoretic metric termed as Correlation Degree \((\rho )\) which estimates the amount of correlated information present among two micro-databases and accordingly assigns a cumulative score in the range [0, 1]. The design of our proposed metric is based on the fact that correlation among multiple datasets exists due to the presence of both overlapping and implicitly dependent attributes. We have also established a functional association between \(\rho \) and the general notion of privacy during the execution of an adversarial linking attack. Finally, we have empirically validated our work by estimating the value of \(\rho \) and the resulting privacy loss for the Adult micro-database on the backdrop of two well-established privacy preservation models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Blake, C., Merz, C.: UCI repository of machine learning databases (1998). http://archive.ics.uci.edu/ml/datasets/Adult

  2. Datta, A., Sharma, D., Sinha, A.: Provable de-anonymization of large datasets with sparse dimensions. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 229–248. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28641-4_13

    Chapter  MATH  Google Scholar 

  3. Hansell, S.: AOL removes search data on vast group of web users. Technical report, New York Times, August 2006

    Google Scholar 

  4. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1–52 (2007)

    Article  Google Scholar 

  5. Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37(3), 179–192 (2004)

    Article  Google Scholar 

  6. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP 2008, pp. 111–125. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  7. Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23633-9_6

    Chapter  Google Scholar 

  8. Sankar, L., Rajagopalan, S.R., Poor, H.V.: Utility-privacy tradeoffs in databases: an information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 8(6), 838–852 (2013)

    Article  Google Scholar 

  9. Sondeck, L.P., Laurent, M., Frey, V.: Discrimination rate: an attribute-centric metric to measure privacy. Ann. Telecommun. 72, 11–12 (2017)

    Article  Google Scholar 

  10. Sweeney, L.: Statement before the privacy and integrity advisory committee of the department of homeland security. Technical report, Department of Homeland Security, June 2005

    Google Scholar 

  11. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  12. Zhu, T., Xiong, P., Li, G., Zhou, W.: Correlated differential privacy: hiding information in non-IID data set. IEEE Trans. Inf. Forensics Secur. 10(2), 229–242 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debanjan Sadhya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sadhya, D., Chakraborty, B. (2019). A Quantitative Study of Attribute Based Correlation in Micro-databases and Its Effects on Privacy. In: Jang-Jaccard, J., Guo, F. (eds) Information Security and Privacy. ACISP 2019. Lecture Notes in Computer Science(), vol 11547. Springer, Cham. https://doi.org/10.1007/978-3-030-21548-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21548-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21547-7

  • Online ISBN: 978-3-030-21548-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics