Skip to main content

Perturbative Data Protection of Multivariate Nominal Datasets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9867))

Abstract

Many of the potentially sensitive personal data produced and compiled in electronic sources are nominal and multi-attribute (e.g., personal interests, healthcare diagnoses, commercial transactions, etc.). For such data, which are discrete, finite and non-ordinal, privacy-protection methods should mask original values to prevent disclosure while preserving the underlying semantics of nominal attributes and the (potential) correlation between them. In this paper we tackle this challenge by proposing a semantically-grounded version of numerical correlated noise addition that, by relying on structured knowledge sources (ontologies), is capable of perturbing/masking multivariate nominal attributes while reasonably preserving their semantics and correlations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., Wolf, P.-P.: Microdata. In: Statistical Disclosure Control, pp. 23–130. Wiley (2012)

    Google Scholar 

  2. Domingo-Ferrer, J., Sánchez, D., Soria-Comas, J.: Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections. Morgan & Claypool Publishers (2016)

    Google Scholar 

  3. Ramirez, E., Brill, J., Ohlhausen, M., Wright, J., Mc-Sweeny, T.: Data brokers: a call for transparency and accountability. Federal Trade Commission, Technical Report, May 2014

    Google Scholar 

  4. Sánchez, D., Batet, M.: C-sanitized: a privacy model for document redaction and sanitization. J. Assoc. Inf. Sci. Technol. 67, 148–163 (2016)

    Article  Google Scholar 

  5. Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: t-closeness through microaggregation: strict privacy with enhanced utility preservation. IEEE Trans. Knowl. Data Eng. 27, 3098–3110 (2015)

    Article  Google Scholar 

  6. Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)

    Article  Google Scholar 

  7. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)

    Article  Google Scholar 

  8. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International (1998)

    Google Scholar 

  9. Krempl, G., Zliobaite, I., Brzezinski, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newslett. 16, 1–10 (2014)

    Article  Google Scholar 

  10. Dwork, C.: Differential privacy. Automata Lang. Programm. 4052, 1–2 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Kooiman, P., Willenborg, L., Gouweleeuw, J.: Pram: a method for disclosure limitation of microdata. Research Paper 9705, Statistics Netharlands, P.O. Box 4000, 2270 JM Voorburg, The Netharlands (1997)

    Google Scholar 

  12. Giggins, H., Brankovic, L.: Protecting privacy in genetic databases. In: Proceeding of the 6th Engineering Mathematics and Applications Conference (EMAC 2003), vol. 2, Sydney, Australia, pp. 73–78 (2003)

    Google Scholar 

  13. Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceeding of the ACM Symposium on Theory of Computing (STOC 2009), pp. 351–360 (2009)

    Google Scholar 

  14. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceeding of Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), pp. 94–103 (2007)

    Google Scholar 

  15. Abril, D., Navarro-Arribas, G., Torra, V.: On the declassification of confidential documents. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds.) MDAI 2011. LNCS, vol. 6820, pp. 235–246. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Rodriguez-Garcia, M., Batet, M., Sánchez, D.: Semantic noise: privacy-protection of nominal microdata through uncorrelated noise addition. In: Proceeding of the 27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Vietri sul Mare, Italy, pp. 1106–1113 (2015)

    Google Scholar 

  17. Conway, R., Strip, D.: Selective partial access to a database. Cornell University, Technical Report (1976)

    Google Scholar 

  18. Tendick, P.: Optimal noise addition for preserving confidentiality in multivariate data. J. Stat. Plann. Infer. 27, 341–353 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  19. Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceeding of the ASA Section on Survey Research Methods, pp. 370–374 (1986)

    Google Scholar 

  20. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceeding of the Annual Meeting of the Association for Computational Linguistics, pp. 133–139 (1994)

    Google Scholar 

  21. Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Spackman, K.A.: SNOMED CT milestones: endorsements are added to already-impressive standards credentials. Healthcare Inf. Bus. Mag. Inf. Commun. Syst. 21, 54–56 (2004)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the EU Commission under the H2020 project “CLARUS”, by the Spanish Government through projects TIN2014-57364-C2-R “SmartGlacis”, TIN2011-27076-C03-01 “Co-Privacy” and TIN2015-70054-REDC “Red de excelencia Consolider ARES” and by the Government of Catalonia under grant 2014 SGR 537. M. Batet is supported by a Postdoctoral grant from Ministry of Economy and Competitiveness (MINECO) (FPDI-2013-16589).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rodriguez-Garcia, M., Sánchez, D., Batet, M. (2016). Perturbative Data Protection of Multivariate Nominal Datasets. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45381-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45380-4

  • Online ISBN: 978-3-319-45381-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics