Skip to main content

Towards a Toolkit for Utility and Privacy-Preserving Transformation of Semi-structured Data Using Data Pseudonymization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10436))

Abstract

We present a flexibly configurable toolkit for the automatic pseudonymization of datasets that keeps certain utility. The toolkit could be used to pseudonymize data in order to preserve the privacy of data owners while data processing and to meet the requirements of the new European general data protection regulation. We define some possible utility requirements and corresponding utility options a pseudonym can meet. Based on that, we define a policy language that can be used to produce machine-readable utility policies. The utility policies are used to configure the toolkit to produce a pseudonymized dataset that offers the utility options. Here, we follow a confidentiality-by-default principle. I.e., only the data mentioned in the policy is transformed and included in the pseudonymized dataset. All remaining data is kept confidential. This stays in contrast to common pseudonymization techniques that replace only personal or sensitive data of a dataset with pseudonyms, while keeping any other information in plaintext. If applied appropriately, our approach allows for providing pseudonymized datasets that includes less information that can be misused to infer personal information about the individuals the data belong to.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    https://net.cs.uni-bonn.de/wg/itsec/staff/saffija-kasem-madani/appendix/.

  2. 2.

    Bundesamt für Sicherheit in der Informationstechnik: German Federal Office for Information Security.

References

  1. Ben-Kiki, O., Evans, C., Ingerson, B.: Yaml Ain’t Markup Language (yaml) Version 1.1. yaml.org. Technical report (2005)

    Google Scholar 

  2. Biskup, J., Flegel, U.: On pseudonymization of audit data for intrusion detection. In: International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pp. 161–180. Springer-Verlag, New York Inc., New York (2001). http://dl.acm.org/citation.cfm?id=371931.371988

  3. Boneh, D., Gentry, C., Halevi, S., Wang, F., Wu, D.J.: Private database queries using somewhat homomorphic encryption. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 102–118. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38980-1_7

    Chapter  Google Scholar 

  4. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS 2012, NY, USA, pp. 309–325 (2012). http://doi.acm.org/10.1145/2090236.2090262

  5. BSI: Kryptographische Verfahren: Empfehlungen und Schlüssellangen. Technische Richtlinie TR-02102-1, Bundesamt fur Sicherheit in der Informationstechnik (2017)

    Google Scholar 

  6. Crockford, D.: The application/json media type for javascript object notation (json) 2006a (2006). http://tools.ietf.org/html/rfc4627

  7. Daemen, J., Rijmen, V.: AES proposal: Rijndael (1999)

    Google Scholar 

  8. Dolin, R.H., Alschuler, L., Boyer, S., Beebe, C., Behlen, F.M., Biron, P.V., Shabo, A.: HL7 clinical document architecture, release 2. J. Am. Med. Inf. Assoc. 13(1), 30–39 (2006)

    Article  Google Scholar 

  9. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union L119/59, May 2016. http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L:2016:119:TOC

  10. Flegel, U., Hoffmann, J., Meier, M.: Cooperation enablement for centralistic early warning systems. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC 2010, NY, USA, pp. 2001–2008 (2010). http://doi.acm.org/10.1145/1774088.1774509

  11. ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. In: Blakley, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 10–18. Springer, Heidelberg (1985). doi:10.1007/3-540-39568-7_2

    Chapter  Google Scholar 

  12. Gentry, C., et al.: Fully homomorphic encryption using ideal lattices. In: STOC, vol. 9, pp. 169–178 (2009)

    Google Scholar 

  13. Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput. Syst. Sci. 28(2), 270–299 (1984)

    Article  MathSciNet  Google Scholar 

  14. Heurix, J., Khosravipour, S., Tjoa, A.M., Rawassizadeh, R.: LiDSec- A lightweight pseudonymization approach for privacy-preserving publishing of textual personal information. In: 2012 Seventh International Conference on Availability, Reliability and Security, pp. 603–608 (2011)

    Google Scholar 

  15. Kasem-Madani, S., Meier, M.: Security and Privacy Policy Languages: A Survey, Categorization and Gap Identification. arXiv preprint arXiv:1512.00201 (2015)

  16. Kerschbaum, F.: Distance-preserving Pseudonymization for timestamps and spatial data. In: Proceedings of the 2007 ACM Workshop on Privacy in Electronic Society, WPES 2007, NY, USA, pp. 68–71 (2007). http://doi.acm.org/10.1145/1314333.1314346

  17. Kumaraguru, P., Calo, S.: A survey of privacy policy languages. In: Workshop on Usable IT Security Management (USM 2007): Proceedings of the 3rd Symposium on Usable Privacy and Security. ACM (2007)

    Google Scholar 

  18. Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on property-preserving encrypted databases. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, NY, USA, pp. 644–655 (2015). http://doi.acm.org/10.1145/2810103.2813651

  19. Neubauer, T., Riedl, B.: Improving patients privacy with pseudonymization. Stud. Health Technol. Inf. 136, 691 (2008)

    Google Scholar 

  20. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). doi:10.1007/3-540-48910-X_16

    Chapter  Google Scholar 

  21. Popa, R.A., Redfield, C.M.S., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP 2011, NY, USA, pp. 85–100 (2011). http://doi.acm.org/10.1145/2043556.2043566

  22. Riedl, B., Neubauer, T., Goluch, G., Boehm, O., Reinauer, G., Krumboeck, A.: A secure architecture for the pseudonymization of medical data. In: The Second International Conference on Availability, Reliability and Security, ARES 2007, pp. 318–324. IEEE (2007)

    Google Scholar 

  23. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  24. Rossum, G.: Python Reference Manual. Technical report, Amsterdam, The Netherlands (1995)

    Google Scholar 

  25. Schaad, A., Bkakria, A., Kerschbaum, F., Cuppens, F., Cuppens-Boulahia, N., Gross-Amblard, D.: Optimized and controlled provisioning of encrypted outsourced data. In: 19th ACM Symposium on Access Control Models and Technologies, SACMAT 2014, London, ON, Canada, 25–27 June 2014, pp. 141–152 (2014). http://doi.acm.org/10.1145/2613087.2613100

  26. Shafranovich, Y.: Common format and MIME type for comma-separated values (csv) files (2005)

    Google Scholar 

  27. Slagell, A., Lakkaraju, K., Luo, K.: FLAIM: a multi-level anonymization framework for computer and network logs. In: LISA 2006: Proceedings of the 20th conference on Large Installation System Administration, p. 6. USENIX Association, Berkeley (2006)

    Google Scholar 

  28. Wendzel, S.: How to increase the security of smart buildings? Commun. ACM 59(5), 47–49 (2006). http://doi.acm.org/10.1145/2828636

    Article  Google Scholar 

  29. Zhao, J., Binns, R., Van Kleek, M., Shadbolt, N.: Privacy languages: are we there yet to enable user controls? In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 799–806. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). http://dx.doi.org/10.1145/2872518.2890590

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Saffija Kasem-Madani or Michael Meier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kasem-Madani, S., Meier, M., Wehner, M. (2017). Towards a Toolkit for Utility and Privacy-Preserving Transformation of Semi-structured Data Using Data Pseudonymization. In: Garcia-Alfaro, J., Navarro-Arribas, G., Hartenstein, H., Herrera-Joancomartí, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2017 2017. Lecture Notes in Computer Science(), vol 10436. Springer, Cham. https://doi.org/10.1007/978-3-319-67816-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67816-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67815-3

  • Online ISBN: 978-3-319-67816-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics