Skip to main content

Data Anonymization for Privacy Aware Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Abstract

The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Albrecht, J.P.: How the GDPR will change the world. Eur. Data Prot. L. Rev. 2, 287 (2016)

    Article  Google Scholar 

  2. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International (1998)

    Google Scholar 

  3. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), pp. 24–24. IEEE (2006)

    Google Scholar 

  4. Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (\(\alpha \), k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759. ACM (2006)

    Google Scholar 

  5. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)

    Google Scholar 

  6. Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 116–125. IEEE (2007)

    Google Scholar 

  7. Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 126–135. IEEE (2007)

    Google Scholar 

  8. Dwork, C.: Differential privacy. In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, pp. 338–340. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-5906-5

    Chapter  Google Scholar 

  9. Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–502. ACM (2010)

    Google Scholar 

  10. Mohammed, N., Chen, R., Fung, B., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–501. ACM (2011)

    Google Scholar 

  11. Sarwate, A.D., Chaudhuri, K.: Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE Signal Process. Mag. 30(5), 86–94 (2013)

    Article  Google Scholar 

  12. Fernandes, N., Dras, M., McIver, A.: Generalised differential privacy for text document processing. In: Nielson, F., Sands, D. (eds.) POST 2019. LNCS, vol. 11426, pp. 123–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17138-4_6

    Chapter  Google Scholar 

  13. Fernandes, N., Dras, M., McIver, A.: Processing text for privacy: an information flow perspective. In: Havelund, K., Peleska, J., Roscoe, B., de Vink, E. (eds.) FM 2018. LNCS, vol. 10951, pp. 3–21. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95582-7_1

    Chapter  Google Scholar 

  14. Zhang, X., Hamm, J., Reiter, M.K., Zhang, Y.: Statistical privacy for streaming traffic. In: Proceedings of the ISOC Network and Distributed System Security Symposium (2019)

    Google Scholar 

  15. Beaulieu-Jones, B.K., et al.: Privacy-preserving generative deep neural networks support clinical data sharing, p. 159756. BioRxiv (2018)

    Google Scholar 

  16. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12(Mar), 1069–1109 (2011)

    MathSciNet  MATH  Google Scholar 

  17. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Null, pp. 94–103. IEEE (2007)

    Google Scholar 

  18. Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res. 14(1), 2905–2943 (2013)

    MathSciNet  MATH  Google Scholar 

  19. Jiang, X., Ji, Z., Wang, S., Mohammed, N., Cheng, S., Ohno-Machado, L.: Differential-private data publishing through component analysis. Trans. Data Priv. 6(1), 19 (2013)

    MathSciNet  Google Scholar 

  20. Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)

  21. Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)

    Google Scholar 

  22. Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)

    Google Scholar 

  23. Hoff, P.D.: Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438–456 (2009)

    Article  MathSciNet  Google Scholar 

  24. Tianqi, C., Carlos, G.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, New York, NY, USA, pp. 785–794. ACM (2016)

    Google Scholar 

  25. Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 32(14–15), 2627–2636 (1998)

    Article  Google Scholar 

  26. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  27. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)

    Google Scholar 

  28. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics (2011)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Scalian.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Nizar Jaidan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaidan, D.N., Carrere, M., Chemli, Z., Poisvert, R. (2019). Data Anonymization for Privacy Aware Machine Learning. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics