Skip to main content

Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks

  • Conference paper
  • First Online:
  • 287 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13589))

Abstract

We present a new approach to anonymizing personal data in text files. In the conducted research, an approach was applied that enables the analysis of sentence sentences with the use of neural networks. Contrary to other currently proposed methods, the presented work analyzes the context of a fragment of the text, which enables the detection of sensitive information not only on the basis of specific words but on the basis of “understanding” the context, such as “mayor of Paris”, “son of the CEO” of a specific company. We present a proprietary solution using convolutional networks connected with glial cells, enabling the selection of the optimal size of the CNN network structure.

The work was supported by The National Centre for Research and Development (NCBR), the project no POIR.01.01.01-00-1431/19.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Devin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language onderstanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  2. Fields, R.D.: The other brain: From dementia to schizophrenia, how new discoveries about the brain are revolutionizing medicine and science. Simon and Schuster (2009)

    Google Scholar 

  3. Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20(3), 121–136 (1975)

    Article  Google Scholar 

  4. Garfinkel, S.L.: Leaking sensitive information in complex document files-and how to prevent it. IEEE Secur. Priv. 12(1), 20–27 (2013)

    Article  Google Scholar 

  5. Guo, T., Guo, S., Zhang, J., Xu, W., Wang, J.: Vertical machine unlearning: Selectively removing sensitive information from latent feature space. arXiv preprint arXiv:2202.13295 (2022)

  6. Hassan, F., Sánchez, D., Soria-Comas, J., Domingo-Ferrer, J.: Automatic anonymization of textual documents: detecting sensitive information via word embeddings. In: 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 358–365. IEEE (2019)

    Google Scholar 

  7. Karam, C., Zini, J.E., Awad, M., Saade, C., Naffaa, L., Amine, M.E.: A progressive and cross-domain deep transfer learning framework for wrist fracture detection. J. Artif. Intell. Soft Comput. Res, 12(2), 101–120 (2022). https://doi.org/10.2478/jaiscr-2022-0007

    Article  Google Scholar 

  8. Kłeczek, D.: PolBERT: attacking polish NLP tasks with transformers. In: Proceedings of the PolEval 2020 Workshop, pp. 79–88 (2020)

    Google Scholar 

  9. McCormick, C., Ryan, N.: BERT fine-tuning tutorial with pytorch. Accessed 24 Jan 2021 (2019)

    Google Scholar 

  10. NASK: EZD RP - Electronic Document Management in Public Administration (2019). https://en.nask.pl/eng/activities/digitisation-of-poland/ezd-rp-electronic-docum/3312,EZD-RP-Electronic-Document-Management-System.html. Accessed 04 April 2022

  11. Shi, L., Copot, C., Vanlanduit, S.: Evaluating dropout placements in Bayesian regression Resnet. J. Artif. Intell. Soft Comput. Res. 12(1), 61–73 (2022). https://doi.org/10.2478/jaiscr-2022-0005

    Article  Google Scholar 

  12. Shin, H.C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)

    Article  Google Scholar 

  13. Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)

    Google Scholar 

  14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MATH  Google Scholar 

  15. Tabakow, P., Weiser, A., Chmielak, K., Blauciak, P., Bladowska, J., Czyz, M.: Navigated neuroendoscopy combined with intraoperative magnetic resonance cysternography for treatment of arachnoid cysts. Neurosurg. Rev. 43(4), 1151–1161 (2020)

    Article  Google Scholar 

  16. Tesfay, W.B., Serna, J., Rannenberg, K.: PrivacyBOT: detecting privacy sensitive information in unstructured texts. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 53–60. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafał Scherer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Korytkowski, M., Nowak, J., Scherer, R., Wei, W. (2023). Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13589. Springer, Cham. https://doi.org/10.1007/978-3-031-23480-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23480-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23479-8

  • Online ISBN: 978-3-031-23480-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics