Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks

Korytkowski, Marcin; Nowak, Jakub; Scherer, Rafał; Wei, Wei

doi:10.1007/978-3-031-23480-4_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13589))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

Abstract

We present a new approach to anonymizing personal data in text files. In the conducted research, an approach was applied that enables the analysis of sentence sentences with the use of neural networks. Contrary to other currently proposed methods, the presented work analyzes the context of a fragment of the text, which enables the detection of sensitive information not only on the basis of specific words but on the basis of “understanding” the context, such as “mayor of Paris”, “son of the CEO” of a specific company. We present a proprietary solution using convolutional networks connected with glial cells, enabling the selection of the optimal size of the CNN network structure.

The work was supported by The National Centre for Research and Development (NCBR), the project no POIR.01.01.01-00-1431/19.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Anonymization of German financial documents using neural network-based language models with contextual word representations

Article Open access 02 October 2021

Privacy Disclosures Detection in Natural-Language Text Through Linguistically-Motivated Artificial Neural Networks

Towards Quantifying the Privacy of Redacted Text

References

Devin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language onderstanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Google Scholar
Fields, R.D.: The other brain: From dementia to schizophrenia, how new discoveries about the brain are revolutionizing medicine and science. Simon and Schuster (2009)
Google Scholar
Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20(3), 121–136 (1975)
Article Google Scholar
Garfinkel, S.L.: Leaking sensitive information in complex document files-and how to prevent it. IEEE Secur. Priv. 12(1), 20–27 (2013)
Article Google Scholar
Guo, T., Guo, S., Zhang, J., Xu, W., Wang, J.: Vertical machine unlearning: Selectively removing sensitive information from latent feature space. arXiv preprint arXiv:2202.13295 (2022)
Hassan, F., Sánchez, D., Soria-Comas, J., Domingo-Ferrer, J.: Automatic anonymization of textual documents: detecting sensitive information via word embeddings. In: 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 358–365. IEEE (2019)
Google Scholar
Karam, C., Zini, J.E., Awad, M., Saade, C., Naffaa, L., Amine, M.E.: A progressive and cross-domain deep transfer learning framework for wrist fracture detection. J. Artif. Intell. Soft Comput. Res, 12(2), 101–120 (2022). https://doi.org/10.2478/jaiscr-2022-0007
Article Google Scholar
Kłeczek, D.: PolBERT: attacking polish NLP tasks with transformers. In: Proceedings of the PolEval 2020 Workshop, pp. 79–88 (2020)
Google Scholar
McCormick, C., Ryan, N.: BERT fine-tuning tutorial with pytorch. Accessed 24 Jan 2021 (2019)
Google Scholar
NASK: EZD RP - Electronic Document Management in Public Administration (2019). https://en.nask.pl/eng/activities/digitisation-of-poland/ezd-rp-electronic-docum/3312,EZD-RP-Electronic-Document-Management-System.html. Accessed 04 April 2022
Shi, L., Copot, C., Vanlanduit, S.: Evaluating dropout placements in Bayesian regression Resnet. J. Artif. Intell. Soft Comput. Res. 12(1), 61–73 (2022). https://doi.org/10.2478/jaiscr-2022-0005
Article Google Scholar
Shin, H.C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)
Article Google Scholar
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MATH Google Scholar
Tabakow, P., Weiser, A., Chmielak, K., Blauciak, P., Bladowska, J., Czyz, M.: Navigated neuroendoscopy combined with intraoperative magnetic resonance cysternography for treatment of arachnoid cysts. Neurosurg. Rev. 43(4), 1151–1161 (2020)
Article Google Scholar
Tesfay, W.B., Serna, J., Rannenberg, K.: PrivacyBOT: detecting privacy sensitive information in unstructured texts. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 53–60. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Czestochowa University of Technology, al. Armii Krajowej 36, Czestochowa, Poland
Marcin Korytkowski, Jakub Nowak & Rafał Scherer
Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
Wei Wei

Authors

Marcin Korytkowski
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Scherer .

Editor information

Editors and Affiliations

Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Korytkowski, M., Nowak, J., Scherer, R., Wei, W. (2023). Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13589. Springer, Cham. https://doi.org/10.1007/978-3-031-23480-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-23480-4_23
Published: 24 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23479-8
Online ISBN: 978-3-031-23480-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks