Abstract
We present a new approach to anonymizing personal data in text files. In the conducted research, an approach was applied that enables the analysis of sentence sentences with the use of neural networks. Contrary to other currently proposed methods, the presented work analyzes the context of a fragment of the text, which enables the detection of sensitive information not only on the basis of specific words but on the basis of “understanding” the context, such as “mayor of Paris”, “son of the CEO” of a specific company. We present a proprietary solution using convolutional networks connected with glial cells, enabling the selection of the optimal size of the CNN network structure.
The work was supported by The National Centre for Research and Development (NCBR), the project no POIR.01.01.01-00-1431/19.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Devin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language onderstanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Fields, R.D.: The other brain: From dementia to schizophrenia, how new discoveries about the brain are revolutionizing medicine and science. Simon and Schuster (2009)
Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20(3), 121–136 (1975)
Garfinkel, S.L.: Leaking sensitive information in complex document files-and how to prevent it. IEEE Secur. Priv. 12(1), 20–27 (2013)
Guo, T., Guo, S., Zhang, J., Xu, W., Wang, J.: Vertical machine unlearning: Selectively removing sensitive information from latent feature space. arXiv preprint arXiv:2202.13295 (2022)
Hassan, F., Sánchez, D., Soria-Comas, J., Domingo-Ferrer, J.: Automatic anonymization of textual documents: detecting sensitive information via word embeddings. In: 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 358–365. IEEE (2019)
Karam, C., Zini, J.E., Awad, M., Saade, C., Naffaa, L., Amine, M.E.: A progressive and cross-domain deep transfer learning framework for wrist fracture detection. J. Artif. Intell. Soft Comput. Res, 12(2), 101–120 (2022). https://doi.org/10.2478/jaiscr-2022-0007
Kłeczek, D.: PolBERT: attacking polish NLP tasks with transformers. In: Proceedings of the PolEval 2020 Workshop, pp. 79–88 (2020)
McCormick, C., Ryan, N.: BERT fine-tuning tutorial with pytorch. Accessed 24 Jan 2021 (2019)
NASK: EZD RP - Electronic Document Management in Public Administration (2019). https://en.nask.pl/eng/activities/digitisation-of-poland/ezd-rp-electronic-docum/3312,EZD-RP-Electronic-Document-Management-System.html. Accessed 04 April 2022
Shi, L., Copot, C., Vanlanduit, S.: Evaluating dropout placements in Bayesian regression Resnet. J. Artif. Intell. Soft Comput. Res. 12(1), 61–73 (2022). https://doi.org/10.2478/jaiscr-2022-0005
Shin, H.C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Tabakow, P., Weiser, A., Chmielak, K., Blauciak, P., Bladowska, J., Czyz, M.: Navigated neuroendoscopy combined with intraoperative magnetic resonance cysternography for treatment of arachnoid cysts. Neurosurg. Rev. 43(4), 1151–1161 (2020)
Tesfay, W.B., Serna, J., Rannenberg, K.: PrivacyBOT: detecting privacy sensitive information in unstructured texts. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 53–60. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Korytkowski, M., Nowak, J., Scherer, R., Wei, W. (2023). Privacy Preserving by Removing Sensitive Data from Documents with Fully Convolutional Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13589. Springer, Cham. https://doi.org/10.1007/978-3-031-23480-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-23480-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23479-8
Online ISBN: 978-3-031-23480-4
eBook Packages: Computer ScienceComputer Science (R0)