On the Declassification of Confidential Documents

Abril, Daniel; Navarro-Arribas, Guillermo; Torra, Vicenç

doi:10.1007/978-3-642-22589-5_22

Daniel Abril²³,
Guillermo Navarro-Arribas²⁴ &
Vicenç Torra²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6820))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

1207 Accesses
17 Citations

Abstract

We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Anonymization of Unstructured Data via Named-Entity Recognition

The General Data Protection Regulation and Log Pseudonymization

Privacy Preserving Data Mining: A Review of the State of the Art

References

Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)
Chapter Google Scholar
Aggarwal, C.C., Yu, P.S. (eds.): Privacy-Preserving Data Mining. Springer, Heidelberg (2007)
Google Scholar
Brand, R.: Microdata Protection through Noise Addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)
Chapter Google Scholar
Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng. 18(10), 1411–1428 (2006)
Google Scholar
Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2(3), 329–336 (1986)
Google Scholar
DARPA, New technologies to support declassification. Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73 (2010)
Google Scholar
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proc. of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics, Canada, pp. 195–204 (1993)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. International Conference on Computational Linguistics (1996)
Google Scholar
He, Y., Naughton, J.: Anonymization of Set-Valued Data via Top-Down. In: VLDB 2009: Proceedings of the Thirtieth International Conference on Very Large Data Bases. VLDB Endowment, Lyon (2009)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 279–288 (2002)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Li, T., Li, N.: Towards optimal k-anonymization. Data Knowledge Engineering 65(1), 22–39 (2008)
Article Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS (LNAI), vol. 6408, pp. 243–254. Springer, Heidelberg (2010)
Chapter Google Scholar
Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets, U. S. Bureau of the Census (unpublished manuscript) (1996)
Google Scholar
Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 2–26 (2007)
Google Scholar
Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Research 20(3), 366–384 (2010)
Article Google Scholar
Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Google Scholar
Sekine, S., Nobata, C.: Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In: Proc. Conference on Language Resources and Evaluation (2004)
Google Scholar
Tjong Kim Sang, E.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proc. Conference on Natural Language Learning (2002)
Google Scholar
Torra, V.: Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. (2010) (invited chapter)
Google Scholar
Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Chapter Google Scholar
Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)
MathSciNet Google Scholar
Torra, V.: Rank swapping for partial orders and continuous variables. In: International Conference on Availability, Reliability and Security, pp. 888–893 (2009)
Google Scholar
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, Heidelberg (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut d’Investigació en Intel·ligència Artificial (IIIA), Consejo Superior de Investigaciones Científicas (CSIC), Spain
Daniel Abril & Vicenç Torra
Dep. Enginyeria de la Informació i de les Comunicacions (DEIC), Universitat Autònoma de Barcelona (UAB), Spain
Guillermo Navarro-Arribas

Authors

Daniel Abril
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Navarro-Arribas
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Artificial Intelligence Research Institute (IIIA) Spanish National Research Council (CSIC), IIIA-CSIC, Campus Universitat Autonoma de Barcelona, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra
Toho Gakuen, 3-1-10, Naka, Kunitachi, 184-0004, Tokyo, Japan
Yasuo Narakawa
School of Computer, National University of Defense Technology, 410073, Changsha, China
Jianping Yin
Department of Network Engineering, National University of Defense Technology, Yanwachi Street 137, 410073, Changsha, Hunan, China
Jun Long

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abril, D., Navarro-Arribas, G., Torra, V. (2011). On the Declassification of Confidential Documents. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-22589-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22588-8
Online ISBN: 978-3-642-22589-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics