Abstract
We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)
Aggarwal, C.C., Yu, P.S. (eds.): Privacy-Preserving Data Mining. Springer, Heidelberg (2007)
Brand, R.: Microdata Protection through Noise Addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)
Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng. 18(10), 1411–1428 (2006)
Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2(3), 329–336 (1986)
DARPA, New technologies to support declassification. Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73 (2010)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proc. of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics, Canada, pp. 195–204 (1993)
Domingo-Ferrer, J., Torra, V.: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. International Conference on Computational Linguistics (1996)
He, Y., Naughton, J.: Anonymization of Set-Valued Data via Top-Down. In: VLDB 2009: Proceedings of the Thirtieth International Conference on Very Large Data Bases. VLDB Endowment, Lyon (2009)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 279–288 (2002)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Los Alamitos (2006)
Li, T., Li, N.: Towards optimal k-anonymization. Data Knowledge Engineering 65(1), 22–39 (2008)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS (LNAI), vol. 6408, pp. 243–254. Springer, Heidelberg (2010)
Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets, U. S. Bureau of the Census (unpublished manuscript) (1996)
Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 2–26 (2007)
Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Research 20(3), 366–384 (2010)
Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Sekine, S., Nobata, C.: Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In: Proc. Conference on Language Resources and Evaluation (2004)
Tjong Kim Sang, E.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proc. Conference on Natural Language Learning (2002)
Torra, V.: Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. (2010) (invited chapter)
Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)
Torra, V.: Rank swapping for partial orders and continuous variables. In: International Conference on Availability, Reliability and Security, pp. 888–893 (2009)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abril, D., Navarro-Arribas, G., Torra, V. (2011). On the Declassification of Confidential Documents. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-22589-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22588-8
Online ISBN: 978-3-642-22589-5
eBook Packages: Computer ScienceComputer Science (R0)