Skip to main content

On the Declassification of Confidential Documents

  • Conference paper
Modeling Decision for Artificial Intelligence (MDAI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6820))

Abstract

We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Aggarwal, C.C., Yu, P.S. (eds.): Privacy-Preserving Data Mining. Springer, Heidelberg (2007)

    Google Scholar 

  3. Brand, R.: Microdata Protection through Noise Addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng. 18(10), 1411–1428 (2006)

    Google Scholar 

  5. Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2(3), 329–336 (1986)

    Google Scholar 

  6. DARPA, New technologies to support declassification. Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73 (2010)

    Google Scholar 

  7. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proc. of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics, Canada, pp. 195–204 (1993)

    Google Scholar 

  8. Domingo-Ferrer, J., Torra, V.: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  9. Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. International Conference on Computational Linguistics (1996)

    Google Scholar 

  10. He, Y., Naughton, J.: Anonymization of Set-Valued Data via Top-Down. In: VLDB 2009: Proceedings of the Thirtieth International Conference on Very Large Data Bases. VLDB Endowment, Lyon (2009)

    Google Scholar 

  11. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 279–288 (2002)

    Google Scholar 

  12. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  13. Li, T., Li, N.: Towards optimal k-anonymization. Data Knowledge Engineering 65(1), 22–39 (2008)

    Article  Google Scholar 

  14. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  15. Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS (LNAI), vol. 6408, pp. 243–254. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets, U. S. Bureau of the Census (unpublished manuscript) (1996)

    Google Scholar 

  17. Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 2–26 (2007)

    Google Scholar 

  18. Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Research 20(3), 366–384 (2010)

    Article  Google Scholar 

  19. Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

    Google Scholar 

  20. Sekine, S., Nobata, C.: Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In: Proc. Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  21. Tjong Kim Sang, E.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proc. Conference on Natural Language Learning (2002)

    Google Scholar 

  22. Torra, V.: Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. (2010) (invited chapter)

    Google Scholar 

  23. Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)

    MathSciNet  Google Scholar 

  25. Torra, V.: Rank swapping for partial orders and continuous variables. In: International Conference on Availability, Reliability and Security, pp. 888–893 (2009)

    Google Scholar 

  26. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, Heidelberg (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abril, D., Navarro-Arribas, G., Torra, V. (2011). On the Declassification of Confidential Documents. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22589-5_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22588-8

  • Online ISBN: 978-3-642-22589-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics