Skip to main content

A Survey of Inference Control Methods for Privacy-Preserving Data Mining

  • Chapter
Privacy-Preserving Data Mining

Part of the book series: Advances in Database Systems ((ADBS,volume 34))

Inference control in databases, also known as Statistical Disclosure Control (SDC), is about protecting data so they can be published without revealing confidential information that can be linked to specific individuals among those to which the data correspond. This is an important application in several areas, such as official statistics, health statistics, e-commerce (sharing of consumer data), etc. Since data protection ultimately means data modification, the challenge for SDC is to achieve protection with minimum loss of the accuracy sought by database users. In this chapter, we survey the current state of the art in SDC methods for protecting individual data (microdata). We discuss several information loss and disclosure risk measures and analyze several ways of combining them to assess the performance of the various methods. Last but not least, topics which need more research in the area are identified and possible directions hinted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. J. M. Abowd and S. D. Woodcock. Disclosure limitation in longitudinal linked tables. In P. Doyle, J. I. Lane, J. J. Theeuwes, and L. V. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 215–278, Amsterdam, 2001. North-Holland.

    Google Scholar 

  2. J. M. Abowd and S. D. Woodcock. Multiply-imputing confidential characteristics and file links in longitudinal linked data. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 290–297, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  3. N. R. Adam and J. C. Wortmann. Security-control for statistical databases: a comparative study. ACM Computing Surveys, 21(4):515–556, 1989.

    Article  Google Scholar 

  4. C. C. Aggarwal and P. S. Yu. A condensation approach to privacy preserving data mining. In E. Bertino, S. Christodoulakis, D. Plexousakis, V. Christophides, M. Koubarakis, K. Böhm, E. Ferrari, editors, Advances in Database Technology - EDBT 2004, vol. 2992 of Lecture Notes in Computer Science, pages 183-199, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  5. R. Brand. Microdata protection through noise addition. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 97–116, Berlin Heidelberg, 2002. Springer.

    Google Scholar 

  6. R. Brand. Tests of the applicability of sullivan’s algorithm to synthetic data and real business data in official statistics, 2002. European Project IST-2000-25069 CASC, Deliverable 1.1-D1, http://neon.vb.cbs.nl/casc.

  7. J. Burridge. Information preserving statistical obfuscation. Statistics and Computing, 13:321–327, 2003.

    Article  MathSciNet  Google Scholar 

  8. CASC. Computational aspects of statistical confidentiality, 2004. European project IST-2000-25069 CASC, 5th FP, 2001-2004, http://neon.vb.cbs.nl/casc.

  9. F. Y. Chin and G. Ozsoyoglu. Auditing and inference control in statistical databases. IEEE Transactions on Software Engineering, SE-8:574–582, 1982.

    Article  MathSciNet  Google Scholar 

  10. L. H. Cox and J. J. Kim. Effects of rounding on the quality and confidentiality of statistical data. In J. Domingo-Ferrer and L. Franconi, editors, Privacy in Statistical Databases-PSD 2006, volume 4302 of Lecture Notes in Computer Science, pages 48–56, Berlin Heidelberg, 2006.

    Google Scholar 

  11. T. Dalenius and S. P. Reiss. Data-swapping: a technique for disclosure control (extended abstract). In Proc. of the ASA Section on Survey Research Methods, pages 191–194, Washington DC, 1978. American Statistical Association.

    Google Scholar 

  12. R. Dandekar, M. Cohen, and N. Kirkendall. Sensitive micro data protection using latin hypercube sampling technique. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 245–253, Berlin Heidelberg, 2002. Springer.

    Google Scholar 

  13. R. Dandekar, J. Domingo-Ferrer, and F. Sebé. Lhs-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 153–162, Berlin Heidelberg, 2002. Springer.

    Google Scholar 

  14. P.-P. de Wolf. Risk, utility and pram. In J. Domingo-Ferrer and L. Franconi, editors, Privacy in Statistical Databases-PSD 2006, volume 4302 of Lecture Notes in Computer Science, pages 189–204, Berlin Heidelberg, 2006.

    Google Scholar 

  15. D. Defays and P. Nanopoulos. Panels of enterprises and confidentiality: the small aggregates method. In Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, pages 195–204, Ottawa, 1993. Statistics Canada.

    Google Scholar 

  16. A. G. DeWaal and L. C. R. J. Willenborg. Global recodings and local suppressions in microdata sets. In Proceedings of Statistics Canada Symposium’95, pages 121–132, Ottawa, 1995. Statistics Canada.

    Google Scholar 

  17. J. Domingo-Ferrer and J. M. Mateo-Sanz. On resampling for statistical confidentiality in contingency tables. Computers & Mathematics with Applications, 38:13–32, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  18. J. Domingo-Ferrer and J. M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201, 2002.

    Article  Google Scholar 

  19. J. Domingo-Ferrer, J. M. Mateo-Sanz, and V. Torra. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS’2001 (vol. 2), pages 807–826, Luxemburg, 2001. Eurostat.

    Google Scholar 

  20. J. Domingo-Ferrer, F. Sebé, and A. Solanas. A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications, 2007. (To appear).

    Google Scholar 

  21. J. Domingo-Ferrer and V. Torra. A quantitative comparison of disclosure control methods for microdata. In P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 111–134, Amsterdam, 2001. North-Holland. http://vneumann.etse.urv.es/publications/bcpi.

  22. J. Domingo-Ferrer and V. Torra. Algorithmic data mining against privacy protection methods for statistical databases. manuscript, 2004.

    Google Scholar 

  23. J. Domingo-Ferrer and V. Torra. Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195–212, 2005.

    Article  MathSciNet  Google Scholar 

  24. G. T. Duncan, S. E. Fienberg, R. Krishnan, R. Padman, and S. F. Roehrig. Disclosure limitation methods and information loss for tabular data. In P. Doyle, J. I. Lane, J. J. Theeuwes, and L. V. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 135–166, Amsterdam, 2001. North-Holland.

    Google Scholar 

  25. G. T. Duncan, S. A. Keller-McNulty, and S. L Stokes. Disclosure risk vs. data utility: The r-u confidentiality map, 2001.

    Google Scholar 

  26. G. T. Duncan and S. Mukherjee. Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. Journal of the American Statistical Association, 95:720–729, 2000.

    Article  Google Scholar 

  27. G. T. Duncan and R. W. Pearson. Enhancing access to microdata while protecting confidentiality: prospects for the future. Statistical Science, 6:219–239, 1991.

    Article  Google Scholar 

  28. E.U.Privacy. European privacy regulations, 2004. http://europa.eu.int/ comm/internal_market/privacy/law_en.htm.

  29. I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64(328):1183–1210, 1969.

    Article  Google Scholar 

  30. S. E. Fienberg. A radical proposal for the provision of micro-data samples and the preservation of confidentiality. Technical Report 611, Carnegie Mellon University Department of Statistics, 1994.

    Google Scholar 

  31. S. E. Fienberg, U. E. Makov, and R. J. Steele. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics, 14(4):485–502, 1998.

    Google Scholar 

  32. S. E. Fienberg and J. McIntyre. Data swapping: variations on a theme by dalenius and reiss. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 14–29, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  33. A. Florian. An efficient sampling scheme: updated latin hypercube sampling. Probabilistic Engineering Mechanics, 7(2):123–130, 1992.

    Article  MathSciNet  Google Scholar 

  34. L. Franconi and J. Stander. A model based method for disclosure limitation of business microdata. Journal of the Royal Statistical Society D - Statistician, 51:1–11, 2002.

    Article  MathSciNet  Google Scholar 

  35. R. Garfinkel, R. Gopal, and D. Rice. New approaches to disclosure limitation while answering queries to a database: protecting numerical confidential data against insider threat based on data and algorithms, 2004. Manuscript. Available at http://www-eio.upc.es/seminar/04/garfinkel.pdf.

  36. S. Giessing. Survey on methods for tabular data protection in argus. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 1–13, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  37. R. Gopal, R. Garfinkel, and P. Goes. Confidentiality via camouflage: the cvc approach to disclosure limitation when answering queries to databases. Operations Research, 50:501–516, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  38. R. Gopal, P. Goes, and R. Garfinkel. Interval protection of confidential information in a database. INFORMS Journal on Computing, 10:309–322, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  39. J. M. Gouweleeuw, P. Kooiman, L. C. R. J. Willenborg, and P.-P. DeWolf. Post randomisation for statistical disclosure control: Theory and implementation, 1997. Research paper no. 9731 (Voorburg: Statistics Netherlands).

    Google Scholar 

  40. B. Greenberg. Rank swapping for ordinal data, 1987. Washington, DC: U. S. Bureau of the Census (unpublished manuscript).

    Google Scholar 

  41. S. L. Hansen and S. Mukherjee. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering, 15(4):1043–1044, 2003.

    Article  Google Scholar 

  42. G. R. Heer. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In D. Lievesley, editor, Proc. of the International Seminar on Statistical Confidentiality, pages 261–271, Luxemburg, 1993. Office for Official Publications of the European Communities.

    Google Scholar 

  43. HIPAA. Health insurance portability and accountability act, 2004. http://www.hhs.gov/ocr/hipaa/.

  44. A. Hundepool, A. Van de Wetering, R. Ramaswamy, L. Franconi, A. Capobianchi, P.-P. DeWolf, J. Domingo-Ferrer, V. Torra, R. Brand, and S. Giessing. μ-ARGUS version 4.0 Software and User’s Manual. Statistics Netherlands, Voorburg NL, may 2005. http://neon.vb.cbs.nl/casc.

  45. A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, R. Lenz, J. Longhurst, E. Schulte-Nordholt, G. Seri, and P.-P. DeWolf. Handbook on Statistical Disclosure Control (version 1.0). Eurostat (CENEX SDC Project Deliverable), 2006.

    Google Scholar 

  46. D. E. Huntington and C. S. Lyrintzis. Improvements to and limitations of latin hypercube sampling. Probabilistic Engineering Mechanics, 13(4):245–253, 1998.

    Article  Google Scholar 

  47. A. B. Kennickell. Multiple imputation and disclosure control: the case of the 1995 survey of consumer finances. In Record Linkage Techniques, pages 248–267, Washington DC, 1999. National Academy Press.

    Google Scholar 

  48. A. B. Kennickell. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. In J. Domingo-Ferrer, editor, Statistical Data Protection, pages 248–267, Luxemburg, 1999. Office for Official Publications of the European Communities.

    Google Scholar 

  49. J. J. Kim. A method for limiting disclosure in microdata based on random noise and transformation. In Proceedings of the Section on Survey Research Methods, pages 303–308, Alexandria VA, 1986. American Statistical Association.

    Google Scholar 

  50. M. Laszlo and S. Mukherjee. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17(7):902–911, 2005.

    Article  Google Scholar 

  51. J. M. Mateo-Sanz and J. Domingo-Ferrer. A method for data-oriented multivariate microaggregation. In J. Domingo-Ferrer, editor, Statistical Data Protection, pages 89–99, Luxemburg, 1999. Office for Official Publications of the European Communities.

    Google Scholar 

  52. A. Meyerson and R. Williams. General k-anonymization is hard. Technical Report 03-113, Carnegie Mellon School of Computer Science (USA), 2003.

    Google Scholar 

  53. R. Moore. Controlled data swapping techniques for masking public use microdata sets, 1996. U. S. Bureau of the Census, Washington, DC, (unpublished manuscript).

    Google Scholar 

  54. K. Muralidhar, D. Batra, and P. J. Kirs. Accessibility, security and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Management Science, 41:1549–1564, 1995.

    Article  MATH  Google Scholar 

  55. A. Oganian and J. Domingo-Ferrer. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe, 18(4):345–354, 2001.

    Google Scholar 

  56. S. Polettini, L. Franconi, and J. Stander. Model based disclosure protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 83–96, Berlin Heidelberg, 2002. Springer.

    Google Scholar 

  57. T. J. Raghunathan, J. P. Reiter, and D. Rubin. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19(1):1–16, 2003.

    Google Scholar 

  58. S. P. Reiss. Practical data-swapping: the first steps. ACM Transactions on Database Systems, 9:20–37, 1984.

    Article  MATH  Google Scholar 

  59. S. P. Reiss, M. J. Post, and T. Dalenius. Non-reversible privacy transformations. In Proceedings of the ACM Symposium on Principles of Database Systems, pages 139–146, Los Angeles, CA, 1982. ACM.

    Chapter  Google Scholar 

  60. J. P. Reiter. Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18(4):531–544, 2002.

    Google Scholar 

  61. J. P. Reiter. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29:181–188, 2003.

    Google Scholar 

  62. J. P. Reiter. Using cart to generate partially synthetic public use microdata, 2003. Duke University working paper.

    Google Scholar 

  63. J. P. Reiter. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, 168:185–205, 2005.

    Article  MATH  MathSciNet  Google Scholar 

  64. J. P. Reiter. Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. Journal of Statistical Planning and Inference, 131(2):365–377, 2005.

    Article  MATH  MathSciNet  Google Scholar 

  65. D. B. Rubin. Discussion of statistical disclosure limitation. Journal of Official Statistics, 9(2):461–468, 1993.

    Google Scholar 

  66. P. Samarati. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001.

    Article  Google Scholar 

  67. P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International, 1998.

    Google Scholar 

  68. G. Sande. Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):459–476, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  69. J. Schlörer. Disclosure from statistical databases: quantitative aspects of trackers. ACM Transactions on Database Systems, 5:467–492, 1980.

    Article  MATH  Google Scholar 

  70. F. Sebé, J. Domingo-Ferrer, J. M. Mateo-Sanz, and V. Torra. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 163–171, Berlin Heidelberg, 2002. Springer.

    Google Scholar 

  71. A. C. Singh, F. Yu, and G. H. Dunteman. Massc: A new data mask for limiting statistical information loss and disclosure. In H. Linden, J. Riecan, and L. Belsby, editors, Work Session on Statistical Data Confidentiality 2003, Monographs in Official Statistics, pages 373–394, Luxemburg, 2004. Eurostat.

    Google Scholar 

  72. G. R. Sullivan. The Use of Added Error to Avoid Disclosure in Microdata Releases. PhD thesis, Iowa State University, 1989.

    Google Scholar 

  73. L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):571–588, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  74. L. Sweeney. k-anonimity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):557–570, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  75. V. Torra. Microaggregation for categorical variables: a median based approach. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 162–174, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  76. J. F. Traub, Y. Yemini, and H. Wozniakowski. The statistical security of a statistical database. ACM Transactions on Database Systems, 9:672–679, 1984.

    Article  Google Scholar 

  77. U.S.Privacy. U. s. privacy regulations, 2004. http://www.media-awareness.ca/english/issues/privacy/us_legislation_privacy.cfm.

  78. L. Willenborg and T. DeWaal. Statistical Disclosure Control in Practice. Springer-Verlag, New York, 1996.

    MATH  Google Scholar 

  79. L. Willenborg and T. DeWaal. Elements of Statistical Disclosure Control. Springer-Verlag, New York, 2001.

    MATH  Google Scholar 

  80. W. E. Winkler. Re-identification methods for masked microdata. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 216–230, Berlin Heidelberg, 2004. Springer.

    Google Scholar 

  81. W. E. Yancey, W. E. Winkler, and R. H. Creecy. Disclosure risk assessment in perturbative microdata protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 135–152, Berlin Heidelberg, 2002. Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Domingo-Ferrer, J. (2008). A Survey of Inference Control Methods for Privacy-Preserving Data Mining. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-70992-5_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-70991-8

  • Online ISBN: 978-0-387-70992-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics