Inference control in databases, also known as Statistical Disclosure Control (SDC), is about protecting data so they can be published without revealing confidential information that can be linked to specific individuals among those to which the data correspond. This is an important application in several areas, such as official statistics, health statistics, e-commerce (sharing of consumer data), etc. Since data protection ultimately means data modification, the challenge for SDC is to achieve protection with minimum loss of the accuracy sought by database users. In this chapter, we survey the current state of the art in SDC methods for protecting individual data (microdata). We discuss several information loss and disclosure risk measures and analyze several ways of combining them to assess the performance of the various methods. Last but not least, topics which need more research in the area are identified and possible directions hinted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. M. Abowd and S. D. Woodcock. Disclosure limitation in longitudinal linked tables. In P. Doyle, J. I. Lane, J. J. Theeuwes, and L. V. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 215–278, Amsterdam, 2001. North-Holland.
J. M. Abowd and S. D. Woodcock. Multiply-imputing confidential characteristics and file links in longitudinal linked data. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 290–297, Berlin Heidelberg, 2004. Springer.
N. R. Adam and J. C. Wortmann. Security-control for statistical databases: a comparative study. ACM Computing Surveys, 21(4):515–556, 1989.
C. C. Aggarwal and P. S. Yu. A condensation approach to privacy preserving data mining. In E. Bertino, S. Christodoulakis, D. Plexousakis, V. Christophides, M. Koubarakis, K. Böhm, E. Ferrari, editors, Advances in Database Technology - EDBT 2004, vol. 2992 of Lecture Notes in Computer Science, pages 183-199, Berlin Heidelberg, 2004. Springer.
R. Brand. Microdata protection through noise addition. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 97–116, Berlin Heidelberg, 2002. Springer.
R. Brand. Tests of the applicability of sullivan’s algorithm to synthetic data and real business data in official statistics, 2002. European Project IST-2000-25069 CASC, Deliverable 1.1-D1, http://neon.vb.cbs.nl/casc.
J. Burridge. Information preserving statistical obfuscation. Statistics and Computing, 13:321–327, 2003.
CASC. Computational aspects of statistical confidentiality, 2004. European project IST-2000-25069 CASC, 5th FP, 2001-2004, http://neon.vb.cbs.nl/casc.
F. Y. Chin and G. Ozsoyoglu. Auditing and inference control in statistical databases. IEEE Transactions on Software Engineering, SE-8:574–582, 1982.
L. H. Cox and J. J. Kim. Effects of rounding on the quality and confidentiality of statistical data. In J. Domingo-Ferrer and L. Franconi, editors, Privacy in Statistical Databases-PSD 2006, volume 4302 of Lecture Notes in Computer Science, pages 48–56, Berlin Heidelberg, 2006.
T. Dalenius and S. P. Reiss. Data-swapping: a technique for disclosure control (extended abstract). In Proc. of the ASA Section on Survey Research Methods, pages 191–194, Washington DC, 1978. American Statistical Association.
R. Dandekar, M. Cohen, and N. Kirkendall. Sensitive micro data protection using latin hypercube sampling technique. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 245–253, Berlin Heidelberg, 2002. Springer.
R. Dandekar, J. Domingo-Ferrer, and F. Sebé. Lhs-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 153–162, Berlin Heidelberg, 2002. Springer.
P.-P. de Wolf. Risk, utility and pram. In J. Domingo-Ferrer and L. Franconi, editors, Privacy in Statistical Databases-PSD 2006, volume 4302 of Lecture Notes in Computer Science, pages 189–204, Berlin Heidelberg, 2006.
D. Defays and P. Nanopoulos. Panels of enterprises and confidentiality: the small aggregates method. In Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, pages 195–204, Ottawa, 1993. Statistics Canada.
A. G. DeWaal and L. C. R. J. Willenborg. Global recodings and local suppressions in microdata sets. In Proceedings of Statistics Canada Symposium’95, pages 121–132, Ottawa, 1995. Statistics Canada.
J. Domingo-Ferrer and J. M. Mateo-Sanz. On resampling for statistical confidentiality in contingency tables. Computers & Mathematics with Applications, 38:13–32, 1999.
J. Domingo-Ferrer and J. M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201, 2002.
J. Domingo-Ferrer, J. M. Mateo-Sanz, and V. Torra. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS’2001 (vol. 2), pages 807–826, Luxemburg, 2001. Eurostat.
J. Domingo-Ferrer, F. Sebé, and A. Solanas. A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications, 2007. (To appear).
J. Domingo-Ferrer and V. Torra. A quantitative comparison of disclosure control methods for microdata. In P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 111–134, Amsterdam, 2001. North-Holland. http://vneumann.etse.urv.es/publications/bcpi.
J. Domingo-Ferrer and V. Torra. Algorithmic data mining against privacy protection methods for statistical databases. manuscript, 2004.
J. Domingo-Ferrer and V. Torra. Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195–212, 2005.
G. T. Duncan, S. E. Fienberg, R. Krishnan, R. Padman, and S. F. Roehrig. Disclosure limitation methods and information loss for tabular data. In P. Doyle, J. I. Lane, J. J. Theeuwes, and L. V. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 135–166, Amsterdam, 2001. North-Holland.
G. T. Duncan, S. A. Keller-McNulty, and S. L Stokes. Disclosure risk vs. data utility: The r-u confidentiality map, 2001.
G. T. Duncan and S. Mukherjee. Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. Journal of the American Statistical Association, 95:720–729, 2000.
G. T. Duncan and R. W. Pearson. Enhancing access to microdata while protecting confidentiality: prospects for the future. Statistical Science, 6:219–239, 1991.
E.U.Privacy. European privacy regulations, 2004. http://europa.eu.int/ comm/internal_market/privacy/law_en.htm.
I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64(328):1183–1210, 1969.
S. E. Fienberg. A radical proposal for the provision of micro-data samples and the preservation of confidentiality. Technical Report 611, Carnegie Mellon University Department of Statistics, 1994.
S. E. Fienberg, U. E. Makov, and R. J. Steele. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics, 14(4):485–502, 1998.
S. E. Fienberg and J. McIntyre. Data swapping: variations on a theme by dalenius and reiss. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 14–29, Berlin Heidelberg, 2004. Springer.
A. Florian. An efficient sampling scheme: updated latin hypercube sampling. Probabilistic Engineering Mechanics, 7(2):123–130, 1992.
L. Franconi and J. Stander. A model based method for disclosure limitation of business microdata. Journal of the Royal Statistical Society D - Statistician, 51:1–11, 2002.
R. Garfinkel, R. Gopal, and D. Rice. New approaches to disclosure limitation while answering queries to a database: protecting numerical confidential data against insider threat based on data and algorithms, 2004. Manuscript. Available at http://www-eio.upc.es/seminar/04/garfinkel.pdf.
S. Giessing. Survey on methods for tabular data protection in argus. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 1–13, Berlin Heidelberg, 2004. Springer.
R. Gopal, R. Garfinkel, and P. Goes. Confidentiality via camouflage: the cvc approach to disclosure limitation when answering queries to databases. Operations Research, 50:501–516, 2002.
R. Gopal, P. Goes, and R. Garfinkel. Interval protection of confidential information in a database. INFORMS Journal on Computing, 10:309–322, 1998.
J. M. Gouweleeuw, P. Kooiman, L. C. R. J. Willenborg, and P.-P. DeWolf. Post randomisation for statistical disclosure control: Theory and implementation, 1997. Research paper no. 9731 (Voorburg: Statistics Netherlands).
B. Greenberg. Rank swapping for ordinal data, 1987. Washington, DC: U. S. Bureau of the Census (unpublished manuscript).
S. L. Hansen and S. Mukherjee. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering, 15(4):1043–1044, 2003.
G. R. Heer. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In D. Lievesley, editor, Proc. of the International Seminar on Statistical Confidentiality, pages 261–271, Luxemburg, 1993. Office for Official Publications of the European Communities.
HIPAA. Health insurance portability and accountability act, 2004. http://www.hhs.gov/ocr/hipaa/.
A. Hundepool, A. Van de Wetering, R. Ramaswamy, L. Franconi, A. Capobianchi, P.-P. DeWolf, J. Domingo-Ferrer, V. Torra, R. Brand, and S. Giessing. μ-ARGUS version 4.0 Software and User’s Manual. Statistics Netherlands, Voorburg NL, may 2005. http://neon.vb.cbs.nl/casc.
A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, R. Lenz, J. Longhurst, E. Schulte-Nordholt, G. Seri, and P.-P. DeWolf. Handbook on Statistical Disclosure Control (version 1.0). Eurostat (CENEX SDC Project Deliverable), 2006.
D. E. Huntington and C. S. Lyrintzis. Improvements to and limitations of latin hypercube sampling. Probabilistic Engineering Mechanics, 13(4):245–253, 1998.
A. B. Kennickell. Multiple imputation and disclosure control: the case of the 1995 survey of consumer finances. In Record Linkage Techniques, pages 248–267, Washington DC, 1999. National Academy Press.
A. B. Kennickell. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. In J. Domingo-Ferrer, editor, Statistical Data Protection, pages 248–267, Luxemburg, 1999. Office for Official Publications of the European Communities.
J. J. Kim. A method for limiting disclosure in microdata based on random noise and transformation. In Proceedings of the Section on Survey Research Methods, pages 303–308, Alexandria VA, 1986. American Statistical Association.
M. Laszlo and S. Mukherjee. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17(7):902–911, 2005.
J. M. Mateo-Sanz and J. Domingo-Ferrer. A method for data-oriented multivariate microaggregation. In J. Domingo-Ferrer, editor, Statistical Data Protection, pages 89–99, Luxemburg, 1999. Office for Official Publications of the European Communities.
A. Meyerson and R. Williams. General k-anonymization is hard. Technical Report 03-113, Carnegie Mellon School of Computer Science (USA), 2003.
R. Moore. Controlled data swapping techniques for masking public use microdata sets, 1996. U. S. Bureau of the Census, Washington, DC, (unpublished manuscript).
K. Muralidhar, D. Batra, and P. J. Kirs. Accessibility, security and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Management Science, 41:1549–1564, 1995.
A. Oganian and J. Domingo-Ferrer. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe, 18(4):345–354, 2001.
S. Polettini, L. Franconi, and J. Stander. Model based disclosure protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 83–96, Berlin Heidelberg, 2002. Springer.
T. J. Raghunathan, J. P. Reiter, and D. Rubin. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19(1):1–16, 2003.
S. P. Reiss. Practical data-swapping: the first steps. ACM Transactions on Database Systems, 9:20–37, 1984.
S. P. Reiss, M. J. Post, and T. Dalenius. Non-reversible privacy transformations. In Proceedings of the ACM Symposium on Principles of Database Systems, pages 139–146, Los Angeles, CA, 1982. ACM.
J. P. Reiter. Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18(4):531–544, 2002.
J. P. Reiter. Inference for partially synthetic, public use microdata sets. Survey Methodology, 29:181–188, 2003.
J. P. Reiter. Using cart to generate partially synthetic public use microdata, 2003. Duke University working paper.
J. P. Reiter. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, 168:185–205, 2005.
J. P. Reiter. Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. Journal of Statistical Planning and Inference, 131(2):365–377, 2005.
D. B. Rubin. Discussion of statistical disclosure limitation. Journal of Official Statistics, 9(2):461–468, 1993.
P. Samarati. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001.
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International, 1998.
G. Sande. Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):459–476, 2002.
J. Schlörer. Disclosure from statistical databases: quantitative aspects of trackers. ACM Transactions on Database Systems, 5:467–492, 1980.
F. Sebé, J. Domingo-Ferrer, J. M. Mateo-Sanz, and V. Torra. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 163–171, Berlin Heidelberg, 2002. Springer.
A. C. Singh, F. Yu, and G. H. Dunteman. Massc: A new data mask for limiting statistical information loss and disclosure. In H. Linden, J. Riecan, and L. Belsby, editors, Work Session on Statistical Data Confidentiality 2003, Monographs in Official Statistics, pages 373–394, Luxemburg, 2004. Eurostat.
G. R. Sullivan. The Use of Added Error to Avoid Disclosure in Microdata Releases. PhD thesis, Iowa State University, 1989.
L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):571–588, 2002.
L. Sweeney. k-anonimity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):557–570, 2002.
V. Torra. Microaggregation for categorical variables: a median based approach. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 162–174, Berlin Heidelberg, 2004. Springer.
J. F. Traub, Y. Yemini, and H. Wozniakowski. The statistical security of a statistical database. ACM Transactions on Database Systems, 9:672–679, 1984.
U.S.Privacy. U. s. privacy regulations, 2004. http://www.media-awareness.ca/english/issues/privacy/us_legislation_privacy.cfm.
L. Willenborg and T. DeWaal. Statistical Disclosure Control in Practice. Springer-Verlag, New York, 1996.
L. Willenborg and T. DeWaal. Elements of Statistical Disclosure Control. Springer-Verlag, New York, 2001.
W. E. Winkler. Re-identification methods for masked microdata. In J. Domingo-Ferrer and V. Torra, editors, Privacy in Statistical Databases, volume 3050 of Lecture Notes in Computer Science, pages 216–230, Berlin Heidelberg, 2004. Springer.
W. E. Yancey, W. E. Winkler, and R. H. Creecy. Disclosure risk assessment in perturbative microdata protection. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 135–152, Berlin Heidelberg, 2002. Springer.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Domingo-Ferrer, J. (2008). A Survey of Inference Control Methods for Privacy-Preserving Data Mining. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_3
Download citation
DOI: https://doi.org/10.1007/978-0-387-70992-5_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-70991-8
Online ISBN: 978-0-387-70992-5
eBook Packages: Computer ScienceComputer Science (R0)