Skip to main content
Log in

Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bishop Y.M.M., Fienberg S.E., and Holland P.W. 1975. Discrete Multivariate Analyses: Theory and Practice. MIT Press, Cambridge, MA.

    Google Scholar 

  • Buzzigoli L. and Giusti A. 1999.Analgorithm to calculate the lower and upper bounds of the elements of an array given its marginals. In: Statistical Data Protection (SDP'98) Proceedings, Luxembourg, Eurostat. pp. 131–147.

    Google Scholar 

  • Cormen T.H., Leiserson C.E., and Rivest R.L. 1990. Introduction to Algorithms. MIT Press/McGraw-Hill.

  • Dobra A. 2002. Statistical Tools for Disclosure Limitation in Multiway Contingency Tables. PhD thesis, Department of Statistics, Carnegie Mellon University.

  • Dobra A. and FienbergS.E. 2000. Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc. Nat. Acad. Sciences 97: 11885–11892.

    Google Scholar 

  • Dobra A. and Fienberg S.E. 2001. Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Statist. J. United Nations ECE 18: 363–371, 2001. Presented at the 2nd Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, 14-16 March, Skopje, Macedonia.

    Google Scholar 

  • Dobra A., Karr A.F., Fienberg S.E., and Sanil A.P. 2002. Software systems for tabular data releases. Int. J. Uncertainty, Fuzziness and Knowledge Based Systems 10(5): 529–544.

    Google Scholar 

  • Fienberg S.E. 1999. Fréchet and bonferroni bounds for multi-way tables of counts with applications to dis-closure limitation. In: Statistical Data Protection (SDP'98) Proceedings, Luxembourg, Eurostat. pp. 115–129.

  • Harinarayan V., Rajaraman A., and Ullman, J.D. 1996. Implementing data cubes efficiently. In: Proceedings of theACMSIGMODInternational Conference on Management of Data, Vol. 25(2) of ACM SIGMOD Record, June 4-6, ACM Press, New York, pp. 205–216.

    Google Scholar 

  • Karr A.F., Dobra A., and Sanil A.P. 2003. Table servers protect confidentiality in tabular data releases. Comm. ACM 46(1): 57–58.

    Google Scholar 

  • Knuth D.E. 1997. Fundamental Algorithms, Vol. 1 of The Art of Computer Programming, 3rd edn. Addison-Wesley, Reading, Massachusetts.

    Google Scholar 

  • Lauritzen S.L. 1996. Graphical Models. Clarendon Press, Oxford, UK.

    Google Scholar 

  • Madigan D. and York J. 1995. Bayesian graphical models for discrete data. Internat. Statis. Rev. 63: 215–232.

    Google Scholar 

  • Moore A.W. and Lee M.S. 1998. Cached sufficient statistics for efficient machine learning with large datasets. J. Artificial Intell. Res. 8: 67–91.

    Google Scholar 

  • Bjarne Stroustrup. 1997. The C++ Programming Language, 3rd edn. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Sun Microsystems. Java programming language. http://java.sun.com, 2002.

  • Willenborg L.C.R.J. and de Waal T. 2001. Elements of Statistical Disclosure Control. Springer-Verlag, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dobra, A., Karr, A.F. & Sanil, A.P. Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues. Statistics and Computing 13, 363–370 (2003). https://doi.org/10.1023/A:1025671023941

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025671023941

Navigation