Abstract
Statistical methods for disclosure limitation (or control) have seen coupling of tools from statistical methodologies and operations research. For the summary and release of data in the form of a contingency table some methods have focused on evaluation of bounds on cell entries in k-way tables given the sets of marginal totals, with less focus on evaluation of disclosure risk given other summaries such as conditional probabilities, that is, tables of rates derived from the observed contingency tables. Narrow intervals - especially for cells with low counts - could pose a privacy risk. In this paper we derive the closed-form solutions for the linear relaxation bounds on cell counts of a two-way contingency table given observed conditional probabilities. We also compute the corresponding sharp integer bounds via integer programming and show that there can be large differences in the width of these bounds, suggesting that using the linear relaxation is often an unacceptable shortcut to estimating the sharp bounds and the disclosure risk.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze, 8 (1936)
Buzzigoli, L., Gusti, A.: An algorithm to calculate the upper and lower bounds of the elements of an array given its marginals. In: Statistical Data Protection (SDP 1998) Proceedings, pp. 131–147. Eurostat, Luxembourg (1998)
Cox, L.: Bounds on entries in 3-dimensional contingency tables. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 21–33. Springer, Heidelberg (2002)
Cox, L.: Contingency tables of network type: Models, markov basis and applications. Statistica Sinica 17, 1371–1393 (2007)
Dobra, A., Fienberg, S., Rinaldo, A., Slavković, A., Zhou, Y.: Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation and disclosure limitation. In: Putinar, M., Sullivant, S. (eds.) IMA Volumes in Mathematics and its Applications: Emerging Applications of Algebraic Geometry, vol. 149, pp. 63–88. Springer, Heidelberg (2008)
Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Statistical Journal of the United Nations Economic Commission for Europe 18(4), 363–371 (2001)
Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals. Statistical Journal of the United Nations ECE 18, 363–371 (2003)
Federal Committe on Statistical Methodology, Statistical Policy Working Paper 22 (Version Two). Report on Statistical Disclosure Limitation Methodology (2005)
Fienberg, S.E.: Fréchet and Bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical Data Protection: Proceedings of the Conference, pp. 115–129. Eurostat, Luxembourg (1999)
Fienberg, S.E.: Contingency tables and log-linear models: Basic results and new developments. Journal of the American Statistical Association 95(450), 643–647 (2000)
Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11, 155–180 (2005)
Fréchet, M.: Les Probabilitiés Associées a un Système dÉvénments Compatibles et Dépendants, Vol. Premiere Partie. Hermann & Cie, Paris (1940)
Hoeffding, W.: Scale-invariant correlation theory. Schriften des Mathematischen Instituts und des Instituts fur Angewandte Mathematik der Universit at Berlin 5(3), 181–233 (1940)
Hosten, S., Sturmfels, B.: Computing the integer programming gap (2003), http://www.citebase.org/abstract?id=oai:arXiv.org:math/0301266
ILOG CPLEX, ILOG CPLEX 10.1 User’s Manual. ILOG (2006)
Lee, J., Slavković, A.: Synthetic tabular data preserving the observed conditional probabilities. In: PSD 2008 (submitted, 2008)
Lu, H., Li, Y., Wu, X.: Disclosure analysis for two-way contingency tables. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 57–67. Springer, Heidelberg (2006)
Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley-Interscience (1988)
Onn, S.: Entry uniqueness in margined tables. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 94–101. Springer, Heidelberg (2006)
Salazar-Gonzalez, J.-J.: Statistical confidentiality: Optimization techniques to protect tables. Computers and Operations Research 35, 1638–1651 (2008)
Slavković, A.B.: Statistical Disclosure Limitation Beyond the Margins: Characterization of Joint Distributions for Contingency Tables. PhD thesis, Carnegie Mellon University (2004)
Slavković, A.B., Fienberg, S.E.: Bounds for cell entries in two-way tables given conditional relative frequencies. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 30–43. Springer, Heidelberg (2004)
Smucker, B., Slavković, A.: Cell bounds in K-way tables given conditional frequencies. Journal of Official Statistics (to be submitted, 2008)
Sullivant, S.: Small contingency tables with large gaps. Siam J. Discrete Math. 18(4), 787–793 (2005)
Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Lecture Notes in Statistics III. Springer, New York (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smucker, B., Slavković, A.B. (2008). Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-87471-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87470-6
Online ISBN: 978-3-540-87471-3
eBook Packages: Computer ScienceComputer Science (R0)