Skip to main content
Log in

Source Data Perturbation and consistent sets of safe tables

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andersen E.D. 2000. The MOSEK Base System and Application Program Interface version 1.3 User's Manual, EKA Consulting ApS.

  • Bacharach M. 1970. Biproportional Matrices & Input-Output Change, Cambridge University Press.

  • Cuppen M. 2000. Source Data Perturbation in Statistical Disclosure Control, Report, Statistics Netherlands.

  • Domingo-Ferrer J. and Torra V. 2002. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 545–556.

    Google Scholar 

  • Evans T., Zayatz L., and Slanta J. 1996. Using Noise for Disclosure Limitation of Establishment Tabular Data, U.S. Census Bureau.

  • Hundepool A., van de Wetering A., Ramaswamy R., Franconi L., Capobianchi A., de Wolf P.P., Domingo J., Torra V., Brand R., and Giessing S. 2003. µ-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.

  • Hundepool A., van de Wetering A., de Wolf P.P., Giessing S., Fischetti M., Salazar J.J., and Caprara A. 2002. ?-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.

  • Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 111.

    Google Scholar 

  • Willenborg L. and de Waal T. 2001. Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 155.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cuppen, M., Willenborg, L. Source Data Perturbation and consistent sets of safe tables. Statistics and Computing 13, 355–362 (2003). https://doi.org/10.1023/A:1025619007103

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025619007103

Navigation