Abstract
When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.
Similar content being viewed by others
References
Andersen E.D. 2000. The MOSEK Base System and Application Program Interface version 1.3 User's Manual, EKA Consulting ApS.
Bacharach M. 1970. Biproportional Matrices & Input-Output Change, Cambridge University Press.
Cuppen M. 2000. Source Data Perturbation in Statistical Disclosure Control, Report, Statistics Netherlands.
Domingo-Ferrer J. and Torra V. 2002. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 545–556.
Evans T., Zayatz L., and Slanta J. 1996. Using Noise for Disclosure Limitation of Establishment Tabular Data, U.S. Census Bureau.
Hundepool A., van de Wetering A., Ramaswamy R., Franconi L., Capobianchi A., de Wolf P.P., Domingo J., Torra V., Brand R., and Giessing S. 2003. µ-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.
Hundepool A., van de Wetering A., de Wolf P.P., Giessing S., Fischetti M., Salazar J.J., and Caprara A. 2002. ?-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.
Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 111.
Willenborg L. and de Waal T. 2001. Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 155.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cuppen, M., Willenborg, L. Source Data Perturbation and consistent sets of safe tables. Statistics and Computing 13, 355–362 (2003). https://doi.org/10.1023/A:1025619007103
Issue Date:
DOI: https://doi.org/10.1023/A:1025619007103