ABSTRACT
This paper looks at the real world problem of statistical disclosure control. National Statistics Agencies are required to publish detailed statistics and simultaneously guarantee the confidentiality of the contributors. When published statistical tables contain magnitude data such as turnover or health statistics the preferred method is to suppress the values of cells which may reveal confidential information. However suppressing these 'primary' cells alone will not guarantee protection due the presence of margin (row/column) totals and therefore other 'secondary' cells must also be suppressed. A previously developed algorithm that hybridizes linear programming with a genetic algorithm has been shown to protect tables with up to 40,000 cells, however Statistical Agencies are often required to protect tables with over 100,000 cells.
This algorithm's performance highly depended on the choice of mutation operator so firstly this dependency was removed. As the algorithm is unable to protect larger tables due to the time it takes for its fitness function (a linear program) to execute a series of modifications have been applied. These modifications significantly reduced its execution time which in turn greatly extend the capabilities of the hybrid algorithm to the point that it can now protect tables with up to one million cells.
- Computational infrastructure for operations research, 2006. www.coin-or.org.Google Scholar
- T. Bäck. Self adaptation in genetic algorithms. In F. Varela and P. Bourgine, editors, Toward a Practice of Autonomous Systems: Proceedings of the 1st European Conference on Artificial Life, pages 263--271. MIT Press, Cambridge, MA, 1992.Google Scholar
- H.-G. Beyer. The Theory of Evolution Strategies. Springer, Berlin, Heidelberg, New York, 2001. Google ScholarDigital Library
- J. Castro. Network flows heuristics for complementary cell suppression: An empirical evaluation and extensions. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 59--73. Springer, 2002. Google ScholarDigital Library
- A. Clark and J. Smith. Improvements to cell suppression in statistical disclosure control. Technical report, University of the West of England, 2006. End-of-Project Report for the Office for National Statistics (ONS).Google Scholar
- P.-P. de Wolf. Hitas: A heuristic approach to cell suppression in hierarchical tables. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 81--98. Springer Berlin / Heidelberg, 2002. Google ScholarDigital Library
- A. Eiben and J. Smith. Introduction to Evolutionary Computation. Springer, 2003. Google ScholarDigital Library
- M. Fischetti and J. Salazar-González. Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control. Mathematical Programming, 84(2):283--312, 1999.Google ScholarCross Ref
- M. Glickman and K. Sycara. Reasons for premature convergence of self-adaptating mutation rates. In 2000 Congress on Evolutionary Computation (CEC'2000), pages 62--69. IEEE Press, Piscataway, NJ, 2000.Google Scholar
- A. Hundpool. τ-argus statistical disclosure control software, 2004. http://neon.vb.cbs.nl/CASC/tau.html.Google Scholar
- J. hung Chen, D. E. Goldberg, S. ying Ho, and K. Sastry. Fitness inheritance in multiobjective optimization. 2002.Google Scholar
- Y. Jin. A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 9(1):3--12, 2005. Google ScholarDigital Library
- J. Kelly, B. Golden, and A. Assad. Cell suppression: Disclosure protection for sensitive tabular data. Networks, 22(4):397--417, 1992.Google ScholarCross Ref
- M. Preuss and T. Bartz-Beielstein. Sequential parameter optimisation applied to self-adaptation for binary-coded evolutionary algorithms. In L. et al, editor, Parameter Setting in Evolutionary Algorithms, pages 91--120. Springer, 2007.Google Scholar
- H.-P. Schwefel. Numerical Optimisation of Computer Models. Wiley, New York, 1981. Google ScholarDigital Library
- M. Serpell and J. Smith. Self-adaption of mutation operator and probability for permutation representations in genetic algorithms. Evolutionary Computation, 18(3):491--514, 2010. Google ScholarDigital Library
- J. Smith and T. Fogarty. Self adaptation of mutation rates in a steady state genetic algorithm. In Proceedings of the 1996 IEEE Conference on Evolutionary Computation, pages 318--323. IEEE Press, Piscataway, NJ, 1996.Google ScholarCross Ref
- R. E. Smith, B. A. Dike, and S. A. Stegmann. Fitness inheritance in genetic algorithms. In Proceedings of the 1995 ACM symposium on Applied computing, SAC '95, pages 345--350, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
Index Terms
- Scaling up a hybrid genetic linear programming algorithm for statistical disclosure control
Recommendations
Initial application of ant colony optimisation to statistical disclosure control
GECCO '13: Proceedings of the 15th annual conference on Genetic and evolutionary computationIn this paper Ant Colony Optimisation (ACO) is applied in the field of Statistical Disclosure Control (SDC) for the first time. It has been applied to a permutation problem found in Cell Suppression. ACO has successfully improved the suppression ...
A preprocessing optimization applied to the cell suppression problem in statistical disclosure control
As organizations start to publish the data that they collect, either internally or externally, in the form of statistical tables they need to consider the protection of the confidential information held in those tables. The algorithms used to protect ...
Pre-processing Optimisation Applied to the Classical Integer Programming Model for Statistical Disclosure Control
PSD '08: Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical DatabasesA pre-processing optimisation is proposed that can be applied to the integer and mixed integer linear programming models that are used to solve the cell suppression problem in statistical disclosure control. In this paper we report our initial findings ...
Comments