Skip to main content
Log in

An evolutionary approach to enhance data privacy

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Dissemination of data with sensitive information about individuals has an implicit risk of unauthorized disclosure. Perturbative masking methods propose the distortion of the original data sets before publication, tackling a difficult tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). In this paper, we describe how information loss and disclosure risk measures can be integrated within an evolutionary algorithm to seek new and enhanced masking protections for continuous microdata. The proposed technique constitutes a hybrid approach that combines state-of-the-art protection methods with an evolutionary algorithm optimization. We also provide experimental results using three data sets in order to illustrate and empirically evaluate the application of this technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The history of these kinds of algorithms goes into the early 1950s and is associated to different scientists, completely independent from each other (Michalewicz and Fogel 2004). Each procedure was slightly different, and some got names like evolutionary computation (Back et al. 2000), genetic algorithms (Holland 1975) or evolution strategies (Rechenberg 1970; Schwefel 1981). Through time, the different approaches borrowed, exchanged and modified ideas. Then the term evolutionary algorithm emerged to describe any of these algorithms, which is the denomination that we follow in this paper.

References

  • Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450

  • Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, Bristol

  • Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228

  • Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASC

  • Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161

  • Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78

  • Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204

  • Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101

  • Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201

    Article  Google Scholar 

  • Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133

  • Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826

  • Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166

  • Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North Carolina

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992)

  • Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288

  • Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSIC

  • Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARES

  • Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911

    Article  Google Scholar 

  • LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, Madison

  • Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36

  • Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193

    Article  MathSciNet  Google Scholar 

  • Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, Berlin

  • Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the Census

  • Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412

    Article  Google Scholar 

  • Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364

    Article  Google Scholar 

  • Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973

  • Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  • Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60

  • Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, Chichester

    Google Scholar 

  • Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171

  • Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239

  • Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, Berlin

  • Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152

Download references

Acknowledgments

Partial support by Generalitat de Catalunya (AGAUR, 2009 SGR 7) and by the Spanish MEC (projects ARES CONSOLIDER INGENIO 2010 CSD2007 00004 and e-AEGIS TSI2007 65406-C03-01/02) is acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiménez, J., Marés, J. & Torra, V. An evolutionary approach to enhance data privacy. Soft Comput 15, 1301–1311 (2011). https://doi.org/10.1007/s00500-010-0672-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-010-0672-1

Keywords

Navigation