Skip to main content

Privacy in Data Mining

  • Chapter
  • First Online:

Summary

In this chapter we describe the main tools for privacy in data mining. We present an overview of the tools for protecting data, and then we focus on protection procedures. Information loss and disclosure risk measures are also described.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adam, N. R.,Wortmann, J. C. (1989) Security-control for statistical databases: a comparative study, ACM Computing Surveys, Volume: 21, 515-556.

    Article  Google Scholar 

  • Aggarwal, C. (2005) On k-anonymity and the curse of dimensionality, Proceedings of the 31st International Conference on Very Large Databases, pages 901-909.

    Google Scholar 

  • Aggarwal, C. C., Yu, P. S. (2008) Privacy-Preserving Data Mining: Models and Algorithms, Springer.

    Google Scholar 

  • Agrawal, R., Srikant, R. (2000) Privacy Preserving Data Mining, Proc. of the ACM SIGMOD Conference on Management of Data, 439-450.

    Google Scholar 

  • Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V. (1999) Disclosure limitation of sensitive rules, Proc. of IEEE Knowledge and Data Engineering Exchange Workshop (KDEX).

    Google Scholar 

  • Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008) Anonymity preserving pattern discovery, The VLDB Journal 17 703-727.

    Article  Google Scholar 

  • Bacher, J., Brand, R., Bender, S. (2002) Re-identifying register data by survey data using cluster analysis: an empirical study, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 589-607.

    Article  MATH  Google Scholar 

  • Bertino, E., Lin, D., Jiang, W. (2008) A survey of quantification of privacy preserving data mining algorithms, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 183-205.

    Google Scholar 

  • Brand, R. (2002) Microdata protection through noise addition, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 97- 116.

    Google Scholar 

  • Bunn, P., Ostrovsky, R. (2007) Secure two-party k-means clustering, Proc. of CCS’07, ACM Press, 486-497.

    Google Scholar 

  • Burridge, J. (2003) Information preserving statistical obfuscation, Statistics and Computing, 13:321–327.

    Article  MathSciNet  Google Scholar 

  • Carlson, M., Salabasis, M. (2002) A data swapping technique using ranks: a method for disclosure control, Research on Official Statistics 5:2 35-64.

    Google Scholar 

  • Dalenius, T. (1977) Towards a methodology for statistical disclosure control, Statistisk Tidskrift 5 429-444.

    Google Scholar 

  • Dalenius, T. (1986) Finding a needle in a haystack - or identifying anonymous census records, Journal of Official Statistics 2:3 329-336.

    Google Scholar 

  • Defays, D., Nanopoulos, P. (1993) Panels of enterprises and confidentiality: the small aggregates method, Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, 195-204.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., Rubin, D. B. (1977) Maximum Likelihood From Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society 39 1-38.

    MATH  MathSciNet  Google Scholar 

  • Domingo-Ferrer, J., Mateo-Sanz, J. M. (2002) Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. on Knowledge and Data Engineering 14:1 189-201.

    Article  Google Scholar 

  • Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V. (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk, Pre-proceedings of ETKNTTS’ 2001, (Eurostat, ISBN 92-894-1176-5), Vol. 2, 807-826, Creta, Greece.

    Google Scholar 

  • Domingo-Ferrer, J., Sebe, F., Castella-Roca, J. (2004) On the security of noise addition for privacy in statistical databases, PSD 2004, Lecture Notes in Computer Science 3050 149-161.

    Article  Google Scholar 

  • Domingo-Ferrer, J., Torra, V. (2001) Disclosure Control Methods and Information Loss for Microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 91-110.

    Google Scholar 

  • Domingo-Ferrer, J., Torra, V. (2001) A quantitative comparison of disclosure control methods for microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, 111-134.

    Google Scholar 

  • Domingo-Ferrer, J., Torra, V. (2003) Disclosure Risk Assessment in Statistical Microdata Protection via advanced record linkage, Statistics and Computing, 13 343-354.

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer, J., Torra, V. (2005) Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation, Data Mining and Knowledge Discovery 11:2 195-212.

    Article  MathSciNet  Google Scholar 

  • Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Disclosure risk vs. data utility: The R-U confidentiality map, Technical Report 121, National Institute of Statistical Sciences.

    Google Scholar 

  • Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Database security and confidentiality: examining disclosure risk vs. data utility through the R-U confidentiality map, Technical Report 142, National Institute of Statistical Sciences.

    Google Scholar 

  • Duncan, G. T., Lambert, D. (1986) Disclosure-limited data dissemination, Journal of the American Statistical Association, 81 10-18.

    Article  Google Scholar 

  • Duncan, G. T., Lambert, D. (1989) The risk disclosure for microdata, Journal of Business and Economic Statistics 7 207-217.

    Article  Google Scholar 

  • Elamir, E. A. H. (2004) Analysis of re-identification risk based on log-linear models, PSD 2004, Lecture Notes in Computer Science 3050 273-281.

    Article  Google Scholar 

  • Elliot, M. (2002) Integrating file and record level disclosure risk assessment, in J. Domingo- Ferrer, Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 126-134.

    Article  MathSciNet  Google Scholar 

  • Elliot, M. J. Skinner, C. J., Dale, A. (1998) Special Uniqueness, Random Uniques and Sticky Populations: Some Counterintuitive Effects of Geographical Detail on Disclosure Risk, Research in Official Statistics 1:2 53-67.

    Google Scholar 

  • Fellegi, I. P., Sunter, A. B. (1969) A theory for record linkage, Journal of the American Statistical Association 64:328 1183-1210.

    Article  Google Scholar 

  • Felsö, F., Theeuwes, J.,Wagner, G., (2001) Disclosure Limitation in Use: Results of a Survey, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 17-42.

    Google Scholar 

  • Franconi, L., Polettini, S. (2004) Individual risk estimation in μ-Argus: a review, PSD 2004, Lecture Notes in Computer Science 3050 262-272.

    Article  Google Scholar 

  • Gouweleeuw, J. M., Kooiman, P., Willenborg, L. C. R. J., De Wolf, P.-P. (1998) Post Randomisation for Statistical Disclosure Control: Theory and Implementation’, Journal of Official Statistics 14:4 463-478. Also as Research Paper No. 9731, Voorburg: Statistics Netherlands (1997).

    Google Scholar 

  • Gross, B., Guiblin, P., Merrett, K. (2004) Implementing the Post Randomisation method to the individual sample of anonymised records (SAR) from the 2001 Census, paper presented at “The Samples of Anonymised Records, An Open Meeting on the Samples of Anonymised Records from the 2001 Census”. http://www.ccsr.ac.uk/sars/events/2004-09-30/gross.pdf

  • Hansen, S., Mukherjee, S. (2003) A Polynomial Algorithm for Optimal Univariate Microaggregation, IEEE Trans. on Knowledge and Data Engineering 15:4 1043-1044.

    Article  Google Scholar 

  • Haritsa, J. R. (2008) Mining association rules under privacy constraints, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 239- 266.

    Google Scholar 

  • Hundepool, A., van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, C., de Wolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S. (2003) μ-ARGUS version 3.2 Software and User’s Manual, Voorburg NL,Statistics Netherlands, February, 2003; version 4.0 published on may 2005. http://neon.vb.cbs.nl/casc.

  • Jaro, M. A. (1989) Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association 84:406 414- 420.

    Article  Google Scholar 

  • Jim´enez, J., Torra, V. (2009) Utility and risk of JPEG-based continuous microdata protection methods, Proc. Int. Conf. on Availability, Reliability and Security (ARES 2009), 929- 934.

    Google Scholar 

  • Kantarcioglu, M. (2008) A survey of privacy-preserving methods across horizontally partitioned data, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 313-335.

    Google Scholar 

  • Kim, J., Winkler, W. (2003) Multiplicative noise for masking continuous data, Research Report Series (Statistics 2003-01), U. S. Bureau of the Census.

    Google Scholar 

  • Kisilevich S., Rokach L., Elovici Y., Shapira B., Efficient Multidimensional Suppression for K-Anonymity, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3, pp. 334-347, Mar. 2010

    Article  Google Scholar 

  • Ladra, S., Torra, V. (2008) On the comparison of generic information loss measures and cluster-specific ones, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 107- 120.

    Article  Google Scholar 

  • Lambert, D. (1993) Measures of Disclosure Risk and Harm, Journal of Official Statistics 9 313-331.

    Google Scholar 

  • LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Multidimensional k-anonymity, Technical Report 1521, University of Wisconsin.

    Google Scholar 

  • LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Incognito: Efficient Full-Domain KAnonymity, SIGMOD 2005.

    Google Scholar 

  • Li, N., Li, T., Venkatasubramanian, S. (2007) T-closeness: privacy beyond k-anonymity and l-diversity, Proc. of the IEEE ICDE 2007.

    Google Scholar 

  • Liew, C. K., Choi, U. J., Liew, C. J. (1985) A data distortion by probability distribution, ACM Transactions on Database Systems 10 395-411.

    Article  MATH  Google Scholar 

  • Lindell, Y., Pinkas, B. (2002) Privacy Preserving Data Mining, Journal of Cryptology, 15:3.

    Google Scholar 

  • Lindell, Y., Pinkas, B. (2000) Privacy Preserving Data Mining, Crypto’00, Lecture Notes in Computer Science 1880 20-24.

    Article  MathSciNet  Google Scholar 

  • Liu, K., Kargupta, H., Ryan, J. (2006) Random projection based multiplicative data perturbation for privacy preserving data mining, IEEE Trans. on Knowledge and Data Engineering 18:1 92-106.

    Article  Google Scholar 

  • Machanavajjhala, A., Gehrke, J., Kiefer, D., Venkitasubramanian, M. (2006) L-diversity: privacy beyond k-anonymity, Proc. of the IEEE ICDE.

    Google Scholar 

  • Mateo-Sanz, J. M., Domingo-Ferrer, J. Seb´e, F. (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata, Data Mining and Knowledge Discovery, 11:2 181-193.

    Article  MathSciNet  Google Scholar 

  • Moore, R. (1996) Controlled data swapping techniques for masking public use microdata sets, U. S. Bureau of the Census (unpublished manuscript).

    Google Scholar 

  • Muralidhar, K., Sarathy, R. (2008) Generating Sufficiency-based Non-Synthetic Perturbed Data, Transactions on Data Privacy 1:1 17 - 33

    Google Scholar 

  • Nin, J., Herranz, J., Torra, V. (2007) Rethinking Rank Swapping to Decrease Disclosure Risk, Data and Knowledge Engineering, 64:1 346-364.

    Article  Google Scholar 

  • Nin, J., Herranz, J., Torra, V. (2008) How to Group Attributes in Multivariate Microaggregation, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 121-138.

    Article  Google Scholar 

  • Nin, J., Herranz, J., Torra, V. (2008) On the Disclosure Risk of Multivariate Microaggregation, Data and Knowledge Engineering, 67:3 399-412.

    Article  Google Scholar 

  • Nin, J., Herranz, J., Torra, V. (2008) Towards a More Realistic Disclosure Risk Assessment, Lecture Notes in Computer Science, 5262 152-165.

    Article  Google Scholar 

  • Nin, J. Torra, V. (2006) Extending microaggregation procedures for time series protection, Lecture Notes in Artificial Intelligence, 4259 899-908.

    MathSciNet  Google Scholar 

  • Nin, J., Torra, V. (2009) Analysis of the Univariate Microaggregation Disclosure Risk, New Generation Computing, 27 177-194.

    Article  Google Scholar 

  • Oganian, A., Domingo-Ferrer, J. (2000) On the Complexity of Optimal Microaggregation for Statistical Disclosure Control, Statistical J. United Nations Economic Commission for Europe, 18, 4, 345-354.

    Google Scholar 

  • Paass, G. (1985) Disclosure risk and disclosure avoidance for microdata, Journal of Business and Economic Statistics 6 487-500.

    Article  Google Scholar 

  • Paass, G., Wauschkuhn, U. (1985) Datenzugang, Datenschutz und Anonymisierung - Analysepotential und Identifizierbarkeit von Anonymisierten Individualdaten, Oldenbourg Verlag.

    Google Scholar 

  • Pagliuca, D., Seri, G. (1999) Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2.

    Google Scholar 

  • Pinkas, B. (2002) Cryptographic techniques for privacy-preserving data mining, ACM SIGKDD Explorations 4:2.

    Google Scholar 

  • Ravikumar, P., Cohen,W.W. (2004) A hierarchical graphical model for record linkage, Proc. of UAI 2004.

    Google Scholar 

  • Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.

    Article  MATH  Google Scholar 

  • Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.

    Google Scholar 

  • Samarati, P. (2001) Protecting Respondents’ Identities in Microdata Release, IEEE Trans. on Knowledge and Data Engineering, 13:6 1010-1027.

    Article  Google Scholar 

  • Samarati, P., Sweeney, L. (1998) Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression, SRI Intl. Tech. Rep.

    Google Scholar 

  • Spruill, N. L. (1983) The confidentiality and analytic usefulness of masked business microdata, Proc. of the Section on Survery Research Methods 1983, American Statistical Association, 602-610.

    Google Scholar 

  • Sweeney, L. (2002) Achieving k-anonymity privacy protection using generalization and suppression, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 571-588.

    Article  MATH  MathSciNet  Google Scholar 

  • Sweeney, L. (2002) k-anonymity: a model for protecting privacy, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 557-570.

    Article  MATH  MathSciNet  Google Scholar 

  • Takemura, A. (2002) Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets, Journal of Official Statistics 18 275-289. Preprint (1999) Local recoding by maximum weight matching for disclosure control of microdata sets.

    Google Scholar 

  • Templ, M. (2008) Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy 1 67-85.

    Google Scholar 

  • Torra, V. (2004) Microaggregation for categorical variables: a median based approach, Proc. Privacy in Statistical Databases (PSD 2004), Lecture Notes in Computer Science 3050 162-174.

    Google Scholar 

  • Torra, V. (2004) OWA operators in data modeling and reidentification, IEEE Trans. on Fuzzy Systems 12:5 652-660.

    Article  Google Scholar 

  • Torra, V. (2008) Constrained Microaggregation: Adding Constraints for Data Editing, Transactions on Data Privacy 1:2 86-104.

    Google Scholar 

  • Torra, V., Abowd, J. M., Domingo-Ferrer, J. (2006) Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment, Lecture Notes in Computer Science 4302 233-242.

    Article  Google Scholar 

  • Torra, V., Domingo-Ferrer, J. (2003) Record linkage methods for multidatabase data mining, in V. Torra (ed.) Information Fusion in Data Mining, Springer, 101-132.

    Google Scholar 

  • Torra, V., Miyamoto, S. (2004) Evaluating fuzzy clustering algorithms for microdata protection, PSD 2004, Lecture Notes in Computer Science 3050 175-186.

    Article  Google Scholar 

  • Trottini, M. (2003) Decision models for data disclosure limitation, PhD Dissertation, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf

  • Truta, T. M., Vinay, B. (2006) Privacy protection: p-sensitive k-anonymity property. Proc. 2nd Int. Workshop on Privacy Data management (PDM 2006) p. 94.

    Google Scholar 

  • Willenborg, L., deWaal, T. (2001) Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag.

    Google Scholar 

  • Winkler, W. E. (1993) Matching and record linkage, Statistical Research Division, U. S. Bureau of the Census (USA), RR93/08.

    Google Scholar 

  • Winkler, W. E. (2004) Re-identification methods for masked microdata, PSD 2004, Lecture Notes in Computer Science 3050 216-230.

    Article  Google Scholar 

  • Yancey, W. E., Winkler, W. E., Creecy, R. H. (2002) Disclosure risk assessment in perturbative microdata protection, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 135-152.

    Google Scholar 

  • Yao, A. C. (1982) Protocols for Secure Computations, Proc. of 23rd IEEE Symposium on Foundations of Computer Science, Chicago, Illinois, 160-164. http://www.census.gov

Download references

Acknowledgements

Part of the research described in this chapter is supported by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Torra, V. (2009). Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_35

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics