Privacy in Data Mining

Torra, Vicenç

doi:10.1007/978-0-387-09823-4_35

Privacy in Data Mining

Vicenç Torra³

Chapter
First Online: 01 January 2010

16k Accesses
9 Citations

Summary

In this chapter we describe the main tools for privacy in data mining. We present an overview of the tools for protecting data, and then we focus on protection procedures. Information loss and disclosure risk measures are also described.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adam, N. R.,Wortmann, J. C. (1989) Security-control for statistical databases: a comparative study, ACM Computing Surveys, Volume: 21, 515-556.
Article Google Scholar
Aggarwal, C. (2005) On k-anonymity and the curse of dimensionality, Proceedings of the 31st International Conference on Very Large Databases, pages 901-909.
Google Scholar
Aggarwal, C. C., Yu, P. S. (2008) Privacy-Preserving Data Mining: Models and Algorithms, Springer.
Google Scholar
Agrawal, R., Srikant, R. (2000) Privacy Preserving Data Mining, Proc. of the ACM SIGMOD Conference on Management of Data, 439-450.
Google Scholar
Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V. (1999) Disclosure limitation of sensitive rules, Proc. of IEEE Knowledge and Data Engineering Exchange Workshop (KDEX).
Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008) Anonymity preserving pattern discovery, The VLDB Journal 17 703-727.
Article Google Scholar
Bacher, J., Brand, R., Bender, S. (2002) Re-identifying register data by survey data using cluster analysis: an empirical study, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 589-607.
Article MATH Google Scholar
Bertino, E., Lin, D., Jiang, W. (2008) A survey of quantification of privacy preserving data mining algorithms, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 183-205.
Google Scholar
Brand, R. (2002) Microdata protection through noise addition, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 97- 116.
Google Scholar
Bunn, P., Ostrovsky, R. (2007) Secure two-party k-means clustering, Proc. of CCS’07, ACM Press, 486-497.
Google Scholar
Burridge, J. (2003) Information preserving statistical obfuscation, Statistics and Computing, 13:321–327.
Article MathSciNet Google Scholar
Carlson, M., Salabasis, M. (2002) A data swapping technique using ranks: a method for disclosure control, Research on Official Statistics 5:2 35-64.
Google Scholar
Dalenius, T. (1977) Towards a methodology for statistical disclosure control, Statistisk Tidskrift 5 429-444.
Google Scholar
Dalenius, T. (1986) Finding a needle in a haystack - or identifying anonymous census records, Journal of Official Statistics 2:3 329-336.
Google Scholar
Defays, D., Nanopoulos, P. (1993) Panels of enterprises and confidentiality: the small aggregates method, Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, 195-204.
Google Scholar
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977) Maximum Likelihood From Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society 39 1-38.
MATH MathSciNet Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J. M. (2002) Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. on Knowledge and Data Engineering 14:1 189-201.
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V. (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk, Pre-proceedings of ETKNTTS’ 2001, (Eurostat, ISBN 92-894-1176-5), Vol. 2, 807-826, Creta, Greece.
Google Scholar
Domingo-Ferrer, J., Sebe, F., Castella-Roca, J. (2004) On the security of noise addition for privacy in statistical databases, PSD 2004, Lecture Notes in Computer Science 3050 149-161.
Article Google Scholar
Domingo-Ferrer, J., Torra, V. (2001) Disclosure Control Methods and Information Loss for Microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 91-110.
Google Scholar
Domingo-Ferrer, J., Torra, V. (2001) A quantitative comparison of disclosure control methods for microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, 111-134.
Google Scholar
Domingo-Ferrer, J., Torra, V. (2003) Disclosure Risk Assessment in Statistical Microdata Protection via advanced record linkage, Statistics and Computing, 13 343-354.
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Torra, V. (2005) Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation, Data Mining and Knowledge Discovery 11:2 195-212.
Article MathSciNet Google Scholar
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Disclosure risk vs. data utility: The R-U confidentiality map, Technical Report 121, National Institute of Statistical Sciences.
Google Scholar
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Database security and confidentiality: examining disclosure risk vs. data utility through the R-U confidentiality map, Technical Report 142, National Institute of Statistical Sciences.
Google Scholar
Duncan, G. T., Lambert, D. (1986) Disclosure-limited data dissemination, Journal of the American Statistical Association, 81 10-18.
Article Google Scholar
Duncan, G. T., Lambert, D. (1989) The risk disclosure for microdata, Journal of Business and Economic Statistics 7 207-217.
Article Google Scholar
Elamir, E. A. H. (2004) Analysis of re-identification risk based on log-linear models, PSD 2004, Lecture Notes in Computer Science 3050 273-281.
Article Google Scholar
Elliot, M. (2002) Integrating file and record level disclosure risk assessment, in J. Domingo- Ferrer, Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 126-134.
Article MathSciNet Google Scholar
Elliot, M. J. Skinner, C. J., Dale, A. (1998) Special Uniqueness, Random Uniques and Sticky Populations: Some Counterintuitive Effects of Geographical Detail on Disclosure Risk, Research in Official Statistics 1:2 53-67.
Google Scholar
Fellegi, I. P., Sunter, A. B. (1969) A theory for record linkage, Journal of the American Statistical Association 64:328 1183-1210.
Article Google Scholar
Felsö, F., Theeuwes, J.,Wagner, G., (2001) Disclosure Limitation in Use: Results of a Survey, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 17-42.
Google Scholar
Franconi, L., Polettini, S. (2004) Individual risk estimation in μ-Argus: a review, PSD 2004, Lecture Notes in Computer Science 3050 262-272.
Article Google Scholar
Gouweleeuw, J. M., Kooiman, P., Willenborg, L. C. R. J., De Wolf, P.-P. (1998) Post Randomisation for Statistical Disclosure Control: Theory and Implementation’, Journal of Official Statistics 14:4 463-478. Also as Research Paper No. 9731, Voorburg: Statistics Netherlands (1997).
Google Scholar
Gross, B., Guiblin, P., Merrett, K. (2004) Implementing the Post Randomisation method to the individual sample of anonymised records (SAR) from the 2001 Census, paper presented at “The Samples of Anonymised Records, An Open Meeting on the Samples of Anonymised Records from the 2001 Census”. http://www.ccsr.ac.uk/sars/events/2004-09-30/gross.pdf
Hansen, S., Mukherjee, S. (2003) A Polynomial Algorithm for Optimal Univariate Microaggregation, IEEE Trans. on Knowledge and Data Engineering 15:4 1043-1044.
Article Google Scholar
Haritsa, J. R. (2008) Mining association rules under privacy constraints, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 239- 266.
Google Scholar
Hundepool, A., van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, C., de Wolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S. (2003) μ-ARGUS version 3.2 Software and User’s Manual, Voorburg NL,Statistics Netherlands, February, 2003; version 4.0 published on may 2005. http://neon.vb.cbs.nl/casc.
Jaro, M. A. (1989) Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association 84:406 414- 420.
Article Google Scholar
Jim´enez, J., Torra, V. (2009) Utility and risk of JPEG-based continuous microdata protection methods, Proc. Int. Conf. on Availability, Reliability and Security (ARES 2009), 929- 934.
Google Scholar
Kantarcioglu, M. (2008) A survey of privacy-preserving methods across horizontally partitioned data, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 313-335.
Google Scholar
Kim, J., Winkler, W. (2003) Multiplicative noise for masking continuous data, Research Report Series (Statistics 2003-01), U. S. Bureau of the Census.
Google Scholar
Kisilevich S., Rokach L., Elovici Y., Shapira B., Efficient Multidimensional Suppression for K-Anonymity, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3, pp. 334-347, Mar. 2010
Article Google Scholar
Ladra, S., Torra, V. (2008) On the comparison of generic information loss measures and cluster-specific ones, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 107- 120.
Article Google Scholar
Lambert, D. (1993) Measures of Disclosure Risk and Harm, Journal of Official Statistics 9 313-331.
Google Scholar
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Multidimensional k-anonymity, Technical Report 1521, University of Wisconsin.
Google Scholar
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Incognito: Efficient Full-Domain KAnonymity, SIGMOD 2005.
Google Scholar
Li, N., Li, T., Venkatasubramanian, S. (2007) T-closeness: privacy beyond k-anonymity and l-diversity, Proc. of the IEEE ICDE 2007.
Google Scholar
Liew, C. K., Choi, U. J., Liew, C. J. (1985) A data distortion by probability distribution, ACM Transactions on Database Systems 10 395-411.
Article MATH Google Scholar
Lindell, Y., Pinkas, B. (2002) Privacy Preserving Data Mining, Journal of Cryptology, 15:3.
Google Scholar
Lindell, Y., Pinkas, B. (2000) Privacy Preserving Data Mining, Crypto’00, Lecture Notes in Computer Science 1880 20-24.
Article MathSciNet Google Scholar
Liu, K., Kargupta, H., Ryan, J. (2006) Random projection based multiplicative data perturbation for privacy preserving data mining, IEEE Trans. on Knowledge and Data Engineering 18:1 92-106.
Article Google Scholar
Machanavajjhala, A., Gehrke, J., Kiefer, D., Venkitasubramanian, M. (2006) L-diversity: privacy beyond k-anonymity, Proc. of the IEEE ICDE.
Google Scholar
Mateo-Sanz, J. M., Domingo-Ferrer, J. Seb´e, F. (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata, Data Mining and Knowledge Discovery, 11:2 181-193.
Article MathSciNet Google Scholar
Moore, R. (1996) Controlled data swapping techniques for masking public use microdata sets, U. S. Bureau of the Census (unpublished manuscript).
Google Scholar
Muralidhar, K., Sarathy, R. (2008) Generating Sufficiency-based Non-Synthetic Perturbed Data, Transactions on Data Privacy 1:1 17 - 33
Google Scholar
Nin, J., Herranz, J., Torra, V. (2007) Rethinking Rank Swapping to Decrease Disclosure Risk, Data and Knowledge Engineering, 64:1 346-364.
Article Google Scholar
Nin, J., Herranz, J., Torra, V. (2008) How to Group Attributes in Multivariate Microaggregation, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 121-138.
Article Google Scholar
Nin, J., Herranz, J., Torra, V. (2008) On the Disclosure Risk of Multivariate Microaggregation, Data and Knowledge Engineering, 67:3 399-412.
Article Google Scholar
Nin, J., Herranz, J., Torra, V. (2008) Towards a More Realistic Disclosure Risk Assessment, Lecture Notes in Computer Science, 5262 152-165.
Article Google Scholar
Nin, J. Torra, V. (2006) Extending microaggregation procedures for time series protection, Lecture Notes in Artificial Intelligence, 4259 899-908.
MathSciNet Google Scholar
Nin, J., Torra, V. (2009) Analysis of the Univariate Microaggregation Disclosure Risk, New Generation Computing, 27 177-194.
Article Google Scholar
Oganian, A., Domingo-Ferrer, J. (2000) On the Complexity of Optimal Microaggregation for Statistical Disclosure Control, Statistical J. United Nations Economic Commission for Europe, 18, 4, 345-354.
Google Scholar
Paass, G. (1985) Disclosure risk and disclosure avoidance for microdata, Journal of Business and Economic Statistics 6 487-500.
Article Google Scholar
Paass, G., Wauschkuhn, U. (1985) Datenzugang, Datenschutz und Anonymisierung - Analysepotential und Identifizierbarkeit von Anonymisierten Individualdaten, Oldenbourg Verlag.
Google Scholar
Pagliuca, D., Seri, G. (1999) Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2.
Google Scholar
Pinkas, B. (2002) Cryptographic techniques for privacy-preserving data mining, ACM SIGKDD Explorations 4:2.
Google Scholar
Ravikumar, P., Cohen,W.W. (2004) A hierarchical graphical model for record linkage, Proc. of UAI 2004.
Google Scholar
Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.
Article MATH Google Scholar
Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.
Google Scholar
Samarati, P. (2001) Protecting Respondents’ Identities in Microdata Release, IEEE Trans. on Knowledge and Data Engineering, 13:6 1010-1027.
Article Google Scholar
Samarati, P., Sweeney, L. (1998) Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression, SRI Intl. Tech. Rep.
Google Scholar
Spruill, N. L. (1983) The confidentiality and analytic usefulness of masked business microdata, Proc. of the Section on Survery Research Methods 1983, American Statistical Association, 602-610.
Google Scholar
Sweeney, L. (2002) Achieving k-anonymity privacy protection using generalization and suppression, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 571-588.
Article MATH MathSciNet Google Scholar
Sweeney, L. (2002) k-anonymity: a model for protecting privacy, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 557-570.
Article MATH MathSciNet Google Scholar
Takemura, A. (2002) Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets, Journal of Official Statistics 18 275-289. Preprint (1999) Local recoding by maximum weight matching for disclosure control of microdata sets.
Google Scholar
Templ, M. (2008) Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy 1 67-85.
Google Scholar
Torra, V. (2004) Microaggregation for categorical variables: a median based approach, Proc. Privacy in Statistical Databases (PSD 2004), Lecture Notes in Computer Science 3050 162-174.
Google Scholar
Torra, V. (2004) OWA operators in data modeling and reidentification, IEEE Trans. on Fuzzy Systems 12:5 652-660.
Article Google Scholar
Torra, V. (2008) Constrained Microaggregation: Adding Constraints for Data Editing, Transactions on Data Privacy 1:2 86-104.
Google Scholar
Torra, V., Abowd, J. M., Domingo-Ferrer, J. (2006) Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment, Lecture Notes in Computer Science 4302 233-242.
Article Google Scholar
Torra, V., Domingo-Ferrer, J. (2003) Record linkage methods for multidatabase data mining, in V. Torra (ed.) Information Fusion in Data Mining, Springer, 101-132.
Google Scholar
Torra, V., Miyamoto, S. (2004) Evaluating fuzzy clustering algorithms for microdata protection, PSD 2004, Lecture Notes in Computer Science 3050 175-186.
Article Google Scholar
Trottini, M. (2003) Decision models for data disclosure limitation, PhD Dissertation, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf
Truta, T. M., Vinay, B. (2006) Privacy protection: p-sensitive k-anonymity property. Proc. 2nd Int. Workshop on Privacy Data management (PDM 2006) p. 94.
Google Scholar
Willenborg, L., deWaal, T. (2001) Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag.
Google Scholar
Winkler, W. E. (1993) Matching and record linkage, Statistical Research Division, U. S. Bureau of the Census (USA), RR93/08.
Google Scholar
Winkler, W. E. (2004) Re-identification methods for masked microdata, PSD 2004, Lecture Notes in Computer Science 3050 216-230.
Article Google Scholar
Yancey, W. E., Winkler, W. E., Creecy, R. H. (2002) Disclosure risk assessment in perturbative microdata protection, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 135-152.
Google Scholar
Yao, A. C. (1982) Protocols for Secure Computations, Proc. of 23rd IEEE Symposium on Foundations of Computer Science, Chicago, Illinois, 160-164. http://www.census.gov

Download references

Acknowledgements

Part of the research described in this chapter is supported by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02).

Author information

Authors and Affiliations

IIIA - CSIC, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra

Authors

Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

, Dept. Industrial Engineering, Tel Aviv University, Ramat Aviv, 69978, Israel
Oded Maimon
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Torra, V. (2009). Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_35

Download citation

DOI: https://doi.org/10.1007/978-0-387-09823-4_35
Published: 07 July 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics