Summary
In this chapter we describe the main tools for privacy in data mining. We present an overview of the tools for protecting data, and then we focus on protection procedures. Information loss and disclosure risk measures are also described.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adam, N. R.,Wortmann, J. C. (1989) Security-control for statistical databases: a comparative study, ACM Computing Surveys, Volume: 21, 515-556.
Aggarwal, C. (2005) On k-anonymity and the curse of dimensionality, Proceedings of the 31st International Conference on Very Large Databases, pages 901-909.
Aggarwal, C. C., Yu, P. S. (2008) Privacy-Preserving Data Mining: Models and Algorithms, Springer.
Agrawal, R., Srikant, R. (2000) Privacy Preserving Data Mining, Proc. of the ACM SIGMOD Conference on Management of Data, 439-450.
Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V. (1999) Disclosure limitation of sensitive rules, Proc. of IEEE Knowledge and Data Engineering Exchange Workshop (KDEX).
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008) Anonymity preserving pattern discovery, The VLDB Journal 17 703-727.
Bacher, J., Brand, R., Bender, S. (2002) Re-identifying register data by survey data using cluster analysis: an empirical study, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 589-607.
Bertino, E., Lin, D., Jiang, W. (2008) A survey of quantification of privacy preserving data mining algorithms, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 183-205.
Brand, R. (2002) Microdata protection through noise addition, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 97- 116.
Bunn, P., Ostrovsky, R. (2007) Secure two-party k-means clustering, Proc. of CCS’07, ACM Press, 486-497.
Burridge, J. (2003) Information preserving statistical obfuscation, Statistics and Computing, 13:321–327.
Carlson, M., Salabasis, M. (2002) A data swapping technique using ranks: a method for disclosure control, Research on Official Statistics 5:2 35-64.
Dalenius, T. (1977) Towards a methodology for statistical disclosure control, Statistisk Tidskrift 5 429-444.
Dalenius, T. (1986) Finding a needle in a haystack - or identifying anonymous census records, Journal of Official Statistics 2:3 329-336.
Defays, D., Nanopoulos, P. (1993) Panels of enterprises and confidentiality: the small aggregates method, Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, 195-204.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977) Maximum Likelihood From Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society 39 1-38.
Domingo-Ferrer, J., Mateo-Sanz, J. M. (2002) Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. on Knowledge and Data Engineering 14:1 189-201.
Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V. (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk, Pre-proceedings of ETKNTTS’ 2001, (Eurostat, ISBN 92-894-1176-5), Vol. 2, 807-826, Creta, Greece.
Domingo-Ferrer, J., Sebe, F., Castella-Roca, J. (2004) On the security of noise addition for privacy in statistical databases, PSD 2004, Lecture Notes in Computer Science 3050 149-161.
Domingo-Ferrer, J., Torra, V. (2001) Disclosure Control Methods and Information Loss for Microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 91-110.
Domingo-Ferrer, J., Torra, V. (2001) A quantitative comparison of disclosure control methods for microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, 111-134.
Domingo-Ferrer, J., Torra, V. (2003) Disclosure Risk Assessment in Statistical Microdata Protection via advanced record linkage, Statistics and Computing, 13 343-354.
Domingo-Ferrer, J., Torra, V. (2005) Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation, Data Mining and Knowledge Discovery 11:2 195-212.
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Disclosure risk vs. data utility: The R-U confidentiality map, Technical Report 121, National Institute of Statistical Sciences.
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Database security and confidentiality: examining disclosure risk vs. data utility through the R-U confidentiality map, Technical Report 142, National Institute of Statistical Sciences.
Duncan, G. T., Lambert, D. (1986) Disclosure-limited data dissemination, Journal of the American Statistical Association, 81 10-18.
Duncan, G. T., Lambert, D. (1989) The risk disclosure for microdata, Journal of Business and Economic Statistics 7 207-217.
Elamir, E. A. H. (2004) Analysis of re-identification risk based on log-linear models, PSD 2004, Lecture Notes in Computer Science 3050 273-281.
Elliot, M. (2002) Integrating file and record level disclosure risk assessment, in J. Domingo- Ferrer, Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 126-134.
Elliot, M. J. Skinner, C. J., Dale, A. (1998) Special Uniqueness, Random Uniques and Sticky Populations: Some Counterintuitive Effects of Geographical Detail on Disclosure Risk, Research in Official Statistics 1:2 53-67.
Fellegi, I. P., Sunter, A. B. (1969) A theory for record linkage, Journal of the American Statistical Association 64:328 1183-1210.
Felsö, F., Theeuwes, J.,Wagner, G., (2001) Disclosure Limitation in Use: Results of a Survey, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, 17-42.
Franconi, L., Polettini, S. (2004) Individual risk estimation in μ-Argus: a review, PSD 2004, Lecture Notes in Computer Science 3050 262-272.
Gouweleeuw, J. M., Kooiman, P., Willenborg, L. C. R. J., De Wolf, P.-P. (1998) Post Randomisation for Statistical Disclosure Control: Theory and Implementation’, Journal of Official Statistics 14:4 463-478. Also as Research Paper No. 9731, Voorburg: Statistics Netherlands (1997).
Gross, B., Guiblin, P., Merrett, K. (2004) Implementing the Post Randomisation method to the individual sample of anonymised records (SAR) from the 2001 Census, paper presented at “The Samples of Anonymised Records, An Open Meeting on the Samples of Anonymised Records from the 2001 Census”. http://www.ccsr.ac.uk/sars/events/2004-09-30/gross.pdf
Hansen, S., Mukherjee, S. (2003) A Polynomial Algorithm for Optimal Univariate Microaggregation, IEEE Trans. on Knowledge and Data Engineering 15:4 1043-1044.
Haritsa, J. R. (2008) Mining association rules under privacy constraints, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 239- 266.
Hundepool, A., van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, C., de Wolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S. (2003) μ-ARGUS version 3.2 Software and User’s Manual, Voorburg NL,Statistics Netherlands, February, 2003; version 4.0 published on may 2005. http://neon.vb.cbs.nl/casc.
Jaro, M. A. (1989) Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association 84:406 414- 420.
Jim´enez, J., Torra, V. (2009) Utility and risk of JPEG-based continuous microdata protection methods, Proc. Int. Conf. on Availability, Reliability and Security (ARES 2009), 929- 934.
Kantarcioglu, M. (2008) A survey of privacy-preserving methods across horizontally partitioned data, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 313-335.
Kim, J., Winkler, W. (2003) Multiplicative noise for masking continuous data, Research Report Series (Statistics 2003-01), U. S. Bureau of the Census.
Kisilevich S., Rokach L., Elovici Y., Shapira B., Efficient Multidimensional Suppression for K-Anonymity, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3, pp. 334-347, Mar. 2010
Ladra, S., Torra, V. (2008) On the comparison of generic information loss measures and cluster-specific ones, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 107- 120.
Lambert, D. (1993) Measures of Disclosure Risk and Harm, Journal of Official Statistics 9 313-331.
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Multidimensional k-anonymity, Technical Report 1521, University of Wisconsin.
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Incognito: Efficient Full-Domain KAnonymity, SIGMOD 2005.
Li, N., Li, T., Venkatasubramanian, S. (2007) T-closeness: privacy beyond k-anonymity and l-diversity, Proc. of the IEEE ICDE 2007.
Liew, C. K., Choi, U. J., Liew, C. J. (1985) A data distortion by probability distribution, ACM Transactions on Database Systems 10 395-411.
Lindell, Y., Pinkas, B. (2002) Privacy Preserving Data Mining, Journal of Cryptology, 15:3.
Lindell, Y., Pinkas, B. (2000) Privacy Preserving Data Mining, Crypto’00, Lecture Notes in Computer Science 1880 20-24.
Liu, K., Kargupta, H., Ryan, J. (2006) Random projection based multiplicative data perturbation for privacy preserving data mining, IEEE Trans. on Knowledge and Data Engineering 18:1 92-106.
Machanavajjhala, A., Gehrke, J., Kiefer, D., Venkitasubramanian, M. (2006) L-diversity: privacy beyond k-anonymity, Proc. of the IEEE ICDE.
Mateo-Sanz, J. M., Domingo-Ferrer, J. Seb´e, F. (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata, Data Mining and Knowledge Discovery, 11:2 181-193.
Moore, R. (1996) Controlled data swapping techniques for masking public use microdata sets, U. S. Bureau of the Census (unpublished manuscript).
Muralidhar, K., Sarathy, R. (2008) Generating Sufficiency-based Non-Synthetic Perturbed Data, Transactions on Data Privacy 1:1 17 - 33
Nin, J., Herranz, J., Torra, V. (2007) Rethinking Rank Swapping to Decrease Disclosure Risk, Data and Knowledge Engineering, 64:1 346-364.
Nin, J., Herranz, J., Torra, V. (2008) How to Group Attributes in Multivariate Microaggregation, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 121-138.
Nin, J., Herranz, J., Torra, V. (2008) On the Disclosure Risk of Multivariate Microaggregation, Data and Knowledge Engineering, 67:3 399-412.
Nin, J., Herranz, J., Torra, V. (2008) Towards a More Realistic Disclosure Risk Assessment, Lecture Notes in Computer Science, 5262 152-165.
Nin, J. Torra, V. (2006) Extending microaggregation procedures for time series protection, Lecture Notes in Artificial Intelligence, 4259 899-908.
Nin, J., Torra, V. (2009) Analysis of the Univariate Microaggregation Disclosure Risk, New Generation Computing, 27 177-194.
Oganian, A., Domingo-Ferrer, J. (2000) On the Complexity of Optimal Microaggregation for Statistical Disclosure Control, Statistical J. United Nations Economic Commission for Europe, 18, 4, 345-354.
Paass, G. (1985) Disclosure risk and disclosure avoidance for microdata, Journal of Business and Economic Statistics 6 487-500.
Paass, G., Wauschkuhn, U. (1985) Datenzugang, Datenschutz und Anonymisierung - Analysepotential und Identifizierbarkeit von Anonymisierten Individualdaten, Oldenbourg Verlag.
Pagliuca, D., Seri, G. (1999) Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2.
Pinkas, B. (2002) Cryptographic techniques for privacy-preserving data mining, ACM SIGKDD Explorations 4:2.
Ravikumar, P., Cohen,W.W. (2004) A hierarchical graphical model for record linkage, Proc. of UAI 2004.
Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.
Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.
Samarati, P. (2001) Protecting Respondents’ Identities in Microdata Release, IEEE Trans. on Knowledge and Data Engineering, 13:6 1010-1027.
Samarati, P., Sweeney, L. (1998) Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression, SRI Intl. Tech. Rep.
Spruill, N. L. (1983) The confidentiality and analytic usefulness of masked business microdata, Proc. of the Section on Survery Research Methods 1983, American Statistical Association, 602-610.
Sweeney, L. (2002) Achieving k-anonymity privacy protection using generalization and suppression, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 571-588.
Sweeney, L. (2002) k-anonymity: a model for protecting privacy, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 557-570.
Takemura, A. (2002) Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets, Journal of Official Statistics 18 275-289. Preprint (1999) Local recoding by maximum weight matching for disclosure control of microdata sets.
Templ, M. (2008) Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy 1 67-85.
Torra, V. (2004) Microaggregation for categorical variables: a median based approach, Proc. Privacy in Statistical Databases (PSD 2004), Lecture Notes in Computer Science 3050 162-174.
Torra, V. (2004) OWA operators in data modeling and reidentification, IEEE Trans. on Fuzzy Systems 12:5 652-660.
Torra, V. (2008) Constrained Microaggregation: Adding Constraints for Data Editing, Transactions on Data Privacy 1:2 86-104.
Torra, V., Abowd, J. M., Domingo-Ferrer, J. (2006) Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment, Lecture Notes in Computer Science 4302 233-242.
Torra, V., Domingo-Ferrer, J. (2003) Record linkage methods for multidatabase data mining, in V. Torra (ed.) Information Fusion in Data Mining, Springer, 101-132.
Torra, V., Miyamoto, S. (2004) Evaluating fuzzy clustering algorithms for microdata protection, PSD 2004, Lecture Notes in Computer Science 3050 175-186.
Trottini, M. (2003) Decision models for data disclosure limitation, PhD Dissertation, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf
Truta, T. M., Vinay, B. (2006) Privacy protection: p-sensitive k-anonymity property. Proc. 2nd Int. Workshop on Privacy Data management (PDM 2006) p. 94.
Willenborg, L., deWaal, T. (2001) Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag.
Winkler, W. E. (1993) Matching and record linkage, Statistical Research Division, U. S. Bureau of the Census (USA), RR93/08.
Winkler, W. E. (2004) Re-identification methods for masked microdata, PSD 2004, Lecture Notes in Computer Science 3050 216-230.
Yancey, W. E., Winkler, W. E., Creecy, R. H. (2002) Disclosure risk assessment in perturbative microdata protection, in J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 135-152.
Yao, A. C. (1982) Protocols for Secure Computations, Proc. of 23rd IEEE Symposium on Foundations of Computer Science, Chicago, Illinois, 160-164. http://www.census.gov
Acknowledgements
Part of the research described in this chapter is supported by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Torra, V. (2009). Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_35
Download citation
DOI: https://doi.org/10.1007/978-0-387-09823-4_35
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)