A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining

Chen, Keke; Liu, Ling

doi:10.1007/978-0-387-70992-5_7

Keke Chen⁵ &
Ling Liu⁶

Part of the book series: Advances in Database Systems ((ADBS,volume 34))

4983 Accesses
13 Citations

The major challenge of data perturbation is to achieve the desired balance between the level of privacy guarantee and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving data mining systems and applications. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired level of data utility by selectively preserving the mining task and model specific information during the data perturbation process. By preserving the task and model specific information, a set of “transformation-invariant data mining models” can be applied to the perturbed data directly, achieving the required model accuracy. Often a multiplicative perturbation algorithm may find multiple data transformations that preserve the required data utility. Thus the next major challenge is to find a good transformation that provides a satisfactory level of privacy guarantee. In this chapter, we review three representative multiplicative perturbation methods: rotation perturbation, projection perturbation, and geometric perturbation, and discuss the technical issues and research challenges. We first describe the mining task and model specific information for a class of data mining models, and the transformations that can (approximately) preserve the information. Then we discuss the design of appropriate privacy evaluation models for multiplicative perturbations, and give an overview of how we use the privacy evaluation model to measure the level of privacy guarantee in the context of different types of attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C. C., and Yu, P. S. A condensation approach to privacy preserving data mining. Proc. of Intl. Conf. on Extending Database Technology (EDBT) 2992 (2004), 183–199.
Google Scholar
Aggarwal, C. C., and Yu, P. S. On privacy-preservation of text and sparse binary data with sketches. SIAM Data Mining Conference (2007).
Google Scholar
Agrawal, D., and Aggarwal, C. C. On the design and quantification of privacy preserving data mining algorithms. Proc. of ACM PODS Conference (2002).
Google Scholar
Agrawal, R., and Srikant, R. Privacy-preserving data mining. Proc. of ACM SIGMOD Conference (2000).
Google Scholar
Alon, N., Matias, Y., and Szegedy, M. The space complexity of approximating the frequency moments. Proc. of ACM PODS Conference (1996).
Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. OPTICS: Ordering points to identify the clustering structure. Proc. of ACM SIGMOD Conference (1999), 49–60.
Google Scholar
Chen, K., and Liu, L. A random geometric perturbation approach to privacy-preserving data classification. Technical Report, College of Computing, Georgia Tech (2005).
Google Scholar
Chen, K., and Liu, L. A random rotation perturbation approach to privacy preserving data classification. Proc. of Intl. Conf. on Data Mining (ICDM) (2005).
Google Scholar
Chen, K., and Liu, L. Towards attack-resilient geometric data perturbation. SIAM Data Mining Conference (2007).
Google Scholar
Cristianini, N., and Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Second International Conference on Knowledge Discovery and Data Mining (1996), 226–231.
Google Scholar
Evfimievski, A., Gehrke, J., and Srikant, R. Limiting privacy breaches in privacy preserving data mining. Proc. of ACM PODS Conference (2003).
Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. Privacy preserving mining of association rules. Proc. of ACM SIGKDD Conference (2002).
Google Scholar
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., and Wright, R. N. Secure multiparty computation of approximations. In ICALP ’01: Proceedings of the 28th International Colloquium on Automata, Languages and Programming, (2001), Springer-Verlag, pp. 927–938.
Google Scholar
Guo, S., and Wu, X. Deriving private information from arbitrarily projected data. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD07) (Warsaw, Poland, Sept 2007).
Google Scholar
Hastie, T., Tibshirani, R., and Friedmann, J. The Elements of Statistical Learning. Springer-Verlag, 2001.
Google Scholar
Hinneburg, A., and Keim, D. A. An efficient approach to clustering in large multimedia databases with noise. Proc. of ACM SIGKDD Conference (1998), 58–65.
Google Scholar
Hyvarinen, A., Karhunen, J., and Oja, E. Independent Component Analysis. Wiley-Interscience, 2001.
Google Scholar
Jain, A. K., and Dubes, R. C. Data clustering: A review. ACM Computing Surveys 31 (1999), 264–323.
Article Google Scholar
Jiang, T. How many entries in a typical orthogonal matrix can be approximated by independent normals. To appear in The Annals of Probability (2005).
Google Scholar
Johnson, W. B., and Lindenstrauss, J. Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics 26 (1984).
Google Scholar
Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. On the privacy preserving properties of random data perturbation techniques. Proc. of Intl. Conf. on Data Mining (ICDM) (2003).
Google Scholar
Kim, J. J., and Winkler, W. E. Multiplicative noise for masking continuous data. Tech. Rep. Statistics #2003-01, Statistical Research Division, U.S. Bureau of the Census, Washington D.C., April 2003.
Google Scholar
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. Mondrain multidimensional k-anonymity. Proc. of IEEE Intl. Conf. on Data Eng. (ICDE) (2006).
Google Scholar
Lewicki, M. S., and Sejnowski, T. J. Learning overcomplet representations. Neural Computation 12, 2 (2000).
Google Scholar
Lindell, Y., and Pinkas, B. Privacy preserving data mining. Journal of Cryptology 15, 3 (2000), 177–206.
Article MathSciNet Google Scholar
Liu, K., Giannella, C., and Kargupta, H. An attacker’s view of distance preserving maps for privacy preserving data mining. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06) (Berlin, Germany, September 2006).
Google Scholar
Liu, K., Kargupta, H., and Ryan, J. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering (TKDE) 18, 1 (January 2006), 92–106.
Article Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. Proc. of IEEE Intl. Conf. on Data Eng. (ICDE) (2006).
Google Scholar
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. Applied Linear Statistical Methods. WCB/McGraw-Hill, 1996.
Google Scholar
Oliveira, S. R. M., and Zaïane, O. R. Privacy preservation when sharing data for clustering. In Proceedings of the International Workshop on Secure Data Management in a Connected World (Toronto, Canada, August 2004), pp. 67–82.
Google Scholar
Sadun, L. Applied Linear Algebra: the Decoupling Principle. Prentice Hall, 2001.
Google Scholar
Stewart, G. The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM Journal on Numerical Analysis 17 (1980).
Google Scholar
Sweeney, L. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10, 5 (2002).
Article Google Scholar
Vaidya, J., and Clifton, C. Privacy preserving k-means clustering over vertically partitioned data. Proc. of ACM SIGKDD Conference (2003).
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, Santa Monica, CA, USA
Keke Chen
College of Computing, Georgia Institute of Technology, Arlington, VA, USA
Ling Liu

Authors

Keke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ling Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM Thomas J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
Charu C. Aggarwal
Department of Computer Science, University of Illinois at Chicago, 854 South Morgan Street, 60607-7053, Chicago, IL, USA
Philip S. Yu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, K., Liu, L. (2008). A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_7

Download citation

DOI: https://doi.org/10.1007/978-0-387-70992-5_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-70991-8
Online ISBN: 978-0-387-70992-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics