Skip to main content
Log in

Singular value decomposition based data distortion strategy for privacy protection

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Santa Barbara, California, USA

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, Texas

  3. Agrawal R, Evfimievski A, Srikant R (2003) Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, CA, pp 86–97

  4. Berry MW (1992) Large scale singular value decompositions. Int J Supercomput Applic High Perf Comput 6:13–49

    Google Scholar 

  5. Berry MW, Drmac Z, Jessup ER (1999) Matrix, vector space, and information retrieval. SIAM Rev 41:335–362

    Article  MATH  MathSciNet  Google Scholar 

  6. Burges C (1998) A tutorial on support vector machine for pattern recognition. Kluwer Academic Publishers, Boston

    Google Scholar 

  7. Campbell C (2002) Kernel methods: a survey of current techniques. Neurocomputing 48:63–84

    Article  MATH  Google Scholar 

  8. Datta S, Kargupta H, Sivakumar K (2003) Homeland defense, privacy-sensitive data mining, and random value distortion. In: Proceedings of the 2003 workshop on data mining for counter terrorism and security, San Francisco, CA

  9. Deerwester S, Dumais S, Furnas G, Landauer T, Harsgman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407

    Article  Google Scholar 

  10. Dempsey JX, Rosenzweig P (2004) Technologies that can protect privacy as information is shared to combat terrorism. Legal Memorandum #11, The Heritage Foundation. Available at www.heritage.org/Research/HomelandDefense/lm11.cfm

  11. Estvill-Castro V, Brankovic L, Dowe DL (1999) Privacy in data mining. Australian Computer Society, NSW Branch, Australia. Available at www.acs.org.au/nsw/articles/1999082.html

  12. Frankes W, Baeza-Yates R (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ

  13. Gao J, Zhang J (2003) Sparsification strategies in latent semantic indexing. In: Berry MW, Pottenger WM (eds) Proceedings of the 2003 text mining workshop, San Francisco, CA, pp 93–103

  14. Gao J, Zhang J (2005) Clustered SVD strategies in latent semantic indexing. Inf Process Manage 41(5), 1051–1063

    Article  MATH  MathSciNet  Google Scholar 

  15. Gilburd B, Schuster A, Wolff R (2004) K-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA

  16. Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins University, Columbia, MD

  17. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods – support vector learning, MIT-Press, Cambridge, MA

    Google Scholar 

  18. Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10:395–411

    Article  MATH  Google Scholar 

  19. Li Y, Gong S, Liddell H (2000) Support vector regression and classification based multiview face detection and recognition. In: Proceedings of the IEEE international conference on automatic face and gesture recognition (FGR'00), Grenoble, France

  20. Skillicorn DB (2003) Clusters within clusters: SVD and counterterrorism. In: Proceedings of 2003 workshop on data mining for counter terrorism and security, San Francisco, CA, p 12

  21. Skillicorn DB (2004) Social network analysis via matrix decompositions: applications to al Qaeda. Technical report, School of Computing, Queen's University, Canada

  22. Skillicorn DB, Vats N (2004) Novel information discovery for intelligence and counterterrorism. Technical report, School of Computing, Queen's University, Canada, pp 488

  23. Sun A, Naing M, Lim EP, Lam W (2003) Using support vector machines for terrorism information extraction. Lecture Notes in Comput Sci 2665:1–12

    Article  Google Scholar 

  24. Sweeney L (2002) K-anonymity: A model for protecting privacy. Int J Uncertainty, Fuzziness Knowl-Based Syst 10:557–570

    Article  MATH  MathSciNet  Google Scholar 

  25. Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci Tech Law Rev 5:1–83

    Google Scholar 

  26. Tether T (2003) Statement before the subcommittee on technology, Information Policy, Intergovernmental Relations and the Census, Committee on Government Reform. U.S. House of Representatives. Available at www.fas.gov/irp/congress/2003_hr/050603tether.html

  27. www.trackingthethreat.com

  28. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  29. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD 33:50–57

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhang.

Additional information

Shuting Xu received her PhD in Computer Science from the University of Kentucky in 2005. Dr. Xu is presently an Assistant Professor in the Department of Computer Information Systems at the Virginia State University. Her research interests include data mining and information retrieval, database systems, parallel, and distributed computing.

Jun Zhang received a PhD from The George Washington University in 1997. He is an Associate Professor of Computer Science and Director of the Laboratory for High Performance Scientific Computing & Computer Simulation and Laboratory for Computational Medical Imaging & Data Analysis at the University of Kentucky. His research interests include computational neuroinformatics, data miningand information retrieval, large scale parallel and scientific computing, numerical simulation, iterative and preconditioning techniques for large scale matrix computation. Dr. Zhang is associate editor and on the editorial boards of four international journals in computer simulation andcomputational mathematics, and is on the program committees of a few international conferences. His research work has been funded by the U.S. National Science Foundation and the Department of Energy. He is recipient of the U.S. National Science Foundation CAREER Award and several other awards.

Dianwei Han received an M.E. degree from Beijing Institute of Technology, Beijing, China, in 1995. From 1995to 1998, he worked in a Hitachi company(BHH) in Beijing, China. He received an MS degree from Lamar University, USA, in 2003. He is currently a PhD student in the Department of Computer Science, University of Kentucky, USA. His research interests include data mining and information retrieval, computational medical imaging analysis, and artificial intelligence.

Jie Wang received the masters degree in Industrial Automation from Beijing University of Chemical Technology in 1996. She is currently a PhD student and a member of the Laboratory for High Performance Computing and Computer Simulation in the Department of Computer Science at the University of Kentucky, USA. Her research interests include data mining and knowledge discovery, information filtering and retrieval, inter-organizational collaboration mechanism, and intelligent e-Technology.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, S., Zhang, J., Han, D. et al. Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10, 383–397 (2006). https://doi.org/10.1007/s10115-006-0001-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0001-2

Keywords

Navigation