Abstract
The data analysis and mining is more and more powerful with the rapid growing data size. And publishing data for researchers is becoming more valuable. This process has an important problem: privacy protection. In recent decades, many methods for protecting privacy in data publishing have been studied. One important kind of them is based on matrix decompositions. These methods find non-critical information for analysis task using matrix decompositions and remove it from the data to protecting privacy. This paper improves this kind method and gives a new algorithm for protecting privacy based on non-negative matrix factorization and singular value decomposition. Our basic idea is that if using plurality kinds of decompositions, it can analyze data from different directions and will analyze data more comprehensive. So, it may find more non-critical information and improve the algorithm performance. The experiments confirmed this idea. This new method can get better result than the traditional ones in which only one kind decomposition is used. Our method gives more powerful guarantee for protecting privacy when maintaining data quality.
Similar content being viewed by others
References
Peng, J., Lu, J., Shang, X., & Chen, J. (2017). Identifying consistent disease subnetworks using DNet. Methods, 131, 104–110.
Peng, J., Xue, H., Shao, Y., Shang, X., Wang, Y., & Chen, J. (2017). A novel method to measure the semantic similarity of HPO terms. International Journal of Data Mining and Bioinformatics, 17(2), 173–188.
Hall, M. A., & Rich, S. S. (2000). Patients’ fear of genetic discrimination by health insurers: The impact of legal protections. Genetics in Medicine, 2(4), 214–221.
Clayton, E. (2003). Ethical, legal, and social implications of genomic medicine. New England Journal of Medicine, 349(6), 562–569.
Vaghashia, H., & Ganatra, A. (2015). A survey: Privacy preservation techniques in data mining. International Journal of Computer Applications, 119(4), 20–26.
Yun, U., & Kim, J. (2015). A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Systems with Applications, 42(3), 1149–1165.
Xu, S., Zhang, J., Han, D., & Wang, J. (2006). Singular value decomposition based data distortion strategy for privacy protection. Knowledge and Information Systems, 10(3), 383–397.
Wang, J., Zhang, J., Xu, S., & Zhong, W. (2008). A novel data distortion approach via selective SSVD for privacy protection. International Journal of Information and Computer Security, 2(1), 48–70.
Wang, J., Zhong, W., & Zhang, J. (2006). NNMF-based factorization techniques for high-accuracy privacy protection on non-negative-valued datasets. In Proceedings of the sixth IEEE international conference on data mining—workshops (pp. 513–517).
Li, G., & Xi, M. (2015). An improved algorithm for privacy-preserving data mining based on NMF. Journal of Information and Computational Science, 12(9), 3423–3430.
Liu, L., Wang, J., & Zhang, J. (2008). Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In Proceedings of the 2008 IEEE international conference on data mining workshops (pp. 27–35).
Zhang, X., Xu, Z., Jia, N., Yang, W., Feng, Q., Chen, W., et al. (2015). Denoising of 3D magnetic resonance images by using higher-order singular value decomposition. Medical Image Analysis, 19(1), 75–86.
Cong, F., Chen, J., Dong, G., & Zhao, F. (2013). Short-time matrix series based singular value decomposition for rolling bearing fault diagnosis. Mechanical Systems and Signal Processing, 34(1–2), 218–230.
Maruyama, R., Maeda, K., Moroda, H., Kato, I., Inoue, M., Miyakawa, H., et al. (2014). Detecting cells using non-negative matrix factorization on calcium imaging data. Neural Networks, 55, 11–19.
Shiga, M., & Mamitsuka, H. (2015). Non-negative matrix factorization with auxiliary information on overlapping groups. IEEE Transactions on Knowledge and Data Engineering, 27(6), 1615–1628.
Wang, J., Zhan, J., & Zhang, J. (2008). Towards real-time performance of data value hiding for frequent data updates. In Proceedings of the 2008 IEEE international conference on granular computing (pp. 606–611).
Witten, I. H., Frank, E., & Hall, M. A. (2016). Data mining: Practical machine learning tools and techniques. Burlington, MA: Morgan Kaufmann.
Lichman, M. (2013). UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. SIAM News, 23(5), 1 & 18.
Acknowledgements
This work is supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JQ6078), the basic research fund of Chang’an University (0009–2014G6114024) and the Chinese NNSF (National Nature Science Foundation) (61601059). The breast cancer databases (WBC data set in this paper) was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, G., Xue, R. A New Privacy-Preserving Data Mining Method Using Non-negative Matrix Factorization and Singular Value Decomposition. Wireless Pers Commun 102, 1799–1808 (2018). https://doi.org/10.1007/s11277-017-5237-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-5237-5