Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Kong, Yutinga; b; c | Qian, Yuronga; b; c; * | Tan, Fuxianga; b; c | Bai, Lua; b; c | Shao, Jinxina; b; c | Ma, Tinghuaid | Tereshchenko, Sergei Nikolayeviche
Affiliations: [a] School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China | [b] Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China | [c] Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China | [d] Nanjing University of Information Science & Technology, Nanjing, China | [e] Novosibirsk State University of Economics and Management (NSUEM), Russia
Correspondence: [*] Corresponding author. Yurong Qian, E-mail: [email protected].
Abstract: Data clustering has been applied and developed in all walks of life, which can provide convenience for enterprise service optimization. However, when the original data to be analyzed contains users’ personal privacy information, the clustering analysis process of the data holder may expose users’ privacy. Differential privacy k-means algorithm is a clustering method based on differential privacy protection technology, which can solve the privacy disclosure problem in the process of data clustering. In the differential privacy k-means algorithm, Laplacian noise controlled by privacy parameter ɛ is added to the center point of clustering to protect user sensitive information and clustering results in the original data, but the addition of noise will affect the utility of clustering. In order to balance the availability and privacy of the differential privacy k-means clustering algorithm, the research on the improvement of the algorithm pays more attention to the selection of the initial clustering center or the optimization of the outlier processing, but does not consider the different contribution degree of each dimension data to the clustering. Therefore, this paper proposes a differential privacy CVDP k-means clustering algorithm based on coefficient of variation. The CVDP scheme first eliminates outliers in the original data through data density, and then designs weighted data point similarity calculation method and initial centroid selection method using variation coefficient. Experimental results show that CVDP k-means algorithm has some improvements in availability, performance and privacy.
Keywords: Differential privacy, differential privacy k-means clustering, coefficient of variation, CVDP k-means
DOI: 10.3233/JIFS-213564
Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 5, pp. 6027-6045, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]