CVDP k-means clustering algorithm for differential privacy based on coefficient of variation

Kong, Yuting; Qian, Yurong; Tan, Fuxiang; Bai, Lu; Shao, Jinxin; Ma, Tinghuai; Tereshchenko, Sergei Nikolayevich

doi:10.3233/JIFS-213564

CVDP k-means clustering algorithm for differential privacy based on coefficient of variation

Article type: Research Article

Authors: Kong, Yuting^{a; b; c} | Qian, Yurong^{a; b; c; *} | Tan, Fuxiang^{a; b; c} | Bai, Lu^{a; b; c} | Shao, Jinxin^{a; b; c} | Ma, Tinghuai^d | Tereshchenko, Sergei Nikolayevich^e

Affiliations: [a] School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China | [b] Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China | [c] Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China | [d] Nanjing University of Information Science & Technology, Nanjing, China | [e] Novosibirsk State University of Economics and Management (NSUEM), Russia

Correspondence: [*] Corresponding author. Yurong Qian, E-mail: [email protected].

Abstract: Data clustering has been applied and developed in all walks of life, which can provide convenience for enterprise service optimization. However, when the original data to be analyzed contains users’ personal privacy information, the clustering analysis process of the data holder may expose users’ privacy. Differential privacy k-means algorithm is a clustering method based on differential privacy protection technology, which can solve the privacy disclosure problem in the process of data clustering. In the differential privacy k-means algorithm, Laplacian noise controlled by privacy parameter ɛ is added to the center point of clustering to protect user sensitive information and clustering results in the original data, but the addition of noise will affect the utility of clustering. In order to balance the availability and privacy of the differential privacy k-means clustering algorithm, the research on the improvement of the algorithm pays more attention to the selection of the initial clustering center or the optimization of the outlier processing, but does not consider the different contribution degree of each dimension data to the clustering. Therefore, this paper proposes a differential privacy CVDP k-means clustering algorithm based on coefficient of variation. The CVDP scheme first eliminates outliers in the original data through data density, and then designs weighted data point similarity calculation method and initial centroid selection method using variation coefficient. Experimental results show that CVDP k-means algorithm has some improvements in availability, performance and privacy.

Keywords: Differential privacy, differential privacy k-means clustering, coefficient of variation, CVDP k-means

DOI: 10.3233/JIFS-213564

Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 5, pp. 6027-6045, 2022

Published: 22 September 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia