Processing math: 100%
Heavy Hitter Identification Over Large-Domain Set-Valued Data With Local Differential Privacy | IEEE Journals & Magazine | IEEE Xplore

Heavy Hitter Identification Over Large-Domain Set-Valued Data With Local Differential Privacy


Abstract:

Set-valued data are widely used to represent information in the real word, such as individual daily behaviors, items in shopping carts and web browsing history. By collec...Show More

Abstract:

Set-valued data are widely used to represent information in the real word, such as individual daily behaviors, items in shopping carts and web browsing history. By collecting set-valued data and identifying heavy hitters, service providers (i.e., the collector) can learn usage preferences of costumers (i.e., users), and improve the quality of their services by the learned information. However, the collection of raw data would bring privacy risks to users. Recently, local differential privacy (LDP) has emerged as a rigorous privacy framework for user private data collection. At the same time, many LDP schemes have been designed to achieve heavy hitters, but most of them are limited by the large data domain due to the huge computation cost. In this paper, we propose an LDP framework: PemSet, to efficiently identify heavy hitters from set-valued data with a large domain. In PemSet, users mainly focus on the prefix of each item (i.e., the first few bits of the binary expression of each item), and only perturb and report prefixes to reduce computation cost. Sometimes the prefixes of different items are the same, so the reported set-valued data could be a multiset, i.e., a set including multiple same items. As such, we design four LDP protocols MOLH, MOLH-S, MPCKV, MWheel to estimate frequencies of items in the multiset setting, and compare their performance under PemSet framework by experiments. Experimental results demonstrate that MOLH can perform the best in a high privacy region, i.e., \epsilon < 1 , while MWheel can obtain the highest utility when privacy budget is large, i.e., \epsilon \geqslant 1 .
Page(s): 414 - 426
Date of Publication: 16 October 2023

ISSN Information:

Funding Agency:


References

References is not available for this document.