research-article

Releasing search queries and clicks privately

Authors:
Aleksandra Korolova

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Krishnaram Kenthapadi

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

,
Nina Mishra

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

,
Alexandros Ntoulas

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

WWW '09: Proceedings of the 18th international conference on World wide webApril 2009Pages 171–180https://doi.org/10.1145/1526709.1526733

Published:20 April 2009Publication History

WWW '09: Proceedings of the 18th international conference on World wide web

Pages 171–180

ABSTRACT

The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data.

References

E. Adar. User 4xxxxx9: Anonymizing query logs. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.Google Scholar
M. Arrington. AOL proudly releases massive amounts of private data. August 2006.Google Scholar
R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In KDD, pages 76--85, 2007. Google ScholarDigital Library
J. Bar-Ilan. Access to query logs -- an academic researcher's point of view. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.Google Scholar
M. Barbaro and T. Zeller. A face is exposed for AOL searcher No. 4417749. New York Times, Aug 2006.Google Scholar
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarDigital Library
K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, volume 4117, pages 198--213, 2006. Google ScholarDigital Library
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007. Google ScholarDigital Library
C. Dwork. An ad omnia approach to defining and achieving private data analysis. In Lecture Notes in Computer Science, volume 4890, pages 1--13. Springer, 2008. Google ScholarDigital Library
C. Dwork, K. . Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, volume 4004, pages 486--503, 2006. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265--284, 2006. Google ScholarDigital Library
D. Fallows. Search engine users. Pew Internet and American Life Project, 2005.Google Scholar
A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, pages 61--70, 2008. Google ScholarDigital Library
A. Horowitz, D. Jacobson, T. McNichol, and O. Thomas. 101 dumbest moments in business, the year's biggest boors, buffoons, and blunderers. In CNN Money, 2007.Google Scholar
T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR, pages 154--161, 2005. Google ScholarDigital Library
R. Jones, R. Kumar, B. Pang, and A.Tomkins. "I know what you did last summer": query logs and user privacy. In CIKM, pages 909--914, 2007. Google ScholarDigital Library
R. Jones, R. Kumar, B. Pang, and A. Tomkins. Vanity fair: Privacy in querylog bundles. In CIKM, pages 853--862, 2008. Google ScholarDigital Library
R. Kessler, M. Stein, and P. Berglund. Social Phobia Subtypes in the National Comorbidity Survey. Am J Psychiatry, 155(5):613--619, 1998.Google Scholar
R. Kumar, J. Novak, B. Pang, and A. Tomkins. On anonymizing query logs via token-based hashing. In WWW, pages 629--638, 2007. Google ScholarDigital Library
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarDigital Library
F. McSherry and K. Talwar. Private communication. 2008.Google Scholar
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, pages 111--125, 2008. Google ScholarDigital Library
K. Nissim. Private data analysis via output perturbation. In Privacy-Preserving Data Mining: Models and Algorithms, pages 383--414. Springer, 2008.Google Scholar
B. Tancer. Click: What Millions of People Are Doing Online and Why it Matters. Hyperion, 2008.Google Scholar
L. Xiong and E. Agichtein. Towards privacy-preserving query log publishing. In Query Log Analysis: Social And Technological Challenges Workshop in WWW, 2007.Google Scholar

Index Terms

Releasing search queries and clicks privately

Recommendations

UMicS: from anonymized data to usable microdata
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side,...
Read More
Differentially private search log sanitization with optimal output utility
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

Web search logs contain extremely sensitive data, as evidenced by the recent AOL incident. However, storing and analyzing search logs can be very useful for many purposes (i.e. investigating human behavior). Thus, an important research question is how ...
Read More
A semantic-preserving differentially private method for releasing query logs
Highlights
- We discuss the challenges and particularities of privacy-preserving releases of query logs.
Abstract
Query logs are of great interest for data analysis. They allow characterizing user profiles, user behaviors and search habits. However, since query logs usually contain personal information, data controllers should implement ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '09: Proceedings of the 18th international conference on World wide web
April 2009
1280 pages
ISBN:9781605584874
DOI:10.1145/1526709
General Chairs:
Juan Quemada
DIT-UPM
,
Gonzalo León
DIT-UPM
,
Program Chairs:
Yoelle Maarek
Google Inc., Israel
,
Wolfgang Nejdl
L3S and Hannover University
Copyright © 2009 IW3C2 org
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data release
differential privacy
query click graph
search logs
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 127
  Total Citations
  View Citations
- 831
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Releasing search queries and clicks privately

WWW '09: Proceedings of the 18th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

UMicS: from anonymized data to usable microdata

Differentially private search log sanitization with optimal output utility

A semantic-preserving differentially private method for releasing query logs