skip to main content
10.1145/1526709.1526733acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Releasing search queries and clicks privately

Published:20 April 2009Publication History

ABSTRACT

The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data.

References

  1. E. Adar. User 4xxxxx9: Anonymizing query logs. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.Google ScholarGoogle Scholar
  2. M. Arrington. AOL proudly releases massive amounts of private data. August 2006.Google ScholarGoogle Scholar
  3. R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In KDD, pages 76--85, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bar-Ilan. Access to query logs -- an academic researcher's point of view. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.Google ScholarGoogle Scholar
  5. M. Barbaro and T. Zeller. A face is exposed for AOL searcher No. 4417749. New York Times, Aug 2006.Google ScholarGoogle Scholar
  6. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, volume 4117, pages 198--213, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dwork. An ad omnia approach to defining and achieving private data analysis. In Lecture Notes in Computer Science, volume 4890, pages 1--13. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork, K. . Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, volume 4004, pages 486--503, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Fallows. Search engine users. Pew Internet and American Life Project, 2005.Google ScholarGoogle Scholar
  13. A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, pages 61--70, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Horowitz, D. Jacobson, T. McNichol, and O. Thomas. 101 dumbest moments in business, the year's biggest boors, buffoons, and blunderers. In CNN Money, 2007.Google ScholarGoogle Scholar
  15. T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR, pages 154--161, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Jones, R. Kumar, B. Pang, and A.Tomkins. "I know what you did last summer": query logs and user privacy. In CIKM, pages 909--914, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Jones, R. Kumar, B. Pang, and A. Tomkins. Vanity fair: Privacy in querylog bundles. In CIKM, pages 853--862, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kessler, M. Stein, and P. Berglund. Social Phobia Subtypes in the National Comorbidity Survey. Am J Psychiatry, 155(5):613--619, 1998.Google ScholarGoogle Scholar
  19. R. Kumar, J. Novak, B. Pang, and A. Tomkins. On anonymizing query logs via token-based hashing. In WWW, pages 629--638, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. McSherry and K. Talwar. Private communication. 2008.Google ScholarGoogle Scholar
  22. A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, pages 111--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Nissim. Private data analysis via output perturbation. In Privacy-Preserving Data Mining: Models and Algorithms, pages 383--414. Springer, 2008.Google ScholarGoogle Scholar
  24. B. Tancer. Click: What Millions of People Are Doing Online and Why it Matters. Hyperion, 2008.Google ScholarGoogle Scholar
  25. L. Xiong and E. Agichtein. Towards privacy-preserving query log publishing. In Query Log Analysis: Social And Technological Challenges Workshop in WWW, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Releasing search queries and clicks privately

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader