skip to main content
research-article

Privacy-preserving query log mining for business confidentiality protection

Published:20 July 2010Publication History
Skip Abstract Section

Abstract

We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.

References

  1. Adar, E. 2007. User 4xxxxx9: Anonymizing query logs. In Proceedings of the Workshop in Query Log Analysis: Social and Technological Challenges (WWW'07).Google ScholarGoogle Scholar
  2. Albert, R., Jeong, H., and Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.Google ScholarGoogle Scholar
  3. AOL. AOL Research Web site, no longer online. http://research.aol.com.Google ScholarGoogle Scholar
  4. Arrington, M. 2006. AOL proudly releases massive amounts of private data. http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.Google ScholarGoogle Scholar
  5. Baeza-Yates, R. 2007. Graphs from search engine queries. In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'07). Springer, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Baeza-Yates, R., Jones, R., and Poblete, B. 2010. Issues with privacy preservation in query log mining. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, F. Bonchi and E. Ferrari, Eds. Chapman and Hall/CRC Press.Google ScholarGoogle Scholar
  7. Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times.Google ScholarGoogle Scholar
  9. Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, B.-C., Kifer, D., LeFevre, K., and Machanavajjhala, A. 2009. Privacy-Preserving Data Publishing. Vol. 2. Now Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Clifton, C., Kantarcioglu, M., and J.Vaidya. 2002. Defining privacy for data mining. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.Google ScholarGoogle Scholar
  12. Cooper, A. 2008. A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 909--914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 629--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Poblete, B., Spiliopoulou, M., and Baeza-Yates, R. 2008. Website privacy preservation for query log publishing. In Proceedings of the 1st SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD'07). Lecture Notes in Computer Science. vol. 4890. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10, 5, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Vedder, R. G., Vanecek, M. T., Guynes, C. S., and Cappel, J. J. 1999. CEO and CIO perspectives on competitive intelligence. Comm. ACM 42, 8, 108--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Verykios, V., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., and Theodoridis, Y. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zack, M. H. 1999. Developing a knowledge strategy. California Management Review 41, 125--145.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zanasi, A. 1998. Competitive intelligence through data mining public sources. Compet. Intell. Rev. 9, 1, 44--54.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Privacy-preserving query log mining for business confidentiality protection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 4, Issue 3
          July 2010
          166 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/1806916
          Issue’s Table of Contents

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 July 2010
          • Accepted: 1 April 2010
          • Revised: 1 July 2008
          • Received: 1 December 2007
          Published in tweb Volume 4, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        ePub

        View this article in ePub.

        View ePub