skip to main content
research-article

Privacy-preserving query log mining for business confidentiality protection

Published: 20 July 2010 Publication History

Abstract

We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.

References

[1]
Adar, E. 2007. User 4xxxxx9: Anonymizing query logs. In Proceedings of the Workshop in Query Log Analysis: Social and Technological Challenges (WWW'07).
[2]
Albert, R., Jeong, H., and Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.
[3]
AOL. AOL Research Web site, no longer online. http://research.aol.com.
[4]
Arrington, M. 2006. AOL proudly releases massive amounts of private data. http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.
[5]
Baeza-Yates, R. 2007. Graphs from search engine queries. In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'07). Springer, 1--8.
[6]
Baeza-Yates, R., Jones, R., and Poblete, B. 2010. Issues with privacy preservation in query log mining. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, F. Bonchi and E. Ferrari, Eds. Chapman and Hall/CRC Press.
[7]
Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[8]
Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times.
[9]
Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10.
[10]
Chen, B.-C., Kifer, D., LeFevre, K., and Machanavajjhala, A. 2009. Privacy-Preserving Data Publishing. Vol. 2. Now Publishers Inc.
[11]
Clifton, C., Kantarcioglu, M., and J.Vaidya. 2002. Defining privacy for data mining. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.
[12]
Cooper, A. 2008. A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2, 4.
[13]
Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 909--914.
[14]
Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 629--638.
[15]
Poblete, B., Spiliopoulou, M., and Baeza-Yates, R. 2008. Website privacy preservation for query log publishing. In Proceedings of the 1st SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD'07). Lecture Notes in Computer Science. vol. 4890. Springer.
[16]
Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10, 5, 557--570.
[17]
Vedder, R. G., Vanecek, M. T., Guynes, C. S., and Cappel, J. J. 1999. CEO and CIO perspectives on competitive intelligence. Comm. ACM 42, 8, 108--116.
[18]
Verykios, V., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., and Theodoridis, Y. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50--57.
[19]
Zack, M. H. 1999. Developing a knowledge strategy. California Management Review 41, 125--145.
[20]
Zanasi, A. 1998. Competitive intelligence through data mining public sources. Compet. Intell. Rev. 9, 1, 44--54.

Cited By

View all
  • (2022)The Perception of Business Ethics in the Public and Private Sectors: a Study of Portuguese Social RepresentationsTrends in Psychology10.1007/s43076-022-00173-831:4(823-844)Online publication date: 31-Mar-2022
  • (2019)Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search enginesInformation Sciences: an International Journal10.1016/j.ins.2012.06.025218(17-30)Online publication date: 6-Jan-2019
  • (2018)More than modelling and hidingData Mining and Knowledge Discovery10.1007/s10618-012-0254-124:3(697-737)Online publication date: 26-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 4, Issue 3
July 2010
166 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/1806916
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2010
Accepted: 01 April 2010
Revised: 01 July 2008
Received: 01 December 2007
Published in TWEB Volume 4, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Privacy preservation
  2. Web sites
  3. queries
  4. query log publication

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)The Perception of Business Ethics in the Public and Private Sectors: a Study of Portuguese Social RepresentationsTrends in Psychology10.1007/s43076-022-00173-831:4(823-844)Online publication date: 31-Mar-2022
  • (2019)Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search enginesInformation Sciences: an International Journal10.1016/j.ins.2012.06.025218(17-30)Online publication date: 6-Jan-2019
  • (2018)More than modelling and hidingData Mining and Knowledge Discovery10.1007/s10618-012-0254-124:3(697-737)Online publication date: 26-Dec-2018
  • (2018)Web log analysisData Mining and Knowledge Discovery10.1007/s10618-011-0228-824:3(663-696)Online publication date: 26-Dec-2018
  • (2017)UML 2.0 based framework for the development of secure web applicationInternational Journal of Information Technology10.1007/s41870-017-0001-39:1(101-109)Online publication date: 22-Feb-2017
  • (2016)On The Reuse of Past Searches in Information RetrievalBusiness Intelligence10.4018/978-1-4666-9562-7.ch057(1117-1137)Online publication date: 2016
  • (2014)Advanced Research on Data Privacy in the ARES ProjectAdvanced Research in Data Privacy10.1007/978-3-319-09885-2_1(3-14)Online publication date: 22-Aug-2014
  • (2014)Data privacyWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11294:4(269-280)Online publication date: 2-Jun-2014
  • (2013)Utility preserving query log anonymization via semantic microaggregationInformation Sciences10.1016/j.ins.2013.04.020242(49-63)Online publication date: Sep-2013
  • (2012)User k-anonymity for privacy preserving data mining of query logsInformation Processing and Management: an International Journal10.1016/j.ipm.2011.01.00448:3(476-487)Online publication date: 1-May-2012
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EPUB

View this article in ePub.

ePub

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media