research-article

Privacy-preserving query log mining for business confidentiality protection

Authors:

Barbara Poblete,

Myra Spiliopoulou,

Ricardo Baeza-YatesAuthors Info & Claims

ACM Transactions on the Web (TWEB), Volume 4, Issue 3

Article No.: 10, Pages 1 - 26

https://doi.org/10.1145/1806916.1806919

Published: 20 July 2010 Publication History

Abstract

We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.

References

[1]

Adar, E. 2007. User 4xxxxx9: Anonymizing query logs. In Proceedings of the Workshop in Query Log Analysis: Social and Technological Challenges (WWW'07).

[2]

Albert, R., Jeong, H., and Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.

[3]

AOL. AOL Research Web site, no longer online. http://research.aol.com.

[4]

Arrington, M. 2006. AOL proudly releases massive amounts of private data. http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.

[5]

Baeza-Yates, R. 2007. Graphs from search engine queries. In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'07). Springer, 1--8.

Digital Library

[6]

Baeza-Yates, R., Jones, R., and Poblete, B. 2010. Issues with privacy preservation in query log mining. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, F. Bonchi and E. Ferrari, Eds. Chapman and Hall/CRC Press.

[7]

Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[8]

Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times.

[9]

Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10.

Digital Library

[10]

Chen, B.-C., Kifer, D., LeFevre, K., and Machanavajjhala, A. 2009. Privacy-Preserving Data Publishing. Vol. 2. Now Publishers Inc.

Digital Library

[11]

Clifton, C., Kantarcioglu, M., and J.Vaidya. 2002. Defining privacy for data mining. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.

[12]

Cooper, A. 2008. A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2, 4.

Digital Library

[13]

Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 909--914.

Digital Library

[14]

Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 629--638.

Digital Library

[15]

Poblete, B., Spiliopoulou, M., and Baeza-Yates, R. 2008. Website privacy preservation for query log publishing. In Proceedings of the 1st SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD'07). Lecture Notes in Computer Science. vol. 4890. Springer.

Digital Library

[16]

Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10, 5, 557--570.

Digital Library

[17]

Vedder, R. G., Vanecek, M. T., Guynes, C. S., and Cappel, J. J. 1999. CEO and CIO perspectives on competitive intelligence. Comm. ACM 42, 8, 108--116.

Digital Library

[18]

Verykios, V., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., and Theodoridis, Y. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50--57.

Digital Library

[19]

Zack, M. H. 1999. Developing a knowledge strategy. California Management Review 41, 125--145.

[20]

Zanasi, A. 1998. Competitive intelligence through data mining public sources. Compet. Intell. Rev. 9, 1, 44--54.

Cited By

Pais LMónico LSampaio BFerraro TFrancis Ádos Santos N(2022)The Perception of Business Ethics in the Public and Private Sectors: a Study of Portuguese Social RepresentationsTrends in Psychology10.1007/s43076-022-00173-831:4(823-844)Online publication date: 31-Mar-2022
https://doi.org/10.1007/s43076-022-00173-8
SáNchez DCastellí-Roca JViejo A(2019)Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search enginesInformation Sciences: an International Journal10.1016/j.ins.2012.06.025218(17-30)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.1016/j.ins.2012.06.025
Berendt B(2018)More than modelling and hidingData Mining and Knowledge Discovery10.1007/s10618-012-0254-124:3(697-737)Online publication date: 26-Dec-2018
https://dl.acm.org/doi/10.1007/s10618-012-0254-1
Show More Cited By

Index Terms

Privacy-preserving query log mining for business confidentiality protection

Recommendations

Privacy-preserving process mining: A microaggregation-based approach
Abstract
The proper exploitation of vast amounts of event data by means of process mining techniques enables the discovery, monitoring and improvement of business processes, allowing organizations to develop more efficient business intelligence ...
Highlights
- Research on privacy-preserving process mining is on the rise.
- Existing privacy-...
An effective value swapping method for privacy preserving data publishing

Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in data ...
A review of privacy preserving models for multi-party data release framework
WIR '16: Proceedings of the ACM Symposium on Women in Research 2016

Nowadays, with the improvement of internet technology and advancement in distributed computing data is increasing rapidly. There is a need of information sharing between organizations. Ideally, we wish to share data from multiple private databases and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web

ACM Transactions on the Web Volume 4, Issue 3

July 2010

166 pages

ISSN:1559-1131

EISSN:1559-114X

DOI:10.1145/1806916

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2010

Accepted: 01 April 2010

Revised: 01 July 2008

Received: 01 December 2007

Published in TWEB Volume 4, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
709
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pais LMónico LSampaio BFerraro TFrancis Ádos Santos N(2022)The Perception of Business Ethics in the Public and Private Sectors: a Study of Portuguese Social RepresentationsTrends in Psychology10.1007/s43076-022-00173-831:4(823-844)Online publication date: 31-Mar-2022
https://doi.org/10.1007/s43076-022-00173-8
SáNchez DCastellí-Roca JViejo A(2019)Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search enginesInformation Sciences: an International Journal10.1016/j.ins.2012.06.025218(17-30)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.1016/j.ins.2012.06.025
Berendt B(2018)More than modelling and hidingData Mining and Knowledge Discovery10.1007/s10618-012-0254-124:3(697-737)Online publication date: 26-Dec-2018
https://dl.acm.org/doi/10.1007/s10618-012-0254-1
Agosti MCrivellari FDi Nunzio G(2018)Web log analysisData Mining and Knowledge Discovery10.1007/s10618-011-0228-824:3(663-696)Online publication date: 26-Dec-2018
https://dl.acm.org/doi/10.1007/s10618-011-0228-8
Pathak NSingh BSharma G(2017)UML 2.0 based framework for the development of secure web applicationInternational Journal of Information Technology10.1007/s41870-017-0001-39:1(101-109)Online publication date: 22-Feb-2017
https://doi.org/10.1007/s41870-017-0001-3
Gutiérrez-Soto CHubert G(2016)On The Reuse of Past Searches in Information RetrievalBusiness Intelligence10.4018/978-1-4666-9562-7.ch057(1117-1137)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-9562-7.ch057
Navarro-Arribas GTorra V(2014)Advanced Research on Data Privacy in the ARES ProjectAdvanced Research in Data Privacy10.1007/978-3-319-09885-2_1(3-14)Online publication date: 22-Aug-2014
https://doi.org/10.1007/978-3-319-09885-2_1
Torra VNavarro-Arribas G(2014)Data privacyWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11294:4(269-280)Online publication date: 2-Jun-2014
https://doi.org/10.1002/widm.1129
Batet MErola ASánchez DCastellà-Roca J(2013)Utility preserving query log anonymization via semantic microaggregationInformation Sciences10.1016/j.ins.2013.04.020242(49-63)Online publication date: Sep-2013
https://doi.org/10.1016/j.ins.2013.04.020
Navarro-Arribas GTorra VErola ACastellí-Roca J(2012)User k-anonymity for privacy preserving data mining of query logsInformation Processing and Management: an International Journal10.1016/j.ipm.2011.01.00448:3(476-487)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1016/j.ipm.2011.01.004
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

EPUB

View this article in ePub.

Figures

Tables

Media

View Issue’s Table of Contents