Abstract
Query log anonymization has become an important challenge nowadays. A query log contains the search history of the users, as well as the selected results and their position in the ranking. These data are used to provide a personalized re-ranking of results and trend studies. However, query logs can disclose sensitive information of the users. Hence, query logs must be submitted to an anonymization process to guarantee that: (a) no sensitive information can be linked to an identity; (b) the analysis of the anonymized data produces similar results than the original data, i.e. minimize data distortion. Latest anonymization approaches utilize microaggregation, a statistical disclosure control technique that provides a privacy comparable with \(k\)-anonymity, attempting to minimize the data distortion. We propose a new method that uses search results to optimize microaggregation, providing more data reliability than the existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that a user can consider some information private or not according to her beliefs, i.e. whereas a user can consider her religion a public issue, another user can consider this information private. Determining what information is private or not is out of the scope of this paper. For this reason, we consider that all the information has the same importance and is private, as is made in [3].
References
Richardson, M.: Learning about the world through long-term query logs. ACM Trans. Web 2, 1–27 (2008)
Xiong, L., Agichtein, E.: Towards privacy-preserving query log publishing. In: Amitay, E., Murray, C.G., Teevan, J. (eds) Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)
He, Y., Naughton, J.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endowment 2(1), 934–945 (2009)
Adar, E.: User 4XXXXX9: anonymizing query logs. In: Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.: Utility-based anonymization for privacy preservation with less information loss. SIGKDD Explor. Newsl. 8(2), 21–30 (2006)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, pp. 195–204 (1993)
Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1465–1468 (2009)
Navarro-Arribas, G., Torra, V., Erola, A., Castellà-Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Inf. Process. Manage. 48(3), 476–487 (2012)
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 127–137. Springer, Heidelberg (2010)
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT-Stat. Oper. Res. Trans. 35(Special issue), 25–40 (2011)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Cooper, A.: A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2(4), 1–27 (2008)
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 171–180 (2009)
Poblete, B., Spiliopoulou, M., Baeza-Yates, R.: Website privacy preservation for query log publishing. In: Bonchi, F., Malin, B., Saygın, Y. (eds.) PInKDD 2007. LNCS, vol. 4890, pp. 80–96. Springer, Heidelberg (2008)
Miller, G.: WordNet - About Us. WordNet. Princeton University, Princeton (2009)
ODP. Open directory project (2011)
Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A.: Semantic annotation of biomedical literature using google. ICCSA 3, 327–337 (2005)
Gligorov, R., Aleksovski, Z., Kate, W., F. Van Harmelen, B.: Using google distance to weight approximate ontology matches. In: Proceedings of the WWW-07, pp. 767–776. ACM Press (2007)
iprospect.com, inc, iProspect Blended Search Results Study. http://www.iProspect.com (2009)
Acknowledgements
This work was partly supported by the European Commission under FP7 project Inter-Trust, by the Spanish Ministry of Science and Innovation (through projects eAEGIS TSI2007-65406-C03-01, CO-PRIVACY TIN2011-27076-C03-01, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, Audit Transparency Voting Process IPT-430000-2010-31 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia (under grant 2009 SGR 1135).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Erola, A., Castellà-Roca, J. (2014). Using Search Results to Microaggregate Query Logs Semantically. In: Garcia-Alfaro, J., Lioudakis, G., Cuppens-Boulahia, N., Foley, S., Fitzgerald, W. (eds) Data Privacy Management and Autonomous Spontaneous Security. DPM SETOP 2013 2013. Lecture Notes in Computer Science(), vol 8247. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54568-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-54568-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54567-2
Online ISBN: 978-3-642-54568-9
eBook Packages: Computer ScienceComputer Science (R0)