Abstract
We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.
- Adar, E. 2007. User 4xxxxx9: Anonymizing query logs. In Proceedings of the Workshop in Query Log Analysis: Social and Technological Challenges (WWW'07).Google Scholar
- Albert, R., Jeong, H., and Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.Google Scholar
- AOL. AOL Research Web site, no longer online. http://research.aol.com.Google Scholar
- Arrington, M. 2006. AOL proudly releases massive amounts of private data. http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.Google Scholar
- Baeza-Yates, R. 2007. Graphs from search engine queries. In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'07). Springer, 1--8. Google ScholarDigital Library
- Baeza-Yates, R., Jones, R., and Poblete, B. 2010. Issues with privacy preservation in query log mining. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, F. Bonchi and E. Ferrari, Eds. Chapman and Hall/CRC Press.Google Scholar
- Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times.Google Scholar
- Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10. Google ScholarDigital Library
- Chen, B.-C., Kifer, D., LeFevre, K., and Machanavajjhala, A. 2009. Privacy-Preserving Data Publishing. Vol. 2. Now Publishers Inc. Google ScholarDigital Library
- Clifton, C., Kantarcioglu, M., and J.Vaidya. 2002. Defining privacy for data mining. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.Google Scholar
- Cooper, A. 2008. A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2, 4. Google ScholarDigital Library
- Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 909--914. Google ScholarDigital Library
- Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 629--638. Google ScholarDigital Library
- Poblete, B., Spiliopoulou, M., and Baeza-Yates, R. 2008. Website privacy preservation for query log publishing. In Proceedings of the 1st SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD'07). Lecture Notes in Computer Science. vol. 4890. Springer. Google ScholarDigital Library
- Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10, 5, 557--570. Google ScholarDigital Library
- Vedder, R. G., Vanecek, M. T., Guynes, C. S., and Cappel, J. J. 1999. CEO and CIO perspectives on competitive intelligence. Comm. ACM 42, 8, 108--116. Google ScholarDigital Library
- Verykios, V., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., and Theodoridis, Y. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50--57. Google ScholarDigital Library
- Zack, M. H. 1999. Developing a knowledge strategy. California Management Review 41, 125--145.Google ScholarCross Ref
- Zanasi, A. 1998. Competitive intelligence through data mining public sources. Compet. Intell. Rev. 9, 1, 44--54.Google ScholarCross Ref
Index Terms
- Privacy-preserving query log mining for business confidentiality protection
Recommendations
Privacy-preserving process mining: A microaggregation-based approach
AbstractThe proper exploitation of vast amounts of event data by means of process mining techniques enables the discovery, monitoring and improvement of business processes, allowing organizations to develop more efficient business intelligence ...
Highlights- Research on privacy-preserving process mining is on the rise.
- Existing privacy-...
An effective value swapping method for privacy preserving data publishing
Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in ...
A review of privacy preserving models for multi-party data release framework
WIR '16: Proceedings of the ACM Symposium on Women in Research 2016Nowadays, with the improvement of internet technology and advancement in distributed computing data is increasing rapidly. There is a need of information sharing between organizations. Ideally, we wish to share data from multiple private databases and ...
Comments