Abstract
Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. The same is true in Web site logs where queries and later actions are recorded from search engine referrers or from an internal search box. In this paper we present two applications based in analyzing and clustering queries. The first one suggest changes to improve the text and structure of a Web site and the second does relevance ranking boosting and query recommendation in search engines.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Akwan Information Technologies. Myweb search, http://www.akwan.com.br
Baeza-Yates, R.: Excavando la web (mining the web, original in Spanish). El profesional de la información (The Information Professional) 13(1), 4–10 (2004)
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Ranking boosting based in query clustering. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034. Springer, Heidelberg (2004)
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query Recommendation Using Query Logs in Search Engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley & ACM Press (1999)
Baeza-Yates, R., Castillo, C.: Relating web structure and user search behavior (extended poster). In: 10th World Wide Web Conference, Hong Kong, China (May 2001)
Baeza-Yates, R., Saint-Jean, F.: A three level search engine index based in query log distribution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 56–65. Springer, Heidelberg (2003)
Baeza-Yates, R.: Query Usage Mining in Search Engines. In: Scime, A. (ed.) Web Mining: Applications and Techniques, pp. 307–321. Idea Group, USA (2004)
Baeza-Yates, R., Poblete, B.: A Web Usage and Content Mining Tool Centered in Queries (2004) (submitted)
Batista, P., Silva, M.J.: Mining on-line newspaper web access logs. In: RPEC2- Workshop on recommendation and personalization on e-commerce, Spain (2002)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: KDD 2000, Boston, MA, USA, pp. 407–416 (2000)
Cooley, R., Tan, P., Srivastava, J.: Websift: the web site information filter system (1999)
Davison, B.D., Deschenes, D.G., Lewanda, D.B.: Finding relevant website queries. In: Poster Proceedings of the Twelfth International World Wide Web Conference, Budapest, Hungary (May 2003)
Ding, C., Chi, C.: Towards an adaptive and task-specific ranking mechanism in web searching (poster session). In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece, pp. 375–376. ACM Press, New York (2000), http://doi.acm.org/10.1145/345508.345663
DirectHit: Main Page (1998), http://www.directhit.com
Fonseca, B.M., Golgher, P.B., De Moura, E.S., Ziviani, N.: Using association rules to discovery search engines related queries. In: First Latin American Web Congress (LA-WEB 2003), Santiago, Chile (November 2003)
Hlscher, C., Strube, G.: Web Search Behavior of Internet Experts and Newbies. In: WWW9, Amsterdam, Netherlands, May 15 - 19 (2000)
Huang, Z., Ng, J., Cheung, D., Ng, M., Ching, W.: A cube model for web access sessions and cluster analysis (2001)
Karypis, G.: CLUTO, a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota (2002), Available at http://www.cs.umn.edu/~cluto
Markatos, E.P.: On Caching Search Engine Query Results. In: Proceedings of the 5th International Web Caching and Content Delivery Workshop (May 2000)
Masseglia, F., Poncelet, P., Teisseire, M.: Using data mining techniques on web access logs to dynamically improve hypertext structure (1999)
Oconnor, M., Herlocker, J.: Clustering items for collaborative filtering. Technical report, University of Minnesota, Minneapolis, MN (1999), http://www.cs.umbc.edu/~ian/sigir99-rec/papers
Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 396–407 (2000)
Pirolli, P.: Computational Models of Information Scent-Following in a Very Large Browsable Text Collection. In: Human Factors in Computing Systems: Proceedings of the CHI 1997 Conference, pp. 3–10. ACM Press, New York (1997)
Pramudiono, I., Shintani, T., Takahashi, K., Kitsuregawa, M.: User Behavior Analysis of Location Aware Search Engine. Mobile Data Management, 139–145 (2002)
Saraiva, P.C., de Moura, E.S., Ziviani, N., Meira, W., Fonseca, R., Ribeiro-Neto, B.: Rank-preserving two-level caching for scalable search engines. In: Proceedings of the 24th annual international ACM Conference on Research and Development in Information Retrieval, New Orleans, USA, September 2001, pp. 51–58 (2001)
Schaale, A., Wulf-Mathies, C., Lieberam-Schmidt, S.: A new approach to relevancy in Internet searching - the SVox Populi Algorithm T, arXiv.org e-Print archive (August. 2003)
Seno, M., Karypis, G.: LPMINER: An algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 505–512. IEEE Computer Society, Los Alamitos (2001)
Silverstein, C., Henzinger, M., Hannes, M., Moricz, M.: Analysis of a very large alta vista query log. SIGIR Forum 33(3), 6–12 (1999)
Spiliopoulou, M., Faulstich, L.C.: WUM: a Web Utilization Miner. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 109–115. Springer, Heidelberg (1999)
Spiliopoulou, M., Pohle, C., Faulstich, L.: Improving the effectiveness of a web site with web usage mining. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 142–162. Springer, Heidelberg (2000)
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: the public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From E-Sex to E-Commerce: Web Search Changes. IEEE Computer 35(3), 107–109 (2002)
Spink, A., Ozmutlu, S., Ozmutlu, H.C., Jansen, B.J.: U.S. Versus European Web Searching Trends. SIGIR Forum 26(2) (2002)
Todocl - Todo Chile en Internet (2002), http://www.todocl.cl/
Wen, J., Mie, J., Zhang, H.: Clustering user queries of a search engine. In: Proc. at 10th International World Wide Web Conference, W3C (2001)
Wolfram, D.: A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. In: Proceedings of the 28th Annual Conference Canadian Association for Information Science (2000)
Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. Infocom (2002)
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with the local context analysis. ACM Transaction of Information Systems 1(18), 79–112 (2000)
Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Lu, C.-J.: Log Mining to Improve the Performance of Site Search. In: 1st Int. Workshop for Enhanced Web Search (MEWS 2002), Singapore, pp. 238–245. IEEE CS Press, Los Alamitos (2002)
Zaiane, O.R., Strilets, A.: Finding similar queries to satisfy searches based on query traces. In: Proceedings of the International Workshop on Efficient Web-Based Information Systems (EWIS), Montpellier, France (September 2002)
Zhang, D., Dong, Y.: A novel web usage mining approach for search engines. Computer Networks 39(3), 303–310 (2002)
Zhao, Y., Karypis, G.: Comparison of agglomerative and partitional document clustering algorithms. In: SIAM Workshop on Clustering High-dimensional Data and its Applications (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R. (2005). Applications of Web Query Mining. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)