skip to main content
10.1145/3404835.3463025acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Identifying Queries in Instant Search Logs

Published: 11 July 2021 Publication History

Abstract

Query logs of search engines with instant search functionality are challenging for log analysis, since the log entries represent interactions at the keystroke level, rather than at the query level. To enable log analyses at the query level, a user's logged sequence of keystroke-level interactions needs to be mapped to distinct queries. This problem bears strong parallels to session detection in "standard" query logs (i.e., forming groups of subsequent queries on the same topic), but there are salient differences. In this paper, we present a new approach to identifying interactions belonging to the same query in instant query logs. In an experimental comparison, our new approach achieves an F2 score of 0.93 compared to only 0.83 of a state-of-the-art cascading method for query log session detection.

Supplementary Material

MP4 File (sigir21-presentation.mp4)
Query logs of search engines with instant search functionality are challenging for log analysis, since the log entries represent interactions at the keystroke level, rather than at the query level. To enable log analyses at the query level, a user's logged sequence of keystroke-level interactions needs to be mapped to distinct queries. This problem bears strong parallels to session detection in "standard'' query logs (i.e., forming groups of subsequent queries on the same topic), but there are salient differences. In this paper, we present a new approach to identifying interactions belonging to the same query in instant query logs. In an experimental comparison, our new approach achieves an F_2~score of~0.93 compared to only~0.83 of a state-of-the-art cascading method for query log session detection.

References

[1]
Nikolai Buzikashvili and Bernard J. Jansen. 2006. Limits of the web log analysis artifacts. In Workshop on Logging Traces of Web Activity: The Mechanics of Data Collection at the Fifteenth International World Wide Web Conference (WWW 2006), May 22--26, 2006, Edinburgh, Scotland.
[2]
Inci Cetindil, Jamshid Esmaelnezhad, Chen Li, and David Newman. 2012. Analysis of instant search query logs. In Proceedings of the 15th International Workshop on the Web and Databases 2012, WebDB 2012, Scottsdale, AZ, USA, May 20, 2012, Zachary G. Ives and Yannis Velegrakis (Eds.). 7--12. http://db.disi.unitn.eu/pages/WebDB2012/papers/p3.pdf
[3]
Shui-Lung Chuang and Lee-Feng Chien. 2004. A practical web-based approach to generating topic hierarchy for text segments. In Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, CIKM 2014, Washington, DC, USA, November 8--13, 2004, David A. Grossman, Luis Gravano, ChengXiang Zhai, Otthein Herzog, and David A. Evans (Eds.). ACM, 127--136. https://doi.org/10.1145/1031171.1031193
[4]
Doug Downey, Susan T. Dumais, and Eric Horvitz. 2007. Models of searching and browsing: Languages, studies, and application. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, January 6--12, 2007, Manuela M. Veloso (Ed.). 2740--2747. http://ijcai.org/Proceedings/07/Papers/440.pdf
[5]
Daniel Gayo-Avello. 2009. A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., Vol. 179, 12 (2009), 1822--1843. https://doi.org/10.1016/j.ins.2009.01.026
[6]
Pedro Gomes, Bruno Martins, and Lu'i s Cruz. 2019. Segmenting user sessions in search engine query logs leveraging word embeddings. In Digital Libraries for Open Knowledge - 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9--12, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11799), Antoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, and Adam Jatowt (Eds.). Springer, 185--199. https://doi.org/10.1007/978--3-030--30760--8_17
[7]
Matthias Hagen, Jakob Gomoll, Anna Beyer, and Benno Stein. 2013. From search session detection to search mission detection. In Open research Areas in Information Retrieval, OAIR '13, Lisbon, Portugal, May 15--17, 2013, Jo a o Ferreira, Jo a o Magalh a es, and Pá vel Calado (Eds.). ACM, 85--92. http://dl.acm.org/citation.cfm?id=2491769
[8]
Daqing He and Ayse Göker. 2000. Detecting session boundaries from web user logs. In Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research, Cambridge, UK. 57--66.
[9]
Jeff Huang and Efthimis N. Efthimiadis. 2009. Analyzing and evaluating query reformulation strategies in web search logs. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009, David Wai-Lok Cheung, Il-Yeol Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin (Eds.). ACM, 77--86. https://doi.org/10.1145/1645953.1645966
[10]
Bernard J. Jansen, Amanda Spink, Chris Blakely, and Sherry Koshman. 2007 b. Defining a session on web search engines. J. Assoc. Inf. Sci. Technol., Vol. 58, 6 (2007), 862--871. https://doi.org/10.1002/asi.20564
[11]
Bernard J. Jansen, Amanda Spink, and Bhuva Narayan. 2007 a. Query modifications patterns during web searching. In Fourth International Conference on Information Technology: New Generations, ITNG 2007, 2--4 April 2007, Las Vegas, Nevada, USA, Shahram Latifi (Ed.). IEEE Computer Society, 439--444. https://doi.org/10.1109/ITNG.2007.164
[12]
Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26--30, 2008, James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu, Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi, and Abdur Chowdhury (Eds.). ACM, 699--708. https://doi.org/10.1145/1458082.1458176
[13]
Taewoo Kim and Chen Li. 2015. RILCA: Collecting and analyzing user-behavior information in instant search using relational DBMS. In Real-Time Business Intelligence and Analytics - International Workshops, BIRTE 2015, Kohala Coast, HI, USA, August 31, 2015, BIRTE 2016, New Delhi, India, September 5, 2016, BIRTE 2017, Munich, Germany, August 28, 2017, Revised Selected Papers (Lecture Notes in Business Information Processing, Vol. 337), Malú Castellanos, Panos K. Chrysanthis, and Konstantinos Pelechrinis (Eds.). Springer, 3--18. https://doi.org/10.1007/978--3-030--24124--7_1
[14]
Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Gabriele Tolomei. 2011. Identifying task-based sessions in search engine query logs. In Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9--12, 2011, Irwin King, Wolfgang Nejdl, and Hang Li (Eds.). ACM, 277--286. https://doi.org/10.1145/1935826.1935875
[15]
Rishabh Mehrotra and Emine Yilmaz. 2016. Query log mining for inferring user tasks and needs. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19--23, 2016, Proceedings, Part III (Lecture Notes in Computer Science, Vol. 9853), Bettina Berendt, Bjö rn Bringmann, É lisa Fromont, Gemma C. Garriga, Pauli Miettinen, Nikolaj Tatti, and Volker Tresp (Eds.). Springer, 284--288. https://doi.org/10.1007/978--3--319--46131--1_36
[16]
Donald Metzler, Susan T. Dumais, and Christopher Meek. 2007. Similarity measures for short segments of text. In Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2--5, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4425), Giambattista Amati, Claudio Carpineto, and Giovanni Romano (Eds.). Springer, 16--27. https://doi.org/10.1007/978--3--540--71496--5_5
[17]
Martin Potthast, Matthias Hagen, Anna Beyer, and Benno Stein. 2014. Improving cloze test performance of language learners using web n-grams. In Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, August 23--29, 2014, Dublin, Ireland, Junichi Tsujii, and Jan Hajic (Eds.). Association for Computational Linguistics, 962--973. https://www.aclweb.org/anthology/C14--1091/
[18]
Martin Potthast, Martin Trenkmann, and Benno Stein. 2010. Netspeak: Assisting writers in choosing words. In Advances in Information Retrieval. 32nd European Conference on Information Retrieval, ECIR 2010, Milton Keynes, UK, March 28--31, 2010, Proceedings (Lecture Notes in Computer Science, Vol. 5993), Cathal Gurrin, Yulan He, Gabriella Kazai, Udo Kruschwitz, Suzanne Little, Thomas Roelleke, Stefan M. Rüger, and Keith van Rijsbergen (Eds.). Springer, Berlin Heidelberg New York, 672. https://doi.org/10.1007/978--3--642--12275-0_75
[19]
Filip Radlinski and Thorsten Joachims. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2005, Chicago, Illinois, USA, August 21--24, 2005, Robert Grossman, Roberto J. Bayardo, and Kristin P. Bennett (Eds.). ACM, 239--248. https://doi.org/10.1145/1081870.1081899
[20]
Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Implicit user modeling for personalized search. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, CIKM 2005, Bremen, Germany, October 31 -- November 5, 2005, Otthein Herzog, Hans-Jö rg Schek, Norbert Fuhr, Abdur Chowdhury, and Wilfried Teiken (Eds.). ACM, 824--831. https://doi.org/10.1145/1099554.1099747
[21]
Craig Silverstein, Monika Rauch Henzinger, Hannes Marais, and Michael Moricz. 1999. Analysis of a very large web aearch engine query log. SIGIR Forum, Vol. 33, 1 (1999), 6--12. https://doi.org/10.1145/331403.331405
[22]
Benno Stein, Martin Potthast, and Martin Trenkmann. 2010. Retrieving customary web language to assist writers. In Advances in Information Retrieval. 32nd European Conference on Information Retrieval, ECIR 2010, Milton Keynes, UK, March 28--31, 2010, Proceedings (Lecture Notes in Computer Science, Vol. 5993), Cathal Gurrin, Yulan He, Gabriella Kazai, Udo Kruschwitz, Suzanne Little, Thomas Roelleke, Stefan M. Rüger, and Keith van Rijsbergen (Eds.). Springer, Berlin Heidelberg New York, 631--635. https://doi.org/10.1007/978--3--642--12275-0_64
[23]
Ganesh Venkataraman, Abhimanyu Lad, Viet Ha-Thuc, and Dhruv Arya. 2016. Instant search: A hands-on tutorial. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016, Raffaele Perego, Fabrizio Sebastiani, Javed A. Aslam, Ian Ruthven, and Justin Zobel (Eds.). ACM, 1211--1214. https://doi.org/10.1145/2911451.2914806

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2021
2998 pages
ISBN:9781450380379
DOI:10.1145/3404835
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. instant search
  2. netspeak
  3. query identification
  4. word search engine

Qualifiers

  • Short-paper

Conference

SIGIR '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 100
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media