research-article

Learning to extract cross-session search tasks

Authors:

Ming-Wei Chang,

Wei ChuAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 1353 - 1364

https://doi.org/10.1145/2488388.2488507

Published: 13 May 2013 Publication History

Abstract

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user search intent. Most prior research in this area has focused on short-term search tasks within a single search session, and heavily depend on human annotations for supervised classification model learning. In this work, we target the identification of long-term, or cross-session, search tasks (transcending session boundaries) by investigating inter-query dependencies learned from users' searching behaviors. A semi-supervised clustering model is proposed based on the latent structural SVM framework, and a set of effective automatic annotation rules are proposed as weak supervision to release the burden of manual annotation. Experimental results based on a large-scale search log collected from Bing.com confirms the effectiveness of the proposed model in identifying cross-session search tasks and the utility of the introduced weak supervision signals. Our learned model enables a more comprehensive understanding of users' search behaviors via search logs and facilitates the development of dedicated search-engine support for long-term tasks.

References

[1]

E. Agichtein, R. W. White, S. T. Dumais, and P. N. Bennet. Search, interrupted: understanding and predicting search task continuation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 315--324. ACM, 2012.

Digital Library

[2]

P. Anick. Using terminological feedback for web search refinement: a log-based study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 88--95. ACM, 2003.

Digital Library

[3]

D. Cai, X. He, X. Wang, H. Bao, and J. Han. Locality preserving nonnegative matrix factorization. In IJCAI'09, pages 1010--1015, 2009.

Digital Library

[4]

L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN systems, 27(6):1065--1073, 1995.

Digital Library

[5]

M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Structured output learning with indirect supervision. In ICML'10, 2010.

[6]

W. W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 475--480. ACM, 2002.

Digital Library

[7]

S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP, volume 4, pages 293--300, 2004.

[8]

T. Finley and T. Joachims. Supervised clustering with support vector machines. In Proceedings of the 22nd international conference on Machine learning, pages 217--224. ACM, 2005.

Digital Library

[9]

D. He, A. Göker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing & Management, 38(5):727--742, 2002.

Digital Library

[10]

A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys (CSUR), 31(3):264--323, 1999.

Digital Library

[11]

R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 699--708. ACM, 2008.

Digital Library

[12]

R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387--396. ACM, 2006.

Digital Library

[13]

A. Kotov, P. N. Bennett, R. W. White, S. T. Dumais, and J. Teevan. Modeling and analysis of cross-session search tasks. SIGIR'11, pages 5--14, 2011.

Digital Library

[14]

Z. Liao, Y. Song, L.-w. He, and Y. Huang. Evaluating the effectiveness of search task trails. In Proceedings of the 21st international conference on World Wide Web, pages 489--498. ACM, 2012.

Digital Library

[15]

C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Identifying task-based sessions in search engine query logs. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 277--286. ACM, 2011.

Digital Library

[16]

X. Luo. On coreference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25--32. Association for Computational Linguistics, 2005.

Digital Library

[17]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248. ACM, 2005.

Digital Library

[18]

D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 131--138. ACM, 2006.

Digital Library

[19]

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACM SIGIR Forum, volume 33, pages 6--12. ACM, 1999.

Digital Library

[20]

A. Spink, M. Park, B. Jansen, and J. Pedersen. Multitasking during web search sessions. Information Processing & Management, 42(1):264--275, 2006.

Digital Library

[21]

V. Vapnik. The nature of statistical learning theory. springer, 1999.

Digital Library

[22]

K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In ICML'01, pages 577--584, 2001.

Digital Library

[23]

R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on World Wide Web, pages 21--30. ACM, 2007.

Digital Library

[24]

C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1169--1176. ACM, 2009.

Digital Library

Cited By

Ates NYaslan Y(2025)Search task extraction using k-contour based recurrent deep graph clusteringEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109501139(109501)Online publication date: Jan-2025
https://doi.org/10.1016/j.engappai.2024.109501
Chen HDou ZMao J(2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-981-96-1024-2_2
Chen HDou ZZhu YWen J(2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3419131
Show More Cited By

Index Terms

Learning to extract cross-session search tasks
1. Information systems
  1. Information retrieval

Recommendations

SPL-LDP: a label distribution propagation method for semi-supervised partial label learning
Abstract
Partial label learning learns from examples represented by a single instance while associated with multiple candidate labels, among which only one valid label resides. However, in real-world applications, collecting candidate label sets for all ...
Partial Label Learning via Feature-Aware Disambiguation
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Partial label learning deals with the problem where each training example is represented by a feature vector while associated with a set of candidate labels, among which only one label is valid. To learn from such ambiguous labeling information, the key ...
Hybrid supervised instance segmentation by learning label noise suppression
Abstract
To reach top accuracy, current fully supervised instance segmentation methods severely rely on large-scale pixel-wise labeled datasets. They are usually expensive and time-consuming to obtain. Though weakly or semi-supervised methods ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
386
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ates NYaslan Y(2025)Search task extraction using k-contour based recurrent deep graph clusteringEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109501139(109501)Online publication date: Jan-2025
https://doi.org/10.1016/j.engappai.2024.109501
Chen HDou ZMao J(2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-981-96-1024-2_2
Chen HDou ZZhu YWen J(2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3419131
Shah CWhite RThomas PMitra BSarkar SBelkin N(2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
https://dl.acm.org/doi/10.1145/3576840.3578288
Ma SChen CMao JTian QJiang XChen HDuh WHuang HKato MMothe JPoblete B(2023)Session Search with Pre-trained Graph Classification ModelProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591766(953-962)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591766
Chen HDou ZZhu QZuo XWen J(2023)Integrating Representation and Interaction for Context-Aware Document RankingACM Transactions on Information Systems10.1145/352995541:1(1-23)Online publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1145/3529955
Tian YZhou KPelleg D(2023)Characterization and Prediction of Mobile TasksACM Transactions on Information Systems10.1145/352271141:1(1-39)Online publication date: 9-Jan-2023
https://dl.acm.org/doi/10.1145/3522711
Bashir SKhattak A(2023)Private Web Search Using Proxy-Query Based Query Obfuscation SchemeIEEE Access10.1109/ACCESS.2023.323500011(3607-3625)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3235000
Garigliotti DBalog KHose KBjerva J(2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
https://doi.org/10.1017/S1351324923000219
Li PZhang BZhang Y(2022)Extracting Searching as Learning Tasks Based on IBRT ApproachApplied Sciences10.3390/app1212587912:12(5879)Online publication date: 9-Jun-2022
https://doi.org/10.3390/app12125879
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten