skip to main content
10.1145/2488388.2488507acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Learning to extract cross-session search tasks

Published: 13 May 2013 Publication History

Abstract

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user search intent. Most prior research in this area has focused on short-term search tasks within a single search session, and heavily depend on human annotations for supervised classification model learning. In this work, we target the identification of long-term, or cross-session, search tasks (transcending session boundaries) by investigating inter-query dependencies learned from users' searching behaviors. A semi-supervised clustering model is proposed based on the latent structural SVM framework, and a set of effective automatic annotation rules are proposed as weak supervision to release the burden of manual annotation. Experimental results based on a large-scale search log collected from Bing.com confirms the effectiveness of the proposed model in identifying cross-session search tasks and the utility of the introduced weak supervision signals. Our learned model enables a more comprehensive understanding of users' search behaviors via search logs and facilitates the development of dedicated search-engine support for long-term tasks.

References

[1]
E. Agichtein, R. W. White, S. T. Dumais, and P. N. Bennet. Search, interrupted: understanding and predicting search task continuation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 315--324. ACM, 2012.
[2]
P. Anick. Using terminological feedback for web search refinement: a log-based study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 88--95. ACM, 2003.
[3]
D. Cai, X. He, X. Wang, H. Bao, and J. Han. Locality preserving nonnegative matrix factorization. In IJCAI'09, pages 1010--1015, 2009.
[4]
L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN systems, 27(6):1065--1073, 1995.
[5]
M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Structured output learning with indirect supervision. In ICML'10, 2010.
[6]
W. W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 475--480. ACM, 2002.
[7]
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP, volume 4, pages 293--300, 2004.
[8]
T. Finley and T. Joachims. Supervised clustering with support vector machines. In Proceedings of the 22nd international conference on Machine learning, pages 217--224. ACM, 2005.
[9]
D. He, A. Göker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing & Management, 38(5):727--742, 2002.
[10]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys (CSUR), 31(3):264--323, 1999.
[11]
R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 699--708. ACM, 2008.
[12]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387--396. ACM, 2006.
[13]
A. Kotov, P. N. Bennett, R. W. White, S. T. Dumais, and J. Teevan. Modeling and analysis of cross-session search tasks. SIGIR'11, pages 5--14, 2011.
[14]
Z. Liao, Y. Song, L.-w. He, and Y. Huang. Evaluating the effectiveness of search task trails. In Proceedings of the 21st international conference on World Wide Web, pages 489--498. ACM, 2012.
[15]
C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Identifying task-based sessions in search engine query logs. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 277--286. ACM, 2011.
[16]
X. Luo. On coreference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25--32. Association for Computational Linguistics, 2005.
[17]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248. ACM, 2005.
[18]
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 131--138. ACM, 2006.
[19]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACM SIGIR Forum, volume 33, pages 6--12. ACM, 1999.
[20]
A. Spink, M. Park, B. Jansen, and J. Pedersen. Multitasking during web search sessions. Information Processing & Management, 42(1):264--275, 2006.
[21]
V. Vapnik. The nature of statistical learning theory. springer, 1999.
[22]
K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In ICML'01, pages 577--584, 2001.
[23]
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on World Wide Web, pages 21--30. ACM, 2007.
[24]
C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1169--1176. ACM, 2009.

Cited By

View all

Index Terms

  1. Learning to extract cross-session search tasks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '13: Proceedings of the 22nd international conference on World Wide Web
    May 2013
    1628 pages
    ISBN:9781450320351
    DOI:10.1145/2488388

    Sponsors

    • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
    • CGIBR: Comite Gestor da Internet no Brazil

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-session search task
    2. query log mining
    3. semi-supervised clustering
    4. weak supervision

    Qualifiers

    • Research-article

    Conference

    WWW '13
    Sponsor:
    • NICBR
    • CGIBR
    WWW '13: 22nd International World Wide Web Conference
    May 13 - 17, 2013
    Rio de Janeiro, Brazil

    Acceptance Rates

    WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Search task extraction using k-contour based recurrent deep graph clusteringEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109501139(109501)Online publication date: Jan-2025
    • (2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
    • (2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
    • (2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
    • (2023)Session Search with Pre-trained Graph Classification ModelProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591766(953-962)Online publication date: 19-Jul-2023
    • (2023)Integrating Representation and Interaction for Context-Aware Document RankingACM Transactions on Information Systems10.1145/352995541:1(1-23)Online publication date: 10-Jan-2023
    • (2023)Characterization and Prediction of Mobile TasksACM Transactions on Information Systems10.1145/352271141:1(1-39)Online publication date: 9-Jan-2023
    • (2023)Private Web Search Using Proxy-Query Based Query Obfuscation SchemeIEEE Access10.1109/ACCESS.2023.323500011(3607-3625)Online publication date: 2023
    • (2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
    • (2022)Extracting Searching as Learning Tasks Based on IBRT ApproachApplied Sciences10.3390/app1212587912:12(5879)Online publication date: 9-Jun-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media