skip to main content
10.1145/3578337.3605139acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Retrieving Webpages Using Online Discussions

Published:09 August 2023Publication History

ABSTRACT

Online discussions are a ubiquitous aspect of everyday life. An Internet user who interacts with an online discussion may benefit from seeing hyperlinks to webpages relevant to the discussion because the relevant webpages can provide added context, act as citations for background sources, or condense information so that conversations can proceed seamlessly at a high level. In this paper, we propose and study a new task of retrieving relevant webpages given an online discussion. We frame the task as a novel retrieval problem where we treat a sequence of comments in an online discussion as a query and use such a query to retrieve relevant webpages. We construct a new data set using Reddit, an online discussion forum, to study this new problem. We explore and evaluate multiple representative retrieval methods to examine their effectiveness for solving this new problem. We also propose to leverage the comments that contain hyperlinks as training data to enable supervised learning and further improve retrieval performance. We find that results using modern retrieval methods are promising and that leveraging comments with hyperlinks as training data can further improve performance. We release our data set and code to enable additional research in this direction.

Skip Supplemental Material Section

Supplemental Material

ictir23-ictir177_kevinros.mp4

mp4

25 MB

References

  1. Nasreen Abdul-Jaleel, James Allan, W Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Mark D Smucker, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. Computer Science Department Faculty Publication Series (2004), 189.Google ScholarGoogle ScholarCross RefCross Ref
  2. Zafar Ali, Irfan Ullah, Amin Khan, Asim Ullah Jan, and Khan Muhammad. 2021. An overview and evaluation of citation recommendation models. Scientometrics, Vol. 126, 5 (2021), 4083--4119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Salvatore Andolina, Valeria Orso, Hendrik Schneider, Khalil Klouche, Tuukka Ruotsalo, Luciano Gamberini, and Giulio Jacucci. 2018. Investigating proactive search support in conversations. In Proceedings of the 2018 Designing Interactive Systems Conference. 1295--1307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Trapit Bansal, David Belanger, and Andrew McCallum. 2016. Ask the GRU: Multi-task learning for deep text recommendations. In proceedings of the 10th ACM Conference on Recommender Systems. 107--114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Campuswire. 2022. Commenting on discussion posts. http://web.archive.org/web/20221002030422/https://campuswire.com/chatroomsGoogle ScholarGoogle Scholar
  6. Common Crawl. 2022. Common Crawl. http://web.archive.org/web/20221014025949/https://commoncrawl.org/Google ScholarGoogle Scholar
  7. Brian Dean. 2021. Reddit User and Growth Stats (Updated Oct 2021). http://web.archive.org/web/20221005051600/https://backlinko.com/reddit-usersGoogle ScholarGoogle Scholar
  8. Ying Ding, Guo Zhang, Tamy Chambers, Min Song, Xiaolong Wang, and Chengxiang Zhai. 2014. Content-based citation analysis: The next generation of citation analysis. Journal of the association for information science and technology, Vol. 65, 9 (2014), 1820--1833.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Travis Ebesu and Yi Fang. 2017. Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 1093--1096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Desmond Elliott and Joemon M Jose. 2009. A proactive personalised retrieval system. In Proceedings of the 18th ACM conference on Information and knowledge management. 1935--1938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dehong Gao, Renxian Zhang, Wenjie Li, and Yuexian Hou. 2012. Twitter hyperlink recommendation with user-tweet-hyperlink three-way clustering. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2535--2538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, and Yinfei Yang. 2021. Longt5: Efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916 (2021).Google ScholarGoogle Scholar
  13. Itay Harel, Hagai Taitelbaum, Idan Szpektor, and Oren Kurland. 2022. A Dataset for Sentence Retrieval for Open-Ended Dialogues. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2960--2969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017).Google ScholarGoogle Scholar
  15. Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2021. Towards unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118 (2021).Google ScholarGoogle Scholar
  16. Chanwoo Jeong, Sion Jang, Eunjeong Park, and Sungchul Choi. 2020. A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics, Vol. 124, 3 (2020), 1907--1922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018. News recommender systems--Survey and roads ahead. Information Processing & Management, Vol. 54, 6 (2018), 1203--1227.Google ScholarGoogle ScholarCross RefCross Ref
  18. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  19. Weize Kong, Rui Li, Jie Luo, Aston Zhang, Yi Chang, and James Allan. 2015. Predicting search intent based on pre-search context. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 503--512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Markus Koskela, Petri Luukkonen, Tuukka Ruotsalo, Mats Sjöberg, and Patrik Floréen. 2018. Proactive information retrieval by capturing search intent from primary task context. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 8, 3 (2018), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300 (2019).Google ScholarGoogle Scholar
  22. Marcia Lee Lee and Kitt Hirasaki. 2012. Commenting on discussion posts. http://web.archive.org/web/20221206102201/https://blog.khanacademy.org/commenting-on-discussion-posts/Google ScholarGoogle Scholar
  23. Qing Li, Jia Wang, Yuanzhu Peter Chen, and Zhangxi Lin. 2010. User comments for news recommendation in forum-based social media. Information Sciences, Vol. 180, 24 (2010), 4929--4939.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel J Liebling, Paul N Bennett, and Ryen W White. 2012. Anticipatory search: using context to initiate search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 1035--1036.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2356--2362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Petri Luukkonen, Markus Koskela, and Patrik Floréen. 2016. LSTM-based predictions for proactive information retrieval. arXiv preprint arXiv:1606.06137 (2016).Google ScholarGoogle Scholar
  27. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.Google ScholarGoogle Scholar
  28. Paul Owoicho, Jeffrey Dalton, Mohammad Aliannejadi, Leif Azzopardi, Johanne R Trippas, and Svitlana Vakulenko. 2023. TREC CAsT 2022: Going Beyond User Ask and System Retrieve with Initiative and Response Generation. In Proceedings of the NIST Text Retrieval Conference (TREC 2022). TREC'22. 1--11.Google ScholarGoogle Scholar
  29. Dae Hoon Park, Yi Fang, Mengwen Liu, and ChengXiang Zhai. 2016. Mobile app retrieval for social media users via inference of implicit intent in social media text. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 959--968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. PushShift.io. 2021. PushShift.io. https://files.pushshift.io/Google ScholarGoogle Scholar
  31. Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, and Martin Potthast. 2023. The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives. arXiv preprint arXiv:2304.00413 (2023).Google ScholarGoogle Scholar
  32. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  33. Bradley James Rhodes and Pattie Maes. 2000. Just-in-time information retrieval agents. IBM Systems journal, Vol. 39, 3.4 (2000), 685--704.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR'94. Springer, 232--241.Google ScholarGoogle ScholarCross RefCross Ref
  35. Kevin Ros, Carl Edwards, Heng Ji, and Cheng Xiang Zhai. 2021. Team Skeletor at Touché 2021: Argument Retrieval and Visualization for Controversial Questions. In CEUR Workshop Proceedings, Vol. 2936. CEUR-WS, 2441--2454.Google ScholarGoogle Scholar
  36. Natalie Jomini Stroud, Emily Van Duyn, and Cynthia Peacock. 2016. Survey of Commenters and Comment Readers. http://web.archive.org/web/20221129215636/https://mediaengagement.org/research/survey-of-commenters-and-comment-readers/Google ScholarGoogle Scholar
  37. Yury Ustinovskiy and Pavel Serdyukov. 2013. Personalization of web-search using short-term browsing context. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1979--1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, .Ilhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, Vol. 17 (2020), 261--272. https://doi.org/10.1038/s41592-019-0686--2Google ScholarGoogle ScholarCross RefCross Ref
  39. Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2021. Personalized news recommendation: A survey. arXiv preprint arXiv:2106.08934 (2021).Google ScholarGoogle Scholar
  40. Libin Yang, Yu Zheng, Xiaoyan Cai, Hang Dai, Dejun Mu, Lantian Guo, and Tao Dai. 2018. A LSTM based model for personalized context-aware citation recommendation. IEEE access, Vol. 6 (2018), 59618--59627.Google ScholarGoogle ScholarCross RefCross Ref
  41. Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017. Situational context for ranking in personal search. In Proceedings of the 26th International Conference on World Wide Web. 1531--1540.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Retrieving Webpages Using Online Discussions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICTIR '23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval
      August 2023
      300 pages
      ISBN:9798400700736
      DOI:10.1145/3578337

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 August 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICTIR '23 Paper Acceptance Rate30of73submissions,41%Overall Acceptance Rate209of482submissions,43%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader