research-article

Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation

Authors:

Xiuqiang HeAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 756 - 767

https://doi.org/10.1145/3442381.3449943

Published: 03 June 2021 Publication History

Abstract

Relevance measures the relation between query and document which contains several different dimensions, e.g., semantic similarity, topical relatedness, cognitive relevance (the relations in the aspect of knowledge), usefulness, timeliness, utility and so on. However, existing retrieval models mainly focus on semantic similarity and cognitive relevance while ignore other possible dimensions to model relevance. Topical relatedness, as an important dimension to measure relevance, is not well studied in existing neural information retrieval. In this paper, we propose a Topic Enhanced Knowledge-aware retrieval Model (TEKM) that jointly learns semantic similarity, knowledge relevance and topical relatedness to estimate relevance between query and document. We first construct a neural topic model to learn topical information and generate topic embeddings of a query. Then we combine the topic embeddings with a knowledge-aware retrieval model to estimate different dimensions of relevance. Specifically, we exploit kernel pooling to soft match topic embeddings with word and entity in a unified embedding space to generate fine-grained topical relatedness. The whole model is trained in an end-to-end manner. Experiments on a large-scale publicly available benchmark dataset show that TEKM outperforms existing retrieval models. Further analysis also shows how topic relatedness is modeled to improve traditional retrieval model with semantic similarity and knowledge relevance.

References

[1]

Haoli Bai, Zhuangbin Chen, Michael R Lyu, Irwin King, and Zenglin Xu. 2018. Neural Relational Topic Models for Scientific Article Analysis. (2018), 27–36.

[2]

Nicholas J Belkin. 2016. People, Interacting with Information1. In ACM SIGIR Forum, Vol. 49. ACM New York, NY, USA, 13–27.

[3]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.

Digital Library

[4]

Nicolas Usunier Alberto Garcia-Duran Jason Weston Bordes, Antoine and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787–2795.

[5]

Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International on Conference on Information and Knowledge Management. ACM, 2485–2488.

Digital Library

[6]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. San Rafael: Morgan and Claypool.

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

[8]

Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 331–338.

Digital Library

[9]

T Graepel, P Hennig, R Herbrich, and D Stern. 2012. Kernel Topic Models. In Artificial Intelligence and Statistics.511–519.

[10]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.

Digital Library

[11]

Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2019. A Deep Look into Neural Ranking Models for Information Retrieval. arXiv preprint arXiv:1903.06902(2019).

[12]

Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2017. Entity Linking in Queries: Efficiency vs. Effectiveness. Springer, Cham.

[13]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. 51, 2 (1999), 50–57.

[14]

Liangjie Hong and Brian D. Davison. 2010. Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics. 80–88.

Digital Library

[15]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042–2050.

[16]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2333–2338.

Digital Library

[17]

Fanghong Jian, Jimmy Xiangji Huang, Jiashu Zhao, Tingting He, and Po Hu. 2016. A simple enhancement for ad-hoc information retrieval via topic modelling. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 733–736.

Digital Library

[18]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).

[19]

Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, and Shaoping Ma. 2018. Understanding Reading Attention Distribution during Relevance Judgement. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 733–742.

Digital Library

[20]

Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 795–804.

Digital Library

[21]

Xiangsheng Li, Yanghui Rao, Haoran Xie, Raymond Y K Lau, Jian Yin, and Fu Lee Wang. 2017. Bootstrapping Social Emotion Classification with Semantically Rich Hybrid Neural Networks. IEEE Transactions on Affective Computing 8, 4 (2017), 428–442.

[22]

Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. arXiv preprint arXiv:1805.07591(2018).

[23]

Marcelo Mendoza, Pablo Ormeno, and Carlos Valle. 2018. Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics. In 2018 37th International Conference of the Chilean Computer Science Society (SCCC). IEEE, 1–7.

[24]

Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. arXiv preprint arXiv:1706.00359(2017).

[25]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.

[26]

Hanna Wallach Edmund Talley-Miriam Leenders Mimno, David and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the 2011 International Joint Conference on Natural Language Processing. 262–272.

[27]

Bhaskar Mitra and Nick Craswell. 2018. An Introduction to Neural Information Retrieval. Foundations and Trends® in Information Retrieval 13, 1(2018), 1–126.

[28]

Jiaul H Paik. 2013. A novel TF-IDF weighting scheme for effective ranking. (2013), 343–352.

[29]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint arXiv:1707.07700(2017).

[30]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Thirtieth AAAI Conference.

[31]

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94.

[32]

Tim Salimans and David A. Knowles. 2013. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. Bayesian Analysis (2013).

[33]

Tefko Saracevic. 2006. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II. Advances in librarianship 30 (2006), 03.

[34]

Tefko Saracevic. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for information Science and Technology 58, 13 (2007), 2126–2144.

Digital Library

[35]

Tefko Saracevic. 2016. The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really?Synthesis Lectures on Information Concepts, Retrieval, and Services 8, 3(2016), i–109.

[36]

Ruihua Song, Zhenxiao Luo, Jian-Yun Nie, Yong Yu, and Hsiao-Wuen Hon. 2009. Identification of ambiguous queries in web search. Information Processing & Management 45, 2 (2009), 216–229.

Digital Library

[37]

Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488(2017).

[38]

Mizzaro Stefano. 1998. How many relevances in information retrieval?Interacting with Computers 10, 3 (1998), 303–320.

[39]

B. C. Vickery. 1959. Subject analysis for information retrieval. In Proceedings of the International Conference on Scientific Information. Washington, DC: National Academy of Sciences, 855–866.

[40]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating non-sequential behavior into click models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 283–292.

Digital Library

[41]

Zhigang Wang, Juanzi Li, Zhichun Wang, Shuangjie Li, Mingyang Li, Dongsheng Zhang, Yao Shi, Yongbin Liu, Peng Zhang, and Jie Tang. 2013. XLore: A Large-Scale English-Chinese Bilingual Knowledge Graph. In Proceedings of the 12th International Semantic Web Conference. 121–124.

[42]

Xing Wei and W Bruce Croft. 2006. LDA-based document models for ad-hoc retrieval. (2006), 178–185.

[43]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conferenc. ACM, 55–64.

Digital Library

[44]

Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tieyan Liu. 2018. Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling. (2018), 575–584.

[45]

Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 2 (2004), 179–214.

Digital Library

Cited By

Peikos GPasi G(2024)A systematic review of multidimensional relevance estimation in information retrievalWIREs Data Mining and Knowledge Discovery10.1002/widm.154114:5Online publication date: 7-May-2024
https://doi.org/10.1002/widm.1541
Cheng ZZhou QJiang ZZhao XCao YGu Q(2023)Unifying Token- and Span-level Supervisions for Few-shot Sequence LabelingACM Transactions on Information Systems10.1145/361040342:1(1-27)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3610403
Nakamura KMatsubara YKawabata KUmeda YWada YSakurai Y(2023)Fast and Multi-aspect Mining of Complex Time-stamped Event StreamsProceedings of the ACM Web Conference 202310.1145/3543507.3583370(1638-1649)Online publication date: 30-Apr-2023
https://doi.org/10.1145/3543507.3583370

Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation
1. Information systems
  1. Information retrieval

Recommendations

A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query Reformulation
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

This paper presents a neural information retrieval pipeline that integrates cooperative learning of query reformulation and neural retrieval models. Our pipeline first exploits an automatic query reformulator to reformulate the user-issued query and ...
A unified relevance model for opinion retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Representing the information need is the greatest challenge for opinion retrieval. Typical queries for opinion retrieval are composed of either just content words, or content words with a small number of cue "opinion" words. Both are inadequate for ...
Cycling topic graph learning for neural topic modeling
Abstract
Topic models aim to discover a set of latent topics in a textual corpus. Graph Neural Networks (GNNs) have been recently utilized in Neural Topic Models (NTMs) due to their strong capacity to model document representations with the text graph. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
276
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)3

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Peikos GPasi G(2024)A systematic review of multidimensional relevance estimation in information retrievalWIREs Data Mining and Knowledge Discovery10.1002/widm.154114:5Online publication date: 7-May-2024
https://doi.org/10.1002/widm.1541
Cheng ZZhou QJiang ZZhao XCao YGu Q(2023)Unifying Token- and Span-level Supervisions for Few-shot Sequence LabelingACM Transactions on Information Systems10.1145/361040342:1(1-27)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3610403
Nakamura KMatsubara YKawabata KUmeda YWada YSakurai Y(2023)Fast and Multi-aspect Mining of Complex Time-stamped Event StreamsProceedings of the ACM Web Conference 202310.1145/3543507.3583370(1638-1649)Online publication date: 30-Apr-2023
https://doi.org/10.1145/3543507.3583370

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten