skip to main content
10.1145/3442381.3449943acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation

Published: 03 June 2021 Publication History

Abstract

Relevance measures the relation between query and document which contains several different dimensions, e.g., semantic similarity, topical relatedness, cognitive relevance (the relations in the aspect of knowledge), usefulness, timeliness, utility and so on. However, existing retrieval models mainly focus on semantic similarity and cognitive relevance while ignore other possible dimensions to model relevance. Topical relatedness, as an important dimension to measure relevance, is not well studied in existing neural information retrieval. In this paper, we propose a Topic Enhanced Knowledge-aware retrieval Model (TEKM) that jointly learns semantic similarity, knowledge relevance and topical relatedness to estimate relevance between query and document. We first construct a neural topic model to learn topical information and generate topic embeddings of a query. Then we combine the topic embeddings with a knowledge-aware retrieval model to estimate different dimensions of relevance. Specifically, we exploit kernel pooling to soft match topic embeddings with word and entity in a unified embedding space to generate fine-grained topical relatedness. The whole model is trained in an end-to-end manner. Experiments on a large-scale publicly available benchmark dataset show that TEKM outperforms existing retrieval models. Further analysis also shows how topic relatedness is modeled to improve traditional retrieval model with semantic similarity and knowledge relevance.

References

[1]
Haoli Bai, Zhuangbin Chen, Michael R Lyu, Irwin King, and Zenglin Xu. 2018. Neural Relational Topic Models for Scientific Article Analysis. (2018), 27–36.
[2]
Nicholas J Belkin. 2016. People, Interacting with Information1. In ACM SIGIR Forum, Vol. 49. ACM New York, NY, USA, 13–27.
[3]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
[4]
Nicolas Usunier Alberto Garcia-Duran Jason Weston Bordes, Antoine and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787–2795.
[5]
Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International on Conference on Information and Knowledge Management. ACM, 2485–2488.
[6]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. San Rafael: Morgan and Claypool.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[8]
Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 331–338.
[9]
T Graepel, P Hennig, R Herbrich, and D Stern. 2012. Kernel Topic Models. In Artificial Intelligence and Statistics.511–519.
[10]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.
[11]
Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2019. A Deep Look into Neural Ranking Models for Information Retrieval. arXiv preprint arXiv:1903.06902(2019).
[12]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2017. Entity Linking in Queries: Efficiency vs. Effectiveness. Springer, Cham.
[13]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. 51, 2 (1999), 50–57.
[14]
Liangjie Hong and Brian D. Davison. 2010. Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics. 80–88.
[15]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042–2050.
[16]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2333–2338.
[17]
Fanghong Jian, Jimmy Xiangji Huang, Jiashu Zhao, Tingting He, and Po Hu. 2016. A simple enhancement for ad-hoc information retrieval via topic modelling. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 733–736.
[18]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).
[19]
Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, and Shaoping Ma. 2018. Understanding Reading Attention Distribution during Relevance Judgement. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 733–742.
[20]
Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 795–804.
[21]
Xiangsheng Li, Yanghui Rao, Haoran Xie, Raymond Y K Lau, Jian Yin, and Fu Lee Wang. 2017. Bootstrapping Social Emotion Classification with Semantically Rich Hybrid Neural Networks. IEEE Transactions on Affective Computing 8, 4 (2017), 428–442.
[22]
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. arXiv preprint arXiv:1805.07591(2018).
[23]
Marcelo Mendoza, Pablo Ormeno, and Carlos Valle. 2018. Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics. In 2018 37th International Conference of the Chilean Computer Science Society (SCCC). IEEE, 1–7.
[24]
Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. arXiv preprint arXiv:1706.00359(2017).
[25]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
[26]
Hanna Wallach Edmund Talley-Miriam Leenders Mimno, David and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the 2011 International Joint Conference on Natural Language Processing. 262–272.
[27]
Bhaskar Mitra and Nick Craswell. 2018. An Introduction to Neural Information Retrieval. Foundations and Trends® in Information Retrieval 13, 1(2018), 1–126.
[28]
Jiaul H Paik. 2013. A novel TF-IDF weighting scheme for effective ranking. (2013), 343–352.
[29]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep Investigation of Deep IR Models. arXiv preprint arXiv:1707.07700(2017).
[30]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Thirtieth AAAI Conference.
[31]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94.
[32]
Tim Salimans and David A. Knowles. 2013. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. Bayesian Analysis (2013).
[33]
Tefko Saracevic. 2006. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II. Advances in librarianship 30 (2006), 03.
[34]
Tefko Saracevic. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for information Science and Technology 58, 13 (2007), 2126–2144.
[35]
Tefko Saracevic. 2016. The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really?Synthesis Lectures on Information Concepts, Retrieval, and Services 8, 3(2016), i–109.
[36]
Ruihua Song, Zhenxiao Luo, Jian-Yun Nie, Yong Yu, and Hsiao-Wuen Hon. 2009. Identification of ambiguous queries in web search. Information Processing & Management 45, 2 (2009), 216–229.
[37]
Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488(2017).
[38]
Mizzaro Stefano. 1998. How many relevances in information retrieval?Interacting with Computers 10, 3 (1998), 303–320.
[39]
B. C. Vickery. 1959. Subject analysis for information retrieval. In Proceedings of the International Conference on Scientific Information. Washington, DC: National Academy of Sciences, 855–866.
[40]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating non-sequential behavior into click models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 283–292.
[41]
Zhigang Wang, Juanzi Li, Zhichun Wang, Shuangjie Li, Mingyang Li, Dongsheng Zhang, Yao Shi, Yongbin Liu, Peng Zhang, and Jie Tang. 2013. XLore: A Large-Scale English-Chinese Bilingual Knowledge Graph. In Proceedings of the 12th International Semantic Web Conference. 121–124.
[42]
Xing Wei and W Bruce Croft. 2006. LDA-based document models for ad-hoc retrieval. (2006), 178–185.
[43]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conferenc. ACM, 55–64.
[44]
Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tieyan Liu. 2018. Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling. (2018), 575–584.
[45]
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 2 (2004), 179–214.

Cited By

View all
  • (2024)A systematic review of multidimensional relevance estimation in information retrievalWIREs Data Mining and Knowledge Discovery10.1002/widm.154114:5Online publication date: 7-May-2024
  • (2023)Unifying Token- and Span-level Supervisions for Few-shot Sequence LabelingACM Transactions on Information Systems10.1145/361040342:1(1-27)Online publication date: 21-Aug-2023
  • (2023)Fast and Multi-aspect Mining of Complex Time-stamped Event StreamsProceedings of the ACM Web Conference 202310.1145/3543507.3583370(1638-1649)Online publication date: 30-Apr-2023
  1. Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Kernel pooling
    2. Knowledge graph
    3. Neural IR
    4. Neural topic model

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic review of multidimensional relevance estimation in information retrievalWIREs Data Mining and Knowledge Discovery10.1002/widm.154114:5Online publication date: 7-May-2024
    • (2023)Unifying Token- and Span-level Supervisions for Few-shot Sequence LabelingACM Transactions on Information Systems10.1145/361040342:1(1-27)Online publication date: 21-Aug-2023
    • (2023)Fast and Multi-aspect Mining of Complex Time-stamped Event StreamsProceedings of the ACM Web Conference 202310.1145/3543507.3583370(1638-1649)Online publication date: 30-Apr-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media