skip to main content
10.1145/3477495.3531986acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open Access

H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search

Published:07 July 2022Publication History

ABSTRACT

The pre-trained language models (PLMs), such as BERT and ERNIE, have achieved outstanding performance in many natural language understanding tasks. Recently, PLMs-based Information Retrieval models have also been investigated and showed substantially state-of-the-art effectiveness, e.g., MORES, PROP and ColBERT. Moreover, most of the PLMs-based rankers only focus on a single level relevance matching (e.g., character-level), while ignore the other granularity information (e.g., words and phrases), which easily lead to the ambiguity of query understanding and inaccurate matching issues in web search.

In this paper, we aim to improve the state-of-the-art PLMs ERNIE for web search, by modeling multi-granularity context information with the awareness of word importance in queries and documents. In particular, we propose a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component. The query-document analysis component has several individual modules which generate the necessary variables, such as word segmentation, word importance analysis, and word tightness analysis. Based on these variables, the importance-aware multiple-level correspondences are sent to the ranking model. The hierarchical ranking model includes a multi-layer transformer module to learn the character-level representations, a word-level matching module, and a phrase-level matching module with word importance. Each of these modules models the query and the document matching from a different perspective. Also, these levels are inherently communicated to achieve the overall accurate matching. We discuss the time complexity of the proposed framework, and show that it can be efficiently implemented in real applications. The offline and online experiments on both public data sets and a commercial search engine illustrate the effectiveness of the proposed H-ERNIE framework.

Skip Supplemental Material Section

Supplemental Material

SIGIR22-fp0563.mp4

mp4

24.9 MB

References

  1. James Bergstra, Daniel Yamins, and David D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16--21 June 2013.Google ScholarGoogle Scholar
  2. Sebastian Bruch. [n.d.]. An Alternative Cross Entropy Loss for Learning-to-Rank. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021. 118--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning (2010).Google ScholarGoogle Scholar
  4. Zhe Cao, Tao Qin, T. Liu, Ming-Feng Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007.Google ScholarGoogle Scholar
  5. Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. TOIS (2012).Google ScholarGoogle Scholar
  6. Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, and Buzhou Tang. [n.d.]. The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 4946--4951.Google ScholarGoogle ScholarCross RefCross Ref
  7. William S Cooper, Fredric C Gey, and Daniel P Dabney. 1992. Probabilistic retrieval based on staged logistic regression. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992. 198--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171--4186.Google ScholarGoogle Scholar
  9. Hui Fang, Tao Tao, and ChengXiang Zhai. 2004. A formal study of information retrieval heuristics. In SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004. 49--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. JMLR (2003).Google ScholarGoogle Scholar
  11. Luyu Gao, Zhuyun Dai, and J. Callan. 2020. Modularized Transfomer-based Ranking Framework. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. 4180--4190.Google ScholarGoogle ScholarCross RefCross Ref
  12. Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline. In Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II. 280--286.Google ScholarGoogle Scholar
  13. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. 55--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Djoerd Hiemstra. 2002. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 11--15, 2002, Tampere, Finland. 35--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013. 2333--2338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kalervo Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. SIGIR Forum, Vol. 51, 2 (2017), 243--250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yiling Jia, Huazheng Wang, Stephen Guo, and Hongning Wang. [n.d.]. PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002, Edmonton, Alberta, Canada. 133--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 39--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR, Vol. abs/1412.6980 (2015).Google ScholarGoogle Scholar
  21. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. [n.d.]. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001. 282--289.Google ScholarGoogle Scholar
  22. Ping Li, Qiang Wu, and Christopher Burges. 2007. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. NIPS'07 (2007).Google ScholarGoogle Scholar
  23. Jimmy Lin, Rodrigo Nogueira, and A. Yates. 2020. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).Google ScholarGoogle Scholar
  24. Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li, and Buzhou Tang. [n.d.]. LCQMC: A Large-scale Chinese Question Matching Corpus. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. 1952--1962.Google ScholarGoogle Scholar
  25. Yiding Liu, Weixue Lu, Suqi Cheng, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, and Dawei Yin. 2021. Pre-trained Language Model for Web-scale Retrieval in Baidu Search. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021. 3365--3375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. [n.d.]. PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval. In WSDM '21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021. 283--291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, and Xueqi Cheng. 2021. B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 1318--1327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yoshitomo Matsubara, Thuy Vu, and Alessandro Moschitti. [n.d.]. Reranking for Efficient Transformer-based Answer Selection. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 1577--1580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 1849--1860.Google ScholarGoogle ScholarCross RefCross Ref
  30. Jonas Mueller and Aditya Thyagarajan. [n.d.]. Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. 2786--2792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019).Google ScholarGoogle Scholar
  32. Rodrigo Nogueira, W. Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-Stage Document Ranking with BERT. arXiv:1910.14424 (2019).Google ScholarGoogle Scholar
  33. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017. ACM, 257--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25 extension to multiple weighted fields. In Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, November 8-13, 2004. 42--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stephen E. Robertson and Steve Walker. [n.d.]. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum). 232--241.Google ScholarGoogle Scholar
  36. Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. Are loss functions all the same? Neural computation (2004).Google ScholarGoogle Scholar
  37. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014. 101--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv preprint arXiv:1904.09223 (2019).Google ScholarGoogle Scholar
  39. Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. [n.d.]. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. 8968--8975.Google ScholarGoogle ScholarCross RefCross Ref
  40. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, L. Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998--6008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lidan Wang, Jimmy J. Lin, and Donald Metzler. 2010. Learning to efficiently rank. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19-23, 2010. 138--145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 55--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. [n.d.]. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. 3685--3690.Google ScholarGoogle ScholarCross RefCross Ref
  44. Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 323--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Denghui Zhang, Zixuan Yuan, Yanchi Liu, Zuohui Fu, Fuzhen Zhuang, Pengyang Wang, Haifeng Chen, and Hui Xiong. 2020. E-BERT: A phrase and product knowledge enhanced language model for e-commerce. arXiv preprint arXiv:2009.02835 (2020).Google ScholarGoogle Scholar
  46. Shiqi Zhao, H. Wang, Chao Li, T. Liu, and Y. Guan. 2011. Automatically Generating Questions from Queries for Community-based Question Answering. In Fifth International Joint Conference on Natural Language Processing, IJCNLP 2011, Chiang Mai, Thailand, November 8-13, 2011. 929--937.Google ScholarGoogle Scholar
  47. Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2020. Whole-Chain Recommendations. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020. 1883--1891.Google ScholarGoogle Scholar
  48. Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007. 287--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 2810--2818.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, and Dawei Yin. 2021. Pre-trained Language Model based Ranking in Baidu Search. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021. 4014--4022.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2022
      3569 pages
      ISBN:9781450387323
      DOI:10.1145/3477495

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 July 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader