research-article

Open access

H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search

Authors:

Xiaokai Chu,

Jiashu Zhao,

Lixin Zou,

Dawei YinAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1478 - 1489

https://doi.org/10.1145/3477495.3531986

Published: 07 July 2022 Publication History

PDF eReader

Abstract

The pre-trained language models (PLMs), such as BERT and ERNIE, have achieved outstanding performance in many natural language understanding tasks. Recently, PLMs-based Information Retrieval models have also been investigated and showed substantially state-of-the-art effectiveness, e.g., MORES, PROP and ColBERT. Moreover, most of the PLMs-based rankers only focus on a single level relevance matching (e.g., character-level), while ignore the other granularity information (e.g., words and phrases), which easily lead to the ambiguity of query understanding and inaccurate matching issues in web search.

In this paper, we aim to improve the state-of-the-art PLMs ERNIE for web search, by modeling multi-granularity context information with the awareness of word importance in queries and documents. In particular, we propose a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component. The query-document analysis component has several individual modules which generate the necessary variables, such as word segmentation, word importance analysis, and word tightness analysis. Based on these variables, the importance-aware multiple-level correspondences are sent to the ranking model. The hierarchical ranking model includes a multi-layer transformer module to learn the character-level representations, a word-level matching module, and a phrase-level matching module with word importance. Each of these modules models the query and the document matching from a different perspective. Also, these levels are inherently communicated to achieve the overall accurate matching. We discuss the time complexity of the proposed framework, and show that it can be efficiently implemented in real applications. The offline and online experiments on both public data sets and a commercial search engine illustrate the effectiveness of the proposed H-ERNIE framework.

Supplementary Material

MP4 File (SIGIR22-fp0563.mp4)

Recently, PLMs (pre-trained language models) -based Information Retrieval models have also been investigated and showed substantially state-of-the-art effectiveness. However, most of the PLMs-based rankers only focus on a single level relevance matching (e.g., character-level), while ignore the other granularity information (e.g., words and phrases), which easily lead to the ambiguity of query understanding and inaccurate matching issues in web search. This video will introduce a hierarchical ERNIE-based model, H-ERNIE, which studies multi-granularity context relevance in queries and documents. The offline and online experiments on both public datasets and a commercial search engine illustrate the effectiveness of the proposed model.

Download
24.86 MB

References

[1]

James Bergstra, Daniel Yamins, and David D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16--21 June 2013.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Associated pagerank: improved pagerank measured by frequent term sets

A New Algorithm for Inferring User Search Goals with Feedback Sessions

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations