research-article

Improving Query Correction Using Pre-train Language Model In Search Engines

Authors:

Jie Liu,

Jin MaAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 2999 - 3008

https://doi.org/10.1145/3583780.3614930

Published: 21 October 2023 Publication History

Get Access

Abstract

Query correction is a task that automatically detects and corrects errors in what users type into a search engine. Misspelled queries can lead to user dissatisfaction and churn. However, correcting a user query accurately is not an easy task. One major challenge is that a correction model must be capable of high-level language comprehension. Recently, pre-trained language models (PLMs) have been successfully applied to text correction tasks, but few works have been done on query correction. However, it is nontrivial to directly apply these PLMs to query correction in large-scale search systems due to the following challenging issues: 1) Expensive deployment. Deploying such a model requires expensive computations. 2) Lacking domain knowledge. A neural correction model needs massive training data to activate its power.

To this end, we introduce KSTEM, a Knowledge-based Sequence To Edit Model for Chinese query correction. KSTEM transforms the sequence generation task into sequence tagging by mapping errors into five categories: KEEP, REPLACE, SWAP, DELETE, and INSERT, reducing computational complexity. Additionally, KSTEM adopts 2D position encoding, which is composed of the internal and external order of the words. Meanwhile, to compensate for the lack of domain knowledge, we propose a task-specific training paradigm for query correction, including edit strategy-based pre-training, user click-based post pre-train, and human label-based fine-tuning. Finally, we apply KSTEM to the industrial search system. Extensive offline and online experiments show that KSTEM significantly improves query correction performance. We hope that our experience will benefit frontier researchers.

Supplementary Material

MP4 File (full0992-video.mp4)

This paper introduces KSTEM, a Knowledge-based Sequence To Edit Model for Chinese query correction and task-specific training paradigm.

Download
33.22 MB

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Quranic Latin Query Correction as a Search Suggestion

Evaluating the retrieval effectiveness of web search engines using a representative query sample

Query routing for Web search engines: architecture and experiments

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations