Query Error Correction Algorithm Based on Fusion Sequence to Sequence Model

Duan, Jianyong; Ji, Tianxiao; Wu, Mingli; Wang, Hao

doi:10.1007/978-3-030-28374-2_2

Jianyong Duan^13,14,15,
Tianxiao Ji^13,15,
Mingli Wu^13,15 &
…
Hao Wang^13,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1950 Accesses

Abstract

The query error correction task is very important to improve user satisfaction and quality of query results. In traditional query error correction methods researchers mostly use a pipeline way to correct the error step by step. They rely heavily on manual annotation corpora. It is difficult to take into account the global effect. In this paper, we present a character-based end-to-end Sequence to Sequence (Seq2Seq) method with attention mechanism. It also incorporates the neural network language model trained on unlabeled corpora to solve the task of query correction. It can unify the modeling of different error types in query error correction and effectively overcome the shortcomings of traditional methods in query error correction tasks. Experiments show that this method can effectively capture the long-distance knowledge to correct errors, and through the Simple Recurrent Unit (SRU) it can be as good as Long Short-Term Memory (LSTM). However, there has been a significant improvement in processing time. This point is very important in query error correction tasks.

This work was supported by the National Natural Science Foundation of China (61672040) and the North China University of Technology Startup Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system

Article 24 September 2019

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices

A Comparative Study of Conventional Machine Learning and Deep Learning Models to Find Semantic Similarity

References

Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics (2000)
Google Scholar
Chen, Q., Li, M., Zhou, M.: Improving query spelling correction using web search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)
Google Scholar
Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Google Scholar
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Article Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Ganjisaffar, Y., et al.: qSpell: spelling correction of web search queries using ranking models and iterative correction. In: Spelling Alteration for Web Search Workshop, p. 15 (2011)
Google Scholar
Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 358–366. Association for Computational Linguistics (2010)
Google Scholar
Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv preprint arXiv:1503.03535 (2015)
Hagen, M., Potthast, M., Gohsen, M., Rathgeber, A., Stein, B.: A large-scale query spelling correction corpus. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1261–1264. ACM (2017)
Google Scholar
Hinton, G., Srivastava, N., Swersky, K.: Rmsprop: Divide the gradient by a running average of its recent magnitude. Neural Networks for Machine Learning, Coursera lecture 6e (2012)
Google Scholar
Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025–1032. Association for Computational Linguistics (2006)
Google Scholar
Li, Y., Duan, H., Zhai, C.: Cloudspeller: query spelling correction by using a Unified Hidden Markov model with web-scale resources. In: Proceedings of the 21st International Conference on World Wide Web, pp. 561–562. ACM (2012)
Google Scholar
Li, Y., Duan, H., Zhai, C.: A generalized hidden Markov model with discriminative training for query spelling correction. In: Proceedings of the 35th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 611–620. ACM (2012)
Google Scholar
Luec, G.: A data-driven approach for correcting search quaries. In: Spelling Alteration for Web Search Workshop, p. 6 (2011)
Google Scholar
Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. arXiv preprint arXiv:1704.04289 (2017)
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manag. 27(5), 517–522 (1991)
Article Google Scholar
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239–248. ACM (2005)
Google Scholar
Sriram, A., Jun, H., Satheesh, S., Coates, A.: Cold fusion: training Seq2Seq models together with language models. arXiv preprint arXiv:1708.06426 (2017)
Sun, X., Gao, J., Micol, D., Quirk, C.: Learning phrase-based spelling error models from clickthrough data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 266–274. Association for Computational Linguistics (2010)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Google Scholar
Wang, K., Pedersen, J.: Review of MSR-Bing web scale speller challenge. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1339–1340. ACM (2011)
Google Scholar
Whitelaw, C., Hutchinson, B., Chung, G.Y., Ellis, G.: Using the web for language independent spellchecking and autocorrection. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, vol. 2, pp. 890–899. Association for Computational Linguistics (2009)
Google Scholar
Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model. arXiv preprint arXiv:1711.03953 (2017)
Zhang, Y., He, P., Xiang, W., Li, M.: Discriminative reranking for spelling correction. In: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, pp. 64–71 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, North China University of Technology, Beijing, China
Jianyong Duan, Tianxiao Ji, Mingli Wu & Hao Wang
Beijing Urban Governance Research Center, Beijing, China
Jianyong Duan
Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing, 100144, China
Jianyong Duan, Tianxiao Ji, Mingli Wu & Hao Wang

Authors

Jianyong Duan
View author publications
You can also search for this author in PubMed Google Scholar
Tianxiao Ji
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianyong Duan .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
University of Pau and Pays de l'Adour, Pau, France
Richard Chbeir
University of Pau and Pays de l'Adour, Pau, France
Ernesto Exposito
University of Pau and Pays de l'Adour, Pau, France
Philippe Aniorté
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, J., Ji, T., Wu, M., Wang, H. (2019). Query Error Correction Algorithm Based on Fusion Sequence to Sequence Model. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-28374-2_2
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28373-5
Online ISBN: 978-3-030-28374-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics