Skip to main content

Query Error Correction Algorithm Based on Fusion Sequence to Sequence Model

  • Conference paper
  • First Online:
  • 1769 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Abstract

The query error correction task is very important to improve user satisfaction and quality of query results. In traditional query error correction methods researchers mostly use a pipeline way to correct the error step by step. They rely heavily on manual annotation corpora. It is difficult to take into account the global effect. In this paper, we present a character-based end-to-end Sequence to Sequence (Seq2Seq) method with attention mechanism. It also incorporates the neural network language model trained on unlabeled corpora to solve the task of query correction. It can unify the modeling of different error types in query error correction and effectively overcome the shortcomings of traditional methods in query error correction tasks. Experiments show that this method can effectively capture the long-distance knowledge to correct errors, and through the Simple Recurrent Unit (SRU) it can be as good as Long Short-Term Memory (LSTM). However, there has been a significant improvement in processing time. This point is very important in query error correction tasks.

This work was supported by the National Natural Science Foundation of China (61672040) and the North China University of Technology Startup Fund.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)

    MATH  Google Scholar 

  2. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics (2000)

    Google Scholar 

  3. Chen, Q., Li, M., Zhou, M.: Improving query spelling correction using web search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)

    Google Scholar 

  4. Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  5. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  6. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Ganjisaffar, Y., et al.: qSpell: spelling correction of web search queries using ranking models and iterative correction. In: Spelling Alteration for Web Search Workshop, p. 15 (2011)

    Google Scholar 

  8. Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 358–366. Association for Computational Linguistics (2010)

    Google Scholar 

  9. Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv preprint arXiv:1503.03535 (2015)

  10. Hagen, M., Potthast, M., Gohsen, M., Rathgeber, A., Stein, B.: A large-scale query spelling correction corpus. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1261–1264. ACM (2017)

    Google Scholar 

  11. Hinton, G., Srivastava, N., Swersky, K.: Rmsprop: Divide the gradient by a running average of its recent magnitude. Neural Networks for Machine Learning, Coursera lecture 6e (2012)

    Google Scholar 

  12. Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990)

    Google Scholar 

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025–1032. Association for Computational Linguistics (2006)

    Google Scholar 

  15. Li, Y., Duan, H., Zhai, C.: Cloudspeller: query spelling correction by using a Unified Hidden Markov model with web-scale resources. In: Proceedings of the 21st International Conference on World Wide Web, pp. 561–562. ACM (2012)

    Google Scholar 

  16. Li, Y., Duan, H., Zhai, C.: A generalized hidden Markov model with discriminative training for query spelling correction. In: Proceedings of the 35th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 611–620. ACM (2012)

    Google Scholar 

  17. Luec, G.: A data-driven approach for correcting search quaries. In: Spelling Alteration for Web Search Workshop, p. 6 (2011)

    Google Scholar 

  18. Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. arXiv preprint arXiv:1704.04289 (2017)

  19. Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manag. 27(5), 517–522 (1991)

    Article  Google Scholar 

  20. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)

  21. Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239–248. ACM (2005)

    Google Scholar 

  22. Sriram, A., Jun, H., Satheesh, S., Coates, A.: Cold fusion: training Seq2Seq models together with language models. arXiv preprint arXiv:1708.06426 (2017)

  23. Sun, X., Gao, J., Micol, D., Quirk, C.: Learning phrase-based spelling error models from clickthrough data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 266–274. Association for Computational Linguistics (2010)

    Google Scholar 

  24. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)

    Google Scholar 

  25. Wang, K., Pedersen, J.: Review of MSR-Bing web scale speller challenge. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1339–1340. ACM (2011)

    Google Scholar 

  26. Whitelaw, C., Hutchinson, B., Chung, G.Y., Ellis, G.: Using the web for language independent spellchecking and autocorrection. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, vol. 2, pp. 890–899. Association for Computational Linguistics (2009)

    Google Scholar 

  27. Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model. arXiv preprint arXiv:1711.03953 (2017)

  28. Zhang, Y., He, P., Xiang, W., Li, M.: Discriminative reranking for spelling correction. In: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, pp. 64–71 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianyong Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duan, J., Ji, T., Wu, M., Wang, H. (2019). Query Error Correction Algorithm Based on Fusion Sequence to Sequence Model. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28374-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28373-5

  • Online ISBN: 978-3-030-28374-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics