Skip to main content

Word Embedding-Based Reformulation for Long Queries in Information Search

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12432))

Included in the following conference series:

Abstract

It has been found that very often long queries are more challenging than short queries for information search engines to obtain good results. In this paper, we present a word embedding-based approach. First short queries or concepts are extracted from the original query. Then with the help of a trained word embedding model, all of the query elements go through a series of reformulation operations including deletion, substitution, and addition of terms so as to obtain more profitable query representations. Finally all the reformulated elements are linearly combined with the original query. Experiments are conducted on three TREC collections, and the experimental results show that the proposed method is able to improve retrieval performance on average and especially effective for long queries. Compared with several state-of-the-art baseline methods, the proposed method is very good.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://searchengineland.com/20-googles-limits-may-not-know-exist-281387, retrieved on 21 January, 2020.

  2. 2.

    https://nlp.stanford.edu/projects/glove/.

  3. 3.

    http://blog.conceptnet.io/posts/2016/conceptnet-numberbatch-a-new-name-for-the-best-word-embeddings-you-can-download/.

  4. 4.

    http://plg.uwaterloo.ca/~gvcormac/clueweb09spam.

  5. 5.

    http://www.lemurproject.org.

  6. 6.

    https://pypi.org/project/MontyLingua.

References

  1. Bendersky, M., Bruce Croft, W.: Discovering key concepts in long queries. In: Proceedings of SIGIR 2008, pp. 491–498 (2008). https://doi.org/10.1145/1390334.1390419

  2. Huston, S., Bruce Croft, W.: Evaluating verbose query processing techniques. In: Proceedings of SIGIR 2010, pp. 291–298 (2010). https://doi.org/10.1145/1835449.1835499

  3. Park, J.H., Bruce Croft, W.: Query term ranking based on dependency parsing of long queries. In: Proceedings of SIGIR 2010, pp. 829–830 (2010). https://doi.org/10.1145/1835449.1835637

  4. Xue, X., Bruce Croft, W.: Modeling subset distributions for long queries. In: Proceedings of SIGIR 2011, pp. 1133–1134 (2011). https://doi.org/10.1145/2009916.2010085

  5. Maxwell, K.T., Bruce Croft, W.: Compact query term selection using topically related text. In: Proceedings of SIGIR 2013, pp. 583–592 (2013). https://doi.org/10.1145/2484028.2484096

  6. Yang, B., Parikh, N., Singh, G.: A study of query term deletion using large-scale e-commerce search logs. In: Proceedings of ECowlIR 2014, pp. 235–246 (2014). https://doi.org/10.1007/978-3-319-06028-6_20

  7. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  8. Zheng, G., Callam, J.: Learning to reweight terms with distributed representations. In Proceedings of SIGIR 2015, pp. 575–584. https://doi.org/10.1145/2766462.2767700

  9. Zamani, H., Bruce Corft, W.: Embedding-based query language models. In: Proceedings of ICTIR 2016, pp. 147–156 (2016). https://doi.org/10.1145/2970398.2970405

  10. Fernández-Reyes, F., Valadez, J., Montes-y-Gómez. M: A prospect-guided global query expansion strategy using word embeddings. Inf. Process. Manage. 54(1), 1–13 (2018). https://doi.org/10.1016/j.ipm.2017.09.001

  11. Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for Query Rewriting in Sponsored Search. In: Proceedings of SIGIR 2015, pp. 383–392 (2015). https://doi.org/10.1145/2766462.2767709

  12. Bendersky, M., Metzler, D., Bruce Croft, W.: Learning concept importance using a weighted dependence model. In: Proceedings of WSDM 2010, pp. 31–40 (2010). https://doi.org/10.1145/1718487.1718492

  13. Xue, X., Huston, S., Bruce Croft, W.: Improving long queries using subset distribution. In: Proceedings of CIKM 2010, pp. 1059–1068 (2010). https://doi.org/10.1145/1871437.1871572

  14. Xue, X., Tao, Y., Jiang, D., Li, H.: Automatically mining question reformulation patterns from search log data. In: Proceedings of ACL (2), pp. 187–192 (2012). https://doi.org/10.5555/2390665.2390712

  15. Zamani, H., Bruce Croft, W.: Relevance-based word embedding. In: Proceedings of SIGIR 2017, pp. 505–514 (2017). https://doi.org/10.1145/3077136.3080831

  16. Gupta, M., Bendersky, M.: Information retrieval with verbose queries. Found. Trends Inf. Retriev. 9(3–4), 91–208 (2015). https://doi.org/10.1561/9781680830453

    Article  Google Scholar 

  17. Bagheri, E., Ensan, F., Al-Obeidat, F.: Neural word and entity embeddings for ad hoc retrieval. Inf. Process. Manage. 54(4), 657–673 (2018). https://doi.org/10.1016/j.ipm.2018.04.007

    Article  Google Scholar 

  18. Amer, N O., Mulhem, P., Gery, M.: Toward word embedding for personalized information retrieval. CoRR abs/1606.06991 (2016)

    Google Scholar 

  19. Xue, X., Bruce Croft, W.: Generating reformulation trees for complex queries. In: Proceedings of SIGIR 2012, pp. 525–534 (2012). https://doi.org/10.1145/2348283.2348355

  20. Liu, X., Nie, J.-Y., Sordoni, A.: Constraining word embeddings by prior knowledge – application to medical information retrieval. In: Ma, S., et al. (eds.) AIRS 2016. LNCS, vol. 9994, pp. 155–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48051-0_12

    Chapter  Google Scholar 

  21. El Mahdaouy, A., El Alaoui Ouatik, S., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4) (2019). https://doi.org/10.1177/0165551518792210

  22. Bae, K., Ko, Y.: Efficient question classification and retrieval using category information and word embedding on cQA services. J. Intell. Inf. Syst. 53(1), 27–49 (2019). https://doi.org/10.1007/s10844-019-00556-x

    Article  Google Scholar 

  23. Zhang, Z., Xu, C., Wu, S.: Evaluation of Score standardization methods for web search in support of results diversification. In: Proceedings of WISA 2018, pp. 182–190 (2018). https://doi.org/10.1007/978-3-030-02934-0_17

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengli Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, W., Wang, Y., Huang, C., Wu, S. (2020). Word Embedding-Based Reformulation for Long Queries in Information Search. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60029-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60028-0

  • Online ISBN: 978-3-030-60029-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics