Word Embedding-Based Reformulation for Long Queries in Information Search

Yan, Wei; Wang, Yarong; Huang, Chunlan; Wu, Shengli

doi:10.1007/978-3-030-60029-7_19

Wei Yan¹⁴,
Yarong Wang¹⁴,
Chunlan Huang¹⁴ &
…
Shengli Wu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12432))

Included in the following conference series:

International Conference on Web Information Systems and Applications

1723 Accesses
1 Citations

Abstract

It has been found that very often long queries are more challenging than short queries for information search engines to obtain good results. In this paper, we present a word embedding-based approach. First short queries or concepts are extracted from the original query. Then with the help of a trained word embedding model, all of the query elements go through a series of reformulation operations including deletion, substitution, and addition of terms so as to obtain more profitable query representations. Finally all the reformulated elements are linearly combined with the original query. Experiments are conducted on three TREC collections, and the experimental results show that the proposed method is able to improve retrieval performance on average and especially effective for long queries. Compared with several state-of-the-art baseline methods, the proposed method is very good.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://searchengineland.com/20-googles-limits-may-not-know-exist-281387, retrieved on 21 January, 2020.
2.
https://nlp.stanford.edu/projects/glove/.
3.
http://blog.conceptnet.io/posts/2016/conceptnet-numberbatch-a-new-name-for-the-best-word-embeddings-you-can-download/.
4.
http://plg.uwaterloo.ca/~gvcormac/clueweb09spam.
5.
http://www.lemurproject.org.
6.
https://pypi.org/project/MontyLingua.

References

Bendersky, M., Bruce Croft, W.: Discovering key concepts in long queries. In: Proceedings of SIGIR 2008, pp. 491–498 (2008). https://doi.org/10.1145/1390334.1390419
Huston, S., Bruce Croft, W.: Evaluating verbose query processing techniques. In: Proceedings of SIGIR 2010, pp. 291–298 (2010). https://doi.org/10.1145/1835449.1835499
Park, J.H., Bruce Croft, W.: Query term ranking based on dependency parsing of long queries. In: Proceedings of SIGIR 2010, pp. 829–830 (2010). https://doi.org/10.1145/1835449.1835637
Xue, X., Bruce Croft, W.: Modeling subset distributions for long queries. In: Proceedings of SIGIR 2011, pp. 1133–1134 (2011). https://doi.org/10.1145/2009916.2010085
Maxwell, K.T., Bruce Croft, W.: Compact query term selection using topically related text. In: Proceedings of SIGIR 2013, pp. 583–592 (2013). https://doi.org/10.1145/2484028.2484096
Yang, B., Parikh, N., Singh, G.: A study of query term deletion using large-scale e-commerce search logs. In: Proceedings of ECowlIR 2014, pp. 235–246 (2014). https://doi.org/10.1007/978-3-319-06028-6_20
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Zheng, G., Callam, J.: Learning to reweight terms with distributed representations. In Proceedings of SIGIR 2015, pp. 575–584. https://doi.org/10.1145/2766462.2767700
Zamani, H., Bruce Corft, W.: Embedding-based query language models. In: Proceedings of ICTIR 2016, pp. 147–156 (2016). https://doi.org/10.1145/2970398.2970405
Fernández-Reyes, F., Valadez, J., Montes-y-Gómez. M: A prospect-guided global query expansion strategy using word embeddings. Inf. Process. Manage. 54(1), 1–13 (2018). https://doi.org/10.1016/j.ipm.2017.09.001
Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for Query Rewriting in Sponsored Search. In: Proceedings of SIGIR 2015, pp. 383–392 (2015). https://doi.org/10.1145/2766462.2767709
Bendersky, M., Metzler, D., Bruce Croft, W.: Learning concept importance using a weighted dependence model. In: Proceedings of WSDM 2010, pp. 31–40 (2010). https://doi.org/10.1145/1718487.1718492
Xue, X., Huston, S., Bruce Croft, W.: Improving long queries using subset distribution. In: Proceedings of CIKM 2010, pp. 1059–1068 (2010). https://doi.org/10.1145/1871437.1871572
Xue, X., Tao, Y., Jiang, D., Li, H.: Automatically mining question reformulation patterns from search log data. In: Proceedings of ACL (2), pp. 187–192 (2012). https://doi.org/10.5555/2390665.2390712
Zamani, H., Bruce Croft, W.: Relevance-based word embedding. In: Proceedings of SIGIR 2017, pp. 505–514 (2017). https://doi.org/10.1145/3077136.3080831
Gupta, M., Bendersky, M.: Information retrieval with verbose queries. Found. Trends Inf. Retriev. 9(3–4), 91–208 (2015). https://doi.org/10.1561/9781680830453
Article Google Scholar
Bagheri, E., Ensan, F., Al-Obeidat, F.: Neural word and entity embeddings for ad hoc retrieval. Inf. Process. Manage. 54(4), 657–673 (2018). https://doi.org/10.1016/j.ipm.2018.04.007
Article Google Scholar
Amer, N O., Mulhem, P., Gery, M.: Toward word embedding for personalized information retrieval. CoRR abs/1606.06991 (2016)
Google Scholar
Xue, X., Bruce Croft, W.: Generating reformulation trees for complex queries. In: Proceedings of SIGIR 2012, pp. 525–534 (2012). https://doi.org/10.1145/2348283.2348355
Liu, X., Nie, J.-Y., Sordoni, A.: Constraining word embeddings by prior knowledge – application to medical information retrieval. In: Ma, S., et al. (eds.) AIRS 2016. LNCS, vol. 9994, pp. 155–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48051-0_12
Chapter Google Scholar
El Mahdaouy, A., El Alaoui Ouatik, S., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4) (2019). https://doi.org/10.1177/0165551518792210
Bae, K., Ko, Y.: Efficient question classification and retrieval using category information and word embedding on cQA services. J. Intell. Inf. Syst. 53(1), 27–49 (2019). https://doi.org/10.1007/s10844-019-00556-x
Article Google Scholar
Zhang, Z., Xu, C., Wu, S.: Evaluation of Score standardization methods for web search in support of results diversification. In: Proceedings of WISA 2018, pp. 182–190 (2018). https://doi.org/10.1007/978-3-030-02934-0_17

Download references

Author information

Authors and Affiliations

School of Computer Science, Jiangsu University, Zhenjiang, 212013, China
Wei Yan, Yarong Wang, Chunlan Huang & Shengli Wu

Authors

Wei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yarong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunlan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shengli Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengli Wu .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Rensselaer Polytechnic Institute, Troy, NY, USA
James Hendler
Wuhan University, Wuhan, China
Wei Song
Hohai University, Nanjing, China
Zhuoming Xu
Fuzhou University, Fuzhou, China
Genggeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, W., Wang, Y., Huang, C., Wu, S. (2020). Word Embedding-Based Reformulation for Long Queries in Information Search. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-60029-7_19
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)