Abstract
It has been found that very often long queries are more challenging than short queries for information search engines to obtain good results. In this paper, we present a word embedding-based approach. First short queries or concepts are extracted from the original query. Then with the help of a trained word embedding model, all of the query elements go through a series of reformulation operations including deletion, substitution, and addition of terms so as to obtain more profitable query representations. Finally all the reformulated elements are linearly combined with the original query. Experiments are conducted on three TREC collections, and the experimental results show that the proposed method is able to improve retrieval performance on average and especially effective for long queries. Compared with several state-of-the-art baseline methods, the proposed method is very good.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://searchengineland.com/20-googles-limits-may-not-know-exist-281387, retrieved on 21 January, 2020.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Bendersky, M., Bruce Croft, W.: Discovering key concepts in long queries. In: Proceedings of SIGIR 2008, pp. 491–498 (2008). https://doi.org/10.1145/1390334.1390419
Huston, S., Bruce Croft, W.: Evaluating verbose query processing techniques. In: Proceedings of SIGIR 2010, pp. 291–298 (2010). https://doi.org/10.1145/1835449.1835499
Park, J.H., Bruce Croft, W.: Query term ranking based on dependency parsing of long queries. In: Proceedings of SIGIR 2010, pp. 829–830 (2010). https://doi.org/10.1145/1835449.1835637
Xue, X., Bruce Croft, W.: Modeling subset distributions for long queries. In: Proceedings of SIGIR 2011, pp. 1133–1134 (2011). https://doi.org/10.1145/2009916.2010085
Maxwell, K.T., Bruce Croft, W.: Compact query term selection using topically related text. In: Proceedings of SIGIR 2013, pp. 583–592 (2013). https://doi.org/10.1145/2484028.2484096
Yang, B., Parikh, N., Singh, G.: A study of query term deletion using large-scale e-commerce search logs. In: Proceedings of ECowlIR 2014, pp. 235–246 (2014). https://doi.org/10.1007/978-3-319-06028-6_20
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Zheng, G., Callam, J.: Learning to reweight terms with distributed representations. In Proceedings of SIGIR 2015, pp. 575–584. https://doi.org/10.1145/2766462.2767700
Zamani, H., Bruce Corft, W.: Embedding-based query language models. In: Proceedings of ICTIR 2016, pp. 147–156 (2016). https://doi.org/10.1145/2970398.2970405
Fernández-Reyes, F., Valadez, J., Montes-y-Gómez. M: A prospect-guided global query expansion strategy using word embeddings. Inf. Process. Manage. 54(1), 1–13 (2018). https://doi.org/10.1016/j.ipm.2017.09.001
Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for Query Rewriting in Sponsored Search. In: Proceedings of SIGIR 2015, pp. 383–392 (2015). https://doi.org/10.1145/2766462.2767709
Bendersky, M., Metzler, D., Bruce Croft, W.: Learning concept importance using a weighted dependence model. In: Proceedings of WSDM 2010, pp. 31–40 (2010). https://doi.org/10.1145/1718487.1718492
Xue, X., Huston, S., Bruce Croft, W.: Improving long queries using subset distribution. In: Proceedings of CIKM 2010, pp. 1059–1068 (2010). https://doi.org/10.1145/1871437.1871572
Xue, X., Tao, Y., Jiang, D., Li, H.: Automatically mining question reformulation patterns from search log data. In: Proceedings of ACL (2), pp. 187–192 (2012). https://doi.org/10.5555/2390665.2390712
Zamani, H., Bruce Croft, W.: Relevance-based word embedding. In: Proceedings of SIGIR 2017, pp. 505–514 (2017). https://doi.org/10.1145/3077136.3080831
Gupta, M., Bendersky, M.: Information retrieval with verbose queries. Found. Trends Inf. Retriev. 9(3–4), 91–208 (2015). https://doi.org/10.1561/9781680830453
Bagheri, E., Ensan, F., Al-Obeidat, F.: Neural word and entity embeddings for ad hoc retrieval. Inf. Process. Manage. 54(4), 657–673 (2018). https://doi.org/10.1016/j.ipm.2018.04.007
Amer, N O., Mulhem, P., Gery, M.: Toward word embedding for personalized information retrieval. CoRR abs/1606.06991 (2016)
Xue, X., Bruce Croft, W.: Generating reformulation trees for complex queries. In: Proceedings of SIGIR 2012, pp. 525–534 (2012). https://doi.org/10.1145/2348283.2348355
Liu, X., Nie, J.-Y., Sordoni, A.: Constraining word embeddings by prior knowledge – application to medical information retrieval. In: Ma, S., et al. (eds.) AIRS 2016. LNCS, vol. 9994, pp. 155–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48051-0_12
El Mahdaouy, A., El Alaoui Ouatik, S., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4) (2019). https://doi.org/10.1177/0165551518792210
Bae, K., Ko, Y.: Efficient question classification and retrieval using category information and word embedding on cQA services. J. Intell. Inf. Syst. 53(1), 27–49 (2019). https://doi.org/10.1007/s10844-019-00556-x
Zhang, Z., Xu, C., Wu, S.: Evaluation of Score standardization methods for web search in support of results diversification. In: Proceedings of WISA 2018, pp. 182–190 (2018). https://doi.org/10.1007/978-3-030-02934-0_17
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yan, W., Wang, Y., Huang, C., Wu, S. (2020). Word Embedding-Based Reformulation for Long Queries in Information Search. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-60029-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)