Expanding Queries with Maximum Likelihood Estimators and Language Models

Karras, Christos; Karras, Aristeidis; Theodorakopoulos, Leonidas; Giannoukou, Ioanna; Sioutas, Spyros

doi:10.1007/978-3-031-14054-9_20

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1431))

Included in the following conference series:

The International Conference on Innovations in Computing Research

657 Accesses
5 Citations

Abstract

The employment of various language modelling techniques in the area of information retrieval is gaining wide adoption in the state of the art methods. The precision of the language model enables the solution of the issue of information retrieval in a huge corpus of texts. To accomplish this, these techniques begin by estimating a probabilistic linguistic model for each article in the collection that is capable of generating a ranking of relevant texts in response to a query. One of the difficulties that this family of methods faces is a shortage of data. As a result, smoothing methods capable of changing the maximum likelihood estimator are required to account for the imprecision created. This paper highlights its use surpasses established approaches, such as tf-idf, for creating rankings of documents sorted by relevance. Finally, we examine various ideas related to query expansion by utilizing such methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cummins, R., O’Riordan, C.: Evolving co-occurrence based query expansion schemes in information retrieval using genetic programming. In: AICS 2005, p. 137 (2005)
Google Scholar
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 111–119. ACM, New York (2001)
Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 275–281. Association for Computing Machinery, New York (1998)
Google Scholar
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27(3), 129–146 (1976)
Article Google Scholar
Croft, W.B., Harper, D.J.: Using probabilistic models of document retrieval without relevance information. J. Doc. (1979)
Google Scholar
Zaragoza, H., Hiemstra, D., Tipping, M.: Bayesian extension to the language model for ad hoc information retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 4–9. Association for Computing Machinery, New York (2003)
Google Scholar
Gutkin, A.: Log-linear interpolation of language models. Mémoire de DEA, University of Cambridge (2000)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)
Article Google Scholar
Karras, C., Karras, A., Sioutas, S.: Pattern recognition and event detection on IoT data-streams. arXiv preprint arXiv:2203.01114 (2022)
Karras, C., Karras, A., Avlonitis, M., Sioutas, S.: An overview of MCMC methods: from theory to applications. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 319–332. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08341-9_26
Chapter Google Scholar
Karras, C., Karras, A., Avlonitis, M., Giannoukou, I., Sioutas, S.: Maximum likelihood estimators on MCMC sampling algorithms for decision making. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 345–356. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08341-9_28
Chapter Google Scholar
Karras, C., Karras, A.: DBSOP: an efficient heuristic for speedy MCMC sampling on polytopes. arXiv preprint arXiv:2203.10916 (2022)
Karras, A., Karras, C., Drakopoulos, G., Tsolis, D., Mylonas, P., Sioutas, S.: SAF: a peer to peer IoT LoRa system for smart supply chain in agriculture. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 41–50. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08337-2_4
Chapter Google Scholar
Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Croft, W.B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. INRE, pp. 11–56. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-017-0171-6_2
Chapter MATH Google Scholar
Blu, T., Thevenaz, P., Unser, M.: Linear interpolation revitalized. IEEE Trans. Image Process. 13(5), 710–719 (2004)
Article MathSciNet Google Scholar
Falahatgar, M., Ohannessian, M.I., Orlitsky, A.: Near-optimal smoothing of structured conditional probability matrices. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Song, M., Yoo, C.D.: Multimodal representation: Kneser-ney smoothing/skip-gram based neural language model. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2281–2285. IEEE (2016)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, vol. 3 (2008)
Google Scholar
Schütze, H.: Dimensions of meaning. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, Supercomputing 1992, pp. 787–796. IEEE Computer Society Press, Washington (1992)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Article Google Scholar
Marin, J., et al.: Recipe1M+: a dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. 43, 187–203 (2019)
Article Google Scholar
Parvez, M.R., Chakraborty, S., Ray, B., Chang, K.W.: Building language models for text with named entities. arXiv preprint arXiv:1805.04836 (2018)

Download references

Author information

Authors and Affiliations

Computer Engineering and Informatics Department, University of Patras, 26504, Patras, Greece
Christos Karras, Aristeidis Karras & Spyros Sioutas
Department of Management Science and Technology, University of Patras, 26334, Patras, Greece
Leonidas Theodorakopoulos & Ioanna Giannoukou

Authors

Christos Karras
View author publications
You can also search for this author in PubMed Google Scholar
Aristeidis Karras
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas Theodorakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Giannoukou
View author publications
You can also search for this author in PubMed Google Scholar
Spyros Sioutas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christos Karras .

Editor information

Editors and Affiliations

University of Detroit Mercy, Detroit, MI, USA
Kevin Daimi
Kent Institute Australia, Sydney, NSW, Australia
Abeer Al Sadoon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karras, C., Karras, A., Theodorakopoulos, L., Giannoukou, I., Sioutas, S. (2022). Expanding Queries with Maximum Likelihood Estimators and Language Models. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-14054-9_20
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Expanding Queries with Maximum Likelihood Estimators and Language Models