Abstract
The employment of various language modelling techniques in the area of information retrieval is gaining wide adoption in the state of the art methods. The precision of the language model enables the solution of the issue of information retrieval in a huge corpus of texts. To accomplish this, these techniques begin by estimating a probabilistic linguistic model for each article in the collection that is capable of generating a ranking of relevant texts in response to a query. One of the difficulties that this family of methods faces is a shortage of data. As a result, smoothing methods capable of changing the maximum likelihood estimator are required to account for the imprecision created. This paper highlights its use surpasses established approaches, such as tf-idf, for creating rankings of documents sorted by relevance. Finally, we examine various ideas related to query expansion by utilizing such methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cummins, R., O’Riordan, C.: Evolving co-occurrence based query expansion schemes in information retrieval using genetic programming. In: AICS 2005, p. 137 (2005)
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 111–119. ACM, New York (2001)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 275–281. Association for Computing Machinery, New York (1998)
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27(3), 129–146 (1976)
Croft, W.B., Harper, D.J.: Using probabilistic models of document retrieval without relevance information. J. Doc. (1979)
Zaragoza, H., Hiemstra, D., Tipping, M.: Bayesian extension to the language model for ad hoc information retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 4–9. Association for Computing Machinery, New York (2003)
Gutkin, A.: Log-linear interpolation of language models. Mémoire de DEA, University of Cambridge (2000)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)
Karras, C., Karras, A., Sioutas, S.: Pattern recognition and event detection on IoT data-streams. arXiv preprint arXiv:2203.01114 (2022)
Karras, C., Karras, A., Avlonitis, M., Sioutas, S.: An overview of MCMC methods: from theory to applications. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 319–332. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08341-9_26
Karras, C., Karras, A., Avlonitis, M., Giannoukou, I., Sioutas, S.: Maximum likelihood estimators on MCMC sampling algorithms for decision making. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 345–356. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08341-9_28
Karras, C., Karras, A.: DBSOP: an efficient heuristic for speedy MCMC sampling on polytopes. arXiv preprint arXiv:2203.10916 (2022)
Karras, A., Karras, C., Drakopoulos, G., Tsolis, D., Mylonas, P., Sioutas, S.: SAF: a peer to peer IoT LoRa system for smart supply chain in agriculture. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022. IFIPAICT, pp. 41–50. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08337-2_4
Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Croft, W.B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. INRE, pp. 11–56. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-017-0171-6_2
Blu, T., Thevenaz, P., Unser, M.: Linear interpolation revitalized. IEEE Trans. Image Process. 13(5), 710–719 (2004)
Falahatgar, M., Ohannessian, M.I., Orlitsky, A.: Near-optimal smoothing of structured conditional probability matrices. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Song, M., Yoo, C.D.: Multimodal representation: Kneser-ney smoothing/skip-gram based neural language model. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2281–2285. IEEE (2016)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, vol. 3 (2008)
Schütze, H.: Dimensions of meaning. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, Supercomputing 1992, pp. 787–796. IEEE Computer Society Press, Washington (1992)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Marin, J., et al.: Recipe1M+: a dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. 43, 187–203 (2019)
Parvez, M.R., Chakraborty, S., Ray, B., Chang, K.W.: Building language models for text with named entities. arXiv preprint arXiv:1805.04836 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Karras, C., Karras, A., Theodorakopoulos, L., Giannoukou, I., Sioutas, S. (2022). Expanding Queries with Maximum Likelihood Estimators and Language Models. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-14054-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)