Abstract
The exponential growth of academic literature presents significant challenges for researchers attempting to find relevant information. Traditional keyword-based retrieval systems often fail to address issues such as synonyms, homonyms, and semantic nuances, leading to suboptimal search results. This paper introduces a novel system called IntelliSMART (Intelligent Semantic Machine-Assisted Research Tool), which leverages large language models (LLMs) and advanced semantic processing techniques to improve the retrieval of academic literature. Our approach integrates query rewriting, embedding generation, efficient indexing, and complex article retrieval mechanisms to provide highly accurate and contextually relevant results that align with the user’s intent. The IntelliSMART system features a user-friendly front end that facilitates intuitive query input, along with a robust back end for handling user queries, generating embeddings, indexing extensive collections of academic papers, and efficiently retrieving the most relevant documents. The proposed system shows significant improvements over conventional methods, highlighting its potential to transform the search experience in academic research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu, Y., et al.: Large language models for information retrieval: a survey, 2024
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edn. Addison-Wesley Publishing Company, USA (2011)
Mao, R., et al.: A survey on semantic processing techniques. Inf. Fusion 101, 101988 (2024). https://www.sciencedirect.com/science/article/pii/S1566253523003044
Abdul-Jaleel, N., et al.: Umass at trec 2004: novelty and hard, Computer Science Department Faculty Publication Series, p. 189, 2004
Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 403–410, 2001
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479, 2005
Zheng, Z., Hui, K., He, B., Han, X., Sun, L., Yates, A.: Bert-qe: contextualized query expansion for document re-ranking, arXiv preprint arXiv:2009.07258, 2020
Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings, arXiv preprint arXiv:1605.07891, 2016
Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1929–1932, 2016
Mackie, I., Sekulic, I., Chatterjee, S., Dalton, J., Crestani, F.: GRM: generative relevance modeling using relevance-aware sample estimation for document retrieval, arXiv preprint arXiv:2306.09938, 2023
Srinivasan, K., Raman, K., Samanta, A., Liao, L., Bertelli, L., Bendersky, M.: Quill: query intent with large language models using retrieval augmentation and multi-stage distillation, arXiv preprint arXiv:2210.15718, 2022
Feng, J., et al.: Knowledge refinement via interaction between search engines and large language models, arXiv preprint arXiv:2305.07402, 2023
Mackie, I., Chatterjee, S., Dalton, J.: Generative and pseudo-relevant feedback for sparse, dense and learned sparse retrieval, arXiv preprint arXiv:2305.07477, 2023
Gao, L., Ma, X., Lin, J., Callan, J.: Precise zero-shot dense retrieval without relevance labels, arXiv preprint arXiv:2212.10496, 2022
Jagerman, R., Zhuang, H., Qin, Z., Wang, X., Bendersky, M.: Query expansion by prompting large language models, arXiv preprint arXiv:2305.03653, 2023
Tang, Y., Qiu, R., Li, X.: Prompt-based effective input reformulation for legal case retrieval. In: Australasian Database Conference. Springer, pp. 87–100, 2023
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. Nist Spec. Publ. Sp 109, 109 (1995)
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering, arXiv preprint arXiv:2004.04906, 2020
Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval, arXiv preprint arXiv:2007.00808, 2020
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018
Neelakantan, A., et al.: Text and code embeddings by contrastive pre-training, arXiv preprint arXiv:2201.10005, 2022
Ma, X., Wang, L., Yang, N., Wei, F., Lin, J.: Fine-tuning llama for multi-stage text retrieval, arXiv preprint arXiv:2310.08319, 2023
Asai, A., et al.: Task-aware retrieval with instructions, arXiv preprint arXiv:2211.09260, 2022
Wei, J., et al.: Finetuned language models are zero-shot learners, arXiv preprint arXiv:2109.01652, 2021
Li, M., et al.: Generate, filter, and fuse: Query expansion via multi-step keyword generation for zero-shot neural rankers, arXiv preprint arXiv:2311.09175, 2023
Anand, A., Setty, V., Anand, A.,et al.: Context aware query rewriting for text rankers using llm, arXiv preprint arXiv:2308.16753, 2023
Li, J., Tang, T., Zhao, W.X., Nie, J.-Y., Wen, J.-R.: Pretrained language models for text generation: a survey, arXiv preprint arXiv:2201.05273, 2022
Mitra, B., Craswell, N.: Neural models for information retrieval, arXiv preprint arXiv:1705.01509, 2017
Li, Z., Zhang, X., Zhang, Y., Long, D., Xie, P., Zhang, M.: Towards general text embeddings with multi-stage contrastive learning, arXiv preprint arXiv:2308.03281, 2023
arXiv.org submitters, “arxiv dataset,” 2024. https://www.kaggle.com/dsv/7548853
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khatri, A., Egierski, N., Pochamreddy, A., Alhamadani, A., Sarkar, S., Lu, CT. (2025). IntelliSMART: Intelligent Semantic Machine-Assisted Research Tool. In: Aiello, L.M., Chakraborty, T., Gaito, S. (eds) Social Networks Analysis and Mining. ASONAM 2024. Lecture Notes in Computer Science, vol 15214. Springer, Cham. https://doi.org/10.1007/978-3-031-78554-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-78554-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78553-5
Online ISBN: 978-3-031-78554-2
eBook Packages: Computer ScienceComputer Science (R0)