skip to main content
10.1145/3681780.3697252acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research

Published: 04 November 2024 Publication History

Abstract

Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science, especially in high-impact journals, such Nature Portfolios. However, traditional methods, relying on keyword searches and basic NLP techniques, often fail to uncover valuable insights not explicitly stated in article titles or keywords. These approaches are unable to perform semantic searches and contextual understanding, limiting their effectiveness in classifying topics and characterizing studies. In this paper, we address these limitations by leveraging Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis. We developed a technical workflow that integrates a vector database, Sentence Transformers, a Gaussian Mixture Model (GMM), Retrieval Agent, and Large Language Models (LLMs) to enable contextual search, topic ranking, and characterization of research using customized prompt templates. A pilot study analyzing 223 urban science-related articles published in Nature Communications over the past decade highlights the effectiveness of our approach in generating insightful summary statistics on the quality, scope, and characteristics of papers in high-impact journals. This study introduces a new paradigm for enhancing bibliometric analysis and knowledge retrieval in urban research, positioning an AI agent as a powerful tool for advancing research evaluation and understanding.

References

[1]
Xieling Chen and Haoran Xie. 2020. A structural topic modeling-based bibliometric study of sentiment analysis literature. Cognitive Computation 12 (2020), 1097--1129.
[2]
Naveen Donthu, Satish Kumar, Debmalya Mukherjee, Nitesh Pandey, and Weng Marc Lim. 2021. How to conduct a bibliometric analysis: An overview and guidelines. Journal of business research 133 (2021), 285--296.
[3]
Nino Fijačko, Ruth Masterson Creber, Benjamin S Abella, Primož Kocbek, Špela Metličar, Robert Greif, and Gregor Štiglic. 2024. Using generative artificial intelligence in bibliometric analysis: 10 years of research trends from the European Resuscitation Congresses. Resuscitation Plus 18 (2024), 100584.
[4]
Ye-na Gan, Duo-duo Li, Nicola Robinson, and Jian-ping Liu. 2022. Practical guidance on bibliometric analysis and mapping knowledge domains methodology-A summary. European Journal of Integrative Medicine 56 (2022), 102203.
[5]
Yi-Ming Guo, Zhen-Ling Huang, Ji Guo, Hua Li, Xing-Rong Guo, and Mpeoane Judith Nkeli. 2019. Bibliometric analysis on smart cities research. Sustainability 11, 13 (2019), 3606.
[6]
Jianyuan Liang, Anqi Zhao, Shuyang Hou, Fengying Jin, and Huayi Wu. 2024. A GPT-enhanced framework on knowledge extraction and reuse for geographic analysis models in Google Earth Engine. International Journal of Digital Earth 17, 1 (2024), 2398063.
[7]
Luis Javier Cabeza Ramírez, Sandra M Sánchez-Cañizares, and Fernando J Fuentes-García. 2019. Past themes and tracking research trends in entrepreneurship: A co-word, cites and usage count analysis. Sustainability 11, 11 (2019), 3121.
[8]
Rodrigo Romero-Silva and Sander De Leeuw. 2021. Learning from the past to shape the future: A comprehensive text mining analysis of OR/MS reviews. Omega 100 (2021), 102388.
[9]
Iqra Safder and Saeed-Ul Hassan. 2019. Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications. Scientometrics 119 (2019), 257--277.
[10]
Jose Tupayachi, Haowen Xu, Olufemi A Omitaomu, Mustafa Can Camur, Aliza Sharmin, and Xueping Li. 2024. Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology Using Large Language Models---A Case in Optimizing Intermodal Freight Transportation. Smart Cities 7, 5 (2024), 2392--2421.
[11]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345.
[12]
Min-Hsien Weng, Shaoqun Wu, and Mark Dyer. 2022. Identification and visualization of key topics in scientific publications with transformer-based language models and document clustering methods. Applied Sciences 12, 21 (2022), 11220.
[13]
Haowen Xu, Femi Omitaomu, Soheil Sabri, Sisi Zlatanova, Xiao Li, and Yongze Song. 2024. Leveraging Generative AI for Urban Digital Twins: A Scoping Review on the Autonomous Generation of Urban Data, Scenarios, Designs, and 3D City Models for Smart City Advancement. arXiv preprint arXiv:2405.19464 (2024). https://doi.org/10.48550/arXiv.2405.19464 arXiv:2405.19464 [cs.AI] Computer Science > Artificial Intelligence.
[14]
Haowen Xu, Jinghui Yuan, Anye Zhou, Guanhao Xu, Wan Li, Xinyue Ye, et al. 2024. GenAI-powered Multi-Agent Paradigm for Smart Urban Mobility: Opportunities and Challenges for Integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) with Intelligent Transportation Systems. arXiv preprint arXiv:2409.00494 (2024).
[15]
Li Zhao, Zhi-ying Tang, and Xin Zou. 2019. Mapping the knowledge domain of smart-city research: A bibliometric and scientometric analysis. Sustainability 11, 23 (2019), 6648.

Index Terms

  1. Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            UrbanAI '24: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI
            October 2024
            68 pages
            ISBN:9798400711565
            DOI:10.1145/3681780
            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 04 November 2024

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Bibliometrics Analysis
            2. Large Language Models
            3. Retrieval-Augmented Generation
            4. Transformers

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            Conference

            SIGSPATIAL '24
            Sponsor:

            Acceptance Rates

            UrbanAI '24 Paper Acceptance Rate 9 of 12 submissions, 75%;
            Overall Acceptance Rate 9 of 12 submissions, 75%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 129
              Total Downloads
            • Downloads (Last 12 months)129
            • Downloads (Last 6 weeks)34
            Reflects downloads up to 02 Mar 2025

            Other Metrics

            Citations

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media