Abstract
This study focuses on keyword generation of customer feedback analysis within the tourism sector regarding tour and hotel services provided by Setur. A dataset comprising 1000 customer surveys from 2020–2022 was crafted by annotating keywords gleaned from open-ended questions. The research employs the efficacy of fine-tuning Pre-trained Language Models (PLMs) and employing Large Language Models (LLMs) through prompting for keyword generation. The study navigates through traditional statistical methods, such as TF-IDF and KP-MINER, and contemporary approaches, such as YAKE, PageRank, and TextRank. Additionally, it explores supervised methodologies, including KEA and sequence-to-sequence models, juxtaposed against the surge in popularity of pre-trained language models like T5, Bart, GPT3.5, GPT4, and Gemini. In our experimental approach, multilingual versions of T5 and Bart models, namely MT5 and MBART, were fine-tuned to prior studies. Comparison extended to include GPT3.5, GPT4, and Gemini models, utilizing diverse prompt styles, and introducing few-shot examples to discern changes in performance. The semantic similarity evaluations between generated keywords and the source text, text length metrics, and inter-keyword semantic similarity are presented.
Along with MBART and MT5, Turkish language models TRBART and TRT5 were employed. The study not only aims to contribute insights into the domain of customer feedback analysis but also to serve as a benchmark for comparing the efficiency of PLM fine-tuning and LLM prompting in keyword generation from textual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Ayan, E.T., Arslan, R., Zengin, M.S., Duru, H.A., Salman, S., Bardak, B.: Turkish keyphrase extraction from web pages with BERT. In: 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2021)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
dbmdz: bert-base-turkish-cased. https://huggingface.co/dbmdz/bert-base-turkish-cased
El-Beltagy, S.R., Rafea, A.: Kp-miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)
facebook: mbart-large-cc25. [https://huggingface.co/facebook/mbart-large-cc25
Glazkova, A., Morozov, D.: Applying transformer-based text summarization for keyphrase generation. Lobachevskii J. Math. 44(1), 123–136 (2023)
Google: mt5-base. https://huggingface.co/google/mt5-base
Grootendorst, M.: KeyBERT: minimal keyword extraction with BERT (2020). https://doi.org/10.5281/zenodo.4461265
Kim, Y.J., Kim, H.S.: The impact of hotel customer experience on customer satisfaction through online reviews. Sustainability 14(2), 848 (2022)
Kulkarni, M., Mahata, D., Arora, R., Bhowmik, R.: Learning rich representation of keyphrases from text. arXiv preprint arXiv:2112.08547 (2021)
Lee, C.K.H., Tse, Y.K., Zhang, M., Ma, J.: Analysing online reviews to investigate customer behaviour in the sharing economy: the case of airBNB. Inf. Technol. People 33(3), 945–961 (2020)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Maragheh, R.Y., et al.: LLM-take: theme-aware keyword extraction using large language models. In: 2023 IEEE International Conference on Big Data (BigData), pp. 4318–4324. IEEE (2023)
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
mukayese: transformer-turkish-summarization. https://huggingface.co/mukayese/transformer-turkish-summarization
Park, S., Kim, H.M.: Improving the accuracy and diversity of feature extraction from online reviews using keyword embedding and two clustering methods. In: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, vol. 84003, p. V11AT11A020. American Society of Mechanical Engineers (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Safaya, A., Kurtuluş, E., Göktoğan, A., Yuret, D.: Mukayese: Turkish NLP strikes back. arXiv preprint arXiv:2203.01215 (2022)
Skoghäll, T., Öhman, D.: Summarization and keyword extraction on customer feedback data: comparing different unsupervised methods for extracting trends and insight from text (2022)
Song, M., et al.: Is ChatGPT a good keyphrase generator? A preliminary study. arXiv preprint arXiv:2303.13001 (2023)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Team, G., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
Turkish-NLP: t5-efficient-base-turkish. https://huggingface.co/Turkish-NLP/t5-efficient-base-turkish
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM Conference on Digital Libraries, pp. 254–255 (1999)
Wu, D., Ahmad, W.U., Chang, K.W.: Pre-trained language models for keyphrase generation: a thorough empirical study. arXiv preprint arXiv:2212.10233 (2022)
Xue, L., et al.: mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Er, A., Diri, B., Yöndem, M.T. (2024). LLM Prompting Versus Fine-Tuning PLMs: A Comparative Study on Keyword Generation from Customer Feedback. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Avlonitis, M., Papaleonidas, A. (eds) Artificial Intelligence Applications and Innovations. AIAI 2024. IFIP Advances in Information and Communication Technology, vol 712. Springer, Cham. https://doi.org/10.1007/978-3-031-63215-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-63215-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63214-3
Online ISBN: 978-3-031-63215-0
eBook Packages: Computer ScienceComputer Science (R0)