Automatic Structuring of Topics for Natural Language Generation in Community Question Answering in Programming Domain

Rvanova, Lyudmila; Kovalchuk, Sergey

doi:10.1007/978-3-031-36021-3_33

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14074))

Included in the following conference series:

International Conference on Computational Science

1099 Accesses

Abstract

The present article describes the methodology for the automatic generation of responses on Stack Overflow using GPT-Neo. Specifically, the formation of a dataset and the selection of appropriate samples for experimentation are expounded upon. Comparisons of the quality of generation for various topics, obtained using thematic modeling of the titles of questions and tags, were carried out. In the absence of consideration of the structures and themes of texts, it can be difficult to train models, so the question is being investigated whether thematic modeling of questions can help in solving the problem. Fine-tuning of GPT-neo for each topic is undertaken as a part of experimental process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, T.B., et al.: Language Models are Few-Shot Learners, In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)
Google Scholar
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Technical report, OpenAi (2019)
Google Scholar
OpenAI: GPT-4 Technical Report, eprint arXiv:2303.08774 (2023)
Arora, U., Goyal, N., Goel, A., Sachdeva, N., Kumaraguru, P.: Ask It Right! identifying low-quality questions on community question answering services. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8, Padua, Italy (2022). https://doi.org/10.1109/IJCNN55064.2022.9892454
Muller, B., Soldaini, L., Koncel-Kedziorski, R., Lind, E., Moschitti, A.: Cross-lingual open-domain question answering with answer sentence generation. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, vol. 1, pp. 337–353, Association for Computational Linguistics (2022)
Google Scholar
Blei, D.M., Ng, A.Y., Michael I.: Latent Dirichlet allocation. Jordan J. Mach. Learn. Res. 3, pp. 993–1022 (2003)
Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Weiss, Y., Schölkopf, B., Platt, J., (eds.) Advances in Neural Information Processing Systems (NIPS 2005), vol. 18, pp. 147–154, MIT Press (2005)
Google Scholar
Black, S., et al.: GPT-NeoX-20B: an open-source autoregressive language model. In: Proceedings of BigScience Episode #5 - Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136, Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.bigscience-1.9
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992, Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410

Download references

Acknowledgments

This research is supported by Russian Scientific Foundation and Saint Petersburg Scientific Foundation, grant No. 23-28-10069 “Forecasting social well-being in order to optimize the functioning of the urban digital services ecosystem in St. Petersburg” (https://rscf.ru/project/23-28-10069/).

Author information

Authors and Affiliations

ITMO University, Saint Petersburg, Russia
Lyudmila Rvanova & Sergey Kovalchuk
AIRI, Moscow, Russia
Lyudmila Rvanova

Authors

Lyudmila Rvanova
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Kovalchuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lyudmila Rvanova .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rvanova, L., Kovalchuk, S. (2023). Automatic Structuring of Topics for Natural Language Generation in Community Question Answering in Programming Domain. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-36021-3_33
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36020-6
Online ISBN: 978-3-031-36021-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Structuring of Topics for Natural Language Generation in Community Question Answering in Programming Domain