Grouping related stack overflow comments for software developer recommendation

Sheth, Viral; Damevski, Kostadin

doi:10.1007/s10515-022-00339-9

Grouping related stack overflow comments for software developer recommendation

Published: 29 April 2022

Volume 29, article number 40, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Viral Sheth¹ &
Kostadin Damevski¹

445 Accesses
Explore all metrics

Abstract

Stack Overflow is a question and answer forum widely used by developers all over the world. Contributors share their knowledge on this platform not only in the form of answers, but also as comments to those answers. With millions of developer-contributed comments, the valuable knowledge contained within them remains difficult to locate by readers. Moreover, Stack Overflow’s comment hiding mechanism that only shows the top five most highly voted comments and hides the remaining leads to wealth condensation. Recently, researchers have observed that the Stack Overflow’s comment display mechanism hides important and relevant comments and makes it difficult for readers to understand the conversational context, as many comments are related to other hidden comments. In this paper, we propose a set of features and a machine learning-based technique to identify the relatedness of pairs of comments. Further, we extend the relatedness into comment clustering, as, with clusters, readers can get the entire context of a set of comments that form a single conversational thread. We evaluate our methods against several baselines to show that they provide strong improvements, although the problem in general is made difficult by the short text and narrow topic of discussion in the comments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 3

Fig. 4

The Use of Artificial Intelligence in Writing Scientific Review Articles

Article Open access 16 January 2024

Melissa A. Kacena, Lilian I. Plotkin & Jill C. Fehrenbacher

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Article Open access 01 September 2023

Ahmed M. Elkhatat, Khaled Elsaid & Saeed Almeer

Academic writing and ChatGPT: Students transitioning into college in the shadow of the COVID-19 pandemic

Article Open access 10 January 2024

Daniela Fontenelle-Tereshchuk

References

Aggarwal, A., López, C., Hsiao, I.H.: The role of comments’ controversy in large-scale online discussion forums. In: Proceedings of the 27th ACM Conference on Hypertext and Social Media, pp. 179–182 (2016)
Calefato, F., Lanubile, F., Novielli, N.: Emotxt: a toolkit for emotion recognition from text. In: 2017 seventh international conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 79–80. IEEE (2017)
Comment hiding mechanism on Stack Overflow. https://meta.stackexchange.com/a/17365/744448. Accessed 29 June 2021
Diyanati, A., Sheykhahmadloo, B.S., Fakhrahmad, S.M., Sadredini, M.H., Diyanati, M.H.: A proposed approach to determining expertise level of Stack Overflow programmers based on mining of user comments. J. Comput. Lang. 61, 1 (2020)
Google Scholar
Elsner, M., Charniak, E.: Disentangling chat. Comput. Linguist. 36(3), 389–409 (2010)
Article Google Scholar
Ericson, J.: Wealth condensation resulting from top n comments system. https://meta.stackexchange.com/a/204579/744448. Accessed 29 June 2021
Imran, M.M., Ciborowska, A., Damevski, K.: Automatically selecting follow-up questions for deficient bug reports. In: Proceedings of the 18th International Conference on Mining Software Repositories (MSR’21) (2021)
Jaydles: Hidden comments contain valueable information. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021
Jaydles: Purpose of users commenting on Stack Overflow questions and answers. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021
Jiang, J.Y., Chen, F., Chen, Y.Y., Wang, W.: Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1812–1822 (2018)
Kummerfeld, J.K., Gouravajhala, S.R., Peper, J., Athreya, V., Gunasekara, C., Ganhotra, J., Patel, S.S., Polymenakos, L., Lasecki, W.S.: A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118 (2018)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics. pp. 159–174 (1977)
Loper, E., Bird, S.: Nltk: The natural language toolkit. arXiv preprint cs/0205028 (2002)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Novielli, N., Calefato, F., Lanubile, F.: A gold standard for emotion annotation in Stack Overflow. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 14–17. IEEE (2018)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ponzanelli, L., Mocci, A., Bacchelli, A., Lanza, M., Fullerton, D.: Improving low quality Stack Overflow post detection. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 541–544 https://doi.org/10.1109/ICSME.2014.90 (2014)
Rahman, M.M., Roy, C.K., Keivanloo, I.: Recommending insightful comments for source code using crowdsourced knowledge. In: 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 81–90. IEEE (2015)
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta. http://is.muni.cz/publication/884893/en (2010)
Sengupta, S., Haythornthwaite, C.: Learning with comments: an analysis of comments and community on Stack Overflow. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020)
Shi, L., Chen, X., Yang, Y., Jiang, H., Jiang, Z., Niu, N., Wang, Q.: A first look at developers’ live chat on gitter. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, p. 391–403. Association for Computing Machinery, New York, NY, USA https://doi.org/10.1145/3468264.3468562 (2021)
Stack Exchange. https://www.stackexchange.com. Accessed 29 June 2021
Stack Overflow number of comments as of January 2021. https://data.stackexchange.com/stackoverflow/query/1364273. Accessed 29 June 2021
Stack Overflow public questions and answers. https://www.stackoverflow.com. Accessed 29 June 2021
Stack Overflow statistics as of November 2020. https://stackexchange.com/sites?view=list#users. Accessed 29 June 2021
Stocker, G.: Comment Explosion. https://meta.stackexchange.com/q/180325/744448. Accessed 29 June 2021
User200500: Asker’s comments getting no votes while answer’s comments getting more upvotes. https://meta.stackexchange.com/questions/204402/hide-trivial-comments#comment652961_204402. Accessed 29 June 2021
Yao, Y., Tong, H., Xu, F., Lu, J.: Scalable algorithms for cqa post voting prediction. IEEE Trans. Knowl. Data Eng. 29(8), 1723–1736 (2017). https://doi.org/10.1109/TKDE.2017.2696535
Article Google Scholar
Zhang, H., Wang, S., Chen, T.H., Hassan, A.E.: Reading answers on Stack Overflow: not enough! IEEE Trans. Softw. Eng. 47, 2520 (2019)
Article Google Scholar
Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Does the hiding mechanism for Stack Overflow comments work well? No! arXiv preprint arXiv:1904.00946 (2019)
Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Are comments on Stack Overflow well organized for easy retrieval by developers? ACM Trans. Softw. Eng. Methodol. (2021). https://doi.org/10.1145/3434279
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
Viral Sheth & Kostadin Damevski

Authors

Viral Sheth
View author publications
You can also search for this author in PubMed Google Scholar
Kostadin Damevski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viral Sheth.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheth, V., Damevski, K. Grouping related stack overflow comments for software developer recommendation. Autom Softw Eng 29, 40 (2022). https://doi.org/10.1007/s10515-022-00339-9

Download citation

Received: 15 July 2021
Accepted: 01 April 2022
Published: 29 April 2022
DOI: https://doi.org/10.1007/s10515-022-00339-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Grouping related stack overflow comments for software developer recommendation

Abstract

Access this article

Similar content being viewed by others

The Use of Artificial Intelligence in Writing Scientific Review Articles

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Academic writing and ChatGPT: Students transitioning into college in the shadow of the COVID-19 pandemic

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Grouping related stack overflow comments for software developer recommendation

Abstract

Access this article

Similar content being viewed by others

The Use of Artificial Intelligence in Writing Scientific Review Articles

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Academic writing and ChatGPT: Students transitioning into college in the shadow of the COVID-19 pandemic

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation