Skip to main content
Log in

Grouping related stack overflow comments for software developer recommendation

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Stack Overflow is a question and answer forum widely used by developers all over the world. Contributors share their knowledge on this platform not only in the form of answers, but also as comments to those answers. With millions of developer-contributed comments, the valuable knowledge contained within them remains difficult to locate by readers. Moreover, Stack Overflow’s comment hiding mechanism that only shows the top five most highly voted comments and hides the remaining leads to wealth condensation. Recently, researchers have observed that the Stack Overflow’s comment display mechanism hides important and relevant comments and makes it difficult for readers to understand the conversational context, as many comments are related to other hidden comments. In this paper, we propose a set of features and a machine learning-based technique to identify the relatedness of pairs of comments. Further, we extend the relatedness into comment clustering, as, with clusters, readers can get the entire context of a set of comments that form a single conversational thread. We evaluate our methods against several baselines to show that they provide strong improvements, although the problem in general is made difficult by the short text and narrow topic of discussion in the comments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aggarwal, A., López, C., Hsiao, I.H.: The role of comments’ controversy in large-scale online discussion forums. In: Proceedings of the 27th ACM Conference on Hypertext and Social Media, pp. 179–182 (2016)

  • Calefato, F., Lanubile, F., Novielli, N.: Emotxt: a toolkit for emotion recognition from text. In: 2017 seventh international conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 79–80. IEEE (2017)

  • Comment hiding mechanism on Stack Overflow. https://meta.stackexchange.com/a/17365/744448. Accessed 29 June 2021

  • Diyanati, A., Sheykhahmadloo, B.S., Fakhrahmad, S.M., Sadredini, M.H., Diyanati, M.H.: A proposed approach to determining expertise level of Stack Overflow programmers based on mining of user comments. J. Comput. Lang. 61, 1 (2020)

    Google Scholar 

  • Elsner, M., Charniak, E.: Disentangling chat. Comput. Linguist. 36(3), 389–409 (2010)

    Article  Google Scholar 

  • Ericson, J.: Wealth condensation resulting from top n comments system. https://meta.stackexchange.com/a/204579/744448. Accessed 29 June 2021

  • Imran, M.M., Ciborowska, A., Damevski, K.: Automatically selecting follow-up questions for deficient bug reports. In: Proceedings of the 18th International Conference on Mining Software Repositories (MSR’21) (2021)

  • Jaydles: Hidden comments contain valueable information. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021

  • Jaydles: Purpose of users commenting on Stack Overflow questions and answers. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021

  • Jiang, J.Y., Chen, F., Chen, Y.Y., Wang, W.: Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1812–1822 (2018)

  • Kummerfeld, J.K., Gouravajhala, S.R., Peper, J., Athreya, V., Gunasekara, C., Ganhotra, J., Patel, S.S., Polymenakos, L., Lasecki, W.S.: A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118 (2018)

  • Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics. pp. 159–174 (1977)

  • Loper, E., Bird, S.: Nltk: The natural language toolkit. arXiv preprint cs/0205028 (2002)

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  • Novielli, N., Calefato, F., Lanubile, F.: A gold standard for emotion annotation in Stack Overflow. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 14–17. IEEE (2018)

  • Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

  • Ponzanelli, L., Mocci, A., Bacchelli, A., Lanza, M., Fullerton, D.: Improving low quality Stack Overflow post detection. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 541–544 https://doi.org/10.1109/ICSME.2014.90 (2014)

  • Rahman, M.M., Roy, C.K., Keivanloo, I.: Recommending insightful comments for source code using crowdsourced knowledge. In: 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 81–90. IEEE (2015)

  • Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta. http://is.muni.cz/publication/884893/en (2010)

  • Sengupta, S., Haythornthwaite, C.: Learning with comments: an analysis of comments and community on Stack Overflow. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020)

  • Shi, L., Chen, X., Yang, Y., Jiang, H., Jiang, Z., Niu, N., Wang, Q.: A first look at developers’ live chat on gitter. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, p. 391–403. Association for Computing Machinery, New York, NY, USA https://doi.org/10.1145/3468264.3468562 (2021)

  • Stack Exchange. https://www.stackexchange.com. Accessed 29 June 2021

  • Stack Overflow number of comments as of January 2021. https://data.stackexchange.com/stackoverflow/query/1364273. Accessed 29 June 2021

  • Stack Overflow public questions and answers. https://www.stackoverflow.com. Accessed 29 June 2021

  • Stack Overflow statistics as of November 2020. https://stackexchange.com/sites?view=list#users. Accessed 29 June 2021

  • Stocker, G.: Comment Explosion. https://meta.stackexchange.com/q/180325/744448. Accessed 29 June 2021

  • User200500: Asker’s comments getting no votes while answer’s comments getting more upvotes. https://meta.stackexchange.com/questions/204402/hide-trivial-comments#comment652961_204402. Accessed 29 June 2021

  • Yao, Y., Tong, H., Xu, F., Lu, J.: Scalable algorithms for cqa post voting prediction. IEEE Trans. Knowl. Data Eng. 29(8), 1723–1736 (2017). https://doi.org/10.1109/TKDE.2017.2696535

    Article  Google Scholar 

  • Zhang, H., Wang, S., Chen, T.H., Hassan, A.E.: Reading answers on Stack Overflow: not enough! IEEE Trans. Softw. Eng. 47, 2520 (2019)

    Article  Google Scholar 

  • Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Does the hiding mechanism for Stack Overflow comments work well? No! arXiv preprint arXiv:1904.00946 (2019)

  • Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Are comments on Stack Overflow well organized for easy retrieval by developers? ACM Trans. Softw. Eng. Methodol. (2021). https://doi.org/10.1145/3434279

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viral Sheth.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheth, V., Damevski, K. Grouping related stack overflow comments for software developer recommendation. Autom Softw Eng 29, 40 (2022). https://doi.org/10.1007/s10515-022-00339-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-022-00339-9

Keywords

Navigation