Abstract
Stack Overflow is a question and answer forum widely used by developers all over the world. Contributors share their knowledge on this platform not only in the form of answers, but also as comments to those answers. With millions of developer-contributed comments, the valuable knowledge contained within them remains difficult to locate by readers. Moreover, Stack Overflow’s comment hiding mechanism that only shows the top five most highly voted comments and hides the remaining leads to wealth condensation. Recently, researchers have observed that the Stack Overflow’s comment display mechanism hides important and relevant comments and makes it difficult for readers to understand the conversational context, as many comments are related to other hidden comments. In this paper, we propose a set of features and a machine learning-based technique to identify the relatedness of pairs of comments. Further, we extend the relatedness into comment clustering, as, with clusters, readers can get the entire context of a set of comments that form a single conversational thread. We evaluate our methods against several baselines to show that they provide strong improvements, although the problem in general is made difficult by the short text and narrow topic of discussion in the comments.
Similar content being viewed by others
References
Aggarwal, A., López, C., Hsiao, I.H.: The role of comments’ controversy in large-scale online discussion forums. In: Proceedings of the 27th ACM Conference on Hypertext and Social Media, pp. 179–182 (2016)
Calefato, F., Lanubile, F., Novielli, N.: Emotxt: a toolkit for emotion recognition from text. In: 2017 seventh international conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 79–80. IEEE (2017)
Comment hiding mechanism on Stack Overflow. https://meta.stackexchange.com/a/17365/744448. Accessed 29 June 2021
Diyanati, A., Sheykhahmadloo, B.S., Fakhrahmad, S.M., Sadredini, M.H., Diyanati, M.H.: A proposed approach to determining expertise level of Stack Overflow programmers based on mining of user comments. J. Comput. Lang. 61, 1 (2020)
Elsner, M., Charniak, E.: Disentangling chat. Comput. Linguist. 36(3), 389–409 (2010)
Ericson, J.: Wealth condensation resulting from top n comments system. https://meta.stackexchange.com/a/204579/744448. Accessed 29 June 2021
Imran, M.M., Ciborowska, A., Damevski, K.: Automatically selecting follow-up questions for deficient bug reports. In: Proceedings of the 18th International Conference on Mining Software Repositories (MSR’21) (2021)
Jaydles: Hidden comments contain valueable information. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021
Jaydles: Purpose of users commenting on Stack Overflow questions and answers. https://meta.stackexchange.com/a/209808/744448. Accessed 29 June 2021
Jiang, J.Y., Chen, F., Chen, Y.Y., Wang, W.: Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1812–1822 (2018)
Kummerfeld, J.K., Gouravajhala, S.R., Peper, J., Athreya, V., Gunasekara, C., Ganhotra, J., Patel, S.S., Polymenakos, L., Lasecki, W.S.: A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118 (2018)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics. pp. 159–174 (1977)
Loper, E., Bird, S.: Nltk: The natural language toolkit. arXiv preprint cs/0205028 (2002)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Novielli, N., Calefato, F., Lanubile, F.: A gold standard for emotion annotation in Stack Overflow. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 14–17. IEEE (2018)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ponzanelli, L., Mocci, A., Bacchelli, A., Lanza, M., Fullerton, D.: Improving low quality Stack Overflow post detection. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 541–544 https://doi.org/10.1109/ICSME.2014.90 (2014)
Rahman, M.M., Roy, C.K., Keivanloo, I.: Recommending insightful comments for source code using crowdsourced knowledge. In: 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 81–90. IEEE (2015)
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta. http://is.muni.cz/publication/884893/en (2010)
Sengupta, S., Haythornthwaite, C.: Learning with comments: an analysis of comments and community on Stack Overflow. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020)
Shi, L., Chen, X., Yang, Y., Jiang, H., Jiang, Z., Niu, N., Wang, Q.: A first look at developers’ live chat on gitter. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, p. 391–403. Association for Computing Machinery, New York, NY, USA https://doi.org/10.1145/3468264.3468562 (2021)
Stack Exchange. https://www.stackexchange.com. Accessed 29 June 2021
Stack Overflow number of comments as of January 2021. https://data.stackexchange.com/stackoverflow/query/1364273. Accessed 29 June 2021
Stack Overflow public questions and answers. https://www.stackoverflow.com. Accessed 29 June 2021
Stack Overflow statistics as of November 2020. https://stackexchange.com/sites?view=list#users. Accessed 29 June 2021
Stocker, G.: Comment Explosion. https://meta.stackexchange.com/q/180325/744448. Accessed 29 June 2021
User200500: Asker’s comments getting no votes while answer’s comments getting more upvotes. https://meta.stackexchange.com/questions/204402/hide-trivial-comments#comment652961_204402. Accessed 29 June 2021
Yao, Y., Tong, H., Xu, F., Lu, J.: Scalable algorithms for cqa post voting prediction. IEEE Trans. Knowl. Data Eng. 29(8), 1723–1736 (2017). https://doi.org/10.1109/TKDE.2017.2696535
Zhang, H., Wang, S., Chen, T.H., Hassan, A.E.: Reading answers on Stack Overflow: not enough! IEEE Trans. Softw. Eng. 47, 2520 (2019)
Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Does the hiding mechanism for Stack Overflow comments work well? No! arXiv preprint arXiv:1904.00946 (2019)
Zhang, H., Wang, S., Chen, T.H.P., Hassan, A.E.: Are comments on Stack Overflow well organized for easy retrieval by developers? ACM Trans. Softw. Eng. Methodol. (2021). https://doi.org/10.1145/3434279
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sheth, V., Damevski, K. Grouping related stack overflow comments for software developer recommendation. Autom Softw Eng 29, 40 (2022). https://doi.org/10.1007/s10515-022-00339-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-022-00339-9