Abstract
Users of Open Source Software (OSS) projects discuss a diverse range of topics online. The content of a post often corresponds to one or more context-sensitive content types, e.g. a suggestion for a solution, a request for further clarification or indication that a proposed solution did not work. The detection of content types can provide several benefits for software developers. For instance, content types can be used as indicators that summarise the content of the messages. These indicators can be exploited as part of a developer-centric knowledge mining platform allowing developers and project managers to create action alerts concerning new bugs found outside of a bug tracker or they can be combined with other metrics to assess the quality of an OSS project. We present a multi-label classifier, able to classify messages exchanged on communication means about OSS, and detailed evaluation results. We experimented with two state-of-the-art multi-label classification approaches HOMER (Hierarchy Of Multilabel classifiER) and RAkEL (RAndom k-labELsets) as these met the technical requirements of the CROSSMINER project. A manually-annotated threaded corpus of posts form newsgroups discussions, bug tracking systems and forums related to Eclipse projects was also used. The results are promising and indicate the potential to attract novel and deeper research for this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Labels that have not been assigned to any dataset instances, e.g. 8.5, have been omitted.
References
Mooney, R.J., Roy, L.: Content-based book recommending using learning for text categorization. In: Proceedings of DL, pp. 195–204. ACM, New York (2000)
Bhatia, S., Mitra, P.: Classifying user messages for managing web forum data. In: WebDB, pp. 13–18 (2012)
Xia, X., Feng, Y., Lo, D., Chen, Z., Wang, X.: Towards more accurate multi-label software behavior learning. In: Proceedings of CSMR-WCRE, pp. 134–143 (2014)
Bagnato, A., Barmpis, K., Bessis, N., Cabrera-Diego, L.A., Di Rocco, J., Di Ruscio, D., Gergely, T., Hansen, S., Kolovos, D., Krief, P., Korkontzelos, I., Laurière, S., Lopez de la Fuente, J.M., Maló, P., Paige, R.F., Spinellis, D., Thomas, C., Vinju, J.: Developer-centric knowledge mining from large open-source software repositories (CROSSMINER). In: In Proceedings of STAFF, Marburg, Germany, pp. 375–384. Springer (2018)
Keivanloo, I., Forbes, C., Hmood, A., Erfani, M., Neal, C., Peristerakis, G., Rilling, J.: A linked data platform for mining software repositories. In: Proceedings of MSR, pp. 32–35 (2012)
Bavota, G., Ciemniewska, A., Chulani, I., De Nigro, A., Di Penta, M., Galletti, D., Galoppini, R., Gordon, T.F., Kedziora, P., Lener, I., Torelli, F., Pratola, R., Pukacki, J., Rebahi, Y., Villalonga, S.G.: The market for open source: an intelligent virtual open source marketplace. In: Proceedings of CSMR-WCRE, pp. 399–402 (2014)
van Deursen, A., Mesbah, A., Cornelissen, B., Zaidman, A., Pinzger, M., Guzzi, A.: Adinda: a knowledgeable, browser-based IDE. In: Proceedings of ICSE, vol. 2, pp. 203–206 (2010)
Di Ruscio, D., Kolovos, D., Matragkas, N., Korkontzelos, I., Vinju, J.: OSSMETER: a software measurement platform for automatically analysing open source software projects. In: Proceedings of ESEC/FSE (2015)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of MMD, Antwerp, Belgium, pp. 53–59 (2008)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)
Korkontzelos, Y., Thompson, P., Ananiadou, S.: Identifying content types of messages related to open source software projects. In: Proceedings of LREC 2016, pp. 1837–1844. European Language Resources Association (ELRA), Portoro (2016)
Palau, R.M., Moens, M.F.: Argumentation mining: the detection, classification and structuring of arguments in text. In: Proceedings of BNAIC, pp. 351–352 (2009)
Mann, W.C., Taboada, M.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8(3), 423–460 (2006)
Bacchelli, A., Dal Sasso, T., D’Ambros, M., Lanza, M.: Content classification of development emails. In: Proceedings of ICSE, pp. 375–385, June 2012
Pascarella, L., Bruntink, M., Bacchelli, A.: Classifying code comments in Java software systems. Empir. Softw. Eng. 24(3), 1499–1537 (2019)
Alfaro, C., Cano-Montero, J., Gómez, J., Moguerza, J.M., Ortega, F.: A multi-stage method for content classification and opinion mining on weblog comments. Ann. Oper. Res. 236(1), 197–213 (2016)
Zhou, B., Xia, X., Lo, D., Tian, C., Wang, X.: Towards more accurate content categorization of API discussions. In: Proceedings of ICPC, pp. 95–105. ACM, New York (2014)
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a Java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of EACL, Valencia, Spain, vol. 2, pp. 427–431 (2017)
Mockus, J.B., Mockus, L.J.: Bayesian approach to global optimization and application to multiobjective and constrained problems. J. Optim. Theory Appl. 70(1), 157–172 (1991)
Acknowledgement
This research work is part of the CROSSMINER Project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 732223.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Campbell, D., Cabrera-Diego, L.A., Korkontzelos, Y. (2020). What is the Message About? Automatic Multi-label Classification of Open Source Repository Messages into Content Types. In: Hassanien, AE., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol 1153. Springer, Cham. https://doi.org/10.1007/978-3-030-44289-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-44289-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44288-0
Online ISBN: 978-3-030-44289-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)