Skip to main content

What is the Message About? Automatic Multi-label Classification of Open Source Repository Messages into Content Types

  • Conference paper
  • First Online:
Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020) (AICV 2020)

Abstract

Users of Open Source Software (OSS) projects discuss a diverse range of topics online. The content of a post often corresponds to one or more context-sensitive content types, e.g. a suggestion for a solution, a request for further clarification or indication that a proposed solution did not work. The detection of content types can provide several benefits for software developers. For instance, content types can be used as indicators that summarise the content of the messages. These indicators can be exploited as part of a developer-centric knowledge mining platform allowing developers and project managers to create action alerts concerning new bugs found outside of a bug tracker or they can be combined with other metrics to assess the quality of an OSS project. We present a multi-label classifier, able to classify messages exchanged on communication means about OSS, and detailed evaluation results. We experimented with two state-of-the-art multi-label classification approaches HOMER (Hierarchy Of Multilabel classifiER) and RAkEL (RAndom k-labELsets) as these met the technical requirements of the CROSSMINER project. A manually-annotated threaded corpus of posts form newsgroups discussions, bug tracking systems and forums related to Eclipse projects was also used. The results are promising and indicate the potential to attract novel and deeper research for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Labels that have not been assigned to any dataset instances, e.g. 8.5, have been omitted.

References

  1. Mooney, R.J., Roy, L.: Content-based book recommending using learning for text categorization. In: Proceedings of DL, pp. 195–204. ACM, New York (2000)

    Google Scholar 

  2. Bhatia, S., Mitra, P.: Classifying user messages for managing web forum data. In: WebDB, pp. 13–18 (2012)

    Google Scholar 

  3. Xia, X., Feng, Y., Lo, D., Chen, Z., Wang, X.: Towards more accurate multi-label software behavior learning. In: Proceedings of CSMR-WCRE, pp. 134–143 (2014)

    Google Scholar 

  4. Bagnato, A., Barmpis, K., Bessis, N., Cabrera-Diego, L.A., Di Rocco, J., Di Ruscio, D., Gergely, T., Hansen, S., Kolovos, D., Krief, P., Korkontzelos, I., Laurière, S., Lopez de la Fuente, J.M., Maló, P., Paige, R.F., Spinellis, D., Thomas, C., Vinju, J.: Developer-centric knowledge mining from large open-source software repositories (CROSSMINER). In: In Proceedings of STAFF, Marburg, Germany, pp. 375–384. Springer (2018)

    Google Scholar 

  5. Keivanloo, I., Forbes, C., Hmood, A., Erfani, M., Neal, C., Peristerakis, G., Rilling, J.: A linked data platform for mining software repositories. In: Proceedings of MSR, pp. 32–35 (2012)

    Google Scholar 

  6. Bavota, G., Ciemniewska, A., Chulani, I., De Nigro, A., Di Penta, M., Galletti, D., Galoppini, R., Gordon, T.F., Kedziora, P., Lener, I., Torelli, F., Pratola, R., Pukacki, J., Rebahi, Y., Villalonga, S.G.: The market for open source: an intelligent virtual open source marketplace. In: Proceedings of CSMR-WCRE, pp. 399–402 (2014)

    Google Scholar 

  7. van Deursen, A., Mesbah, A., Cornelissen, B., Zaidman, A., Pinzger, M., Guzzi, A.: Adinda: a knowledgeable, browser-based IDE. In: Proceedings of ICSE, vol. 2, pp. 203–206 (2010)

    Google Scholar 

  8. Di Ruscio, D., Kolovos, D., Matragkas, N., Korkontzelos, I., Vinju, J.: OSSMETER: a software measurement platform for automatically analysing open source software projects. In: Proceedings of ESEC/FSE (2015)

    Google Scholar 

  9. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of MMD, Antwerp, Belgium, pp. 53–59 (2008)

    Google Scholar 

  10. Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)

    Article  Google Scholar 

  11. Korkontzelos, Y., Thompson, P., Ananiadou, S.: Identifying content types of messages related to open source software projects. In: Proceedings of LREC 2016, pp. 1837–1844. European Language Resources Association (ELRA), Portoro (2016)

    Google Scholar 

  12. Palau, R.M., Moens, M.F.: Argumentation mining: the detection, classification and structuring of arguments in text. In: Proceedings of BNAIC, pp. 351–352 (2009)

    Google Scholar 

  13. Mann, W.C., Taboada, M.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8(3), 423–460 (2006)

    Article  Google Scholar 

  14. Bacchelli, A., Dal Sasso, T., D’Ambros, M., Lanza, M.: Content classification of development emails. In: Proceedings of ICSE, pp. 375–385, June 2012

    Google Scholar 

  15. Pascarella, L., Bruntink, M., Bacchelli, A.: Classifying code comments in Java software systems. Empir. Softw. Eng. 24(3), 1499–1537 (2019)

    Article  Google Scholar 

  16. Alfaro, C., Cano-Montero, J., Gómez, J., Moguerza, J.M., Ortega, F.: A multi-stage method for content classification and opinion mining on weblog comments. Ann. Oper. Res. 236(1), 197–213 (2016)

    Article  MATH  Google Scholar 

  17. Zhou, B., Xia, X., Lo, D., Tian, C., Wang, X.: Towards more accurate content categorization of API discussions. In: Proceedings of ICPC, pp. 95–105. ACM, New York (2014)

    Google Scholar 

  18. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a Java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of EACL, Valencia, Spain, vol. 2, pp. 427–431 (2017)

    Google Scholar 

  20. Mockus, J.B., Mockus, L.J.: Bayesian approach to global optimization and application to multiobjective and constrained problems. J. Optim. Theory Appl. 70(1), 157–172 (1991)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This research work is part of the CROSSMINER Project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 732223.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Korkontzelos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campbell, D., Cabrera-Diego, L.A., Korkontzelos, Y. (2020). What is the Message About? Automatic Multi-label Classification of Open Source Repository Messages into Content Types. In: Hassanien, AE., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol 1153. Springer, Cham. https://doi.org/10.1007/978-3-030-44289-7_49

Download citation

Publish with us

Policies and ethics