Skip to main content

Automated Code Review Comment Classification to Improve Modern Code Reviews

  • Conference paper
  • First Online:
Software Quality: The Next Big Thing in Software Engineering and Quality (SWQD 2022)

Abstract

Modern Code Reviews (MCRs) are a widely-used quality assurance mechanism in continuous integration and deployment. Unfortunately, in medium and large projects, the number of changes that need to be integrated, and consequently the number of comments triggered during MCRs could be overwhelming. Therefore, there is a need for quickly recognizing which comments are concerning issues that need prompt attention to guide the focus of the code authors, reviewers, and quality managers. The goal of this study is to design a method for automated classification of review comments to identify the needed change faster and with higher accuracy. We conduct a Design Science Research study on three open-source systems. We designed a method (CommentBERT) for automated classification of the code-review comments based on the BERT (Bidirectional Encoder Representations from Transformers) language model and a new taxonomy of comments. When applied to 2,672 comments from Wireshark, The Mono Framework, and Open Network Automation Platform (ONAP) projects, the method achieved accuracy, measured using Matthews Correlation Coefficient, of 0.46–0.82 (Wireshark), 0.12–0.8 (ONAP), and 0.48–0.85 (Mono). Based on the results, we conclude that the proposed method seems promising and could be potentially used to build machine-learning-based tools to support MCRs as long as there is a sufficient number of historical code-review comments to train the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/mochodek/bertcomments.

  2. 2.

    The input length of 128 tokens corresponds to the 98 percentile of the comment lengths distribution in the dataset of code reviews under study.

  3. 3.

    https://www.wireshark.org/docs/wsdg_html/.

References

  1. Akoglu, H.: User’s guide to correlation coefficients. Turk. J. Emerg. Med. 18(3), 91–93 (2018)

    Article  Google Scholar 

  2. Axelsson, S., Baca, D., Feldt, R., Sidlauskas, D., Kacan, D.: Detecting defects with an interactive code review tool based on visualisation and machine learning. In: The 21st International Conference on Software Engineering and Knowledge Engineering (SEKE 2009) (2009)

    Google Scholar 

  3. Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 712–721. IEEE (2013)

    Google Scholar 

  4. Badampudi, D., Britto, R., Unterkalmsteiner, M.: Modern code reviews-preliminary results of a systematic mapping study. In: Proceedings of the Evaluation and Assessment on Software Engineering, pp. 340–345. ACM (2019)

    Google Scholar 

  5. Balachandran, V.: Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 931–940. IEEE Press (2013)

    Google Scholar 

  6. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS ONE 12(6), 1–17 (2017). https://doi.org/10.1371/journal.pone.0177678

  7. del Carpio, P.M.: Identification of architectural technical debt: an analysis based on naming patterns. In: 2016 8th Euro American Conference on Telematics and Information Systems (EATIS), pp. 1–8. IEEE (2016)

    Google Scholar 

  8. Chappelly, T., Cifuentes, C., Krishnan, P., Gevay, S.: Machine learning for finding bugs: an initial report. In: IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 21–26. IEEE (2017)

    Google Scholar 

  9. Cruzes, D.S., Dyba, T.: Recommended steps for thematic synthesis in software engineering. In: 2011 International Symposium on Empirical Software Engineering and Measurement, pp. 275–284. IEEE (2011)

    Google Scholar 

  10. Davila, N., Nunes, I.: A systematic literature review and taxonomy of modern code review. J. Syst. Softw. 177, 110951 (2021)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)

  13. de Freitas Farias, M.A., Santos, J.A., Kalinowski, M., Mendonça, M., Spínola, R.O.: Investigating the identification of technical debt through code comment analysis. In: Hammoudi, S., Maciaszek, L.A., Missikoff, M.M., Camp, O., Cordeiro, J. (eds.) ICEIS 2016. LNBIP, vol. 291, pp. 284–309. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62386-3_14

    Chapter  Google Scholar 

  14. Ge, X., Sarkar, S., Witschey, J., Murphy-Hill, E.: Refactoring-aware code review. In: 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 71–79. IEEE (2017)

    Google Scholar 

  15. Kitchenham, B.A., Pickard, L.M., MacDonell, S.G., Shepperd, M.J.: What accuracy statistics really measure. IEE Proc.-Softw. 148(3), 81–85 (2001)

    Article  Google Scholar 

  16. Mandrekar, J.N.: Receiver operating characteristic curve in diagnostic test assessment. J. Thorac. Oncol. 5(9), 1315–1316 (2010)

    Article  Google Scholar 

  17. Mirsaeedi, E., Rigby, P.C.: Mitigating turnover with code review recommendation: balancing expertise, workload, and knowledge distribution. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 1183–1195 (2020)

    Google Scholar 

  18. Ochodek, M., Hebig, R., Meding, W., Frost, G., Staron, M.: Recognizing lines of code violating company-specific coding guidelines using machine learning. Empir. Softw. Eng. 25(1), 220–265 (2019). https://doi.org/10.1007/s10664-019-09769-8

    Article  Google Scholar 

  19. Paixão, M., et al.: Behind the intents: an in-depth empirical study on software refactoring in modern code review. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 125–136 (2020)

    Google Scholar 

  20. Paul, R., Turzo, A.K., Bosu, A.: Why security defects go unnoticed during code reviews? A case-control study of the chromium OS project. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1373–1385. IEEE (2021)

    Google Scholar 

  21. Sadowski, C., Aftandilian, E., Eagle, A., Miller-Cushon, L., Jaspan, C.: Lessons from building static analysis tools at google (2018)

    Google Scholar 

  22. Staron, M., Meding, W., Söder, O., Bäck, M.: Measurement and impact factors of speed of reviews and integration in continuous software engineering. Found. Comput. Decis. Sci. 43(4), 281–303 (2018)

    Article  Google Scholar 

  23. Staron, M., Ochodek, M., Meding, W., Söder, O.: Using machine learning to identify code fragments for manual review. In: 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 513–516. IEEE (2020)

    Google Scholar 

  24. Thongtanunam, P., McIntosh, S., Hassan, A.E., Iida, H.: Review participation in modern code review. Empir. Softw. Eng. 22(2), 768–817 (2016). https://doi.org/10.1007/s10664-016-9452-6

    Article  Google Scholar 

  25. Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Matsumoto, K.I.: Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 141–150. IEEE (2015)

    Google Scholar 

  26. Uchôa, A., et al.: How does modern code review impact software design degradation? An in-depth empirical study. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 511–522. IEEE (2020)

    Google Scholar 

  27. Vassallo, C., Panichella, S., Palomba, F., Proksch, S., Gall, H.C., Zaidman, A.: How developers engage with static analysis tools in different contexts. Empir. Softw. Eng. 25(2), 1419–1457 (2019). https://doi.org/10.1007/s10664-019-09750-5

    Article  Google Scholar 

  28. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  29. Wang, D., Ueda, Y., Kula, R.G., Ishio, T., Matsumoto, K.: The evolution of code review research: a systematic mapping study (2019)

    Google Scholar 

  30. Wen, F., Aghajani, E., Nagy, C., Lanza, M., Bavota, G.: Siri, write the next method. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 138–149. IEEE (2021)

    Google Scholar 

  31. Wieringa, R.J.: Design Science Methodology for Information Systems and Software Engineering. Springer, Cham (2014). https://doi.org/10.1007/978-3-662-43839-8

    Book  Google Scholar 

  32. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miroslaw Ochodek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ochodek, M., Staron, M., Meding, W., Söder, O. (2022). Automated Code Review Comment Classification to Improve Modern Code Reviews. In: Mendez, D., Wimmer, M., Winkler, D., Biffl, S., Bergsmann, J. (eds) Software Quality: The Next Big Thing in Software Engineering and Quality. SWQD 2022. Lecture Notes in Business Information Processing, vol 439. Springer, Cham. https://doi.org/10.1007/978-3-031-04115-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04115-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04114-3

  • Online ISBN: 978-3-031-04115-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics