Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code

Keim, Jan; Kaplan, Angelika; Koziolek, Anne; Mirakhorli, Mehdi

doi:10.1007/978-3-030-58923-3_15

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12292))

Included in the following conference series:

European Conference on Software Architecture

2180 Accesses
7 Citations

Abstract

Quality-driven design decisions are often addressed by using architectural tactics that are re-usable solution options for certain quality concerns. Creating traceability links for these tactics is useful but costly. Automating the creation of these links can help reduce costs but is challenging as simple structural analyses only yield limited results. Transfer-learning approaches using language models like BERT are a recent trend in the field of natural language processing. These approaches yield state-of-the-art results for tasks like text classification. In this paper, we experiment with treating detection of architectural tactics in code as a text classification problem. We present an approach to detect architectural tactics in code by fine-tuning BERT. A 10-fold cross-validation shows promising results with an average $F_1$-Score of 90%, which is on a par with state-of-the-art approaches. We additionally apply our approach on a case study, where the results of our approach show promising potential but fall behind the state-of-the-art. Therefore, we discuss our approach and look at potential reasons as well as downsides and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mapping Source Code to Software Architecture by Leveraging Large Language Models

Design pattern recognition: a study of large language models

Article Open access 18 February 2025

Leveraging pre-trained language models for code generation

Article Open access 29 February 2024

References

Adhikari, A., Ram, A., Tang, R., Lin, J.: Docbert: BERT for document classification. arXiv (2019). http://arxiv.org/abs/1904.08398
Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: ICLR (2019)
Google Scholar
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE TSE 28(10), 970–983 (2002). https://doi.org/10.1109/TSE.2002.1041053
Article Google Scholar
Antoniol, G., Casazza, G., Di Penta, M., Fiutem, R.: Object-oriented design patterns recovery. J. Syst. Softw. 59(2), 181–196 (2001)
Article Google Scholar
Babar, M.A., Gorton, I.: A tool for managing software architecture knowledge. In: 2nd SHARK/ADI 2007 ICSE Workshops 2007, pp. 11–11. IEEE (2007)
Google Scholar
Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice. Addison-Wesley Professional (2003)
Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. arXiv (2020). http://arxiv.org/abs/1904.08398
Capilla, R., Nava, F., Pérez, S., Dueñas, J.C.: A web-based tool for managing architectural design decisions. ACM SIGSOFT 31(5), 4 (2006)
Article Google Scholar
Chihada, A., Jalili, S., Hasheminejad, S.M.H., Zangooei, M.H.: Source code and design conformance, design pattern detection from source code by classification approach. Appl. Soft Comput. 26, 357–367 (2015)
Article Google Scholar
Cleland-Huang, J., Berenbach, B., Clark, S., Settimi, R., Romanova, E.: Best practices for automated traceability. Computer 40(6), 27–35 (2007). https://doi.org/10.1109/MC.2007.195
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of Deep Bidirectional transformers for language understanding. In: NAACL-HLT (2019). https://doi.org/10.18653/v1/N19-1423
Ducasse, S., Pollet, D.: Software architecture reconstruction: a process-oriented taxonomy. IEEE TSE 35(4), 573–591 (2009)
Google Scholar
Egyed, A., Biffl, S., Heindl, M., Grünbacher, P.: Determining the cost-quality trade-off for automated software traceability. In: 20th IEEE/ACM ASE, pp. 360–363. ACM, New York (2005). https://doi.org/10.1145/1101908.1101970
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Elements of reusable object-oriented software. arXiv (1995)
Google Scholar
Hey, T., Keim, J., Tichy, W.F., Koziolek, A.: NoRBERT: Transfer learning for requirements classification. In: 2020 IEEE 28th RE. IEEE (2020)
Google Scholar
Hoorn, J.F., Farenhorst, R., Lago, P., Van Vliet, H.: The lonesome architect. J. Syst. Softw. 84(9), 1424–1435 (2011)
Article Google Scholar
Howard, J., Ruder, S.: Fine-tuned language models for text classification. arXiv (2018). http://arxiv.org/abs/1801.06146
Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M.: Gram21/BERT4DAT, July 2020. https://doi.org/10.5281/zenodo.3925165
Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M.: Using BERT for the detection of architectural tactics in code. Technical report 2, Karlsruhe Institute of Technology (KIT), Karlsruhe (2020). https://doi.org/10.5445/IR/1000121031
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv (2016). http://arxiv.org/abs/1609.04836
Li, J., Wang, Y., Lyu, M.R., King, I.: Code completion with neural attention and pointer networks. 27th IJCAI, July 2018. https://doi.org/10.24963/ijcai.2018/578
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. arXiv (2017). http://arxiv.org/abs/1711.05101
Mirakhorli, M., Cleland-Huang, J.: Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans. Softw. Eng. 42(3), 205–220 (2016). https://doi.org/10.1109/TSE.2015.2479217
Article Google Scholar
Mirakhorli, M., Shin, Y., Cleland-Huang, J., Cinar, M.: A tactic-centric approach for automating traceability of quality concerns. In: 34th ICSE, pp. 639–649, June 2012. https://doi.org/10.1109/ICSE.2012.6227153
Mirakhorli, M., Cleland-Huang, J.: Tracing architectural concerns in high assurance systems. In: 33rd ICSE, pp. 908–911. ACM (2011)
Google Scholar
Mirakhorli, M., et al.: Archie. https://github.com/SoftwareDesignLab/Archie
Niven, T., Kao, H.Y.: Probing neural network comprehension of natural language arguments. In: 57th ACL (2019). https://doi.org/10.18653/v1/P19-1459
Prechelt, L.: Why we need an explicit forum for negative results. J. Univ. Comput. Sci. 3(9), 1074–1083 (1997)
MathSciNet Google Scholar
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: 35th ACM SIGPLAN PLDI, pp. 419–428. New York, NY, USA (2014). https://doi.org/10.1145/2594291.2594321
Sharma, T., Efstathiou, V., Louridas, P., Spinellis, D.: On the feasibility of transfer-learning code smells using deep learning. arXiv (2019). http://arxiv.org/abs/1904.03031
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune bert for text classification? arXiv (2019). http://arxiv.org/abs/1905.05583
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. In: 57th ACL, pp. 4593–4601. ACL, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1452

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology, Karlsruhe, Germany
Jan Keim, Angelika Kaplan & Anne Koziolek
Rochester Institute of Technology, 134 Lomb Memorial Drive, Rochester, NY, 14623-5608, USA
Mehdi Mirakhorli

Authors

Jan Keim
View author publications
You can also search for this author in PubMed Google Scholar
Angelika Kaplan
View author publications
You can also search for this author in PubMed Google Scholar
Anne Koziolek
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Mirakhorli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Keim .

Editor information

Editors and Affiliations

Koninklijke Philips N.V., Eindhoven, The Netherlands
Anton Jansen
VU Amsterdam, Amsterdam, The Netherlands
Ivano Malavolta
University of L’Aquila, L’Aquila, Italy
Henry Muccini
Carnegie Mellon University, Pittsburg, PA, USA
Ipek Ozkaya
University of Applied Sciences of Eastern Switzerland, Rapperswil, Switzerland
Olaf Zimmermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M. (2020). Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code. In: Jansen, A., Malavolta, I., Muccini, H., Ozkaya, I., Zimmermann, O. (eds) Software Architecture. ECSA 2020. Lecture Notes in Computer Science(), vol 12292. Springer, Cham. https://doi.org/10.1007/978-3-030-58923-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-58923-3_15
Published: 08 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58922-6
Online ISBN: 978-3-030-58923-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code