Abstract
Given the wide adoption of the agile software development paradigm, where efficient collaboration as well as effective maintenance are of utmost importance, and the (re)use of software residing in code hosting platforms, the need to produce qualitative code is evident. A condition for acceptable software reusability and maintainability is the use of idiomatic code, based on syntactic fragments that recur frequently across software projects and are characterized by high quality. In this work, we propose a methodology that can harness data from the most popular GitHub repositories in order to automatically identify reusable and maintainable code idioms, by grouping code blocks that have similar structural and semantic information. We also apply the same methodology on a single-project level, in an attempt to identify frequently recurring blocks of code across the files of a team. Preliminary evaluation of our methodology indicates that our approach can identify commonly used, reusable and maintainable code idioms and code blocks that can be effectively given as actionable recommendations to the developers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, K., Hindle, A., Stroulia, E.: Co-evolution of project documentation and popularity within github. In: Proceedings of the 11th Working Conference on Mining Software Repositories. MSR 2014, New York, NY, USA, pp. 360–363. Association for Computing Machinery (2014). https://doi.org/10.1145/2597073.2597120
Allamanis, M., Barr, E.T., Bird, C., Devanbu, P., Marron, M., Sutton, C.: Mining semantic loop idioms. IEEE Trans. Software Eng. 44(7), 651–668 (2018). https://doi.org/10.1109/TSE.2018.2832048
Allamanis, M., Sutton, C.: Mining idioms from source code. CoRR abs/1404.0417 (2014). http://arxiv.org/abs/1404.0417
Augsten, N., Böhlen, M., Gamper, J.: The PQ-gram distance between ordered labeled trees 35(1) (2008). https://doi.org/10.1145/1670243.1670247
Augsten, N., Böhlen, M., Gamper, J.: Approximate matching of hierarchical data using PQ-grams - slides 1, 301–312 (2005). https://doi.org/10.5167/uzh-56101
Baltes, S., Dumani, L., Treude, C., Diehl, S.: Sotorrent: Reconstructing and analyzing the evolution of stack overflow posts. CoRR abs/1803.07311 (2018). http://arxiv.org/abs/1803.07311
Borges, H., Hora, A., Valente, M.T.: Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 334–344 (2016). https://doi.org/10.1109/ICSME.2016.31
Dimaridou, V., Kyprianidis, A.C., Papamichail, M., Diamantopoulos, T., Symeonidis, A.: Towards modeling the user-perceived quality of source code using static analysis metrics, pp. 73–84, July 2017. https://doi.org/10.5220/0006420000730084
Dimaridou, V., Kyprianidis, A.-C., Papamichail, M., Diamantopoulos, T., Symeonidis, A.: Assessing the user-perceived quality of source code components using static analysis metrics. In: Cabello, E., Cardoso, J., Maciaszek, L.A., van Sinderen, M. (eds.) ICSOFT 2017. CCIS, vol. 868, pp. 3–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93641-3_1
Fowkes, J., Sutton, C.: Parameter-free probabilistic API mining across github. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2016, New York, NY, USA, pp. 254–265. Association for Computing Machinery (2016). https://doi.org/10.1145/2950290.2950319
Hnatkowska, B., Jaszczak, A.: Impact of selected java idioms on source code maintainability – empirical study. In: Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds.) Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. June 30 – July 4, 2014, BrunĂ³w, Poland. AISC, vol. 286, pp. 243–254. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07013-1_23
Ji, X., Liu, L., Zhu, J.: Code clone detection with hierarchical attentive graph embedding. Int. J. Software Eng. Knowl. Eng. 31(06), 837–861 (2021). https://doi.org/10.1142/S021819402150025X
Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Italiano, G.F., Pietracaprina, A., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-68530-8_8
McCabe, T.: A complexity measure. IEEE Trans. Software Eng. SE–2(4), 308–320 (1976). https://doi.org/10.1109/TSE.1976.233837
Papamichail, M., Diamantopoulos, T., Symeonidis, A.: User-perceived source code quality estimation based on static analysis metrics. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 100–107 (2016). https://doi.org/10.1109/QRS.2016.22
Papoudakis, A., Karanikiotis, T., Symeonidis, A.: A mechanism for automatically extracting reusable and maintainable code idioms from software repositories. In: Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 79–90. INSTICC, SciTePress (2022). https://doi.org/10.5220/0011279300003266
Sivaraman, A., Abreu, R., Scott, A., Akomolede, T., Chandra, S.: Mining idioms in the wild. CoRR abs/2107.06402 (2021). https://arxiv.org/abs/2107.06402
Tai, K.C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979). https://doi.org/10.1145/322139.322143
Tanaka, H., Matsumoto, S., Kusumoto, S.: A study on the current status of functional idioms in Java. IEICE Trans. Inf. Syst. E102.D, 2414–2422 (2019). https://doi.org/10.1587/transinf.2019MPP0002
Wang, J., Dang, Y., Zhang, H., Chen, K., Xie, T., Zhang, D.: Mining succinct and high-coverage api usage patterns from source code. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp. 319–328 (2013). https://doi.org/10.1109/MSR.2013.6624045
Weber, S., Luo, J.: What makes an open source code popular on git hub? In: 2014 IEEE International Conference on Data Mining Workshop, pp. 851–855 (2014). https://doi.org/10.1109/ICDMW.2014.55
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989). https://doi.org/10.1137/0218082
Zhang, Y., Wang, T.: CCEYES: an effective tool for code clone detection on large-scale open source repositories. In: 2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE), pp. 61–70 (2021). https://doi.org/10.1109/ICICSE52190.2021.9404141
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Karanikiotis, T., Symeonidis, A.L. (2023). Towards Extracting Reusable and Maintainable Code Snippets. In: Fill, HG., van Sinderen, M., Maciaszek, L.A. (eds) Software Technologies. ICSOFT 2022. Communications in Computer and Information Science, vol 1859. Springer, Cham. https://doi.org/10.1007/978-3-031-37231-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-37231-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37230-8
Online ISBN: 978-3-031-37231-5
eBook Packages: Computer ScienceComputer Science (R0)