Abstract
Software modularisation is a pivotal facet within software engineering, seeking to optimise the arrangement of software components based on their interrelationships. Despite extensive investigations in this domain, particularly concerning evolutionary computation, the research emphasis has transitioned towards solution design and convergence analysis rather than pioneering methodologies. The primary objective is to attain efficient solutions within a pragmatic timeframe. Recent research posits that initial positions in the search space wield minimal influence, given the prevalent trend of methods converging upon akin local optima. This paper delves into this phenomenon comprehensively, employing graph partitioning techniques on dependency graphs to generate initial clustering arrangement seeds. Our empirical discoveries challenge conventional insight, underscoring the pivotal role of seed selection in software modularisation to enhance overall outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
A gold standard represents the theoretical best solution for a given problem, a rarity in real-world datasets where it is seldom known.
- 3.
Values formatted bold in Table 1 signify the highest average final fitness.
- 4.
Values formatted bold in Table 2 signify the highest average final fitness.
- 5.
Values formatted bold in Table 3 signify the shortest average convergence point.
- 6.
Values formatted bold in Table 4 signify the shortest runtime in milliseconds.
- 7.
Values formatted bold in Table 5 signify the highest Weighted Kappa agreement.
References
Altman, D.: Skewed distributions. Practical statistics for medical research. London, Chapman & Hall pp. 60–63 (1997)
Arasteh, B.: Clustered design-model generation from a program source code using chaos-based metaheuristic algorithms. Neural Comput. Appl. 35(4), 3283–3305 (2023)
Arasteh, B., Seyyedabbasi, A., Rasheed, J., M. Abu-Mahfouz, A.: Program source-code re-modularization using a discretized and modified sand cat swarm optimization algorithm. Symmetry 15(2), 401 (2023)
Arzoky, M., Swift, S., Tucker, A., Cain, J.: Munch: An efficient modularisation strategy to assess the degree of refactoring on sequential source code checkings. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. pp. 422–429. IEEE (2011)
Arzoky, M., Swift, S., Tucker, A., Cain, J.: A seeded search for the modularisation of sequential software versions. J. Object Technol. 11(2), 6–1 (2012)
Brunsfeld, M.: Tree-sitter, https://github.com/tree-sitter/tree-sitter, Accessed on 2023-11-01
Campbell, L.R., Dahlberg, S., Dorward, R., Gerhard, J., Grubb, T., Purcell, C., Sagan, B.E.: Restricted growth function patterns and statistics. Adv. Appl. Math. 100, 1–42 (2018)
Chen, Y.T., Huang, C.Y., Yang, T.H.: Using multi-pattern clustering methods to improve software maintenance quality. IET Software 17(1), 1–22 (2023)
Chung, F.R.: Spectral graph theory, vol. 92. American Mathematical Soc. (1997)
Corradini, A., König, B., Nolte, D.: Specifying graph languages with type graphs. Journal of Logical and Algebraic Methods in Programming 104, 176–200 (2019). https://doi.org/10.1016/j.jlamp.2019.01.005, https://www.sciencedirect.com/science/article/pii/S235222081730233X
Devroye, L.: Sample-based non-uniform random variate generation. In: Proceedings of the 18th conference on Winter simulation. pp. 260–265 (1986)
Fiedler, M.: A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslov. Math. J. 25(4), 619–633 (1975)
Fiedler, M.: Laplacian of graphs and algebraic connectivity. Banach Center Publ. 1(25), 57–70 (1989)
GitHub: Github advanced search (2023), https://github.com/search/advanced, Last Accessed on 23-11-01
GitHub: Octoverse 2022: 10 years of tracking open source (2023), https://github.blog/2022-11-17-octoverse-2022-10-years-of-tracking-open-source/, Last Accessed on 23-11-01
Gupta, N., Kumar, S., Gupta, V., Vijh, S.: Novel automatic approach using modified differential evaluation to software module clustering problem. SN Computer Science 4(6), 816 (2023)
Harman, M., Swift, S., Mahdavi, K., Beyer, H.: An empirical study of the robustness of two module clustering fitness functions, pp. 1029–1036. ASSOC COMPUTING MACHINERY (2005), genetic and Evolutionary Computation Conference; Conference date: 25–06-2005 Through 29–06-2005
Kang, Y., Xie, W., Wang, X., Wang, H., Wang, X., Li, J.: Mopisde: A collaborative multi-objective information-sharing de algorithm for software clustering. Expert Systems with Applications p. 120207 (2023)
Khan, M.Z., Naseem, R., Anwar, A., Haq, I.U., Alturki, A., Ullah, S.S., Al-Hadhrami, S.A., et al.: A novel approach to automate complex software modularization using a fact extraction system. Journal of Mathematics 2022 (2022)
L.H., H.: Stirling behaviour is asymptotically normal. The Annals of Mathematical Statistics 3(2), 410–414 (1967)
Lourenço, H.R., Martin, O.C., Stützle, T.: Iterated local search. In: Handbook of metaheuristics, pp. 320–353. Springer (2003)
Lu, K.: Practical program modularization with type-based dependence analysis. In: 2023 IEEE Symposium on Security and Privacy (SP). pp. 1256–1270. IEEE (2023)
Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R.: Bunch: A clustering tool for the recovery and maintenance of software system structures. In: Proceedings IEEE International Conference on Software Maintenance-1999 (ICSM’99)’.Software Maintenance for Business Change’(Cat. No. 99CB36360). pp. 50–59. IEEE (1999)
Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., Gansner, E.R.: Using automatic clustering to produce high-level system organizations of source code. In: Proceedings. 6th International Workshop on Program Comprehension. IWPC’98 (Cat. No. 98TB100242). pp. 45–52. IEEE (1998)
Maramazi, F., Odebode, A., Mann, A., Swift, S., Arzoky, M.: Intelligent systems and applications: Proceedings of the 2024 intelligent systems conference (intellisys) volume 1. In: Lecture Notes in Networks and Systems #822. p. 470. Springer (January 5 2024)
Mitchell, B.S., Mancoridis, S.: Clustering module dependency graphs of software systems using the bunch tool. Nat. Sci. Found., Alexandria, VA, USA, Tech. Rep (1998)
Prajapati, A., Parashar, A., Rathee, A.: Multi-dimensional information-driven many-objective software remodularization approach. Front. Comp. Sci. 17(3), 173209 (2023)
Ramalhinho-Lourenço, H., Martin, O.C., Stützle, T.: Iterated local search (2000)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Savić, M., Rakić, G., Budimac, Z., Ivanović, M.: A language-independent approach to the extraction of dependencies between source code entities. Inf. Softw. Technol. 56(10), 1268–1288 (2014)
SciTools: Understand: The software developer’s multi-tool (2023), https://scitools.com/, Accessed on 2023-11-10
Tan, A.J.J., Chong, C.Y., Aleti, A.: Closing the loop for software remodularisation-rearrange: An effort estimation approach for software clustering-based remodularisation. arXiv preprint arXiv:2303.06283 (2023)
Temme, N.M.: Asymptotic estimates of stirling numbers. Stud. Appl. Math. 89(3), 233–243 (1993)
Tucker, A., Swift, S., Liu, X.: Variable grouping in multivariate time series via correlation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 31(2), 235–245 (2001)
Weiss, K., Banse, C.: A language-independent analysis platform for source code. arXiv preprint arXiv:2203.08424 (2022)
Weisstein, E.W.: Stirling number of the second kind. https://mathworld.wolfram.com/ (2002)
Yang, K., Wang, J., Fang, Z., Wu, P., Song, Z.: Enhancing software modularization via semantic outliers filtration and label propagation. Inf. Softw. Technol. 145, 106818 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A
Appendix A
This Appendix showcases details about the software system MDG used in our experiments. Below, we showcase the following statistics for each software system:
-
1.
ID
-
Each software system is assigned a unique identifier. We choose not to use the actual names of our software systems because our collection is sourced randomly from GitHub. These software system names can exhibit variation, and we intend to maintain professionalism and steer clear of potentially inappropriate names and software tools.
-
-
2.
Nodes
-
Also known as vertices, these signify the number of software components (classes) within our Module Dependency Graphs (MDGs).
-
-
3.
Edges
-
Denotes the number of relationships between software components.
-
-
4.
Clustering Coefficient:
-
The extent to which nodes tend to cluster. A high score indicates a strong cohesion, while a low score indicates a higher coupling level. We present this statistic as these software systems exhibit remarkably low coefficients, indicating a high coupling level and a deficiency in the initial modular structure. There is potential here to investigate the nature of software structure over time, especially concerning the analysis of open-source software systems (Tables 6 and 7).
-
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mann, A., Swift, S., Arzoky, M. (2024). Applying Graph Partitioning-Based Seeding Strategies to Software Modularisation. In: Smith, S., Correia, J., Cintrano, C. (eds) Applications of Evolutionary Computation. EvoApplications 2024. Lecture Notes in Computer Science, vol 14634. Springer, Cham. https://doi.org/10.1007/978-3-031-56852-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-56852-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56851-0
Online ISBN: 978-3-031-56852-7
eBook Packages: Computer ScienceComputer Science (R0)