Applying Graph Partitioning-Based Seeding Strategies to Software Modularisation

Mann, Ashley; Swift, Stephen; Arzoky, Mahir

doi:10.1007/978-3-031-56852-7_16

Ashley Mann¹⁰,
Stephen Swift¹⁰ &
Mahir Arzoky¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14634))

Included in the following conference series:

International Conference on the Applications of Evolutionary Computation (Part of EvoStar)

401 Accesses

Abstract

Software modularisation is a pivotal facet within software engineering, seeking to optimise the arrangement of software components based on their interrelationships. Despite extensive investigations in this domain, particularly concerning evolutionary computation, the research emphasis has transitioned towards solution design and convergence analysis rather than pioneering methodologies. The primary objective is to attain efficient solutions within a pragmatic timeframe. Recent research posits that initial positions in the search space wield minimal influence, given the prevalent trend of methods converging upon akin local optima. This paper delves into this phenomenon comprehensively, employing graph partitioning techniques on dependency graphs to generate initial clustering arrangement seeds. Our empirical discoveries challenge conventional insight, underscoring the pivotal role of seed selection in software modularisation to enhance overall outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

How Starting Points and Representations Affect Software Modularisation: An Empirical Analysis

A Hybrid Genetic Algorithm for Software Architecture Re-Modularization

Article 17 April 2019

Variable Neighborhood Descent for Software Quality Optimization

Notes

1.
Weighted Kappa is employed to assess the similarity of clustering arrangements and is applied in Sects. 5.3, 6, and 7.
2.
A gold standard represents the theoretical best solution for a given problem, a rarity in real-world datasets where it is seldom known.
3.
Values formatted bold in Table 1 signify the highest average final fitness.
4.
Values formatted bold in Table 2 signify the highest average final fitness.
5.
Values formatted bold in Table 3 signify the shortest average convergence point.
6.
Values formatted bold in Table 4 signify the shortest runtime in milliseconds.
7.
Values formatted bold in Table 5 signify the highest Weighted Kappa agreement.

References

Altman, D.: Skewed distributions. Practical statistics for medical research. London, Chapman & Hall pp. 60–63 (1997)
Google Scholar
Arasteh, B.: Clustered design-model generation from a program source code using chaos-based metaheuristic algorithms. Neural Comput. Appl. 35(4), 3283–3305 (2023)
Article Google Scholar
Arasteh, B., Seyyedabbasi, A., Rasheed, J., M. Abu-Mahfouz, A.: Program source-code re-modularization using a discretized and modified sand cat swarm optimization algorithm. Symmetry 15(2), 401 (2023)
Google Scholar
Arzoky, M., Swift, S., Tucker, A., Cain, J.: Munch: An efficient modularisation strategy to assess the degree of refactoring on sequential source code checkings. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. pp. 422–429. IEEE (2011)
Google Scholar
Arzoky, M., Swift, S., Tucker, A., Cain, J.: A seeded search for the modularisation of sequential software versions. J. Object Technol. 11(2), 6–1 (2012)
Article Google Scholar
Brunsfeld, M.: Tree-sitter, https://github.com/tree-sitter/tree-sitter, Accessed on 2023-11-01
Campbell, L.R., Dahlberg, S., Dorward, R., Gerhard, J., Grubb, T., Purcell, C., Sagan, B.E.: Restricted growth function patterns and statistics. Adv. Appl. Math. 100, 1–42 (2018)
Article MathSciNet Google Scholar
Chen, Y.T., Huang, C.Y., Yang, T.H.: Using multi-pattern clustering methods to improve software maintenance quality. IET Software 17(1), 1–22 (2023)
Article Google Scholar
Chung, F.R.: Spectral graph theory, vol. 92. American Mathematical Soc. (1997)
Google Scholar
Corradini, A., König, B., Nolte, D.: Specifying graph languages with type graphs. Journal of Logical and Algebraic Methods in Programming 104, 176–200 (2019). https://doi.org/10.1016/j.jlamp.2019.01.005, https://www.sciencedirect.com/science/article/pii/S235222081730233X
Devroye, L.: Sample-based non-uniform random variate generation. In: Proceedings of the 18th conference on Winter simulation. pp. 260–265 (1986)
Google Scholar
Fiedler, M.: A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslov. Math. J. 25(4), 619–633 (1975)
Article MathSciNet Google Scholar
Fiedler, M.: Laplacian of graphs and algebraic connectivity. Banach Center Publ. 1(25), 57–70 (1989)
Article MathSciNet Google Scholar
GitHub: Github advanced search (2023), https://github.com/search/advanced, Last Accessed on 23-11-01
GitHub: Octoverse 2022: 10 years of tracking open source (2023), https://github.blog/2022-11-17-octoverse-2022-10-years-of-tracking-open-source/, Last Accessed on 23-11-01
Gupta, N., Kumar, S., Gupta, V., Vijh, S.: Novel automatic approach using modified differential evaluation to software module clustering problem. SN Computer Science 4(6), 816 (2023)
Article Google Scholar
Harman, M., Swift, S., Mahdavi, K., Beyer, H.: An empirical study of the robustness of two module clustering fitness functions, pp. 1029–1036. ASSOC COMPUTING MACHINERY (2005), genetic and Evolutionary Computation Conference; Conference date: 25–06-2005 Through 29–06-2005
Google Scholar
Kang, Y., Xie, W., Wang, X., Wang, H., Wang, X., Li, J.: Mopisde: A collaborative multi-objective information-sharing de algorithm for software clustering. Expert Systems with Applications p. 120207 (2023)
Google Scholar
Khan, M.Z., Naseem, R., Anwar, A., Haq, I.U., Alturki, A., Ullah, S.S., Al-Hadhrami, S.A., et al.: A novel approach to automate complex software modularization using a fact extraction system. Journal of Mathematics 2022 (2022)
Google Scholar
L.H., H.: Stirling behaviour is asymptotically normal. The Annals of Mathematical Statistics 3(2), 410–414 (1967)
Google Scholar
Lourenço, H.R., Martin, O.C., Stützle, T.: Iterated local search. In: Handbook of metaheuristics, pp. 320–353. Springer (2003)
Google Scholar
Lu, K.: Practical program modularization with type-based dependence analysis. In: 2023 IEEE Symposium on Security and Privacy (SP). pp. 1256–1270. IEEE (2023)
Google Scholar
Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R.: Bunch: A clustering tool for the recovery and maintenance of software system structures. In: Proceedings IEEE International Conference on Software Maintenance-1999 (ICSM’99)’.Software Maintenance for Business Change’(Cat. No. 99CB36360). pp. 50–59. IEEE (1999)
Google Scholar
Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., Gansner, E.R.: Using automatic clustering to produce high-level system organizations of source code. In: Proceedings. 6th International Workshop on Program Comprehension. IWPC’98 (Cat. No. 98TB100242). pp. 45–52. IEEE (1998)
Google Scholar
Maramazi, F., Odebode, A., Mann, A., Swift, S., Arzoky, M.: Intelligent systems and applications: Proceedings of the 2024 intelligent systems conference (intellisys) volume 1. In: Lecture Notes in Networks and Systems #822. p. 470. Springer (January 5 2024)
Google Scholar
Mitchell, B.S., Mancoridis, S.: Clustering module dependency graphs of software systems using the bunch tool. Nat. Sci. Found., Alexandria, VA, USA, Tech. Rep (1998)
Google Scholar
Prajapati, A., Parashar, A., Rathee, A.: Multi-dimensional information-driven many-objective software remodularization approach. Front. Comp. Sci. 17(3), 173209 (2023)
Article Google Scholar
Ramalhinho-Lourenço, H., Martin, O.C., Stützle, T.: Iterated local search (2000)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Savić, M., Rakić, G., Budimac, Z., Ivanović, M.: A language-independent approach to the extraction of dependencies between source code entities. Inf. Softw. Technol. 56(10), 1268–1288 (2014)
Article Google Scholar
SciTools: Understand: The software developer’s multi-tool (2023), https://scitools.com/, Accessed on 2023-11-10
Tan, A.J.J., Chong, C.Y., Aleti, A.: Closing the loop for software remodularisation-rearrange: An effort estimation approach for software clustering-based remodularisation. arXiv preprint arXiv:2303.06283 (2023)
Temme, N.M.: Asymptotic estimates of stirling numbers. Stud. Appl. Math. 89(3), 233–243 (1993)
Article MathSciNet Google Scholar
Tucker, A., Swift, S., Liu, X.: Variable grouping in multivariate time series via correlation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 31(2), 235–245 (2001)
Google Scholar
Weiss, K., Banse, C.: A language-independent analysis platform for source code. arXiv preprint arXiv:2203.08424 (2022)
Weisstein, E.W.: Stirling number of the second kind. https://mathworld.wolfram.com/ (2002)
Yang, K., Wang, J., Fang, Z., Wu, P., Song, Z.: Enhancing software modularization via semantic outliers filtration and label propagation. Inf. Softw. Technol. 145, 106818 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Brunel University London, UB8 3PH, Uxbridge, UK
Ashley Mann, Stephen Swift & Mahir Arzoky

Authors

Ashley Mann
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Swift
View author publications
You can also search for this author in PubMed Google Scholar
Mahir Arzoky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashley Mann .

Editor information

Editors and Affiliations

University of York, York, UK
Stephen Smith
University of Coimbra, Coimbra, Portugal
João Correia
University of Málaga, Málaga, Spain
Christian Cintrano

Appendix A

This Appendix showcases details about the software system MDG used in our experiments. Below, we showcase the following statistics for each software system:

1.
ID
- Each software system is assigned a unique identifier. We choose not to use the actual names of our software systems because our collection is sourced randomly from GitHub. These software system names can exhibit variation, and we intend to maintain professionalism and steer clear of potentially inappropriate names and software tools.
2.
Nodes
- Also known as vertices, these signify the number of software components (classes) within our Module Dependency Graphs (MDGs).
3.
Edges
- Denotes the number of relationships between software components.
4.
Clustering Coefficient:
- The extent to which nodes tend to cluster. A high score indicates a strong cohesion, while a low score indicates a higher coupling level. We present this statistic as these software systems exhibit remarkably low coefficients, indicating a high coupling level and a deficiency in the initial modular structure. There is potential here to investigate the nature of software structure over time, especially concerning the analysis of open-source software systems (Tables 6 and 7).

Table 6. “Big 5” Software MDG Statistics

Full size table

Table 7. “Small 50” Open-Source Software MDG Statistics

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mann, A., Swift, S., Arzoky, M. (2024). Applying Graph Partitioning-Based Seeding Strategies to Software Modularisation. In: Smith, S., Correia, J., Cintrano, C. (eds) Applications of Evolutionary Computation. EvoApplications 2024. Lecture Notes in Computer Science, vol 14634. Springer, Cham. https://doi.org/10.1007/978-3-031-56852-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-56852-7_16
Published: 21 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56851-0
Online ISBN: 978-3-031-56852-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Applying Graph Partitioning-Based Seeding Strategies to Software Modularisation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

How Starting Points and Representations Affect Software Modularisation: An Empirical Analysis

A Hybrid Genetic Algorithm for Software Architecture Re-Modularization

Variable Neighborhood Descent for Software Quality Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A

Appendix A

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us