Abstract
Software modularization is a technique used to divide a software system into independent modules (packages) that are expected to be cohesive and loosely coupled. However, as software systems evolve over time to meet new requirements, their modularizations become complex and gradually loose their quality. Thus, it is challenging to automatically optimize the classes’ distribution in packages, also known as remodularization. To alleviate this issue, we introduce a new approach to optimize software modularization by moving classes to more suitable packages. In addition to improving design quality and preserving semantic coherence, our approach takes into consideration the refactoring effort as an objective in itself while optimizing software modularization. We adapt the Elitist Non-dominated Sorting Genetic Algorithm (NSGA-II) of Deb et al. to find the best sequence of refactorings that 1) maximize structural quality, 2) maximize semantic cohesiveness of packages (evaluated by a semantic measure based on WordNet), and 3) minimize the refactoring effort. We report the results of an evaluation of our approach using open-source projects, and we show that our proposal is able to produce a coherent and useful sequence of recommended refactorings both in terms of quality metrics and from the developer’s points of view.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Lehman M M. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software, 1984, 1: 213-221.
Eick S G, Graves T L, Karr A F, Marron J S, Mockus A. Does code decay? Assessing the evidence from change management data. IEEE Transactions on Software Engineering, 2001, 27(1): 1-12.
Lanza M, Marinescu R. Object-oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer-Verlag Berlin Heidelberg, 2006.
Fowler M, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999.
Harman M, Hierons R M, Proctor M. A new representation and crossover operator for search-based optimization of software modularization. In Proc. the 4th Annual Conference on Genetic and Evolutionary Computation, July 2002, pp.1351-1358.
Mitchell B S, Mancoridis S. On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 2006, 32(3): 193-208.
Seng O, Bauer M, Biehl M, Pache G. Search-based improvement of subsystem decompositions. In Proc. the 7th Annual Conference on Genetic and Evolutionary Computation, June 2005, pp.1045-1051.
Bavota G, de Lucia A, Marcus A, Oliveto R. Software remodularization based on structural and semantic metrics. In Proc. the 17th Working Conference on Reverse Engineering, October 2010, pp.195-204.
Harman M, Tratt L. Pareto optimal search based refactoring at the design level. In Proc. the 9th Annual Conference on Genetic and Evolutionary Computation, July 2007, pp.1106-1113.
Bavota G, Carnevale F, de Lucia A, di Penta M, Oliveto R. Putting the developer in-the-loop: An interactive GA for software re-modularization. In Proc. the 4th International Symposium on Search Based Software Engineering, September 2012, pp.75-89.
Bavota G, de Lucia A, Marcus A, Oliveto R. Using structural and semantic measures to improve software modularization. Empirical Software Engineering, 2013, 18(5): 901-932.
Bavota G, Gethers M, Oliveto R, Poshyvanyk D, de Lucia A. Improving software modularization via automated analysis of latent topics and dependencies. ACM Transactions on Software Engineering and Methodology, 2014, 23(1): Article No. 4.
Mkaouer M W, Kessentini M, Shaout A, Koligheu P, Bechikh S, Deb K, Ouni A. Many-objective software remodularization using NSGA-III. ACM Trans. Softw. Eng. Methodol., 2015, 24(3): Article No. 17.
Abdeen H, Ducasse S, Sahraoui H, Alloui I. Automatic package coupling and cycle minimization. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.103-112.
Palomba F, Tufano M, Bavota G, Oliveto R, Marcus A, Poshyvanyk D, de Lucia A. Extract package refactoring in ARIES. In Proc. the 37th IEEE/ACM International Conference on Software Engineering, Volume 2, May 2015, pp.669-672.
Doval D, Mancoridis S, Mitchell B S. Automatic clustering of software systems using a genetic algorithm. In Proc. the 9th International Workshop on Software Technology and Engineering Practice, September 1999, pp.73-81.
Paixao M, Harman M, Zhang Y, Yu Y. An empirical study of cohesion and coupling: Balancing optimization and disruption. IEEE Transactions on Evolutionary Computation, 2018, 22(3): 394-414.
Ouni A, Kessentini M, Sahraoui H, Inoue K, Deb K. Multicriteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology, 2016, 25(3): Article No. 23.
Maqbool O, Babri H. Hierarchical clustering for software architecture recovery. IEEE Transactions on Software Engineering, 2007, 33(11): 759-780.
Candela I, Bavota G, Russo B, Oliveto R. Using cohesion and coupling for software remodularization: Is it enough? ACM Transactions on Software Engineering and Methodology, 2016, 25(3): Article No. 24.
Corazza A, di Martino S, Maggio V, Scanniello G. Investigating the use of lexical information for software system clustering. In Proc. the 15th European Conference on Software Maintenance and Reengineering, March 2011, pp.35-44.
Hall M, Khojaye M A, Walkinshaw N, McMinn P. Establishing the source code disruption caused by automated remodularisation tools. In Proc. the IEEE International Conference on Software Maintenance and Evolution, September 2014, pp.466-470.
Abdeen H, Sahraoui H, Shata O, Anquetil N, Ducasse S. Towards automatically improving package structure while respecting original design decisions. In Proc. the 20th Working Conference on Reverse Engineering, October 2013, pp.212-221.
Ouni A, Kessentini M, Sahraoui H, Boukadoum M. Maintainability defects detection and correction: A multiobjective approach. Automated Software Engineering, 2013, 20(1): 47-79.
Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
Praditwong K, Harman M, Yao X. Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering, 2011, 37(2): 264-282.
Vallée-Rai R, Gagnon E, Hendren L, Lam P, Pominville P, Sundaresan V. Optimizing Java bytecode using the soot framework: Is it feasible? In Proc. the 9th International Conference on Compiler Construction, March 2000, pp.18-34.
Farrugia A. Vertex-partitioning into fixed additive induced-hereditary properties is NP-hard. The Electronic Journal of Combinatorics, 2004, 11(1): R46.
Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. the 10th International Conference Research on Computational Linguistics, March 1997, pp.19-33.
Brooks R. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 1983, 18(6): 543-554.
Merlo E, McAdam I, de Mori R. Feed-forward and recurrent neural networks for source code informal information analysis. Journal of Software Maintenance: Research and Practice, 2003, 15(4): 205-244.
Caprile C, Tonella P. Nomen est omen: Analyzing the language of function identifiers. In Proc. the 6th Working Conference on Reverse Engineering, October 1999, pp.112-122.
Lawrie D, Morrell C, Feild H, Binkley D. What’s in a name? A study of identifiers. In Proc. the 14th IEEE International Conference on Program Comprehension, June 2006, pp.3-12.
Poshyvanyk D, Marcus A. The conceptual coupling metrics for object-oriented systems. In Proc. the 22nd IEEE International Conference on Software Maintenance, September 2006, pp.469-478.
Gethers M, Poshyvanyk D. Using relational topic models to capture coupling among classes in object-oriented software systems. In Proc. the 26th IEEE International Conference on Software Maintenance, September 2010, pp.1-10.
Arnaoudova V, Eshkevari L M, di Penta M, Oliveto R, Antoniol G, Guéhéneuc Y G. REPENT: Analyzing the nature of identifier renamings. IEEE Transactions on Software Engineering, 2014, 40(5): 502-532.
Arnaoudova V, di Penta M, Antoniol G. Linguistic antipatterns: What they are and how developers perceive them. Empirical Software Engineering, 2016, 21(1): 104-158.
Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet. In Proc. the 16th European Conference on Artificial Intelligence, August 2004, pp.1089-1090.
Budanitsky A, Hirst G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Proc. Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics, Volume 2, June 2001, pp.24-29.
Lin D. An information-theoretic definition of similarity. In Proc. the 15th International Conference on Machine Learning, July 1998, pp.296-304.
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proc. the 14th International Joint Conference on Artificial Intelligence, August 1995, pp.448-453.
Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evolutionary Computation, 2014, 18(4): 577-601.
Wen Z, Tzerpos V. An effectiveness measure for software clustering algorithms. In Proc. the 12th IEEE International Workshop on Program Comprehension, July 2004, pp.194-203.
Kuhn A, Ducasse S, Gîrba T. Semantic clustering: Identifying topics in source code. Information & Software Technology, 2007, 49(3): 230-243.
Sahraoui H A, Godin R, Miceli T. Can metrics help to bridge the gap between the improvement of OO design quality and its automation? In Proc. the 8th International Conference on Software Maintenance, October 2000, pp.154-162.
Kessentini M, Mahaouachi R, Ghedira K. What you like in design use to correct bad-smells. Software Quality Journal, 2013, 21(4): 551-571.
Bavota G, Oliveto R, Gethers M, Poshyvanyk D, de Lucia A. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering, 2014, 40(7): 671-694.
Tsantalis N, Chatzigeorgiou A. Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering, 2009, 35(3): 347-367.
Oliveto R, Gethers M, Bavota G, Poshyvanyk D, de Lucia A. Identifying method friendships to remove the feature envy bad smell: NIER track. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.820-823.
Lee J, Lee D, Kim D K, Park S. A semantic-based approach for detecting and decomposing god classes. arXiv: 1204.1967, 2012. https://arxiv.org/pdf/1204.1967.pdf, Sept. 2018.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(PDF 299 kb)
Rights and permissions
About this article
Cite this article
Mahouachi, R. Search-Based Cost-Effective Software Remodularization. J. Comput. Sci. Technol. 33, 1320–1336 (2018). https://doi.org/10.1007/s11390-018-1892-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-018-1892-6