Skip to main content
Log in

An experimental evaluation of the importance of randomness in hill climbing searches applied to software engineering problems

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Random number generators are a core component of heuristic search algorithms. They are used to build candidate solutions and reduce bias while transforming these solutions during the search. Despite their usefulness, random numbers also have drawbacks, as one cannot guarantee that all portions of the search space are covered by the search and must run an algorithm many times to statistically assess its behavior. Determine whether deterministic quasi-random sequences can be used as an alternative to pseudo-random numbers in feeding “randomness” into Hill Climbing searches addressing Software Engineering problems. We have designed and executed three experimental studies in which a Hill Climbing search was used to find solutions for two Software Engineering problems: software module clustering and requirement selection. The algorithm was executed using both pseudo-random numbers and three distinct quasi-random sequences (Faure, Halton, and Sobol). The software clustering problem was evaluated for 32 real-world instances and the requirement selection problem was addressed using 15 instances reused from previous research works. The experimental studies were chained to allow varying as few as possible experimental factors between any given study and its subsequent one. Results found by searches powered by distinct quasi-random sequences were compared to those produced by the pseudo-random search on a per instance basis. The comparison evaluated search efficiency (processing time required to run the search) and effectiveness (quality of results produced by the search). Contrary to previous findings observed in the context of other heuristic search algorithms, we found evidence that quasi-random sequences cannot outperform pseudo-random numbers regularly in Hill Climbing searches. Detailed statistical analysis is provided to support the evidence favoring pseudo-random numbers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Considering instance size and the number of restarts for the search process. See Section 3 for details.

  2. Available at GitHub: https://github.com/marciobarros/SBSEHub

  3. Available at GitHub: https://github.com/marciobarros/SBSEHub

References

  • Amoui M, Mirarab S, Ansari S, Lucas C (2006) A Genetic Algorithm Approach to Design Evolution Using Design Pattern Transformation. Int J Inf Technol Intell Comput 1(2):235–244

    Google Scholar 

  • Antoniol G, Penta MD, Harman M (2005) “Search-based techniques applied to optimization of project planning for a massive maintenance project”. IN: 21st IEEE International Conference on Software Maintenance, pp. 240–249, California, USA

  • Arcuri A, Briand L, (2011) “A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering”, IN: Proceedings of the 33th International Conference on Software Engineering, (ICSE’11), Hawaii, EUA

  • Bagnall AJ, Rayward-Smith VJ, Whittley IM (2001) The next release problem. Inf Softw Technol 43(14):883–890

    Article  Google Scholar 

  • Barros M.O. (2012a) “Evaluating the Importance of Randomness in Search-based Software Engineering”, IN: Proceeding of the IV Symposium on Search Based Software Engineering (SSBSE’12), ISBN: 978-3-642-33118-3, Riva del Garda, Italy, pp. 60 – 74

  • Barros M.O. (2012b) “An Analysis of the Effects of Composite Objectives in Multiobjective Software Module Clustering”, Proceedings of the Genetic and Evolutionary Computing Conference (GECCO 2012), Philadelphia, USA

  • Barros M.O, Dias-Neto A.C. “Threats to Validity in Search-based Software Engineering Empirical Studies”, Technical Report DIA/UNIRIO, No. 6, Rio de Janeiro, Brazil, 2011 (http://www.seer.unirio.br/index.php/ monografiasppgi/article/viewFile/1479/1307)

  • Boehm B (1981) Software Engineering Economics. Prantice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Bowman M, Briand LC, Labiche Y (2010) Solving the Class Responsibility Assignment Problem in Object-Oriented Analysis with Multiobjective Genetic Algorithms”. IEEE Trans Software Eng 36:6

    Article  Google Scholar 

  • Briand LC, Morasca S, Basili VR (1999) Defining and Validating Measures for Object-based High-Level Design. IEEE Trans Softw Eng 25(5):722–743

    Article  Google Scholar 

  • Caflisch RE (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numerica 7:1–49

    Article  MathSciNet  Google Scholar 

  • Chen TY, Merkel R (2007) Quasi-Random Testing. IEEE Trans Reliab 56(3):562–568

    Article  Google Scholar 

  • Chi H, Jones E. L, (2006) “Computational Investigations of QR Sequences in Generating Test Cases for Specification-based Tests”, Proceedings of the Winter Simulation Conference, pp. 975–980

  • Doval D, Mancoridis S, Mitchell BS (1999) “Automatic Clustering of Software Systems using a Genetic Algorithm”, IN: Procedings of the International Conference on Software Tools and Engineering Practice (STEP’99)

  • Durillo J. J, Nebro A. J, Luna F, Doronsoro B, Alba E, (2006) “JMetal: A Java Framework for Developing Multi-objective Optimization Metaheuristics”, TR ITI-2006-10, Dept. de Lenguajes y Ciencias de Computacion, University of Málaga

  • Durillo JJ, Zhang Y, Alba E, Harman M, Nebro AJ (2010) A study of the bi-objective next release problem. Empir Softw Eng 16:29–60

    Article  Google Scholar 

  • Fraser G, Arcuri A, (2011) “Evosuite: Automatic test suite generation for object-oriented software”, IN: ACM Symposium on the Foundations of Software Engineering (FSE’11)

  • Fraser G, Arcuri A, (2012) “The Seed is Strong: Seeding Strategies in Search-based Software Testing”. IN: Proceedings of the 5th IEEE International Conference on Software Testing, Verification and Validation, pp 121 – 130, Montreal, Canada

  • Georgieva A, Jordanov I (2009) Global Optimization based on Novel Heuristics, Low-discrepancy Sequences and Genetic Algorithms. Eur J Oper Res 196:413–422

    Article  MATH  Google Scholar 

  • Gibbs S, Tsichritzis D, et al. (1990) “Class Management for Software Communities”, Communications of the ACM, v. 33, n. 9, pp.90-103, New York, USA

  • Hall M, Walkinshaw N, McMinn P, (2012) “Supervised Software Modularization”, Proceedings of the International Conference on Software Maintenance, pp. 472-481, Riva del Garda, Italy

  • Harman M, Swift S, Mahdavi K, (2005) “An Empirical Study of the Robustness of two Module Clustering Fitness Functions”, IN: Proceedings of the Genetic and Evolutionary Computing Conference (GECCO’05), Washington DC, USA

  • Harman M, Masouri S.A, Zhang Y, (2009) “Search Based Software Engineering: A Comprehensive Analysis and Review of Trends Techniques and Applications”, Department of Computer Science, King’s College London, Technical Report TR-09-03, April

  • Joy C, Boyle P.P, Tan K.S, (1996) “Quasi-Monte Carlo Methods in Numerical Finance”, Institute for Operations Research and Management Sciences, pp. 41–54

  • Kimura S, Matsumura K, (2005) “Genetic Algorithms using Low-Discrepancy Sequences”, IN: Proceedings of Genetic and Evolutionary Computation Conference (GECCO’05), Washington DC, USA

  • Knuth DE (1981) “Seminumerical Algorithms”, IN: The Art of Computer Programming, 2nd edn. Addison-Wesley, Reading

    Google Scholar 

  • Larman C (2002) Applying UML and Patterns: An Introduction to Object-Oriented Analysis and the Unified Process. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Levy G, (2013) “An introduction to quasi-random numbers”, available online at http://www.nag.com/Industry Articles/introduction_to_quasi_random_numbers.pdf, last accessed in 07/30

  • Li Z, Harman M, Hierons R (2007) Search Algorithms for Regression Test Case Prioritization. IEEE Trans Softw Eng 33(4):225–237

    Article  Google Scholar 

  • Liu H, Chen T.Y, (2009) “An Innovative Approach to Randomising Quasi-random Sequences and Its Application into Software Testing”. 9th International Conference on Quality Software, pp. 59–64

  • Lutz R (2001) Evolving Good Hierarchical Decompositions of Complex Systems. J Syst Archit 47:613–634

    Article  Google Scholar 

  • Maaranen H, Miettinen K, Makela MM (2004) Quasi-Random Initial Population for Genetic Algorithms. Comput Math Appl 47:1885–1895

    Article  MathSciNet  MATH  Google Scholar 

  • Mahdavi K, Harman M, Hierons R.M, (2003) “A Multiple Hill Climbing Approach to Software Module Clustering”, IN: Proceedings of the International Conference on Software Maintenance, Amsterdan, pp. 315-324

  • Mancoridis S, Mitchell B.S, Chen Y, Gansner E.R, (1999) “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures”, IN: Proceedings of the IEEE International Conference on Software Maintenance, pp. 50-59

  • Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comp Simulation 8(1):3–30

    Article  MATH  Google Scholar 

  • McConnell S. (2004) “Code Complete”, Second Edition, Microsoft Press, 2004

  • Morokoff WJ, Caflish RE (1994) Quasi-random Sequences and their Discrepancies. SIAM J Sci Comput 15(6):1251–1279

    Article  MathSciNet  MATH  Google Scholar 

  • Niederreiter H (1978) Quasi-Monte Carlo methods and pseudo-random numbers. Bull Am Math Soc 84(6):957–1041

    Article  MathSciNet  MATH  Google Scholar 

  • Niederreiter H. (1992) “Random Number Generation and Quasi-Monte Carlo Methods”, CBMS-NSF Regional Conference Series in Applied Mathematics, ISBN: 978-0-89871-295-7

  • Pant M, Thangaraj R, Grosan C, Abraham A. (2008) “Improved Particle Swarm Optimization with Low-Discrepancy Sequences”, N: Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2008), pp. 3011 – 3018, Hong Kong

  • Perepletchikov M, Ryan C (2011) A Controlled Experiment for Evaluating the Impact of Coupling on the Maintainability of Service-Oriented Software. IEEE Trans Softw Eng 37(4):449–465

    Article  Google Scholar 

  • Praditwong K, Harman M, Yao X (2011) Software Module Clustering as a Multiobjective Search Problem. IEEE Trans Softw Eng 37(2):262–284

    Article  Google Scholar 

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical Recipes: The Art of Scientific Computing, 2nd edn. Cambridge University Press, NY

    Google Scholar 

  • Räihä O. (2007) “A Survey on Search-Based Software Design”. Technical Report D-2009-1, Department of Computer Sciences University Of Tampere, March

  • Shahbazi A, Tappenden AF, Miller J (2013) Centroidal Voronoi Tessellations - A New Approach to Random Testing. IEEE Trans Softw Eng 39(2):163–183

    Article  Google Scholar 

  • Sharma T. K, Pant M. (2011) “Halton Based Initial Distribution in Artificial Bee Colony Algorithm and Its Application in Software Effort Estimation”, 6th International Conference on Bio-Inspired Computing: Theories and Applications, pp. 80–84

  • Sheta AF (2006) Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects. J Comput Sci 2(2):118–123

    Article  Google Scholar 

  • Simons CL, Parmee IC, Gwynllyw R (2010) Interactive, Evolutionary Search in Upstream Object-Oriented Class Design. IEEE Trans Softw Eng 36(6):798–816

    Article  Google Scholar 

  • Storn R, Price K. (1995) “Differential Evolution – a simple and efficient adaptive scheme for global optimization over continuous spaces”, Technical Report, International Computer Science Institute, Berkley

  • Thangaraj R, Pant M, Abraham A, Badr Y. (2009) “Hybrid Evolutionary Algorithm for Solving Global Optimization Problems”, Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, pp. 310-318

  • Tucker A, Swift S, Liu X (2001) Grouping Multivariate Time Series via Correlation. IEEE Trans Syst, Man, CybernPart B: Cyberne 31(2):235–245

    Article  Google Scholar 

  • Vargha A, Delaney HD (2000) A Critique and Improvement of the “CL Common Language Effect Size Statistics of McGraw and Wong”. J Educ Behav Stat 25(2):101–132

    Google Scholar 

  • Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in Software Engineering. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Xuan J, Jiang H, Ren Z, Luo Z (2012) Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm. IEEE Trans Softw Eng 38(5):1195–1212

    Article  Google Scholar 

  • Yourdon E, Constantine LL (1979) “Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design”, Yourdon Press

  • Zhang Y, Harman M, Lim SL (2013) Empirical evaluation of search based requirements interaction management. Inf Softw Technol 55(1):126–152

    Article  Google Scholar 

  • Zhang Y, Harman M, Mansouri SA (2007) “The multi-objective next release problem”. Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO ’07), pp. 1129-1136

Download references

Acknowledgments

The author would like to express his gratitude to FAPERJ, CAPES, and CNPq, the research agencies which financially supported this project. He would also like to express his gratitude to the SSBSE reviewers, who gave precious insight and ideas on ways to improve this work, and Adriana Alvim, who (correctly) insisted that it was possible to implement significant improvements to reduce the algorithm’s running time.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Márcio de O. Barros.

Additional information

Communicated by: Gordon Fraser and Jerffeson Teixeira de Souza

Rights and permissions

Reprints and permissions

About this article

Cite this article

de O. Barros, M. An experimental evaluation of the importance of randomness in hill climbing searches applied to software engineering problems. Empir Software Eng 19, 1423–1465 (2014). https://doi.org/10.1007/s10664-013-9294-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9294-4

Keywords

Navigation