Abstract
Random number generators are a core component of heuristic search algorithms. They are used to build candidate solutions and reduce bias while transforming these solutions during the search. Despite their usefulness, random numbers also have drawbacks, as one cannot guarantee that all portions of the search space are covered by the search and must run an algorithm many times to statistically assess its behavior. Determine whether deterministic quasi-random sequences can be used as an alternative to pseudo-random numbers in feeding “randomness” into Hill Climbing searches addressing Software Engineering problems. We have designed and executed three experimental studies in which a Hill Climbing search was used to find solutions for two Software Engineering problems: software module clustering and requirement selection. The algorithm was executed using both pseudo-random numbers and three distinct quasi-random sequences (Faure, Halton, and Sobol). The software clustering problem was evaluated for 32 real-world instances and the requirement selection problem was addressed using 15 instances reused from previous research works. The experimental studies were chained to allow varying as few as possible experimental factors between any given study and its subsequent one. Results found by searches powered by distinct quasi-random sequences were compared to those produced by the pseudo-random search on a per instance basis. The comparison evaluated search efficiency (processing time required to run the search) and effectiveness (quality of results produced by the search). Contrary to previous findings observed in the context of other heuristic search algorithms, we found evidence that quasi-random sequences cannot outperform pseudo-random numbers regularly in Hill Climbing searches. Detailed statistical analysis is provided to support the evidence favoring pseudo-random numbers.
Similar content being viewed by others
Notes
Considering instance size and the number of restarts for the search process. See Section 3 for details.
Available at GitHub: https://github.com/marciobarros/SBSEHub
Available at GitHub: https://github.com/marciobarros/SBSEHub
References
Amoui M, Mirarab S, Ansari S, Lucas C (2006) A Genetic Algorithm Approach to Design Evolution Using Design Pattern Transformation. Int J Inf Technol Intell Comput 1(2):235–244
Antoniol G, Penta MD, Harman M (2005) “Search-based techniques applied to optimization of project planning for a massive maintenance project”. IN: 21st IEEE International Conference on Software Maintenance, pp. 240–249, California, USA
Arcuri A, Briand L, (2011) “A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering”, IN: Proceedings of the 33th International Conference on Software Engineering, (ICSE’11), Hawaii, EUA
Bagnall AJ, Rayward-Smith VJ, Whittley IM (2001) The next release problem. Inf Softw Technol 43(14):883–890
Barros M.O. (2012a) “Evaluating the Importance of Randomness in Search-based Software Engineering”, IN: Proceeding of the IV Symposium on Search Based Software Engineering (SSBSE’12), ISBN: 978-3-642-33118-3, Riva del Garda, Italy, pp. 60 – 74
Barros M.O. (2012b) “An Analysis of the Effects of Composite Objectives in Multiobjective Software Module Clustering”, Proceedings of the Genetic and Evolutionary Computing Conference (GECCO 2012), Philadelphia, USA
Barros M.O, Dias-Neto A.C. “Threats to Validity in Search-based Software Engineering Empirical Studies”, Technical Report DIA/UNIRIO, No. 6, Rio de Janeiro, Brazil, 2011 (http://www.seer.unirio.br/index.php/ monografiasppgi/article/viewFile/1479/1307)
Boehm B (1981) Software Engineering Economics. Prantice-Hall, Englewood Cliffs
Bowman M, Briand LC, Labiche Y (2010) Solving the Class Responsibility Assignment Problem in Object-Oriented Analysis with Multiobjective Genetic Algorithms”. IEEE Trans Software Eng 36:6
Briand LC, Morasca S, Basili VR (1999) Defining and Validating Measures for Object-based High-Level Design. IEEE Trans Softw Eng 25(5):722–743
Caflisch RE (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numerica 7:1–49
Chen TY, Merkel R (2007) Quasi-Random Testing. IEEE Trans Reliab 56(3):562–568
Chi H, Jones E. L, (2006) “Computational Investigations of QR Sequences in Generating Test Cases for Specification-based Tests”, Proceedings of the Winter Simulation Conference, pp. 975–980
Doval D, Mancoridis S, Mitchell BS (1999) “Automatic Clustering of Software Systems using a Genetic Algorithm”, IN: Procedings of the International Conference on Software Tools and Engineering Practice (STEP’99)
Durillo J. J, Nebro A. J, Luna F, Doronsoro B, Alba E, (2006) “JMetal: A Java Framework for Developing Multi-objective Optimization Metaheuristics”, TR ITI-2006-10, Dept. de Lenguajes y Ciencias de Computacion, University of Málaga
Durillo JJ, Zhang Y, Alba E, Harman M, Nebro AJ (2010) A study of the bi-objective next release problem. Empir Softw Eng 16:29–60
Fraser G, Arcuri A, (2011) “Evosuite: Automatic test suite generation for object-oriented software”, IN: ACM Symposium on the Foundations of Software Engineering (FSE’11)
Fraser G, Arcuri A, (2012) “The Seed is Strong: Seeding Strategies in Search-based Software Testing”. IN: Proceedings of the 5th IEEE International Conference on Software Testing, Verification and Validation, pp 121 – 130, Montreal, Canada
Georgieva A, Jordanov I (2009) Global Optimization based on Novel Heuristics, Low-discrepancy Sequences and Genetic Algorithms. Eur J Oper Res 196:413–422
Gibbs S, Tsichritzis D, et al. (1990) “Class Management for Software Communities”, Communications of the ACM, v. 33, n. 9, pp.90-103, New York, USA
Hall M, Walkinshaw N, McMinn P, (2012) “Supervised Software Modularization”, Proceedings of the International Conference on Software Maintenance, pp. 472-481, Riva del Garda, Italy
Harman M, Swift S, Mahdavi K, (2005) “An Empirical Study of the Robustness of two Module Clustering Fitness Functions”, IN: Proceedings of the Genetic and Evolutionary Computing Conference (GECCO’05), Washington DC, USA
Harman M, Masouri S.A, Zhang Y, (2009) “Search Based Software Engineering: A Comprehensive Analysis and Review of Trends Techniques and Applications”, Department of Computer Science, King’s College London, Technical Report TR-09-03, April
Joy C, Boyle P.P, Tan K.S, (1996) “Quasi-Monte Carlo Methods in Numerical Finance”, Institute for Operations Research and Management Sciences, pp. 41–54
Kimura S, Matsumura K, (2005) “Genetic Algorithms using Low-Discrepancy Sequences”, IN: Proceedings of Genetic and Evolutionary Computation Conference (GECCO’05), Washington DC, USA
Knuth DE (1981) “Seminumerical Algorithms”, IN: The Art of Computer Programming, 2nd edn. Addison-Wesley, Reading
Larman C (2002) Applying UML and Patterns: An Introduction to Object-Oriented Analysis and the Unified Process. Prentice Hall, Upper Saddle River
Levy G, (2013) “An introduction to quasi-random numbers”, available online at http://www.nag.com/Industry Articles/introduction_to_quasi_random_numbers.pdf, last accessed in 07/30
Li Z, Harman M, Hierons R (2007) Search Algorithms for Regression Test Case Prioritization. IEEE Trans Softw Eng 33(4):225–237
Liu H, Chen T.Y, (2009) “An Innovative Approach to Randomising Quasi-random Sequences and Its Application into Software Testing”. 9th International Conference on Quality Software, pp. 59–64
Lutz R (2001) Evolving Good Hierarchical Decompositions of Complex Systems. J Syst Archit 47:613–634
Maaranen H, Miettinen K, Makela MM (2004) Quasi-Random Initial Population for Genetic Algorithms. Comput Math Appl 47:1885–1895
Mahdavi K, Harman M, Hierons R.M, (2003) “A Multiple Hill Climbing Approach to Software Module Clustering”, IN: Proceedings of the International Conference on Software Maintenance, Amsterdan, pp. 315-324
Mancoridis S, Mitchell B.S, Chen Y, Gansner E.R, (1999) “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures”, IN: Proceedings of the IEEE International Conference on Software Maintenance, pp. 50-59
Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comp Simulation 8(1):3–30
McConnell S. (2004) “Code Complete”, Second Edition, Microsoft Press, 2004
Morokoff WJ, Caflish RE (1994) Quasi-random Sequences and their Discrepancies. SIAM J Sci Comput 15(6):1251–1279
Niederreiter H (1978) Quasi-Monte Carlo methods and pseudo-random numbers. Bull Am Math Soc 84(6):957–1041
Niederreiter H. (1992) “Random Number Generation and Quasi-Monte Carlo Methods”, CBMS-NSF Regional Conference Series in Applied Mathematics, ISBN: 978-0-89871-295-7
Pant M, Thangaraj R, Grosan C, Abraham A. (2008) “Improved Particle Swarm Optimization with Low-Discrepancy Sequences”, N: Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2008), pp. 3011 – 3018, Hong Kong
Perepletchikov M, Ryan C (2011) A Controlled Experiment for Evaluating the Impact of Coupling on the Maintainability of Service-Oriented Software. IEEE Trans Softw Eng 37(4):449–465
Praditwong K, Harman M, Yao X (2011) Software Module Clustering as a Multiobjective Search Problem. IEEE Trans Softw Eng 37(2):262–284
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical Recipes: The Art of Scientific Computing, 2nd edn. Cambridge University Press, NY
Räihä O. (2007) “A Survey on Search-Based Software Design”. Technical Report D-2009-1, Department of Computer Sciences University Of Tampere, March
Shahbazi A, Tappenden AF, Miller J (2013) Centroidal Voronoi Tessellations - A New Approach to Random Testing. IEEE Trans Softw Eng 39(2):163–183
Sharma T. K, Pant M. (2011) “Halton Based Initial Distribution in Artificial Bee Colony Algorithm and Its Application in Software Effort Estimation”, 6th International Conference on Bio-Inspired Computing: Theories and Applications, pp. 80–84
Sheta AF (2006) Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects. J Comput Sci 2(2):118–123
Simons CL, Parmee IC, Gwynllyw R (2010) Interactive, Evolutionary Search in Upstream Object-Oriented Class Design. IEEE Trans Softw Eng 36(6):798–816
Storn R, Price K. (1995) “Differential Evolution – a simple and efficient adaptive scheme for global optimization over continuous spaces”, Technical Report, International Computer Science Institute, Berkley
Thangaraj R, Pant M, Abraham A, Badr Y. (2009) “Hybrid Evolutionary Algorithm for Solving Global Optimization Problems”, Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, pp. 310-318
Tucker A, Swift S, Liu X (2001) Grouping Multivariate Time Series via Correlation. IEEE Trans Syst, Man, CybernPart B: Cyberne 31(2):235–245
Vargha A, Delaney HD (2000) A Critique and Improvement of the “CL Common Language Effect Size Statistics of McGraw and Wong”. J Educ Behav Stat 25(2):101–132
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in Software Engineering. Kluwer Academic Publishers, Norwell
Xuan J, Jiang H, Ren Z, Luo Z (2012) Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm. IEEE Trans Softw Eng 38(5):1195–1212
Yourdon E, Constantine LL (1979) “Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design”, Yourdon Press
Zhang Y, Harman M, Lim SL (2013) Empirical evaluation of search based requirements interaction management. Inf Softw Technol 55(1):126–152
Zhang Y, Harman M, Mansouri SA (2007) “The multi-objective next release problem”. Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO ’07), pp. 1129-1136
Acknowledgments
The author would like to express his gratitude to FAPERJ, CAPES, and CNPq, the research agencies which financially supported this project. He would also like to express his gratitude to the SSBSE reviewers, who gave precious insight and ideas on ways to improve this work, and Adriana Alvim, who (correctly) insisted that it was possible to implement significant improvements to reduce the algorithm’s running time.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Gordon Fraser and Jerffeson Teixeira de Souza
Rights and permissions
About this article
Cite this article
de O. Barros, M. An experimental evaluation of the importance of randomness in hill climbing searches applied to software engineering problems. Empir Software Eng 19, 1423–1465 (2014). https://doi.org/10.1007/s10664-013-9294-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9294-4