Abstract
Building parsers is an essential task for the development of many tools, from software maintenance tools to any kind of business-specific, programmable environment having a command-line interface. Whilst grammars for many programming languages are available, these are, very often, almost useless because of the large diffusion of dialects and variants not contemplated by standard grammars. Writing a grammar by hand is clearly feasible, however it can be a tedious and error-prone task, requiring appropriate skills not always available. Grammar inference is a possible, challenging approach for obtaining suitable grammars from program examples. However, inference from scratch poses serious scalability issues and tends to produce correct, but meaningless grammars, hard to be understood and used to build tools. This paper describes an approach, based on genetic algorithms, for evolving existing grammars towards target (dialect) grammars, inferring changes from examples written using the dialect. Results obtained experimenting the inference of C dialect rules show that the algorithm is able to successfully evolve the grammar. Inspections indicated that the changes automatically made to the grammar during its evolution preserved its meaningfulness, and were comparable to what a developer could have done by hand.
Similar content being viewed by others
References
Aho AV, Sethi R, Ullman JD (1985) Compilers. Principles techniques and tools. Addison-Wesley Reading, MA
Antoniol G, Di Penta M, Masone G, Villano U (2004a) Compiler hacking for source code analysis. Softw Qual J (12):383–406
Antoniol G, Di Penta M, Harman M (2004b) A robust search-based approach to project management in the presence of abandonment, rework, error and uncertainty. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 172–183, 2004
Aycinena M (2005) Probabilistic geometric grammars for object recognition. S.M. thesis, MIT, Stanford, USA, August 2005
Caskey S, Story E, Pieraccini R (2003) Interactive grammar inference with finite state transducers. In: Proc. Automatic speech recognition and understanding, (ASRU’03), IEEE Workshop pp 572– 576, Virgin Islands (USA), December 2003. IEEE Press NewYork
Chomsky N (1959) On certain formal properties of grammars. Inform Control 2:137–167
Clark JA, Dolado JJ, Harman M, Hierons RM, Jones B, Lumkin M, Mitchell BS, Mancoridis S, Rees K, Roper M, Shepperd MJ (2003) Formulating software engineering as a search problem. IEE Proc Softw 150(3):161–175
Cyre W (2002a) Evolutionary language acquisition. In: IASTED international conference on artificial intelligence and soft computing, pp 146–151, Banff, Canada, July 2002
Cyre W (2002b) Learning grammars with a modified classifier system. In: Proc. 2002 world congress on computational intelligence, pp 1366–1371, Honolulu, Hawaii, USA, May 2002
De La Higuera C (2000) Current trends in grammatical inference. In: Proceedings of the joint IAPR international workshops on advances in pattern recognition, August 2000
Di Penta M, Taneja K (2005) Towards the automatic evolution of reengineering tools. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR 2005), pp 241–244, Manchester, UK
Dubey A, Aggarwal SK, Jalote P (2005) A technique for extracting keyword based rules from a set of programs. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR-2005), pp 217–225, Manchester, UK, 2005. IEEE Computer Society
Dulewicz G, Unold O (2002) Evolving natural language parser with genetic programming. In: Abraham A, Koppen M (eds) Advances in Soft Computing. Hybrid Information Systems, pp 361–377
Dupont P (1994) Inference from positive and negative samples by genetic search: the GIG method. In: Proceedings of the second international colloquium on grammatical inference and applications, pp 21–23, September 1994
Fatiregun D, Harman M, Hierons RM (2005) Search-based amorphous slicing. In: 12th working conference on reverse engineering (WCRE 2005), 7–11 November 2005, Pittsburgh, PA, USA, pp 3–12
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, USA
Greer D, Ruhe G (2004) Software release planning: an evolutionary and iterative approach. Inf Softw Technol 46(4):243–253
Greibach SA (1964) Formal parsing systems. Commun ACM 7(8): 499–504
Harman M (2007) The current state and future of search based software engineering. In: ICSE—Future of SE Track, 2007
Harman M, Clark JA (2004) Metrics are fitness functions too. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 58–69
Hingston P (2001) A genetic algorithm for regular inference. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 1299–1306, San Francisco, California, USA, 7–11 2001. Morgan Kaufmann, San Fransisco
Jain R, Aggarwal SK, Jalote P, Biswas S (2004) An interactive method for extracting grammar from programs. Softw Pract Exper 34(5):433–447
Javed F, Bryant B, Crepinsek M, Mernik, Sprague A Context-free grammar induction using genetic programming. In: ACMSE ’04, Huntsville, Alabama, USA, April 2004
Johnson SC (1979) YACC: yet another compiler–compiler. unix programmer’s manual, vol 2b
Kirsopp C, Shepperd MJ, Hart J (2002) Search heuristics, case-based reasoning and software project effort prediction. In: GECCO 2002: Proceedings of the genetic and evolutionary computation conference, New York, USA, 9–13 July 2002, pp 1367–1374, 2002
Lämmel R, Verhoef C (2001a) Cracking the 500-language problem. IEEE Software, pp 78–88, November-December
Lämmel R, Verhoef C (2001b) Semi-automatic grammar recovery. Software - Pract Exper 12(1)
Lankhorst M (1996) Genetic algorithms in data analysis. University Library Groningen, 1996
Lucas S (1994) Context-free grammar evolution. In: First international conference on evolutionary computing, pp 130–135
Luenberger DG (1979) Introduction to dynamic systems: Theory, Models, and applications.Wiley, New York, NY 10158-0012
McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verif Reliab 14(2):105–156
Miclet L, de la Higuera C (eds) (1996) Proceedings of the 3rd international colloquium on grammatical inference: learning syntax from sentences. Springer, Heidelberg
Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng 32(3):193–208
Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of IEEE working conference on reverse engineering, pp 13–22, October 2001
O′Keeffe M, O′Cinneide M (2006) Search-based software maintenance pp 249–260
Petasis G, Paliouras G, Spyropoulos CD, Halatsis C (2004) eg-GRIDS: context-free grammatical inference from positive examples using genetic search. In: Paliouras G, Sakakibara Y(eds), Grammatical inference: algorithms and applications: 7th international colloquium, ICGI 2004, vol 3264 of Lecture Notes in Computer Science, pp 223 – 234, Athens, Greece, January 2004. Springer, Heidelberg.
Tsoulos IG, Lagaris IE (2006) Grammar inference with grammatical evolution. http://www.cs.uoi.gr/lagaris/papers/PREPRINTS/meta_grammars.pdf
Wyard P (1991) Context free grammar induction using genetic algorithms. In: Belew RK, Booker LB (eds) Proceedings of the fourth international conference on genetic algorithms, pp 514–518, San Diego, CA, USA, 1991
Wyard P (1994) Representational issues for context free grammar induction using genetic algorithm. In: Carrasco RC, Oncina J (eds) Proceedings of the 2nd international colloquium on grammatical inference and applications, vol 862 of Lecture Notes in Artificial Intelligence, pp 222–235, London, UK, 1994. Springer, Heidelberg.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Di Penta, M., Lombardi, P., Taneja, K. et al. Search-based inference of dialect grammars. Soft Comput 12, 51–66 (2008). https://doi.org/10.1007/s00500-007-0216-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-007-0216-5