Skip to main content

Advertisement

Log in

Search-based inference of dialect grammars

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Building parsers is an essential task for the development of many tools, from software maintenance tools to any kind of business-specific, programmable environment having a command-line interface. Whilst grammars for many programming languages are available, these are, very often, almost useless because of the large diffusion of dialects and variants not contemplated by standard grammars. Writing a grammar by hand is clearly feasible, however it can be a tedious and error-prone task, requiring appropriate skills not always available. Grammar inference is a possible, challenging approach for obtaining suitable grammars from program examples. However, inference from scratch poses serious scalability issues and tends to produce correct, but meaningless grammars, hard to be understood and used to build tools. This paper describes an approach, based on genetic algorithms, for evolving existing grammars towards target (dialect) grammars, inferring changes from examples written using the dialect. Results obtained experimenting the inference of C dialect rules show that the algorithm is able to successfully evolve the grammar. Inspections indicated that the changes automatically made to the grammar during its evolution preserved its meaningfulness, and were comparable to what a developer could have done by hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aho AV, Sethi R, Ullman JD (1985) Compilers. Principles techniques and tools. Addison-Wesley Reading, MA

    Google Scholar 

  • Antoniol G, Di Penta M, Masone G, Villano U (2004a) Compiler hacking for source code analysis. Softw Qual J (12):383–406

    Article  Google Scholar 

  • Antoniol G, Di Penta M, Harman M (2004b) A robust search-based approach to project management in the presence of abandonment, rework, error and uncertainty. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 172–183, 2004

  • Aycinena M (2005) Probabilistic geometric grammars for object recognition. S.M. thesis, MIT, Stanford, USA, August 2005

  • Caskey S, Story E, Pieraccini R (2003) Interactive grammar inference with finite state transducers. In: Proc. Automatic speech recognition and understanding, (ASRU’03), IEEE Workshop pp 572– 576, Virgin Islands (USA), December 2003. IEEE Press NewYork

  • Chomsky N (1959) On certain formal properties of grammars. Inform Control 2:137–167

    Article  Google Scholar 

  • Clark JA, Dolado JJ, Harman M, Hierons RM, Jones B, Lumkin M, Mitchell BS, Mancoridis S, Rees K, Roper M, Shepperd MJ (2003) Formulating software engineering as a search problem. IEE Proc Softw 150(3):161–175

    Article  Google Scholar 

  • Cyre W (2002a) Evolutionary language acquisition. In: IASTED international conference on artificial intelligence and soft computing, pp 146–151, Banff, Canada, July 2002

  • Cyre W (2002b) Learning grammars with a modified classifier system. In: Proc. 2002 world congress on computational intelligence, pp 1366–1371, Honolulu, Hawaii, USA, May 2002

  • De La Higuera C (2000) Current trends in grammatical inference. In: Proceedings of the joint IAPR international workshops on advances in pattern recognition, August 2000

  • Di Penta M, Taneja K (2005) Towards the automatic evolution of reengineering tools. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR 2005), pp 241–244, Manchester, UK

  • Dubey A, Aggarwal SK, Jalote P (2005) A technique for extracting keyword based rules from a set of programs. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR-2005), pp 217–225, Manchester, UK, 2005. IEEE Computer Society

  • Dulewicz G, Unold O (2002) Evolving natural language parser with genetic programming. In: Abraham A, Koppen M (eds) Advances in Soft Computing. Hybrid Information Systems, pp 361–377

  • Dupont P (1994) Inference from positive and negative samples by genetic search: the GIG method. In: Proceedings of the second international colloquium on grammatical inference and applications, pp 21–23, September 1994

  • Fatiregun D, Harman M, Hierons RM (2005) Search-based amorphous slicing. In: 12th working conference on reverse engineering (WCRE 2005), 7–11 November 2005, Pittsburgh, PA, USA, pp 3–12

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, USA

    MATH  Google Scholar 

  • Greer D, Ruhe G (2004) Software release planning: an evolutionary and iterative approach. Inf Softw Technol 46(4):243–253

    Article  Google Scholar 

  • Greibach SA (1964) Formal parsing systems. Commun ACM 7(8): 499–504

    Article  MATH  Google Scholar 

  • Harman M (2007) The current state and future of search based software engineering. In: ICSE—Future of SE Track, 2007

  • Harman M, Clark JA (2004) Metrics are fitness functions too. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 58–69

  • Hingston P (2001) A genetic algorithm for regular inference. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 1299–1306, San Francisco, California, USA, 7–11 2001. Morgan Kaufmann, San Fransisco

  • Jain R, Aggarwal SK, Jalote P, Biswas S (2004) An interactive method for extracting grammar from programs. Softw Pract Exper 34(5):433–447

    Article  Google Scholar 

  • Javed F, Bryant B, Crepinsek M, Mernik, Sprague A Context-free grammar induction using genetic programming. In: ACMSE ’04, Huntsville, Alabama, USA, April 2004

  • Johnson SC (1979) YACC: yet another compiler–compiler. unix programmer’s manual, vol 2b

  • Kirsopp C, Shepperd MJ, Hart J (2002) Search heuristics, case-based reasoning and software project effort prediction. In: GECCO 2002: Proceedings of the genetic and evolutionary computation conference, New York, USA, 9–13 July 2002, pp 1367–1374, 2002

  • Lämmel R, Verhoef C (2001a) Cracking the 500-language problem. IEEE Software, pp 78–88, November-December

  • Lämmel R, Verhoef C (2001b) Semi-automatic grammar recovery. Software - Pract Exper 12(1)

  • Lankhorst M (1996) Genetic algorithms in data analysis. University Library Groningen, 1996

  • Lucas S (1994) Context-free grammar evolution. In: First international conference on evolutionary computing, pp 130–135

  • Luenberger DG (1979) Introduction to dynamic systems: Theory, Models, and applications.Wiley, New York, NY 10158-0012

  • McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verif Reliab 14(2):105–156

    Article  Google Scholar 

  • Miclet L, de la Higuera C (eds) (1996) Proceedings of the 3rd international colloquium on grammatical inference: learning syntax from sentences. Springer, Heidelberg

    Google Scholar 

  • Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng 32(3):193–208

    Article  Google Scholar 

  • Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of IEEE working conference on reverse engineering, pp 13–22, October 2001

  • O′Keeffe M, O′Cinneide M (2006) Search-based software maintenance pp 249–260

  • Petasis G, Paliouras G, Spyropoulos CD, Halatsis C (2004) eg-GRIDS: context-free grammatical inference from positive examples using genetic search. In: Paliouras G, Sakakibara Y(eds), Grammatical inference: algorithms and applications: 7th international colloquium, ICGI 2004, vol 3264 of Lecture Notes in Computer Science, pp 223 – 234, Athens, Greece, January 2004. Springer, Heidelberg.

  • Tsoulos IG, Lagaris IE (2006) Grammar inference with grammatical evolution. http://www.cs.uoi.gr/lagaris/papers/PREPRINTS/meta_grammars.pdf

  • Wyard P (1991) Context free grammar induction using genetic algorithms. In: Belew RK, Booker LB (eds) Proceedings of the fourth international conference on genetic algorithms, pp 514–518, San Diego, CA, USA, 1991

  • Wyard P (1994) Representational issues for context free grammar induction using genetic algorithm. In: Carrasco RC, Oncina J (eds) Proceedings of the 2nd international colloquium on grammatical inference and applications, vol 862 of Lecture Notes in Artificial Intelligence, pp 222–235, London, UK, 1994. Springer, Heidelberg.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Di Penta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Penta, M., Lombardi, P., Taneja, K. et al. Search-based inference of dialect grammars. Soft Comput 12, 51–66 (2008). https://doi.org/10.1007/s00500-007-0216-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-007-0216-5

Keywords

Navigation