skip to main content
10.1145/1953355.1953380acmotherconferencesArticle/Chapter ViewAbstractPublication PagesisecConference Proceedingsconference-collections
research-article

Gramin: a system for incremental learning of programming language grammars

Published: 24 February 2011 Publication History

Abstract

Major software vendors now offer software maintenance as service. Bug detection and removal are major tasks in the maintenance phase of the software life cycle. Vendors often rely on in-house built analysis tools for software fault localization. Programming language grammar is an absolute requirement for building such tools, especially for platforms which are built by third parties. However, the grammar may not be publicly available. Grammars obtained from language reference manuals are often not complete due to language evolution. Manual fixes of such a grammar is a cumbersome task, and requires special expertise.
Grammar inference techniques have been traditionally used to infer grammar from the strings of a language, in the context of computational linguistic, but have not been successful in learning industry standard complex programming language grammars. In this paper we present a programming language grammar inference system, called Gramin, which is used to infer grammar from sample programs. Gramin employs various optimizations to make grammar inference practical in the domain of programming languages. We demonstrate the effectiveness of the Gramin system by inferring a programming language grammar used in industry, and not available in public domain.

References

[1]
Dana Angluin. Negative results for equivalence queries. Mach. Learn., 5(2):121--150, 1990.
[2]
Shiladitya Biswas and S. K. Aggarwal. A technique for extracting grammar from legacy programs. In International Conference on Applied Informatics, pages 652--657, Innsbruck, Austria, 2004.
[3]
Erzesébet Csuhaj-Varjú and Alica Kelemenová. Descriptional complexity of context-free grammar forms. Theor. Comput. Sci., 112(2):277--289, 1993.
[4]
Colin de la Higuera. A bibliographical study of grammatical inference. Pattern Recogn., 38(9):1332--1348, 2005.
[5]
Massimiliano Di Penta and Kunal Taneja. Towards the automatic evolution of reengineering tools. In CSMR '05: Proceedings of the Ninth European Conference on Software Maintenance and Reengineering, pages 241--244, Washington, DC, USA, 2005. IEEE Computer Society.
[6]
Alpana Dubey. Goodness criteria for programming language grammar rules. SIGPLAN Notices, 41(12):44--53, 2006.
[7]
Alpana Dubey, Pankaj Jalote, and Sanjeev Kumar Aggarwal. Inferring grammar rules of programming language dialects. In ICGI, pages 201--213, 2006.
[8]
Alpana Dubey, Pankaj Jalote, and Sanjeev Kumar Aggarwal. Learning context-free grammar rules from a set of program. IET Software, 2(3):223--240, 2008.
[9]
E. Mark Gold. Language identification in the limit. Information and Control, 10(5):447--474, 1967.
[10]
Keita Imada and Katsuhiko Nakamura. Towards machine learning of grammars and compilers of programming languages. In ECML PKDD, pages 98--112, Berlin, Heidelberg, 2008. Springer-Verlag.
[11]
Rahul Jain, Sanjeev Kumar Aggarwal, Pankaj Jalote, and Shiladitya Biswas. An interactive method for extracting grammar from programs. Softw., Pract. Exper., 34(5):433--447, 2004.
[12]
F. Javed, B. R. Bryant, M. Črepinšek, M. Mernik, and A. Sprague. Context-free grammar induction using genetic programming. In ACM-SE 42: Proceedings of the 42nd annual Southeast regional conference, pages 404--405, New York, NY, USA, 2004. ACM.
[13]
Faizan Javed. Inferring context-free grammars for domain-specific languages. In OOPSLA, pages 212--213, New York, NY, USA, 2005. ACM.
[14]
Faizan Javed, Marjan Mernik, Alan P. Sprague, and Barrett R. Bryant. Incrementally inferring context-free grammars for domain-specific languages. In SEKE, pages 363--368, 2006.
[15]
Takeshi Koshiba, Erkki Mäkinen, and Yuji Takada. Inferring pure context-free languages from positive data. Acta Cybern., 14(3):469--477, 2000.
[16]
R. Lämmel and C. Verhoef. Semi-automatic grammar recovery. Softw. Pract. Exper., 31(15):1395--1448, 2001.
[17]
Ralf Lämmel and Chris Verhoef. Cracking the 500-language problem. IEEE Softw., 18(6):78--88, 2001.
[18]
Lillian Lee. Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Harvard University, 1996. Available via ftp, ftp://deas-ftp.harvard.edu/techreports/tr-12-96.ps.gz.
[19]
E Makinen. M akinen, e. 1992. on the structural grammatical inference problem for context-free grammars. Information Processing Letters, 42(1):1--5, 1992.
[20]
Marjan Mernik, Goran Gerlič, Viljem Žumer, and Barrett R. Bryant. Can a parser be generated from examples? In SAC, pages 1063--1067, New York, NY, USA, 2003. ACM.
[21]
Katsuhiko Nakamura. Incremental learning of context free grammars by bridging rule generation and search for semi-optimum rule sets. In ICGI, pages 72--83, 2006.
[22]
Katsuhiko Nakamura and Masashi Matsumoto. Incremental learning of context free grammars based on bottom-up parsing and search. Pattern Recognition, 38(9):1384--1392, 2005.
[23]
Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Bookshelf, 2007.
[24]
James F. Power and Brian A. Malloy. A metrics suite for grammar-based software: Research articles. J. Softw. Maint. Evol., 16(6):405--426, 2004.
[25]
Jorma Rissanen. Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc., River Edge, NJ, USA, 1989.
[26]
D. Saha and V. Narula. Gramin: A system for incremental learning of programming language grammars, 2010. Available at http://researcher.watson.ibm.com/researcher/files/in-diptsaha/gramin.pdf.
[27]
Matej Črepinšek, Marjan Mernik, and Viljem Žumer. Extracting grammar from programs: brute force approach. SIGPLAN Not., 40(4):29--38, 2005.
[28]
XSB. The XSB logic programming system. Available from http://xsb.sourceforge.net.

Cited By

View all
  • (2023)Symbolic encoding of LL(1) parsing and its applicationsFormal Methods in System Design10.1007/s10703-023-00420-361:2-3(338-379)Online publication date: 22-Jun-2023
  • (2021)Automatic grammar repairProceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3486608.3486910(126-142)Online publication date: 17-Oct-2021
  • (2011)A framework for analyzing programs written in proprietary languagesProceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion10.1145/2048147.2048223(289-300)Online publication date: 22-Oct-2011

Index Terms

  1. Gramin: a system for incremental learning of programming language grammars

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ISEC '11: Proceedings of the 4th India Software Engineering Conference
    February 2011
    229 pages
    ISBN:9781450305594
    DOI:10.1145/1953355
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Computer Society of India: Computer Society of India

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. grammar induction
    2. grammar inference
    3. grammar learning
    4. incremental learning
    5. programming language grammar inference

    Qualifiers

    • Research-article

    Conference

    ISEC '11
    Sponsor:
    • Computer Society of India
    ISEC '11: Indian Software Engineering Conference
    February 24 - 27, 2011
    Kerala, Thiruvananthapuram, India

    Acceptance Rates

    Overall Acceptance Rate 76 of 315 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Symbolic encoding of LL(1) parsing and its applicationsFormal Methods in System Design10.1007/s10703-023-00420-361:2-3(338-379)Online publication date: 22-Jun-2023
    • (2021)Automatic grammar repairProceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3486608.3486910(126-142)Online publication date: 17-Oct-2021
    • (2011)A framework for analyzing programs written in proprietary languagesProceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion10.1145/2048147.2048223(289-300)Online publication date: 22-Oct-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media