research-article

ModelMate: A recommender for textual modeling languages based on pre-trained language models

Authors:

Carlos Durá Costa,

José Antonio Hernández López,

Jesús Sánchez CuadradoAuthors Info & Claims

MODELS '24: Proceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems

Pages 183 - 194

https://doi.org/10.1145/3640310.3674089

Published: 22 September 2024 Publication History

Abstract

Current DSL environments lack smart editing facilities intended to enhance modeler productivity and cannot keep pace of current developments of integrated development environments based on AI. In this paper, we propose an approach to address this shortcoming through a recommender system specifically tailored for textual DSLs based on the fine-tuning of pre-trained language models. We identify three main tasks: identifier suggestion, line completion, and block completion, which we implement over the same fine-tuned model and we propose a workflow to apply these tasks to any textual DSL. We have evaluated our approach with different pre-trained models for three DSLs: Emfatic, Xtext and a DSL to specify domain entities, showing that the system performs well and provides accurate suggestions. We compare it against existing approaches in the feature name recommendation task showing that our system outperforms the alternatives. Moreover, we evaluate the inference time of our approach obtaining low latencies, which makes the system adequate for live assistance. Finally, we contribute a concrete recommender, named ModelMate, which implements the training, evaluation and inference steps of the workflow as well as providing integration into Eclipse-based textual editors.

References

[1]

Bhisma Adhikari, Eric J Rapos, and Matthew Stephan. 2023. SimIMA: a virtual Simulink intelligent modeling assistant: Simulink intelligent modeling assistance through machine learning and model clones. Software and Systems Modeling (2023), 1--28.

[2]

Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143--153.

Digital Library

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.

[4]

Loli Burgueño, Robert Clarisó, Sébastien Gérard, Shuai Li, and Jordi Cabot. 2021. An NLP-based architecture for the autocompletion of partial domain models. In International Conference on Advanced Information Systems Engineering. Springer, 91--106.

Digital Library

[5]

Meriem Ben Chaaben, Lola Burgueño, and Houari Sahraoui. 2023. Towards using few-shot prompt learning for automating model completion. In 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 7--12.

Digital Library

[6]

Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T Nguyen, and Alfonso Pierantonio. 2022. MemoRec: a recommender system for assisting modelers in specifying metamodels. Software and Systems Modeling (2022), 1--21.

[7]

Juri Di Rocco, Claudio Di Sipio, Davide Di Ruscio, and Phuong T Nguyen. 2021. A GNN-based Recommender System to Assist the Specification of Metamodelsand Models. In 2021 ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 70--81.

[8]

Claudio Di Sipio, Juri Di Rocco, Davide Di Ruscio, and Phuong T Nguyen. 2023. MORGAN: a modeling recommender system based on graph kernel. Software and Systems Modeling (2023), 1--23.

[9]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. 2022. A survey on in-context learning. arXiv preprint arXiv:2301.00234 (2022).

[10]

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. 2020. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027 (2020).

[11]

Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012).

[12]

Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, et al. 2022. The stack: 3 tb of permissively licensed source code. arXiv preprint arXiv:2211.15533 (2022).

[13]

José Antonio Hernández López, Javier Luis Cánovas Izquierdo, and Jesús Sánchez Cuadrado. 2021. ModelSet: a dataset for machine learning in model-driven engineering. Software and Systems Modeling (2021), 1--20.

[14]

José Antonio Hernández López and Jesús Sánchez Cuadrado. 2020. MAR: A structure-based search engine for models. In Proceedings of the 23rd ACM/IEEE international conference on model driven engineering languages and systems. 57--67.

Digital Library

[15]

José Antonio Hernández López and Jesús Sánchez Cuadrado. 2021. An efficient and scalable search engine for models. Software and Systems Modeling (2021), 1--23.

[16]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).

[17]

Marjan Mernik, Jan Heering, and Anthony M Sloane. 2005. When and how to develop domain-specific languages. ACM computing surveys (CSUR) 37, 4 (2005), 316--344.

Digital Library

[18]

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2023. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1--40.

Digital Library

[19]

Gunter Mussbacher, Benoit Combemale, Jörg Kienzle, Silvia Abrahão, Hyacinth Ali, Nelly Bencomo, Márton Búr, Loli Burgueño, Gregor Engels, Pierre Jeanjean, et al. 2020. Opportunities in intelligent modeling assistance. Software and Systems Modeling 19 (2020), 1045--1053.

Digital Library

[20]

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.

Digital Library

[22]

Joshua Peterson, Stephan Meylan, and David Bourgin. 2019. Open clone of openai's unreleased webtext dataset scraper.

[23]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

[24]

Martin Robillard, Robert Walker, and Thomas Zimmermann. 2009. Recommendation systems for software engineering. IEEE software 27, 4 (2009), 80--86.

[25]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[26]

Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Indira Vats, Hadi Moazen, and Federica Sarro. 2024. A survey on machine learning techniques applied to source code. Journal of Systems and Software 209 (2024), 111934.

Digital Library

[27]

Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2022. On the evaluation of neural code summarization. In Proceedings of the 44th international conference on software engineering. 1597--1608.

Digital Library

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[29]

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2022. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560 (2022).

[30]

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. 2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).

[31]

Martin Weyssow, Houari Sahraoui, and Eugene Syriani. 2022. Recommending metamodel concepts during modeling activities with pre-trained language models. Software and Systems Modeling (2022), 1--19.

[32]

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, and Daxin Jiang. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 (2023).

Index Terms

ModelMate: A recommender for textual modeling languages based on pre-trained language models
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Software system models
        Model-driven software engineering

Recommendations

Domain-specific textual meta-modelling languages for model driven engineering
ECMFA'12: Proceedings of the 8th European conference on Modelling Foundations and Applications

Domain-specific modelling languages are normally defined through general-purpose meta-modelling languages like the MOF. While this is satisfactory for many Model-Driven Engineering (MDE) projects, several researchers have identified the need for domain-...
Model-driven engineering with domain-specific meta-modelling languages

Domain-specific modelling languages are normally defined through general-purpose meta-modelling languages like the MOF. While this is satisfactory for many model-driven engineering (MDE) projects, several researchers have identified the need for domain-...
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Traditional recommender systems leverage users’ item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MODELS '24: Proceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems

September 2024

311 pages

ISBN:9798400705045

DOI:10.1145/3640310

General Chairs:
Alexander Egyed,
Manuel Wimmer,
Program Chairs:
Marsha Chechik,
Benoit Combemale

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Johannes Kepler University, Linz, Austria
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

MODELS '24

Sponsor:

SIGSOFT

MODELS '24: ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems

September 22 - 27, 2024

Linz, Austria

Acceptance Rates

MODELS '24 Paper Acceptance Rate 26 of 124 submissions, 21%;

Overall Acceptance Rate 144 of 506 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
123
Total Downloads

Downloads (Last 12 months)123
Downloads (Last 6 weeks)12

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten