SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration

Li, Yishu; Keung, Jacky; Yang, Zhen; Ma, Xiaoxue; Zhang, Jingyu; Liu, Shuo

doi:10.1007/s10515-024-00448-7

SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration

Published: 21 June 2024

Volume 31, article number 55, (2024)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Yishu Li¹,
Jacky Keung¹,
Zhen Yang²,
Xiaoxue Ma¹,
Jingyu Zhang¹ &
…
Shuo Liu¹

563 Accesses
Explore all metrics

Abstract

In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Goal Model Extraction from User Stories Using Large Language Models

Improving agile requirements: the Quality User Story framework and tool

Article Open access 01 April 2016

Evaluating user story quality with LLMs: a comparative study

Article 21 April 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: Data Warehousing and Knowledge Discovery: 10th International Conference, DaWaK 2008 Turin, Italy, September 2-5, 2008 Proceedings 10, pp. 305–316. Springer (2008)
Ahmed, M., Khan, S.U.R., Alam, K.A.: An NLP-based quality attributes extraction and prioritization framework in agile-driven software development. Autom. Softw. Eng. 30(1), 7 (2023)
Article Google Scholar
Ali, N., Cai, H., Hamou-Lhadj, A., Hassine, J.: Exploiting parts-of-speech for effective automated requirements traceability. Inf. Softw. Technol. 106, 126–141 (2019)
Article Google Scholar
Almanaseer, A.M., Alzyadat, W., Muhairat, M., Al-Showarah, S., Alhroob, A.: A proposed model for eliminating nonfunctional requirements in agile methods using natural language processes. In 2022 International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA), pp. 1–7. IEEE (2022)
Bjarnason, E., Unterkalmsteiner, M., Engström, E., Borg, M.: An industrial case study on test cases as requirements. In: Agile Processes in Software Engineering and Extreme Programming: 16th International Conference, XP 2015, Helsinki, Finland, May 25-29, 2015, Proceedings 16, pp. 27–39. Springer (2015)
Bragilovski, M., Dalpiaz, F., Sturm, A.: Guided derivation of conceptual models from user stories: a controlled experiment. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 131–147. Springer (2022)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Buglione, L., Abran, A.: Improving the user story agile technique using the invest criteria. In: 2013 joint conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, pp. 49–53. IEEE (2013)
Cardoso, J.R., Pereira, L.M., Iversen, M.D., Ramos, A.L.: What is gold standard and what is ground truth? Dental Press J. Orthod. 19, 27–30 (2014)
Article Google Scholar
Carreño, L.V.G., Winbladh, K.: Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE), pp. 582–591. IEEE (2013)
Cer, D., Yang, Y., Kong, S.y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al.: Universal sentence encoder. (2018) arXiv preprint arXiv:1803.11175
Chipman, H. A., George, E. I., McCulloch, R. E.: Bart: Bayesian additive regression trees The Annals of Applied Statistics (2010). https://doi.org/10.1214/09-AOAS285
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. (2014) arXiv preprint arXiv:1409.1259
Coe, R.: It’s the effect size, stupid. British Educ. Res. Assoc. Ann. Conf. 12, 14 (2002)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Cohen, J.: Statistical power analysis for the behavioral sciences. Academic press, Cambridge (2013)
Book Google Scholar
Cohn, M.: User Stories Applied: For Agile Software Development. Addison-Wesley Professional, Boston (2004)
Google Scholar
Conboy, K., Fitzgerald, B.: Toward a conceptual framework of agile methods: a study of agility in different disciplines. In: Proceedings of the 2004 ACM Workshop on Interdisciplinary Software Engineering Research, pp. 37–44 (2004)
Dalpiaz, F., Brinkkemper, S.: Agile requirements engineering with user stories. In: 2018 IEEE 26th International Requirements Engineering Conference (RE), pp. 191–200. IEEE (2012)
Dalpiaz, F., Van Der Schalk, I., Brinkkemper, S., Aydemir, F.B., Lucassen, G.: Detecting terminological ambiguity in user stories: tool and experimentation. Inf. Softw. Technol. 110, 3–16 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. (2018) arXiv preprint arXiv:1810.04805
Diebold, P., Ostberg, J.P., Wagner, S., Zendler, U.: What do practitioners vary in using scrum? In: Agile Processes in Software Engineering and Extreme Programming: 16th International Conference, XP 2015, Helsinki, Finland, May 25-29, 2015, Proceedings 16, pp. 40–51. Springer (2015)
Dimitrijević, S., Jovanović, J., Devedžić, V.: A comparative study of software tools for user story management. Inf. Softw. Technol. 57, 352–368 (2015)
Article Google Scholar
Dong, Y., Jiang, X., Jin, Z., Li, G.: Self-collaboration code generation via chatgpt. (2023) arXiv preprint arXiv:2304.07590
Falessi, D., Juristo, N., Wohlin, C., Turhan, B., Münch, J., Jedlitschka, A., Oivo, M.: Empirical software engineering experts on the use of students and professionals in experiments. Empir. Softw. Eng. 23, 452–489 (2018)
Article Google Scholar
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: survey and open problems. (2023) arXiv preprint arXiv:2310.03533
Ferrari, A., Abualhaija, S., Arora, C.: Model generation from requirements with llms: an exploratory study. (2024) arXiv preprint arXiv:2404.06371
Ferrari, A., Gnesi, S.: Using collective intelligence to detect pragmatic ambiguities. In: 2012 20th IEEE International Requirements Engineering Conference (RE), pp. 191–200. IEEE (2012)
Ferreira, A.M., da Silva, A.R., Paiva, A.C.: Towards the art of writing agile requirements with user stories, acceptance criteria, and related constructs. In: ENASE, pp. 477–484 (2022)
Fischbach, J., Vogelsang, A., Spies, D., Wehrle, A., Junker, M., Freudenstein, D.: Specmate: Automated creation of test cases from acceptance criteria. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pp. 321–331. IEEE (2020)
Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, W.t., Zettlemoyer, L., Lewis, M.: Incoder: A generative model for code infilling and synthesis. (2022) arXiv preprint arXiv:2204.05999
Geng, X., Liu, H.: May. An open reproduction of llama, Openllama (2023). https://github.com/openlm-research/open_llama
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: Unixcoder: unified cross-modal pre-training for code representation. (2022) arXiv preprint arXiv:2203.03850
Gupta, A., Poels, G., Bera, P.: Creation of multiple conceptual models from user stories–a natural language processing approach. In: Advances in Conceptual Modeling: ER 2019 Workshops FAIR, MREBA, EmpER, MoBiD, OntoCom, and ER Doctoral Symposium Papers, Salvador, Brazil, November 4–7, 2019, Proceedings 38, pp. 47–57. Springer (2019)
Hakala, K., Pyysalo, S.: Biomedical named entity recognition with multilingual bert. In: Proceedings of the 5th workshop on BioNLP open shared tasks, pp. 56–61 (2019)
Halme, E., Vakkuri, V., Kultanen, J., Jantunen, M., Kemell, K.K., Rousi, R., Abrahamsson, P.: How to write ethical user stories? Impacts of the eccola method. In: International Conference on Agile Software Development, pp. 36–52. Springer International Publishing Cham (2021)
Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: Norbert: Transfer learning for requirements classification. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp.169–179. IEEE(2020)
Hoang, M., Bihorac, O.A., Rouces, J.: Aspect-based sentiment analysis using bert. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics, pp. 187–196 (2019)
Hoda, R., Salleh, N., Grundy, J.: The rise and evolution of agile software development. IEEE Softw. 35(5), 58–63 (2018)
Article Google Scholar
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A., et al.: spacy: Industrial-strength natural language processing in python (2020). https://spacy.io/
Hotomski, S., Glinz,M.: A qualitative study on using guidegen to keep requirements and acceptance tests aligned. In: 2018 IEEE 26th International Requirements Engineering Conference (RE), pp. 29–39. IEEE (2018)
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al.: Chatgpt for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023)
Article Google Scholar
Kelly, A.: The Art of Agile Product Ownership: A Guide for Product Managers, Business Analysts, and Entrepreneurs, 93–123 (2019)
Khanh, N.T., Daengdej, J., Arifin, H.H.: Human stories: a new written technique in agile software requirements. In: Proceedings of the 6th International Conference on Software and Computer Applications, pp. 15–22 (2017)
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article Google Scholar
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. (2019) arXiv preprint arXiv:1910.13461
Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin,D., Ghanem, B.: Camel: communicative agents for" mind" exploration of large scale language model society. (2023) arXiv preprint arXiv:2303.17760
Li, Y., Keung, J., Ma, X., Chong, C.Y., Zhang, J., Liao, Y.: Llm-based class diagram derivation from user stories with chain-of-thought prompting. In: 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE (2024)
Liu, F., Liu, Y., Shi, L., Huang, H., Wang, R., Yang, Z., Zhang, L.: Exploring and evaluating hallucinations in llm-powered code generation. (2024a) arXiv preprint arXiv:2404.00971
Lombriser, P., Dalpiaz, F., Lucassen, G., Brinkkemper, S.: Gamified requirements engineering: model and experimentation. In: Requirements Engineering: Foundation for Software Quality: 22nd International Working Conference, REFSQ 2016, Gothenburg, Sweden, March 14-17, 2016, Proceedings 22, pp. 171–187. Springer (2016)
Lucassen, G., Dalpiaz, F., van der Werf, J.M.E., Brinkkemper, S.: Improving agile requirements: the quality user story framework and tool. Requir. Eng. 21, 383–403 (2016)
Article Google Scholar
Lucassen, G., Dalpiaz, F., Werf, J.M.E.v.d., Brinkkemper, S.: The use and effectiveness of user stories in practice. In: Requirements Engineering: Foundation for Software Quality: 22nd International Working Conference, REFSQ 2016, Gothenburg, Sweden, March 14-17, 2016, Proceedings 22, pp. 205–222. Springer (2016)
Lucassen, G., Robeer, M., Dalpiaz, F., Van Der Werf, J.M.E., Brinkkemper, S.: Extracting conceptual models from user stories with visual narrator. Requir. Eng. 22, 339–358 (2017)
Article Google Scholar
Ma, X., Keung, J.W., Yu, X., Zou, H., Zhang, J., Li, Y.: Attsum: a deep attention-based summarization model for bug report title generation. IEEE Trans. Reliab. 72, 1663–1677 (2023)
Article Google Scholar
Manifesto, A.: Agile manifesto. Haettu 14, 2012 (2001)
Google Scholar
Meredith, P., Summons, P., Park, M., Cheek, B.: What do employers expect from business analysts and is it captured by the “business analysis body of knowledge” (babok)? ACIS 2019 Proceedings. 93 (2019)
Nema, P., Anthonysamy, P., Taft, N., Peddinti, S.T.: Analyzing user perspectives on mobile app privacy at scale. In: Proceedings of the 44th International Conference on Software Engineering, pp. 112–124 (2022)
Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: lessons for training llms on programming and natural languages. (2023) arXiv preprint arXiv:2305.02309
Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: Codegen: an open large language model for code with multi-turn program synthesis. (2022) arXiv preprint arXiv:2203.13474
Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)
Article Google Scholar
Pandit, P., Tahiliani, S.: Agileuat: a framework for user acceptance testing based on user stories and acceptance criteria. Int. J. Comput. Appl. 120(10), 16–21 (2015)
Google Scholar
Pantiuchina, J., Mondini, M., Khanna, D., Wang, X., Abrahamsson, P.: Are software startups applying agile practices? the state of the practice from a large survey. In: International Conference on Agile Software Development, pp. 167–183. Springer, Cham (2017)
Google Scholar
Peng, Z., Rathod, P., Niu, N., Bhowmik, T., Liu, H., Shi, L., Jin, Z.: Environment-driven abstraction identification for requirements-based testing. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp. 245–256. IEEE (2021)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Sedano, T., Ralph, P., Péraire, C.: The product backlog. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 200–211. IEEE (2019)
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: Solving AI tasks with chatgpt and its friends in huggingface. (2023) arXiv preprint arXiv:2303.17580
Spoletini, P., Ferrari, A.: The return of formal requirements engineering in the era of large language models. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 344–353. Springer (2024)
Subramanian, S., Mudumba, S.R., Sordoni, A., Trischler, A., Courville, A.C., Pal, C.: Towards text generation with adversarially learned neural outlines. Adv. Neural Inf. Process. Syst. 31 (2018). https://proceedings.neurips.cc/paper/2018/hash/aaaccd2766ec67aecbe26459bb828d81-Abstract.html
Sullivan, G.M., Feinn, R.: Using effect size-or why the p value is not enough. J. Grad. Med. Educ. 4(3), 279–282 (2012)
Article Google Scholar
Sverrisdottir, H.S., Ingason, H.T., Jonasson, H.I.: The role of the product owner in scrum-comparison between theory and practices. Procedia Soc. Behav. Sci. 119, 257–267 (2014)
Article Google Scholar
Thakur, J.S., Gupta, A.: Anmodeler: a tool for generating domain models from textual specifications. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 828–833 (2016)
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. (2023) arXiv preprint arXiv:2302.13971
Wang, T., Roberts, A., Hesslow, D., Le Scao, T., Chung, H.W., Beltagy, I., Launay, J., Raffel, C.: What language model architecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, pp. 22964–22984. PMLR (2022)
Wang, X., Zhao, L., Wang, Y., Sun, J.: The role of requirements engineering practices in agile development: an empirical study. In: Requirements Engineering: First Asia Pacific Requirements Engineering Symposium, APRES 2014, Auckland, New Zealand, April 28-29, 2014. Proceedings, pp. 195–209. Springer (2014)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. (2021) arXiv preprint arXiv:2109.00859
Wautelet, Y., Heng, S., Kolp, M., Mirbel, I., Poelmans, S.: Building a rationale diagram for evaluating user story sets. In: 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pp. 1–12. IEEE (2016)
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)
Google Scholar
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. (2023) arXiv preprint arXiv:2302.11382
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Berlin (2012)
Book Google Scholar
Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan., N.: Visual chatgpt: talking, drawing and editing with visual foundation models. (2023) arXiv preprint arXiv:2303.04671
Xia, C.S., Wei, Y., Zhang, L.: Automated program repair in the era of large pre-trained language models. In: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery (2023)
Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)
Xue, P., L. Wu, Z. Yu, Z. Jin, Z. Yang, X. Li, Z. Yang, and Y. Tan.: Automated commit message generation with large language models: An empirical study and beyond. (2024) arXiv preprint arXiv:2404.14824.
Yang, Z., F. Liu, Z. Yu, J.W. Keung, J. Li, S. Liu, Y. Hong, X. Ma, Z. Jin, and G. Li.: Exploring and unleashing the power of large language models in automated code translation. (2024) arXiv preprint arXiv:2404.14646.
Zhang, J., Chen, Y., Niu, N., Liu, C.: A preliminary evaluation of chatgpt in requirements information retrieval. (2023) arXiv preprint arXiv:2304.12562
Zhang, Y., Jin, Z., Xing, Y., Li, G.: Steam: simulating the interactive behavior of programmers for automatic bug fixing. (2023) arXiv preprint arXiv:2308.14460
Zhao, L., Alhoshan, W., Ferrari, A., Letsholo, K.J., Ajagbe, M.A., Chioasca, E.V., Batista-Navarro, R.T.: Natural language processing for requirements engineering: a systematic mapping study. ACM Comput. Surv. (CSUR) 54(3), 1–41 (2021)
Article Google Scholar
Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. (2023) arXiv preprint arXiv:2303.18223

Download references

Acknowledgements

This work is supported in part by the General Research Fund of the Research Grants Council of Hong Kong and the research funds of the City University of Hong Kong (6000796, 9229109, 9229098, 9220103, 9229029).

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Yishu Li, Jacky Keung, Xiaoxue Ma, Jingyu Zhang & Shuo Liu
School of Computer Science and Technology, Shandong University, Qingdao, China
Zhen Yang

Authors

Yishu Li
View author publications
You can also search for this author inPubMed Google Scholar
Jacky Keung
View author publications
You can also search for this author inPubMed Google Scholar
Zhen Yang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoxue Ma
View author publications
You can also search for this author inPubMed Google Scholar
Jingyu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Shuo Liu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yishu Li, Jacky Keung, and Zhen Yang designed and conducted the main experiments. Yishu Li, Zhen Yang, and Jacky Keung wrote the main manuscript text. Yishu Li and Zhen Yang designed the questionnaires. Yishu Li and Xiaozue Ma manual match the LLM solutions against with the gold standard. Xiaoxue Ma and Jingyu Zhang conducted comparative experiments. Xiaoxue Ma, Jingyu Zhang, and Shuo Liu consolidated the human evaluation results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhen Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Instruction of questionnaire

1.1 Background introduction

Questionnaire for the effectiveness of different models in the task of generating acceptance criteria to elaborate the given user stories. The User Story associated with Acceptance Criteria is marked as "US-AC".

Please score each US-AC by the five attributes of INVEST evaluation criteria. The score of each attribute is 0 to 4.

0 = US-AC fulfilled the X attribute poorly.
1 = US-AC fulfilled the X attribute unsatisfactorily.
2 = US-AC fulfilled the X attribute to some extent.
3 = US-AC fulfilled the X attribute satisfactorily.
4 = US-AC fulfilled the X attribute very satisfactorily.

INVEST evaluation attributes:

Attribute 1 (Independent): The US-AC should be as independent as possible. It can be implemented and delivered without being dependent on other stories.
Attribute 2 (Negotiable): The US-AC is not an explicit contract and should leave space for discussion. It can be rewritten depending on business, market, technical, or any other type of requirement by team members.
Attribute 3 (Valuable): The US-AC must deliver value to the stakeholders by clearly specifying solutions.
Attribute 4 (Estimable): The US-AC should be estimable. It is well enough understood by team members to be able to determine its relative size (in development and cost).
Attribute 5 (Small): The US-AC should not be so big as to become impossible to plan/task/prioritize within a level of accuracy.
Attribute 6 (Testable): The US-AC must provide the necessary information to make test development possible.

The generations for different models are randomly ordered.

And US-ACs are randomly picked from 3 projects (e.g., g11, g24, g28). Brief description of three projects:

g11 - New version of the NSF website: redesign and content discovery.
g24 - Institutional data repository for the University of Bath.
g28 - Citizen science platform that allows anyone to help in research tasks.

1.2 Actions

Please fill in the Tab Scores (Columns D to H) in ScoreSheets.xlsx.

Column "Group No." - The number of projects.
Column "US No." - The number of given user stories.
Column "Model #" - The model number of the generated acceptance criteria.
Column "Independent" - The score of Independent. (0 to 4)
Column "Negotiable" - The score of Negotiable. (0 to 4)
Column "Valuable" - The score of Valuable. (0 to 4)
Column "Estimable" - The score of Estimable. (0 to 4)
Column "Small" - The score of Small. (0 to 4)
Column "Testable" - The score of Testable. (0 to 4)

Please fill in the Tab Information (Column B) in ScoreSheets.xlsx. Column A - Description of your information. Column B - Details of your information.

Sample of US-AC for evaluation

1.1 User story

As an NSF employee, I want to have editorial-planning recommendations so that I can understand how they might impact an eventual CMS / future web presence.

1.2 Acceptance criteria: (generated from Model #1)

(1)
The editorial-planning tool should be available to employees with access to the current CMS or future web presence.
(2)
The tool should provide recommendations for content that is likely to attract the most traffic or engagement.
(3)
The recommendations should be based on data insights and analysis.
(4)
The tool should be user-friendly and intuitive, enabling employees to easily input editorial content and receive recommendations.
(5)
Recommendations should include keywords or search terms that are likely to generate traffic.

1.3 Acceptance criteria: (generated from Model #2)

(1)
The editorial planning recommendations should be based on data-driven analysis, not just assumptions or opinions.
(2)
The CMS should provide a dashboard or interface for displaying the editorial planning recommendations for easy access for NSF employees.
(3)
The editorial planning recommendations should be displayed in an easytounderstand format that presents the relevant information and makes the recommendations actionable.
(4)
The recommendations should address any potential SEO considerations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Keung, J., Yang, Z. et al. SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration. Autom Softw Eng 31, 55 (2024). https://doi.org/10.1007/s10515-024-00448-7

Download citation

Received: 03 November 2023
Accepted: 16 May 2024
Published: 21 June 2024
DOI: https://doi.org/10.1007/s10515-024-00448-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Goal Model Extraction from User Stories Using Large Language Models

Improving agile requirements: the Quality User Story framework and tool

Evaluating user story quality with LLMs: a comparative study

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Instruction of questionnaire

1.1 Background introduction

1.2 Actions

Sample of US-AC for evaluation

1.1 User story

1.2 Acceptance criteria: (generated from Model #1)

1.3 Acceptance criteria: (generated from Model #2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now