Skip to main content

Advertisement

Log in

SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.iiba.org/career-resources/a-business-analysis-professionals-foundation-for-success/agile-extension/

  2. https://github.com/liyishu0308/SimAC.git.

  3. https://www.altexsoft.com/blog/acceptance-criteria-purposes-formats-and-best-practices/

  4. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api.

  5. https://ai.meta.com/

  6. https://platform.openai.com/docs/models/gpt-3-5.

  7. https://huggingface.co/openlm-research.

  8. https://spacy.io/usage/linguistic-features.

  9. https://xp123.com/articles/invest-in-good-stories-and-smart-tasks/

  10. https://github.com/nsf-open/nsf/tree/master.

  11. https://blog.zooniverse.org/.

References

  • Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: Data Warehousing and Knowledge Discovery: 10th International Conference, DaWaK 2008 Turin, Italy, September 2-5, 2008 Proceedings 10, pp. 305–316. Springer (2008)

  • Ahmed, M., Khan, S.U.R., Alam, K.A.: An NLP-based quality attributes extraction and prioritization framework in agile-driven software development. Autom. Softw. Eng. 30(1), 7 (2023)

    Article  Google Scholar 

  • Ali, N., Cai, H., Hamou-Lhadj, A., Hassine, J.: Exploiting parts-of-speech for effective automated requirements traceability. Inf. Softw. Technol. 106, 126–141 (2019)

    Article  Google Scholar 

  • Almanaseer, A.M., Alzyadat, W., Muhairat, M., Al-Showarah, S., Alhroob, A.: A proposed model for eliminating nonfunctional requirements in agile methods using natural language processes. In 2022 International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA), pp. 1–7. IEEE (2022)

  • Bjarnason, E., Unterkalmsteiner, M., Engström, E., Borg, M.: An industrial case study on test cases as requirements. In: Agile Processes in Software Engineering and Extreme Programming: 16th International Conference, XP 2015, Helsinki, Finland, May 25-29, 2015, Proceedings 16, pp. 27–39. Springer (2015)

  • Bragilovski, M., Dalpiaz, F., Sturm, A.: Guided derivation of conceptual models from user stories: a controlled experiment. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 131–147. Springer (2022)

  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  • Buglione, L., Abran, A.: Improving the user story agile technique using the invest criteria. In: 2013 joint conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, pp. 49–53. IEEE (2013)

  • Cardoso, J.R., Pereira, L.M., Iversen, M.D., Ramos, A.L.: What is gold standard and what is ground truth? Dental Press J. Orthod. 19, 27–30 (2014)

    Article  Google Scholar 

  • Carreño, L.V.G., Winbladh, K.: Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE), pp. 582–591. IEEE (2013)

  • Cer, D., Yang, Y., Kong, S.y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al.: Universal sentence encoder. (2018) arXiv preprint arXiv:1803.11175

  • Chipman, H. A., George, E. I., McCulloch, R. E.: Bart: Bayesian additive regression trees The Annals of Applied Statistics (2010). https://doi.org/10.1214/09-AOAS285

  • Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. (2014) arXiv preprint arXiv:1409.1259

  • Coe, R.: It’s the effect size, stupid. British Educ. Res. Assoc. Ann. Conf. 12, 14 (2002)

    Google Scholar 

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  • Cohen, J.: Statistical power analysis for the behavioral sciences. Academic press, Cambridge (2013)

    Book  Google Scholar 

  • Cohn, M.: User Stories Applied: For Agile Software Development. Addison-Wesley Professional, Boston (2004)

    Google Scholar 

  • Conboy, K., Fitzgerald, B.: Toward a conceptual framework of agile methods: a study of agility in different disciplines. In: Proceedings of the 2004 ACM Workshop on Interdisciplinary Software Engineering Research, pp. 37–44 (2004)

  • Dalpiaz, F., Brinkkemper, S.: Agile requirements engineering with user stories. In: 2018 IEEE 26th International Requirements Engineering Conference (RE), pp. 191–200. IEEE (2012)

  • Dalpiaz, F., Van Der Schalk, I., Brinkkemper, S., Aydemir, F.B., Lucassen, G.: Detecting terminological ambiguity in user stories: tool and experimentation. Inf. Softw. Technol. 110, 3–16 (2019)

    Article  Google Scholar 

  • Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. (2018) arXiv preprint arXiv:1810.04805

  • Diebold, P., Ostberg, J.P., Wagner, S., Zendler, U.: What do practitioners vary in using scrum? In: Agile Processes in Software Engineering and Extreme Programming: 16th International Conference, XP 2015, Helsinki, Finland, May 25-29, 2015, Proceedings 16, pp. 40–51. Springer (2015)

  • Dimitrijević, S., Jovanović, J., Devedžić, V.: A comparative study of software tools for user story management. Inf. Softw. Technol. 57, 352–368 (2015)

    Article  Google Scholar 

  • Dong, Y., Jiang, X., Jin, Z., Li, G.: Self-collaboration code generation via chatgpt. (2023) arXiv preprint arXiv:2304.07590

  • Falessi, D., Juristo, N., Wohlin, C., Turhan, B., Münch, J., Jedlitschka, A., Oivo, M.: Empirical software engineering experts on the use of students and professionals in experiments. Empir. Softw. Eng. 23, 452–489 (2018)

    Article  Google Scholar 

  • Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: survey and open problems. (2023) arXiv preprint arXiv:2310.03533

  • Ferrari, A., Abualhaija, S., Arora, C.: Model generation from requirements with llms: an exploratory study. (2024) arXiv preprint arXiv:2404.06371

  • Ferrari, A., Gnesi, S.: Using collective intelligence to detect pragmatic ambiguities. In: 2012 20th IEEE International Requirements Engineering Conference (RE), pp. 191–200. IEEE (2012)

  • Ferreira, A.M., da Silva, A.R., Paiva, A.C.: Towards the art of writing agile requirements with user stories, acceptance criteria, and related constructs. In: ENASE, pp. 477–484 (2022)

  • Fischbach, J., Vogelsang, A., Spies, D., Wehrle, A., Junker, M., Freudenstein, D.: Specmate: Automated creation of test cases from acceptance criteria. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pp. 321–331. IEEE (2020)

  • Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, W.t., Zettlemoyer, L., Lewis, M.: Incoder: A generative model for code infilling and synthesis. (2022) arXiv preprint arXiv:2204.05999

  • Geng, X., Liu, H.: May. An open reproduction of llama, Openllama (2023). https://github.com/openlm-research/open_llama

  • Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: Unixcoder: unified cross-modal pre-training for code representation. (2022) arXiv preprint arXiv:2203.03850

  • Gupta, A., Poels, G., Bera, P.: Creation of multiple conceptual models from user stories–a natural language processing approach. In: Advances in Conceptual Modeling: ER 2019 Workshops FAIR, MREBA, EmpER, MoBiD, OntoCom, and ER Doctoral Symposium Papers, Salvador, Brazil, November 4–7, 2019, Proceedings 38, pp. 47–57. Springer (2019)

  • Hakala, K., Pyysalo, S.: Biomedical named entity recognition with multilingual bert. In: Proceedings of the 5th workshop on BioNLP open shared tasks, pp. 56–61 (2019)

  • Halme, E., Vakkuri, V., Kultanen, J., Jantunen, M., Kemell, K.K., Rousi, R., Abrahamsson, P.: How to write ethical user stories? Impacts of the eccola method. In: International Conference on Agile Software Development, pp. 36–52. Springer International Publishing Cham (2021)

  • Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: Norbert: Transfer learning for requirements classification. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp.169–179. IEEE(2020)

  • Hoang, M., Bihorac, O.A., Rouces, J.: Aspect-based sentiment analysis using bert. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics, pp. 187–196 (2019)

  • Hoda, R., Salleh, N., Grundy, J.: The rise and evolution of agile software development. IEEE Softw. 35(5), 58–63 (2018)

    Article  Google Scholar 

  • Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A., et al.: spacy: Industrial-strength natural language processing in python (2020). https://spacy.io/

  • Hotomski, S., Glinz,M.: A qualitative study on using guidegen to keep requirements and acceptance tests aligned. In: 2018 IEEE 26th International Requirements Engineering Conference (RE), pp. 29–39. IEEE (2018)

  • Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al.: Chatgpt for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023)

    Article  Google Scholar 

  • Kelly, A.: The Art of Agile Product Ownership: A Guide for Product Managers, Business Analysts, and Entrepreneurs, 93–123 (2019)

  • Khanh, N.T., Daengdej, J., Arifin, H.H.: Human stories: a new written technique in agile software requirements. In: Proceedings of the 6th International Conference on Software and Computer Applications, pp. 15–22 (2017)

  • Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)

    Google Scholar 

  • Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

  • Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. (2019) arXiv preprint arXiv:1910.13461

  • Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin,D., Ghanem, B.: Camel: communicative agents for" mind" exploration of large scale language model society. (2023) arXiv preprint arXiv:2303.17760

  • Li, Y., Keung, J., Ma, X., Chong, C.Y., Zhang, J., Liao, Y.: Llm-based class diagram derivation from user stories with chain-of-thought prompting. In: 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE (2024)

  • Liu, F., Liu, Y., Shi, L., Huang, H., Wang, R., Yang, Z., Zhang, L.: Exploring and evaluating hallucinations in llm-powered code generation. (2024a) arXiv preprint arXiv:2404.00971

  • Lombriser, P., Dalpiaz, F., Lucassen, G., Brinkkemper, S.: Gamified requirements engineering: model and experimentation. In: Requirements Engineering: Foundation for Software Quality: 22nd International Working Conference, REFSQ 2016, Gothenburg, Sweden, March 14-17, 2016, Proceedings 22, pp. 171–187. Springer (2016)

  • Lucassen, G., Dalpiaz, F., van der Werf, J.M.E., Brinkkemper, S.: Improving agile requirements: the quality user story framework and tool. Requir. Eng. 21, 383–403 (2016)

    Article  Google Scholar 

  • Lucassen, G., Dalpiaz, F., Werf, J.M.E.v.d., Brinkkemper, S.: The use and effectiveness of user stories in practice. In: Requirements Engineering: Foundation for Software Quality: 22nd International Working Conference, REFSQ 2016, Gothenburg, Sweden, March 14-17, 2016, Proceedings 22, pp. 205–222. Springer (2016)

  • Lucassen, G., Robeer, M., Dalpiaz, F., Van Der Werf, J.M.E., Brinkkemper, S.: Extracting conceptual models from user stories with visual narrator. Requir. Eng. 22, 339–358 (2017)

    Article  Google Scholar 

  • Ma, X., Keung, J.W., Yu, X., Zou, H., Zhang, J., Li, Y.: Attsum: a deep attention-based summarization model for bug report title generation. IEEE Trans. Reliab. 72, 1663–1677 (2023)

    Article  Google Scholar 

  • Manifesto, A.: Agile manifesto. Haettu 14, 2012 (2001)

    Google Scholar 

  • Meredith, P., Summons, P., Park, M., Cheek, B.: What do employers expect from business analysts and is it captured by the “business analysis body of knowledge” (babok)? ACIS 2019 Proceedings. 93 (2019)

  • Nema, P., Anthonysamy, P., Taft, N., Peddinti, S.T.: Analyzing user perspectives on mobile app privacy at scale. In: Proceedings of the 44th International Conference on Software Engineering, pp. 112–124 (2022)

  • Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: lessons for training llms on programming and natural languages. (2023) arXiv preprint arXiv:2305.02309

  • Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: Codegen: an open large language model for code with multi-turn program synthesis. (2022) arXiv preprint arXiv:2203.13474

  • Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)

    Article  Google Scholar 

  • Pandit, P., Tahiliani, S.: Agileuat: a framework for user acceptance testing based on user stories and acceptance criteria. Int. J. Comput. Appl. 120(10), 16–21 (2015)

    Google Scholar 

  • Pantiuchina, J., Mondini, M., Khanna, D., Wang, X., Abrahamsson, P.: Are software startups applying agile practices? the state of the practice from a large survey. In: International Conference on Agile Software Development, pp. 167–183. Springer, Cham (2017)

    Google Scholar 

  • Peng, Z., Rathod, P., Niu, N., Bhowmik, T., Liu, H., Shi, L., Jin, Z.: Environment-driven abstraction identification for requirements-based testing. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp. 245–256. IEEE (2021)

  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  • Sedano, T., Ralph, P., Péraire, C.: The product backlog. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 200–211. IEEE (2019)

  • Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: Solving AI tasks with chatgpt and its friends in huggingface. (2023) arXiv preprint arXiv:2303.17580

  • Spoletini, P., Ferrari, A.: The return of formal requirements engineering in the era of large language models. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 344–353. Springer (2024)

  • Subramanian, S., Mudumba, S.R., Sordoni, A., Trischler, A., Courville, A.C., Pal, C.: Towards text generation with adversarially learned neural outlines. Adv. Neural Inf. Process. Syst. 31 (2018). https://proceedings.neurips.cc/paper/2018/hash/aaaccd2766ec67aecbe26459bb828d81-Abstract.html

  • Sullivan, G.M., Feinn, R.: Using effect size-or why the p value is not enough. J. Grad. Med. Educ. 4(3), 279–282 (2012)

    Article  Google Scholar 

  • Sverrisdottir, H.S., Ingason, H.T., Jonasson, H.I.: The role of the product owner in scrum-comparison between theory and practices. Procedia Soc. Behav. Sci. 119, 257–267 (2014)

    Article  Google Scholar 

  • Thakur, J.S., Gupta, A.: Anmodeler: a tool for generating domain models from textual specifications. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 828–833 (2016)

  • Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. (2023) arXiv preprint arXiv:2302.13971

  • Wang, T., Roberts, A., Hesslow, D., Le Scao, T., Chung, H.W., Beltagy, I., Launay, J., Raffel, C.: What language model architecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, pp. 22964–22984. PMLR (2022)

  • Wang, X., Zhao, L., Wang, Y., Sun, J.: The role of requirements engineering practices in agile development: an empirical study. In: Requirements Engineering: First Asia Pacific Requirements Engineering Symposium, APRES 2014, Auckland, New Zealand, April 28-29, 2014. Proceedings, pp. 195–209. Springer (2014)

  • Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. (2021) arXiv preprint arXiv:2109.00859

  • Wautelet, Y., Heng, S., Kolp, M., Mirbel, I., Poelmans, S.: Building a rationale diagram for evaluating user story sets. In: 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pp. 1–12. IEEE (2016)

  • Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)

    Google Scholar 

  • White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. (2023) arXiv preprint arXiv:2302.11382

  • Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Berlin (2012)

    Book  Google Scholar 

  • Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan., N.: Visual chatgpt: talking, drawing and editing with visual foundation models. (2023) arXiv preprint arXiv:2303.04671

  • Xia, C.S., Wei, Y., Zhang, L.: Automated program repair in the era of large pre-trained language models. In: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery (2023)

  • Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)

  • Xue, P., L. Wu, Z. Yu, Z. Jin, Z. Yang, X. Li, Z. Yang, and Y. Tan.: Automated commit message generation with large language models: An empirical study and beyond. (2024) arXiv preprint arXiv:2404.14824.

  • Yang, Z., F. Liu, Z. Yu, J.W. Keung, J. Li, S. Liu, Y. Hong, X. Ma, Z. Jin, and G. Li.: Exploring and unleashing the power of large language models in automated code translation. (2024) arXiv preprint arXiv:2404.14646.

  • Zhang, J., Chen, Y., Niu, N., Liu, C.: A preliminary evaluation of chatgpt in requirements information retrieval. (2023) arXiv preprint arXiv:2304.12562

  • Zhang, Y., Jin, Z., Xing, Y., Li, G.: Steam: simulating the interactive behavior of programmers for automatic bug fixing. (2023) arXiv preprint arXiv:2308.14460

  • Zhao, L., Alhoshan, W., Ferrari, A., Letsholo, K.J., Ajagbe, M.A., Chioasca, E.V., Batista-Navarro, R.T.: Natural language processing for requirements engineering: a systematic mapping study. ACM Comput. Surv. (CSUR) 54(3), 1–41 (2021)

    Article  Google Scholar 

  • Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. (2023) arXiv preprint arXiv:2303.18223

Download references

Acknowledgements

This work is supported in part by the General Research Fund of the Research Grants Council of Hong Kong and the research funds of the City University of Hong Kong (6000796, 9229109, 9229098, 9220103, 9229029).

Author information

Authors and Affiliations

Authors

Contributions

Yishu Li, Jacky Keung, and Zhen Yang designed and conducted the main experiments. Yishu Li, Zhen Yang, and Jacky Keung wrote the main manuscript text. Yishu Li and Zhen Yang designed the questionnaires. Yishu Li and Xiaozue Ma manual match the LLM solutions against with the gold standard. Xiaoxue Ma and Jingyu Zhang conducted comparative experiments. Xiaoxue Ma, Jingyu Zhang, and Shuo Liu consolidated the human evaluation results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhen Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Instruction of questionnaire

1.1 Background introduction

Questionnaire for the effectiveness of different models in the task of generating acceptance criteria to elaborate the given user stories. The User Story associated with Acceptance Criteria is marked as "US-AC".

Please score each US-AC by the five attributes of INVEST evaluation criteria. The score of each attribute is 0 to 4.

  • 0 = US-AC fulfilled the X attribute poorly.

  • 1 = US-AC fulfilled the X attribute unsatisfactorily.

  • 2 = US-AC fulfilled the X attribute to some extent.

  • 3 = US-AC fulfilled the X attribute satisfactorily.

  • 4 = US-AC fulfilled the X attribute very satisfactorily.

INVEST evaluation attributes:

  • Attribute 1 (Independent): The US-AC should be as independent as possible. It can be implemented and delivered without being dependent on other stories.

  • Attribute 2 (Negotiable): The US-AC is not an explicit contract and should leave space for discussion. It can be rewritten depending on business, market, technical, or any other type of requirement by team members.

  • Attribute 3 (Valuable): The US-AC must deliver value to the stakeholders by clearly specifying solutions.

  • Attribute 4 (Estimable): The US-AC should be estimable. It is well enough understood by team members to be able to determine its relative size (in development and cost).

  • Attribute 5 (Small): The US-AC should not be so big as to become impossible to plan/task/prioritize within a level of accuracy.

  • Attribute 6 (Testable): The US-AC must provide the necessary information to make test development possible.

The generations for different models are randomly ordered.

And US-ACs are randomly picked from 3 projects (e.g., g11, g24, g28). Brief description of three projects:

  • g11 - New version of the NSF website: redesign and content discovery.

  • g24 - Institutional data repository for the University of Bath.

  • g28 - Citizen science platform that allows anyone to help in research tasks.

1.2 Actions

Please fill in the Tab Scores (Columns D to H) in ScoreSheets.xlsx.

  • Column "Group No." - The number of projects.

  • Column "US No." - The number of given user stories.

  • Column "Model #" - The model number of the generated acceptance criteria.

  • Column "Independent" - The score of Independent. (0 to 4)

  • Column "Negotiable" - The score of Negotiable. (0 to 4)

  • Column "Valuable" - The score of Valuable. (0 to 4)

  • Column "Estimable" - The score of Estimable. (0 to 4)

  • Column "Small" - The score of Small. (0 to 4)

  • Column "Testable" - The score of Testable. (0 to 4)

Please fill in the Tab Information (Column B) in ScoreSheets.xlsx. Column A - Description of your information. Column B - Details of your information.

Sample of US-AC for evaluation

1.1 User story

As an NSF employee, I want to have editorial-planning recommendations so that I can understand how they might impact an eventual CMS / future web presence.

1.2 Acceptance criteria: (generated from Model #1)

  1. (1)

    The editorial-planning tool should be available to employees with access to the current CMS or future web presence.

  2. (2)

    The tool should provide recommendations for content that is likely to attract the most traffic or engagement.

  3. (3)

    The recommendations should be based on data insights and analysis.

  4. (4)

    The tool should be user-friendly and intuitive, enabling employees to easily input editorial content and receive recommendations.

  5. (5)

    Recommendations should include keywords or search terms that are likely to generate traffic.

1.3 Acceptance criteria: (generated from Model #2)

  1. (1)

    The editorial planning recommendations should be based on data-driven analysis, not just assumptions or opinions.

  2. (2)

    The CMS should provide a dashboard or interface for displaying the editorial planning recommendations for easy access for NSF employees.

  3. (3)

    The editorial planning recommendations should be displayed in an easytounderstand format that presents the relevant information and makes the recommendations actionable.

  4. (4)

    The recommendations should address any potential SEO considerations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Keung, J., Yang, Z. et al. SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration. Autom Softw Eng 31, 55 (2024). https://doi.org/10.1007/s10515-024-00448-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-024-00448-7

Keywords