Skip to main content
Log in

User story clustering in agile development: a framework and an empirical study

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Agile development aims at rapidly developing software while embracing the continuous evolution of user requirements along the whole development process. User stories are the primary means of requirements collection and elicitation in the agile development. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Nevertheless, the current user story clustering is mainly conducted in a manual manner, which is time-consuming and subjective to human bias. In this paper, we propose a novel approach for clustering the user stories automatically on the basis of natural language processing. Specifically, the sentence patterns of each component in a user story are first analysed and determined such that the critical structure in the representative tasks can be automatically extracted based on the user story meta-model. The similarity of user stories is calculated, which can be used to generate the connected graph as the basis of automatic user story clustering. We evaluate the approach based on thirteen datasets, compared against ten baseline techniques. Experimental results show that our clustering approach has higher accuracy, recall rate and F1-score than these baselines. It is demonstrated that the proposed approach can significantly improve the efficacy of user story clustering and thus enhance the overall performance of agile development. The study also highlights promising research directions for more accurate requirements elicitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Sillitti A, Succi G. Requirements engineering for agile methods. In: Aurum A, Wohlin C, eds. Engineering and Managing Software Requirements. Berlin, Heidelberg: Springer, 2005, 309–326

    Chapter  Google Scholar 

  2. Leffingwell D. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Upper Saddle River: Addison-Wesley Professional, 2011

    Google Scholar 

  3. Wang X, Zhao L, Wang Y, Sun J. The role of requirements engineering practices in agile development: an empirical study. In: Zowghi D, Jin Z, eds. Requirements Engineering. Berlin, Heidelberg: Springer, 2014, 195–209

    Chapter  Google Scholar 

  4. Kassab M. The changing landscape of requirements engineering practices over the past decade. In: Proceedings of the 5th IEEE International Workshop on Empirical Requirements Engineering (EmpiRE). 2015, 1–8

  5. Dimitrijević S, Jovanović J, Devedžić V. A comparative study of software tools for user story management. Information and Software Technology, 2015, 57: 352–368

    Article  Google Scholar 

  6. Patton J, Economy P. User Story Mapping: Discover the Whole Story, Build the Right Product. Sebastopol: O’Reilly Media, Inc., 2014

    Google Scholar 

  7. Wang C H, Jin Z, Zhao H Y, Liu L, Zhang W, Cui M Y. Humanassisted elicitation and evolution of user stories with scenarios. Journal of Software, 2019, 30(10): 3186–3205

    Google Scholar 

  8. Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Visualizing user story requirements at multiple granularity levels via semantic relatedness. In: Proceedings of the 35th International Conference on Conceptual Modeling. 2016, 463–478

  9. Wautelet Y, Heng S, Kolp M, Mirbel I, Poelmans S. Building a rationale diagram for evaluating user story sets. In: Proceedings of the 10th IEEE International Conference on Research Challenges in Information Science (RCIS). 2016, 1–12

  10. Tsilionis K, Maene J, Heng S, Wautelet Y, Poelmans S. Conceptual modeling versus user story mapping: which is the best approach to agile requirements engineering? In: Proceedings of the 15th International Conference on Research Challenges in Information Science. 2021, 356–373

  11. Berends J, Dalpiaz F. Refining user stories via example mapping: an empirical investigation. Proceedings of the 29th IEEE International Requirements Engineering Conference (RE), 2021: 345–355

  12. Wautelet Y, Heng S, Kolp M, Mirbel I. Unifying and extending user story models. In: Proceedings of the 26th International Conference on Advanced Information Systems Engineering. 2014, 211–225

  13. Grau G, Franch X, Mayol E, Ayala C, Cares C, Haya M, Navarrete F, Botella P, Quer C. RiSD: a methodology for building i* strategic dependency models. In: Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering. 2005, 259–266

  14. Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Forging high-quality user stories: towards a discipline for agile requirements. In: Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE). 2015, 126–135

  15. Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24: 881

    Article  MATH  Google Scholar 

  16. Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996, 226–231

  17. Belkin M, Niyogi P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001, 585–591

  18. Joachims T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 143–151

  19. Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 1188–1196

  20. Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H. Short text clustering via convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015, 62–69

  21. Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Proceedings of the 27th European Conference on Information Retrieval. 2005, 345–359

  22. Larsen B, Aone C. Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 16–22

  23. Hedges L V, Olkin I. Statistical Methods for Meta-Analysis. New York: Academic Press, 1985

    MATH  Google Scholar 

  24. Sawilowsky S S. New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 2009, 8(2): 597–599

    Article  Google Scholar 

  25. Rodeghero P, Jiang S, Armaly A, McMillan C. Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). 2017, 49–59

  26. Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Improving agile requirements: the Quality User Story framework and tool. Requirements Engineering, 2016, 21(3): 383–403

    Article  Google Scholar 

  27. Robeer M, Lucassen G, van der Werf J M E M, Dalpiaz F, Brinkkemper S. Automated extraction of conceptual models from user stories via NLP. In: Proceedings of the 24th IEEE International Requirements Engineering Conference (RE). 2016, 196–205

  28. Dalpiaz F, van der Schalk I, Lucassen G. Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 119–135

  29. Wautelet Y, Heng S, Hintea D, Kolp M, Poelmans S. Bridging user story sets with the use case model. In: Proceedings of 2016 International Conference on Conceptual Modeling. 2016, 127–138

  30. Mesquita R, Jaqueira A, Agra C, Lucena M, Alencar F. US2StarTool: generating i* models from user stories. In: Proceedings of the 8th International i* Workshop (istar 2015). 2015, 103–108

  31. Jaqueira A, Lucena M, Alencar F M R, Castro J, Aranha E. Using i* models to enrich user stories. In: Proceedings of the 6th International i* Workshop 2013. 2013, 55–60

  32. Trkman M, Mendling J, Krisper M. Using business process models to better understand the dependencies among user stories. Information and Software Technology, 2016, 71: 58–76

    Article  Google Scholar 

  33. Wautelet Y, Velghe M, Heng S, Poelmans S, Kolp M. On modelers ability to build a visual diagram from a user story set: a goal-oriented approach. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 209–226

  34. Barbosa R, Silva A E A, Moraes R. Use of similarity measure to suggest the existence of duplicate user stories in the srum process. In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W). 2016, 2–5

  35. Li C, Duan Y, Wang H, Zhang Z, Sun A, Ma Z. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems, 2017, 36(2): 11

    Google Scholar 

  36. Quan X, Kit C, Ge Y, Pan S J. Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 2270–2276

  37. Seifzadeh S, Farahat A K, Kamel M S, Karray F. Short-text clustering using statistical semantics. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 805–810

  38. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017, 427–431

  39. De Boom C, Van Canneyt S, Bohez S, Demeester T, Dhoedt B. Learning semantic similarity for very short texts. In: Proceedings of 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, 1229–1234

  40. Zeng J, Li J, Song Y, Gao C, Lyu M R, King I. Topic memory networks for short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3120–3131

  41. Kenter T, de Rijke M. Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015, 1411–1420

  42. Hua W, Wang Z, Wang H, Zheng K, Zhou X. Short text understanding through lexical-semantic analysis. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 495–506

  43. Liang S, Yilmaz E, Kanoulas E. Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 995–1004

  44. Zuo Y, Wu J, Zhang H, Lin H, Wang F, Xu K, Xiong H. Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 2105–2114

  45. Yin J, Chao D, Liu Z, Zhang W, Yu X, Wang J. Model-based clustering of short text streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 2634–2642

  46. Banerjee S, Ramanathan K, Gupta A. Clustering texts using wikipedia. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 2007

  47. Fodeh S, Punch B, Tan P N. On ontology-driven document clustering using core semantic features. Knowledge and Information Systems, 2011, 28(2): 395–421

    Article  Google Scholar 

  48. Yin J, Wang J. A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 233–242

  49. Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2267–2273

  50. Ravi S, Kozareva Z. Self-governing neural networks for on-device short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 804–810

  51. Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 478–487

  52. Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 649–657

  53. Wang M, Lu Z, Li H, Liu Q. Syntax-based deep matching of short texts. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 1354–1361

  54. Wang P, Xu J, Xu B, Liu C L, Zhang H, Wang F, Hao H. Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 352–357

Download references

Acknowledgements

We thank anonymous reviewers for their thoughtful comments. This work was sponsored by the National Natural Science Foundation of China (Grant Nos. 62192731, 62192730, 62162051), the Australian Research Council Discovery Project (DP210102447), and the Fundamental Research Funds for the Central Universities (BLX202003).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bo Yang or Zhi Jin.

Additional information

Bo Yang received the PhD degree in computer software and theory from the Beihang University, China. He is an associate professor at the School of Information Science and Technology, Beijing Forestry University, China. His research interests include deep learning, software testing, software fault localization, and software requirements analysis. He is a member of CCF.

Xiuyin Ma received the BEng degree from the North China University of Technology, China. Her research interests include software requirements analysis and software testing.

Chunhui Wang received her PhD degree in computer science from School of Electronics Engineering and Computer Science, Peking University, China in 2020. Currently, she is an associate professor at the School of Computer Science, Inner Mongolia Normal University, China. Her research interests include requirements engineering and collective intelligence based software engineering. She is a member of CCF.

Haoran Guo received the BEng degree from the North China University of Technology, China. His research interests include software fault localization and software testing.

Huai Liu received the PhD degree in software engineering from the Swinburne University of Technology, Australia. He is a senior lecturer in the Department of Computing Technologies, Swinburne University of Technology, Australia. He has worked as a lecturer at Victoria University and a research fellow at RMIT University Australia. His current research interests include software testing, cloud computing, and end-user software engineering.

Zhi Jin obtained her BSc from Zhejiang University, China in 1984, and PhD from National University of Defense Technology, China in 1992, respectively. She is a professor in School of Computer Science, Peking University (PKU), China and serves as the Deputy Director of High-Confidence Software Technologies (PKU), Ministry of Education, China since 2009. Her research interests include requirements engineering, knowledge engineering, and knowledge-based software engineering.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, B., Ma, X., Wang, C. et al. User story clustering in agile development: a framework and an empirical study. Front. Comput. Sci. 17, 176213 (2023). https://doi.org/10.1007/s11704-022-8262-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-8262-9

Keywords

Navigation