Skip to main content
Log in

An impact-driven approach to predict user stories instability

  • Original Article
  • Published:
Requirements Engineering Aims and scope Submit manuscript

Abstract

A common way to describe requirements in Agile software development is through user stories, which are short descriptions of desired functionality. Nevertheless, there are no widely accepted quantitative metrics to evaluate user stories. We propose a novel metric to evaluate user stories called instability, which measures the number of changes made to a user story after it was assigned to a developer to be implemented in the near future. A user story with a high instability score suggests that it was not detailed and coherent enough to be implemented. The instability of a user story can be automatically extracted from industry-standard issue tracking systems such as Jira by performing retrospective analysis over user stories that were fully implemented. We propose a method for creating prediction models that can identify user stories that will have high instability even before they have been assigned to a developer. Our method works by applying a machine learning algorithm on implemented user stories, considering only features that are available before a user story is assigned to a developer. We evaluate our prediction models on several open-source projects and one commercial project and show that they outperform baseline prediction models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.atlassian.com/software/jira/.

  2. That is, the data for each project were split to train and test. The train part was used to create a prediction model that was then evaluated on the test part of the data for that project.

  3. https://issues.alfresco.com/jira.

  4. https://issues.jboss.org.

  5. https://jira.lsstcorp.org.

  6. https://jira.spring.io.

  7. USIs labeled as “defects” or “bug reports” are not included.

  8. The textual description of a USI was collected from the “summary”, “description”, and “acceptance criteria” fields in JIRA.

  9. Performing a k-fold cross-validation would require for some folds to use data for evaluation that was collected before the data for training was used. This is problematic, especially since the features in the “Personalized Metrics” family rely on analyzing the instability of USIs done in the past.

  10. Note that we have also repeated our experiments using a k-fold cross-validation, and the results obtained were similar to those reported here.

  11. See the definition of AUC ROC in Sect. 5.2.

  12. The code is written in Python and includes a detailed README file with documentation with step-by-step instructions. This code can be used to reproduce our experiments, train and evaluate instability prediction models on other datasets, and explore other features and algorithms over our dataset. Our dataset is given as an exported SQL server dump file.

References

  1. Abdelali Z, Mustapha H, Abdelwahed N (2019) Investigating the use of random forest in software effort estimation. Procedia Comput Sci 148(2019):343–352

    Article  Google Scholar 

  2. Abrahamsson P, Fronza I, Moser R, Vlasenko J, Pedrycz W (2011) Predicting development effort from user stories. In: International symposium on empirical software engineering and measurement, pp 400–403

  3. Abrahamsson P, Oza N, Siponen MT (2010) Agile software development methods: a comparative review. In: Dingsøyr T, Dybå T, Moe N (eds) Agile software development. Springer, Berlin

    Google Scholar 

  4. Abrahamsson P, Salo O, Ronkainen J, Warsta J (2017) Agile software development methods: review and analysis. Preprint arXiv:1709.08439

  5. Beck K, Beedle M, Van Bennekum A, Cockburn A, Cunningham W, Fowler M, Grenning J, Highsmith J, Hunt A, Jeffries R et al. (2010) Manifesto for agile software development

  6. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press

  7. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  8. Buglione L, Abran A (2013) Improving the user story agile technique using the invest criteria. In: International workshop on software measurement and international conference on software process and product measurement. IEEE, pp 49–53

  9. Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: International conference on Machine learning. ACM, pp 96–103

  10. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: ACM sigkdd international conference on knowledge discovery and data mining, pp 785–794

  11. Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329

    Article  Google Scholar 

  12. Choetkiertikul M, Dam HK, Tran T, Ghose A (2017) Predicting the delay of issues with due dates in software projects. Empir Softw Eng 22:1223–1263

    Article  Google Scholar 

  13. Choetkiertikul M, Dam HK, Tran T, Pham TTM, Ghose A, Menzies T (2018) A deep learning model for estimating story points. IEEE Trans Softw Eng 45(7):637–656

    Article  Google Scholar 

  14. Coelho E, Anirban B (2012) Effort estimation in agile software development using story points. Int J Appl Inf Syst (IJAIS) 3(7):7–10

    Google Scholar 

  15. Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: The international conference on machine learning (ICML), pp 233–240

  16. Dimitrijević S, Jovanović J, Devedžić V (2015) A comparative study of software tools for user story management. Inf Softw Technol 57:352–368

    Article  Google Scholar 

  17. Dargan JL, Wasek JS, Campos-Nanez E (2016) Systems performance prediction using requirements quality attributes classification. Requir Eng 21:553–572

    Article  Google Scholar 

  18. Femmer H, Fernández DM, Wagner S, Eder S (2017) Rapid quality assurance with requirements smells. J Syst Softw 123:190–213

    Article  Google Scholar 

  19. Fowler M, Highsmith J (2001) The agile manifesto, Software Development

  20. Gupta A, Shilpa S, Goyal S, Rashid M (2020) Novel XGBoost tuned machine learning model for software bug prediction. In: The international conference on intelligent engineering and management (ICIEM), pp 376–380

  21. Haugen NC (2006) An empirical study of using planning poker for user story estimation. In Agile 06:23–34

    Google Scholar 

  22. Hayes JH, Li W, Yu T, Han X, Hays M, Woodson C (2015) Measuring requirement quality to predict testability. In: IEEE international workshop on artificial intelligence for requirements engineering (AIRE), pp 1–8

  23. Hearty P, Fenton N, Marquez D, Neil M (2008) Predicting project velocity in xp using a learning dynamic bayesian network model. IEEE Trans Softw Eng 35:124–137

    Article  Google Scholar 

  24. Kassab M (2015) The changing landscape of requirements engineering practices over the past decade. In: International workshop on empirical requirements engineering (EmpiRE), pp 1–8

  25. Lai ST (2017) A user story quality measurement model for reducing agile software development risk. Int J Softw Eng Appl 8:75–86

    Google Scholar 

  26. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  27. Leffingwell D (2010) Agile software requirements: lean requirements practices for teams, programs, and the enterprise. Addison-Wesley Professional

  28. López-Martín C (2021) Machine learning techniques for software testing effort prediction. Softw Qual J 2021:1–36

    Google Scholar 

  29. Lucassen G, Dalpiaz F,van der Werf JMEM, Brinkkemper S (2015) Forging high-quality user stories: towards a discipline for agile requirements. In: IEEE international requirements engineering conference, pp 126–135

  30. Lucassen G, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2016) The use and effectiveness of user stories in practice. Foundation for software quality. In: International working conference on requirements engineering, pp 205–222

  31. Lucassen G, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2016) Improving agile requirements: the quality user story framework and tool. Requir Eng Springer 21:383–403

    Article  Google Scholar 

  32. Lucassen G, Dalpiaz F,van der Werf JMEM, Brinkkemper S (2017) Improving user story practice with the Grimm Method: a multiple case study in the software industry. In: International working conference on requirements engineering: foundation for software quality. Springer, pp 235–252

  33. Mahnič V, Hovelja T (2012) On using planning poker for estimating user stories. J Syst Softw 85:2086–2095

    Article  Google Scholar 

  34. Paetsch F, Eberlein A, Maurer F (2003) Requirements engineering and agile software development. In: IEEE international workshops on enabling technologies: infrastructure for collaborative enterprises, pp 308–313

  35. Palomares C, Franch X, Quer C, Chatzipetrou P, López L, Gorschek T (2021) The state-of-practice in requirements elicitation: an extended interview study at 12 companies. Requir Eng 26(2):273–299

    Article  Google Scholar 

  36. Porru S, Murgia A, Demeyer S, Marchesi M, Tonelli R (2016) Estimating story points from issue reports. In: International conference on predictive models and data analytics in software engineering. ACM, pp 1–10

  37. Rees MJ (2002) A feasible user story tool for agile software development? In: Proceedings of the ninth asia-pacific software engineering conference, pp 22–30

  38. Schwaber K, Beedle M (2002) Agile software development with Scrum. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  39. Schwaber K, Sutherland J (2020) The scrum guide: the definitive guide to scrum: the rules of the game. http://www.scrum.org/scrum-guides. Accessed 31 Dec 2021

  40. Shi L, Wang Q, Li M (2013) Learning from evolution history to predict future requirement changes. In: IEEE international requirements engineering conference (RE), pp 135–144

  41. Wallach HM (2006) Topic modeling, beyond bag-of-words. In: The international conference on machine learning (ICML), pp 977–984

  42. Wang X, Zhao L, Wang Y, Sun, J (2014) The role of requirements engineering practices in agile development: an empirical study. Requirements Engineering. Springer, pp 195–209

  43. Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer

  44. Zhao L, Waad A, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Comput Surv 54(3):1–41

    Article  Google Scholar 

  45. Ziauddin SKTZS (2012) An effort estimation model for agile software development. Adv Comput Sci Appl (ACSA) 2:314–324

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Ministry of Science & Technology, Israel, and by the Israeli Science Foundation Grant #210/17 to Roni Stern.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnon Sturm.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: List of features

The features used for our instability prediction models are listed below. For each feature we briefly explain the rationale of using it and, when needed, how it is computed.

1.1 1.1: Simple text processing features

  • Text length The number of characters in the USI text. Short text may suggest missing details, and very long text may indicate an excessively detailed USI that causes confusion.

  • Number of question marks in the text Question marks may indicate that the USI writer is not sure about the requirement, and therefore, the probability to change increases.

  • Number of headlines in the text In some projects, most USIs contain headlines that provide some structure to the USI text. In such cases, missing headlines can testify to missing information.

Has an acceptance criteria headline An “Acceptance Criteria” headline exists in many projects. Since the acceptance criteria are an important part of the USI, its absence may testify to missing information.

  • Number of sentences in the text Using one long sentence can be hard to understand, as well as many short sentences.

  • Number of words in the text Similar to the text length feature, fewer words may suggest missing details, and too many words may cause confusion.

  • Average number of words in a sentence in the text Similar to the number of sentences in the text feature, too short/long sentences, on average, may affect the readability of the text.

  • Does the text contain a URL This binary feature indicates whether the USI description contains a URL. A URL can add additional information that is missing in the URI text, and thus affect the USI instability.

  • Does the text contain source code This binary feature indicates whether the USI description contains source code. As with the previous feature, the source code can add additional information that can be missing in the text.

  • Does the text contain the Connextra user story template This binary feature indicates whether the USI description follows the well-known Connextra template (“As a (role) I want (something) so that (benefit)”. We conjecture that stories that were written following this template will be of higher quality and possibly lower instability.

  • Does the text contains the words “TODO/TBD/Please” This binary feature indicates if the USI description contain one of the words “TODO”, “TBD”, and “Please”. These words imply something missing or unclear in the text, which may indicate that it will be edited later (exhibiting instability).

  • Number of stop-words the text contains A sentence without stop words can be hard to understand. On the other hand, a sentence with too many stop words may suggest low writing quality and uncertainty.

  • Number of nouns/adjectives/adverbs/pronouns in the text This feature may provide some insight into the USI description structure. For example, a USI missing a verb may be hard to understand.

  • Is text field is empty This binary feature indicates whether the USI description was empty or not. Missing text field may suggest lack of information and thus be correlated with instability.

1.2 1.2: Advanced text-based features

  • USI Vector Translates the documents (USIs) text into a fixed-size vector of numbers, which are the resulting feature. See more details in the main body of the text.

  • Topic Model we train a topic model for each project and add a feature for each topic. See more details in the main body of the text.

1.3 1.3: Process metrics

.

  • Number of changes in the text before entering a sprint Many changes in the USI before the sprint can point on a problem with the text or an unclear requirement and can cause more changes during the sprint.

  • Number of comments before entering a sprint The number of comments this USI had before entering the sprint. If a USI has many comments, it may suggest an unclear requirement.

  • Number of changes in story point before entering a sprint Change in the number of story points may indicate an unclear requirement and hence lead to USI instability.

  • Number of story points when entering a sprint The estimated effort of the development team may provide indirect influence on the USI instability.

  • USI priority when entering a sprint The priority may influence stability, e.g., if the USI is urgent, it may be written in a hasty and less rigorous manner.

  • Time until the USI entered a sprint The rationale for this feature is that if a USI stays long in the backlog it had more time to be written in a full and stable manner. On the other hand, this may suggest that this USI is of less importance, and hence may have been written in a sloppy manner, and eventually exhibit instability.

  • The number of issue links The number of issue links from several USI types (as duplicate and block). A high number of dependencies may affect the USI instability.

1.4 1.4: Personalized metrics

  • Number of USIs The number of USIs that the author wrote before the current USI. The idea behind that feature is that we expect a person with limited experience in writing USI, to write limited quality USI. On the other hand, we expect a skilled person to write high-quality USI, with a lower probability that the USI will be changed.

  • Ratio of unstable USIs in the past The ratio of unstable USIs from all the USIs that the author wrote before the specific USI. We expect that the probability that a USI will be changed to increase when the ratio of unstable USI to a writer is high.

Appendix 2: Confusion matrices for instability prediction models

For completeness, Table 9 provides the TN, FP, FN, TP, and accuracy results obtained by the evaluated prediction models in each project. We show here results the 5-instability and 20-instability prediction tasks in the columns \(k=5\) and \(k=20\), respectively.

Table 9 The TN, FP, FN, TP, and accuracy results for all the projects and prediction models in our evaluation

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levy, Y., Stern, R., Sturm, A. et al. An impact-driven approach to predict user stories instability. Requirements Eng 27, 231–248 (2022). https://doi.org/10.1007/s00766-022-00372-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-022-00372-w

Keywords

Navigation