An impact-driven approach to predict user stories instability

Levy, Yarden; Stern, Roni; Sturm, Arnon; Mordoch, Argaman; Bitan, Yuval

doi:10.1007/s00766-022-00372-w

An impact-driven approach to predict user stories instability

Original Article
Published: 18 March 2022

Volume 27, pages 231–248, (2022)
Cite this article

Requirements Engineering Aims and scope Submit manuscript

Yarden Levy¹,
Roni Stern^1,2,
Arnon Sturm ORCID: orcid.org/0000-0002-4021-7752¹,
Argaman Mordoch¹ &
…
Yuval Bitan¹

731 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

A common way to describe requirements in Agile software development is through user stories, which are short descriptions of desired functionality. Nevertheless, there are no widely accepted quantitative metrics to evaluate user stories. We propose a novel metric to evaluate user stories called instability, which measures the number of changes made to a user story after it was assigned to a developer to be implemented in the near future. A user story with a high instability score suggests that it was not detailed and coherent enough to be implemented. The instability of a user story can be automatically extracted from industry-standard issue tracking systems such as Jira by performing retrospective analysis over user stories that were fully implemented. We propose a method for creating prediction models that can identify user stories that will have high instability even before they have been assigned to a developer. Our method works by applying a machine learning algorithm on implemented user stories, considering only features that are available before a user story is assigned to a developer. We evaluate our prediction models on several open-source projects and one commercial project and show that they outperform baseline prediction models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

An Empirical Study of User Story Quality and Its Impact on Open Source Project Performance

Automated and Robust User Story Coverage

“I Don’t Understand!”: Toward a Model to Evaluate the Role of User Story Quality

Notes

http://www.atlassian.com/software/jira/.
That is, the data for each project were split to train and test. The train part was used to create a prediction model that was then evaluated on the test part of the data for that project.
https://issues.alfresco.com/jira.
https://issues.jboss.org.
https://jira.lsstcorp.org.
https://jira.spring.io.
USIs labeled as “defects” or “bug reports” are not included.
The textual description of a USI was collected from the “summary”, “description”, and “acceptance criteria” fields in JIRA.
Performing a k-fold cross-validation would require for some folds to use data for evaluation that was collected before the data for training was used. This is problematic, especially since the features in the “Personalized Metrics” family rely on analyzing the instability of USIs done in the past.
Note that we have also repeated our experiments using a k-fold cross-validation, and the results obtained were similar to those reported here.
See the definition of AUC ROC in Sect. 5.2.
The code is written in Python and includes a detailed README file with documentation with step-by-step instructions. This code can be used to reproduce our experiments, train and evaluate instability prediction models on other datasets, and explore other features and algorithms over our dataset. Our dataset is given as an exported SQL server dump file.

References

Abdelali Z, Mustapha H, Abdelwahed N (2019) Investigating the use of random forest in software effort estimation. Procedia Comput Sci 148(2019):343–352
Article Google Scholar
Abrahamsson P, Fronza I, Moser R, Vlasenko J, Pedrycz W (2011) Predicting development effort from user stories. In: International symposium on empirical software engineering and measurement, pp 400–403
Abrahamsson P, Oza N, Siponen MT (2010) Agile software development methods: a comparative review. In: Dingsøyr T, Dybå T, Moe N (eds) Agile software development. Springer, Berlin
Google Scholar
Abrahamsson P, Salo O, Ronkainen J, Warsta J (2017) Agile software development methods: review and analysis. Preprint arXiv:1709.08439
Beck K, Beedle M, Van Bennekum A, Cockburn A, Cunningham W, Fowler M, Grenning J, Highsmith J, Hunt A, Jeffries R et al. (2010) Manifesto for agile software development
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Buglione L, Abran A (2013) Improving the user story agile technique using the invest criteria. In: International workshop on software measurement and international conference on software process and product measurement. IEEE, pp 49–53
Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: International conference on Machine learning. ACM, pp 96–103
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: ACM sigkdd international conference on knowledge discovery and data mining, pp 785–794
Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329
Article Google Scholar
Choetkiertikul M, Dam HK, Tran T, Ghose A (2017) Predicting the delay of issues with due dates in software projects. Empir Softw Eng 22:1223–1263
Article Google Scholar
Choetkiertikul M, Dam HK, Tran T, Pham TTM, Ghose A, Menzies T (2018) A deep learning model for estimating story points. IEEE Trans Softw Eng 45(7):637–656
Article Google Scholar
Coelho E, Anirban B (2012) Effort estimation in agile software development using story points. Int J Appl Inf Syst (IJAIS) 3(7):7–10
Google Scholar
Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: The international conference on machine learning (ICML), pp 233–240
Dimitrijević S, Jovanović J, Devedžić V (2015) A comparative study of software tools for user story management. Inf Softw Technol 57:352–368
Article Google Scholar
Dargan JL, Wasek JS, Campos-Nanez E (2016) Systems performance prediction using requirements quality attributes classification. Requir Eng 21:553–572
Article Google Scholar
Femmer H, Fernández DM, Wagner S, Eder S (2017) Rapid quality assurance with requirements smells. J Syst Softw 123:190–213
Article Google Scholar
Fowler M, Highsmith J (2001) The agile manifesto, Software Development
Gupta A, Shilpa S, Goyal S, Rashid M (2020) Novel XGBoost tuned machine learning model for software bug prediction. In: The international conference on intelligent engineering and management (ICIEM), pp 376–380
Haugen NC (2006) An empirical study of using planning poker for user story estimation. In Agile 06:23–34
Google Scholar
Hayes JH, Li W, Yu T, Han X, Hays M, Woodson C (2015) Measuring requirement quality to predict testability. In: IEEE international workshop on artificial intelligence for requirements engineering (AIRE), pp 1–8
Hearty P, Fenton N, Marquez D, Neil M (2008) Predicting project velocity in xp using a learning dynamic bayesian network model. IEEE Trans Softw Eng 35:124–137
Article Google Scholar
Kassab M (2015) The changing landscape of requirements engineering practices over the past decade. In: International workshop on empirical requirements engineering (EmpiRE), pp 1–8
Lai ST (2017) A user story quality measurement model for reducing agile software development risk. Int J Softw Eng Appl 8:75–86
Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Leffingwell D (2010) Agile software requirements: lean requirements practices for teams, programs, and the enterprise. Addison-Wesley Professional
López-Martín C (2021) Machine learning techniques for software testing effort prediction. Softw Qual J 2021:1–36
Google Scholar
Lucassen G, Dalpiaz F,van der Werf JMEM, Brinkkemper S (2015) Forging high-quality user stories: towards a discipline for agile requirements. In: IEEE international requirements engineering conference, pp 126–135
Lucassen G, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2016) The use and effectiveness of user stories in practice. Foundation for software quality. In: International working conference on requirements engineering, pp 205–222
Lucassen G, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2016) Improving agile requirements: the quality user story framework and tool. Requir Eng Springer 21:383–403
Article Google Scholar
Lucassen G, Dalpiaz F,van der Werf JMEM, Brinkkemper S (2017) Improving user story practice with the Grimm Method: a multiple case study in the software industry. In: International working conference on requirements engineering: foundation for software quality. Springer, pp 235–252
Mahnič V, Hovelja T (2012) On using planning poker for estimating user stories. J Syst Softw 85:2086–2095
Article Google Scholar
Paetsch F, Eberlein A, Maurer F (2003) Requirements engineering and agile software development. In: IEEE international workshops on enabling technologies: infrastructure for collaborative enterprises, pp 308–313
Palomares C, Franch X, Quer C, Chatzipetrou P, López L, Gorschek T (2021) The state-of-practice in requirements elicitation: an extended interview study at 12 companies. Requir Eng 26(2):273–299
Article Google Scholar
Porru S, Murgia A, Demeyer S, Marchesi M, Tonelli R (2016) Estimating story points from issue reports. In: International conference on predictive models and data analytics in software engineering. ACM, pp 1–10
Rees MJ (2002) A feasible user story tool for agile software development? In: Proceedings of the ninth asia-pacific software engineering conference, pp 22–30
Schwaber K, Beedle M (2002) Agile software development with Scrum. Prentice Hall, Upper Saddle River
MATH Google Scholar
Schwaber K, Sutherland J (2020) The scrum guide: the definitive guide to scrum: the rules of the game. http://www.scrum.org/scrum-guides. Accessed 31 Dec 2021
Shi L, Wang Q, Li M (2013) Learning from evolution history to predict future requirement changes. In: IEEE international requirements engineering conference (RE), pp 135–144
Wallach HM (2006) Topic modeling, beyond bag-of-words. In: The international conference on machine learning (ICML), pp 977–984
Wang X, Zhao L, Wang Y, Sun, J (2014) The role of requirements engineering practices in agile development: an empirical study. Requirements Engineering. Springer, pp 195–209
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer
Zhao L, Waad A, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Comput Surv 54(3):1–41
Article Google Scholar
Ziauddin SKTZS (2012) An effort estimation model for agile software development. Adv Comput Sci Appl (ACSA) 2:314–324
Google Scholar

Download references

Acknowledgements

This research was supported by the Ministry of Science & Technology, Israel, and by the Israeli Science Foundation Grant #210/17 to Roni Stern.

Author information

Authors and Affiliations

Ben Gurion University of the Negev, Beer-Sheva, Israel
Yarden Levy, Roni Stern, Arnon Sturm, Argaman Mordoch & Yuval Bitan
Palo Alto Research Center (PARC), Palo Alto, USA
Roni Stern

Authors

Yarden Levy
View author publications
You can also search for this author in PubMed Google Scholar
Roni Stern
View author publications
You can also search for this author in PubMed Google Scholar
Arnon Sturm
View author publications
You can also search for this author in PubMed Google Scholar
Argaman Mordoch
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Bitan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnon Sturm.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: List of features

The features used for our instability prediction models are listed below. For each feature we briefly explain the rationale of using it and, when needed, how it is computed.

1.1 1.1: Simple text processing features

Text length The number of characters in the USI text. Short text may suggest missing details, and very long text may indicate an excessively detailed USI that causes confusion.
Number of question marks in the text Question marks may indicate that the USI writer is not sure about the requirement, and therefore, the probability to change increases.
Number of headlines in the text In some projects, most USIs contain headlines that provide some structure to the USI text. In such cases, missing headlines can testify to missing information.

Has an acceptance criteria headline An “Acceptance Criteria” headline exists in many projects. Since the acceptance criteria are an important part of the USI, its absence may testify to missing information.

Number of sentences in the text Using one long sentence can be hard to understand, as well as many short sentences.
Number of words in the text Similar to the text length feature, fewer words may suggest missing details, and too many words may cause confusion.
Average number of words in a sentence in the text Similar to the number of sentences in the text feature, too short/long sentences, on average, may affect the readability of the text.
Does the text contain a URL This binary feature indicates whether the USI description contains a URL. A URL can add additional information that is missing in the URI text, and thus affect the USI instability.
Does the text contain source code This binary feature indicates whether the USI description contains source code. As with the previous feature, the source code can add additional information that can be missing in the text.
Does the text contain the Connextra user story template This binary feature indicates whether the USI description follows the well-known Connextra template (“As a (role) I want (something) so that (benefit)”. We conjecture that stories that were written following this template will be of higher quality and possibly lower instability.
Does the text contains the words “TODO/TBD/Please” This binary feature indicates if the USI description contain one of the words “TODO”, “TBD”, and “Please”. These words imply something missing or unclear in the text, which may indicate that it will be edited later (exhibiting instability).
Number of stop-words the text contains A sentence without stop words can be hard to understand. On the other hand, a sentence with too many stop words may suggest low writing quality and uncertainty.
Number of nouns/adjectives/adverbs/pronouns in the text This feature may provide some insight into the USI description structure. For example, a USI missing a verb may be hard to understand.
Is text field is empty This binary feature indicates whether the USI description was empty or not. Missing text field may suggest lack of information and thus be correlated with instability.

1.2 1.2: Advanced text-based features

USI Vector Translates the documents (USIs) text into a fixed-size vector of numbers, which are the resulting feature. See more details in the main body of the text.
Topic Model we train a topic model for each project and add a feature for each topic. See more details in the main body of the text.

1.3 1.3: Process metrics

.

Number of changes in the text before entering a sprint Many changes in the USI before the sprint can point on a problem with the text or an unclear requirement and can cause more changes during the sprint.
Number of comments before entering a sprint The number of comments this USI had before entering the sprint. If a USI has many comments, it may suggest an unclear requirement.
Number of changes in story point before entering a sprint Change in the number of story points may indicate an unclear requirement and hence lead to USI instability.
Number of story points when entering a sprint The estimated effort of the development team may provide indirect influence on the USI instability.
USI priority when entering a sprint The priority may influence stability, e.g., if the USI is urgent, it may be written in a hasty and less rigorous manner.
Time until the USI entered a sprint The rationale for this feature is that if a USI stays long in the backlog it had more time to be written in a full and stable manner. On the other hand, this may suggest that this USI is of less importance, and hence may have been written in a sloppy manner, and eventually exhibit instability.
The number of issue links The number of issue links from several USI types (as duplicate and block). A high number of dependencies may affect the USI instability.

1.4 1.4: Personalized metrics

Number of USIs The number of USIs that the author wrote before the current USI. The idea behind that feature is that we expect a person with limited experience in writing USI, to write limited quality USI. On the other hand, we expect a skilled person to write high-quality USI, with a lower probability that the USI will be changed.
Ratio of unstable USIs in the past The ratio of unstable USIs from all the USIs that the author wrote before the specific USI. We expect that the probability that a USI will be changed to increase when the ratio of unstable USI to a writer is high.

Appendix 2: Confusion matrices for instability prediction models

For completeness, Table 9 provides the TN, FP, FN, TP, and accuracy results obtained by the evaluated prediction models in each project. We show here results the 5-instability and 20-instability prediction tasks in the columns \(k=5\) and \(k=20\), respectively.

Table 9 The TN, FP, FN, TP, and accuracy results for all the projects and prediction models in our evaluation

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levy, Y., Stern, R., Sturm, A. et al. An impact-driven approach to predict user stories instability. Requirements Eng 27, 231–248 (2022). https://doi.org/10.1007/s00766-022-00372-w

Download citation

Received: 03 February 2021
Accepted: 14 January 2022
Published: 18 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00766-022-00372-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An impact-driven approach to predict user stories instability

Abstract

Access this article

Similar content being viewed by others

An Empirical Study of User Story Quality and Its Impact on Open Source Project Performance

Automated and Robust User Story Coverage

“I Don’t Understand!”: Toward a Model to Evaluate the Role of User Story Quality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: List of features

1.1 1.1: Simple text processing features

1.2 1.2: Advanced text-based features

1.3 1.3: Process metrics

1.4 1.4: Personalized metrics

Appendix 2: Confusion matrices for instability prediction models

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An impact-driven approach to predict user stories instability

Abstract

Access this article

Similar content being viewed by others

An Empirical Study of User Story Quality and Its Impact on Open Source Project Performance

Automated and Robust User Story Coverage

“I Don’t Understand!”: Toward a Model to Evaluate the Role of User Story Quality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: List of features

1.1 1.1: Simple text processing features

1.2 1.2: Advanced text-based features

1.3 1.3: Process metrics

1.4 1.4: Personalized metrics

Appendix 2: Confusion matrices for instability prediction models

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation