Abstract
Context
The recent research trend has highlighted that multiple stakeholders are involved during requirement gathering in agile software development. Hence, leading to an increased number of duplicate user stories in agile product backlog during requirement gathering.
Objective
The objective of this paper is to evaluate the existing techniques employed in identifying and eliminating the duplicate user stories from agile product backlog and to overcome the existing gaps with the help of a newly proposed clustering algorithm.
Method
An agile user story is expressed as a function of input and output parameters. That said multiple user stories having similar set of input parameters are most likely to be duplicate causing a redundancy. The newly proposed algorithm is used for clustering user stories having similar set of input parameters through various iterations and then removing the identified duplicate user stories from agile product backlog. This paper also introduces the concept of mass clustering which means clustering a number of user stories in single run.
Results
Experimental results prove the proposed model is capable of handling small and large releases ranging between 100 to 1000 user stories with similar efficiency. The proposed clustering algorithm outperformed the clustering algorithms and resulted in 37% decrease in agile product backlog by eliminating duplicate user stories causing redundancy. The experimental results are obtained from the logs of the MATLAB tool. However, the provided algorithm is generic in nature and can be implemented using R, Python or SAS programming tools. The provided algorithms employs proven matrix operations.
Conclusion
The proposed clustering algorithm overcomes the limitation of existing user story management methods and clearly out performs when compared with other clustering algorithms. Finally, this paper gives recommendations about the usage of the provided clustering algorithm during agile release planning for eliminating duplicate user stories from agile product backlog.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Abrahamsson P, Warsta J, Siponen MT, Ronkainen J (2003) New directions on agile methods: a comparative analysis, in 25th International Conference on Software Engineering, 2003. Proceedings, pp. 244–254. doi: https://doi.org/10.1109/ICSE.2003.1201204
Ahmad MO, Dennehy D, Conboy K, Oivo M (2018) Kanban in software engineering: A systematic mapping study. J Syst Softw 137:96–113. https://doi.org/10.1016/j.jss.2017.11.045
Alsalemi AM, Yeoh ET (2016) A survey on product backlog change management and requirement traceability in agile (Scrum), in 2015 9th Malaysian Software Engineering Conference, MySEC 2015. doi: https://doi.org/10.1109/MySEC.2015.7475219
Barbosa R, Silva AEA, Moraes R (2016) Use of Similarity Measure to Suggest the Existence of Duplicate User Stories in the Srum Process, in Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016. doi: https://doi.org/10.1109/DSN-W.2016.27
Berger H, Beynon-Davies P (2009) The utility of rapid application development in large-scale, complex projects. Inf. Syst. J. 19(6):549–570. https://doi.org/10.1111/j.1365-2575.2009.00329.x
Blankenship J, Bussa M, Millett S, Blankenship J, Bussa M, Millett S (2011) Sprint 0: Generating the Product Backlog,” in Pro Agile .NET Development with Scrum, doi: https://doi.org/10.1007/978-1-4302-3534-7_4
Boerman MP, Lubsen Z, Tamburri DA, Visser J (2015) Measuring and monitoring agile development status, in International Workshop on Emerging Trends in Software Metrics, WETSoM. doi: https://doi.org/10.1109/WETSoM.2015.15
Bolloju N, Gupta A, Alter S, Gupta S, Jain S (2017) Improving scrum user stories and product backlog using work system snapshots, in AMCIS 2017 - America’s Conference on Information Systems: A Tradition of Innovation
Charikar M, Guha S, Tardos É, Shmoys DB (2002) A constant-factor approximation algorithm for the k-median problem. J Comput Syst Sci. 65(1):129–149. https://doi.org/10.1006/JCSS.2002.1882
Cohen-Addad V, Larsen KG, Saulpic D, Schwiegelshohn C (2022) Towards optimal lower bounds for k-median and k-means coresets, in Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1038–1051. doi: https://doi.org/10.1145/3519935.3519946
Czumaj A, Sohler C (2017) Sublinear Clustering in Encyclopedia of Machine Learning and Data Mining, Boston, MA: Springer US, pp. 1205–1209. doi: https://doi.org/10.1007/978-1-4899-7687-1_798
Duraisamy G, Atan R (2013) Requirement traceability matrix through documentation for SCRUM methodology, J Theor Appl Inf Technol
Frahling G, Sohler C (2006) A fast k-means implementation using coresets, in Proceedings of the twenty-second annual symposium on Computational geometry, pp. 135–143. doi: 10.1145/1137856.1137879
Ghosh S, Kumar S (2013) Comparative Analysis of K-Means and Fuzzy C-Means Algorithms, Int J Adv Comput Sci Appl, vol. 4, no. 4, doi: https://doi.org/10.14569/IJACSA.2013.040406
Hartigan JA, Wong MA (1979) Algorithm AS 136: A K-Means Clustering Algorithm. Appl. Stat. 28(1):100. https://doi.org/10.2307/2346830
Holmes CC, Adams NM (2002) A probabilistic nearest neighbour method for statistical pattern recognition. J. R. Stat. Soc. Ser. B Statistical Methodol. 64(2):295–306. https://doi.org/10.1111/1467-9868.00338
Kayes I, Sarker M, Chakareski J (2016) Product backlog rating: a case study on measuring test quality in scrum, Innov Syst Softw Eng, doi: https://doi.org/10.1007/s11334-016-0271-0
Kosub S (2019) A note on the triangle inequality for the Jaccard distance. Pattern Recognit. Lett. 120:36–38. https://doi.org/10.1016/j.patrec.2018.12.007
Kupiainen E, Mäntylä MV, Itkonen J (2015) Using metrics in Agile and Lean software development - A systematic literature review of industrial studies, Information and Software Technology.doi: https://doi.org/10.1016/j.infsof.2015.02.005
Li J, Song S, Zhang Y, Zhou Z (2016) Robust K-Median and K-Means Clustering Algorithms for Incomplete Data. Math. Probl. Eng. 2016:1–8. https://doi.org/10.1155/2016/4321928
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit. 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2
Masulli F,Rovetta S (2015) Clustering high-dimensional data, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). doi: https://doi.org/10.1007/978-3-662-48577-4_1
Maurer F, Martel S (2002) Extreme programming. Rapid development for Web-based applications. IEEE Internet Comput. 6(1):86–90. https://doi.org/10.1109/4236.989006
Noll J, Razzak MA, Bass JM, Beecham S (2017) A study of the scrum master’s role, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10611 LNCS, pp. 307–323. 10.1007/978-3-319-69926-4_22
Paasivaara M, Heikkilä VT, Lassenius C (2012) Experiences in scaling the Product Owner role in large-scale globally distributed Scrum, in Proceedings - 2012 IEEE 7th International Conference on Global Software Engineering, ICGSE 2012, doi: https://doi.org/10.1109/ICGSE.2012.41
Panigrahy R (2008) An Improved Algorithm Finding Nearest Neighbor Using Kd-trees, in LATIN: Theoretical Informatics, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 387–398. doi: https://doi.org/10.1007/978-3-540-78773-0_34
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Radigan D (2018) The product backlog: your ultimate to-do list | Atlassian, Atlassian Agile Coach
Rawat KS, Sood SK (2021) Emerging trends and global scope of big data analytics: a scientometric analysis. Qual. Quant. 55(4):1371–1396. https://doi.org/10.1007/s11135-020-01061-y
Samworth RJ (2012) Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5). https://doi.org/10.1214/12-AOS1049
Sedano T, Ralph P, Peraire C (2019) The Product Backlog, in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 200–211. doi: https://doi.org/10.1109/ICSE.2019.00036
Sharma S, Kumar D (2019) On the Development of Feature-Based Sprint in AGILE, in Ambient Communications and Computer Systems. Advances in Intelligent Systems and Computing, Volume 904., T. M. Hu YC., Tiwari S., Mishra K., Ed. Springer, Singapore, pp. 223–235. doi: https://doi.org/10.1007/978-981-13-5934-7_20
Sharma S, Kumar D (2019) Agile Release Planning Using Natural Language Processing Algorithm, in 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 934–938. doi: https://doi.org/10.1109/AICAI.2019.8701252.
Song G, Rochas J, El Beze LE, Huet F, Magoulès F (2016) K Nearest Neighbour Joins for Big Data on MapReduce: A Theoretical and Experimental Analysis, IEEE Trans Knowl Data Eng, doi: https://doi.org/10.1109/TKDE.2016.2562627
Tirumala SS, Ali S, Babu A (2016) A Hybrid Agile model using SCRUM and Feature Driven Development. Int. J. Comput. Appl. 156(5):1–5. https://doi.org/10.5120/ijca2016912443
Wang C, Pedrycz W, Li Z, Zhou M (2021) Residual-driven Fuzzy C-Means Clustering for Image Segmentation. IEEE/CAA J. Autom. Sin. 8(4):876–889. https://doi.org/10.1109/JAS.2020.1003420
Wong MA, Lane T (1983) A K th Nearest Neighbour Clustering Procedure. J. R. Stat. Soc. Ser. B 45(3):362–368. https://doi.org/10.1111/j.2517-6161.1983.tb01262.x
Xu R, Wunsch DC (2008) Clustering. doi: https://doi.org/10.1002/9780470382776
Xu R, WunschII D (2005) Survey of Clustering Algorithms. IEEE Trans. Neural Networks 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141
Yakowitz S (1987) Nearest-Neighbour Methods for Time Series Analysis. J Time Ser Anal 8(2):235–247. https://doi.org/10.1111/j.1467-9892.1987.tb00435.x
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, S., Kumar, D. Product backlog optimization technique in agile software development using clustering algorithm. Multimed Tools Appl 82, 46695–46715 (2023). https://doi.org/10.1007/s11042-023-15406-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15406-w