Abstract
Poor requirements definition can adversely impact system cost and performance for government acquisition programs. This can be mitigated by ensuring requirements statements are written in a clear and unambiguous manner with high linguistic quality. This paper introduces a statistical model that uses requirements quality factors to predict system operational performance. This work explores four classification techniques (Logistic Regression, Naïve Bayes Classifier, Support Vector Machine, and K-Nearest Neighbor) to develop the predictive model. This model is created using empirical data from current major acquisition programs within the federal government. Operational Requirements Documents and Operational Test Reports are the data sources, respectively, for the system requirements statements and the accompanying operational test results used for model development. A commercial-off-the-shelf requirements quality analysis tool is used to determine the requirements linguistic quality metrics used in the model. Subsequent to model construction, the predictive value of the model is confirmed through execution of a sensitivity analysis, cross-validation of the data, and an overfitting analysis. Lastly, Receiver Operating Characteristics are examined to determine the best performing model. In all, the results establish that requirements quality is indeed a predictive factor for end-system operational performance, and the resulting statistical model can influence requirements development based on likelihood of successful operational performance.















Similar content being viewed by others
References
Sutcliffe AG, Economou A, Markis P (1999) Tracing requirements errors to problems in the requirements engineering process. Requir Eng 4(3):134–151
Park S, Maurer F, Eberlein A, Fung T.-S (2010) Requirements attributes to predict requirements related defects. In: Proceedings of the 2010 conference of the center for advanced studies on collaborative research—CASCON’10, p 42
Fantechi A, Gnesi S, Lami G, Maccari A (2003) Applications of linguistic techniques for use case analysis. Requir Eng 8(3):161–170
Nigam A, Arya N, Nigam B, Jain D (2012) Tool for automatic discovery of ambiguity in requirements. Int J Comput Sci Issues 9(5):350–356
Yang H, Roeck A, Gervasi V, Willis A, Nuseibeh B (2011) Analysing anaphoric ambiguity in natural language requirements. Requir Eng 16(3):163–189
Génova G, Fuentes JM, Llorens J, Hurtado O, Moreno V (2011) A framework to measure and improve the quality of textual requirements. Requir Eng 18(1):25–41
Lami RW (2007) Giuseppe and Ferguson, an empirical study on the impact of automation on the requirements analysis process. J Comput Sci Technol 22(3):338–347
Lami G, Gnesi S, Fabbrini F, Fusani M, Trentanni G (2004) An Automatic tool for the analysis of natural language requirements. Information Science Technology Institute, Pisa, pp 1–21
Kiyavitskaya N, Zeni N, Mich L, Berry DM (2008) Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requir Eng 13(3):207–239
Bibi S, Tsoumakas G, Stamelos I, Vlahavas I (2006) Software defect prediction using regression via classification. In: IEEE international conference on computer systems and applications, pp 330–336
Tian L, Noore A (2005) Dynamic software reliability prediction: an approach based on support vector machines. Int J Reliab Qual Saf Eng 12(04):309–321
Malhotra R, Jain A (2012) Fault prediction using statistical and machine learning methods for improving software quality. J Inf Process Syst 8(2):241–262
Rawat MS, Dubey SK (2012) Software defect prediction models for quality improvement: a literature study. Int J Comput Sci 9(5):288–296
Turk W (2006) Writing requirements. IET Eng Manag J 16(3):20–24
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Elish K, Elish M (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660
Saed Sayad (2015) An introduction to data mining. Data mining in engineering, business, and medicine. [Online]. Available: http://www.saedsayad.com/data_mining_map.htm. Accessed 15 March 2015
Cerpa N, Bardeen M, Kitchenham B, Verner J (2010) Evaluating logistic regression models to estimate software project outcomes. Inf Softw Technol 52(9):934–944
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
Kuntz L, Meyer G, Carnahan B (2003) Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research. Hum Factors J 45(3):408–423
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Gondra I (2008) Applying machine learning to software fault-proneness prediction. J Syst Softw 81(2):186–195
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence, pp 0–6
Rodríguez JD, Pérez A, Lozano JA (2010) Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Karystinos GN, Pados DA (2000) On overfitting, generalization, and randomly expanded training sets. IEEE Trans Neural Netw 11(5):1050–1057
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Hajar Mat Jani SAM (2011) Implementing case-based reasoning technique to software requirements specifications quality analysis. Int J Adv Comput Technol 3(1):23–31
Natt Och Dag J, Regnell B, Carlshamre P, Andersson M, Karlsson J (2002) A feasibility study of automated natural language requirements analysis in market-driven development. Requir Eng 7(1):20–33
Georgiades A, Marinos, Andreou (2011) A novel software tool for supporting and automating the requirements engineering process with the use of natural language. Int J Comput Sci Technol 4333(i):591–597
Mathison S, Hurworth R (2006) Document analysis. CassBeth document analysis manual. Cherry Hill
Seibel JS, Mazzuchi TA, Sarkani S (2006) Same vendor, version-to-version upgrade decision support model for commercial off-the-shelf productivity applications. Syst Eng 9(4):296–312
Bewick V, Cheek L, Ball J (2005) Statistics review 14: logistic regression. Crit Care 9(1):112–118
Halloran PSO (2005) Lecture 10: logistical regression II—multinomial data. Columbia University, New York, pp 1–73
Ye F (2010) What you see may not be what you get—a brief introduction to overfitting the problem of overfitting
Gortmaker SL, Hosmer DW, Lemeshow S (1994) Applied logistic regression. Contemp Sociol 23(1):159
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Majnik M, Bosni Z (2013) ROC analysis of classifiers in machine learning: a survey. Intell Data Anal 17:531–558
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dargan, J.L., Wasek, J.S. & Campos-Nanez, E. Systems performance prediction using requirements quality attributes classification. Requirements Eng 21, 553–572 (2016). https://doi.org/10.1007/s00766-015-0232-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-015-0232-4