Abstract
Risk management is one of the ten knowledge areas discussed in the Project Management Body of Knowledge (PMBOK), which serves as a guide that should be followed to increase the chances of project success. The popularity of research regarding the application of risk management in software projects has been consistently growing in recent years, especially with the application of machine learning techniques to help identify risk levels of risk factors of a project before its development begins, with the goal of improving the likelihood of success of these projects. This paper presents the results of the application of machine learning techniques for risk assessment in software projects. A Python application was developed and, using Scikit-learn, two machine learning models, trained using software project risk data shared by a partner company of this project, were created to predict risk impact and likelihood levels on a scale of 1 to 3.
Different algorithms were tested to compare the results obtained by high performance but non-interpretable algorithms (e.g., Support Vector Machine) and the ones obtained by interpretable algorithms (e.g., Random Forest), whose performance tends to be lower than their non-interpretable counterparts. The results showed that Support Vector Machine and Naive Bayes were the best performing algorithms. Support Vector Machine had an accuracy of 69% in predicting impact levels, and Naive Bayes had an accuracy of 63% in predicting likelihood levels, but the results presented in other evaluation metrics (e.g., AUC, Precision) show the potential of the approach presented in this use case.
This article is a result of the project PROMESSA - NORTE-01-0247-FEDER-039887, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Altman, N.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
Boehm, B.: Software risk management: principles and practices. IEEE Softw. 8, 32–41 (1991)
Boehm, B.: Software project risk and opportunity management. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing World, pp. 107–121. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55035-5_5
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Chawan, P., Patil, J., Naik, R.: Software risk management. Int. J. Comput. Technol. 6, 60–66 (2013)
Felderer, M., Auer, F., Bergsmann, J.: Risk management during software development: results of a survey in software houses from Germany, Austria and Switzerland. In: Großmann, J., Felderer, M., Seehusen, F. (eds.) RISK 2016. LNCS, vol. 10224, pp. 143–155. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57858-3_11
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)
Group, T.S.: Chaos report 2015 (2015). https://standishgroup.com/sample_research_files/CHAOSReport2015-Final.pdf
Hsieh, M.Y., Hsu, Y.C., Lin, C.T.: Risk assessment in new software development projects at the front end: a fuzzy logic approach. J. Ambient Intell. Humanized Comput. 9 (2016). https://doi.org/10.1007/s12652-016-0372-5
Kaur, P., Gosain, A.: Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise. In: Saini, A.K., Nayak, A.K., Vyas, R.K. (eds.) ICT Based Innovations. AISC, vol. 653, pp. 23–30. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6602-3_3
Mizuno, O., Hamasaki, T., Takagi, Y., Kikuno, T.: An empirical evaluation of predicting runaway software projects using Bayesian classification. In: Bomarius, F., Iida, H. (eds.) Product Focused Software Process Improvement, pp. 263–273. Springer, Heidelberg (2004)
Molnar, C.: Interpretable machine learning (2019). https://christophm.github.io/interpretable-ml-book/
Nguyen, H.M., Cooper, E., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3, 4–21 (2011)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
PMI: A Guide to the Project Management Body of Knowledge (PMBOK Guide), 4th Edn. Project Management Institute (2008)
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of Naive Bayes text classifiers. In: ICML (2003)
Wallace, L., Keil, M., Rai, A.: Understanding software project risk: a cluster analysis. Inf. Manage. 42(1), 115–125 (2004). https://doi.org/10.1016/j.im.2003.12.007, http://www.sciencedirect.com/science/article/pii/S0378720604000102
Westfall, L.: Defining software risk management (2001). http://www.westfallteam.com/sites/default/files/papers/risk_management_paper.pdf
Williams, R.C., Pandelios, G.J., Behrens, S.: Software risk evaluation (SRE) method description (version 2.0) (2000)
Wu, J., Chen, X.Y., Zhang, H., Xiong, L.D., Lei, H., Deng, S.H.: Hyperparameter optimization for machine learning models based on bayesian optimization. J. Electron. Sci. Technol. 17(1), 26 – 40 (2019). https://doi.org/10.11989/JEST.1674-862X.80904120, http://www.sciencedirect.com/science/article/pii/S1674862X19300047
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sousa, A., Faria, J.P., Mendes-Moreira, J., Gomes, D., Henriques, P.C., Graça, R. (2021). Applying Machine Learning to Risk Assessment in Software Projects. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-93733-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93732-4
Online ISBN: 978-3-030-93733-1
eBook Packages: Computer ScienceComputer Science (R0)