Skip to main content

Applying Machine Learning to Risk Assessment in Software Projects

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Risk management is one of the ten knowledge areas discussed in the Project Management Body of Knowledge (PMBOK), which serves as a guide that should be followed to increase the chances of project success. The popularity of research regarding the application of risk management in software projects has been consistently growing in recent years, especially with the application of machine learning techniques to help identify risk levels of risk factors of a project before its development begins, with the goal of improving the likelihood of success of these projects. This paper presents the results of the application of machine learning techniques for risk assessment in software projects. A Python application was developed and, using Scikit-learn, two machine learning models, trained using software project risk data shared by a partner company of this project, were created to predict risk impact and likelihood levels on a scale of 1 to 3.

Different algorithms were tested to compare the results obtained by high performance but non-interpretable algorithms (e.g., Support Vector Machine) and the ones obtained by interpretable algorithms (e.g., Random Forest), whose performance tends to be lower than their non-interpretable counterparts. The results showed that Support Vector Machine and Naive Bayes were the best performing algorithms. Support Vector Machine had an accuracy of 69% in predicting impact levels, and Naive Bayes had an accuracy of 63% in predicting likelihood levels, but the results presented in other evaluation metrics (e.g., AUC, Precision) show the potential of the approach presented in this use case.

This article is a result of the project PROMESSA - NORTE-01-0247-FEDER-039887, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Altman, N.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)

    MathSciNet  Google Scholar 

  2. Boehm, B.: Software risk management: principles and practices. IEEE Softw. 8, 32–41 (1991)

    Article  Google Scholar 

  3. Boehm, B.: Software project risk and opportunity management. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing World, pp. 107–121. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55035-5_5

    Chapter  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

  5. Chawan, P., Patil, J., Naik, R.: Software risk management. Int. J. Comput. Technol. 6, 60–66 (2013)

    Google Scholar 

  6. Felderer, M., Auer, F., Bergsmann, J.: Risk management during software development: results of a survey in software houses from Germany, Austria and Switzerland. In: Großmann, J., Felderer, M., Seehusen, F. (eds.) RISK 2016. LNCS, vol. 10224, pp. 143–155. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57858-3_11

    Chapter  Google Scholar 

  7. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  8. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)

    Google Scholar 

  9. Group, T.S.: Chaos report 2015 (2015). https://standishgroup.com/sample_research_files/CHAOSReport2015-Final.pdf

  10. Hsieh, M.Y., Hsu, Y.C., Lin, C.T.: Risk assessment in new software development projects at the front end: a fuzzy logic approach. J. Ambient Intell. Humanized Comput. 9 (2016). https://doi.org/10.1007/s12652-016-0372-5

  11. Kaur, P., Gosain, A.: Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise. In: Saini, A.K., Nayak, A.K., Vyas, R.K. (eds.) ICT Based Innovations. AISC, vol. 653, pp. 23–30. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6602-3_3

    Chapter  Google Scholar 

  12. Mizuno, O., Hamasaki, T., Takagi, Y., Kikuno, T.: An empirical evaluation of predicting runaway software projects using Bayesian classification. In: Bomarius, F., Iida, H. (eds.) Product Focused Software Process Improvement, pp. 263–273. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Molnar, C.: Interpretable machine learning (2019). https://christophm.github.io/interpretable-ml-book/

  14. Nguyen, H.M., Cooper, E., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3, 4–21 (2011)

    Google Scholar 

  15. Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)

    Google Scholar 

  16. PMI: A Guide to the Project Management Body of Knowledge (PMBOK Guide), 4th Edn. Project Management Institute (2008)

    Google Scholar 

  17. Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of Naive Bayes text classifiers. In: ICML (2003)

    Google Scholar 

  18. Wallace, L., Keil, M., Rai, A.: Understanding software project risk: a cluster analysis. Inf. Manage. 42(1), 115–125 (2004). https://doi.org/10.1016/j.im.2003.12.007, http://www.sciencedirect.com/science/article/pii/S0378720604000102

  19. Westfall, L.: Defining software risk management (2001). http://www.westfallteam.com/sites/default/files/papers/risk_management_paper.pdf

  20. Williams, R.C., Pandelios, G.J., Behrens, S.: Software risk evaluation (SRE) method description (version 2.0) (2000)

    Google Scholar 

  21. Wu, J., Chen, X.Y., Zhang, H., Xiong, L.D., Lei, H., Deng, S.H.: Hyperparameter optimization for machine learning models based on bayesian optimization. J. Electron. Sci. Technol. 17(1), 26 – 40 (2019). https://doi.org/10.11989/JEST.1674-862X.80904120, http://www.sciencedirect.com/science/article/pii/S1674862X19300047

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Sousa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sousa, A., Faria, J.P., Mendes-Moreira, J., Gomes, D., Henriques, P.C., Graça, R. (2021). Applying Machine Learning to Risk Assessment in Software Projects. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93733-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93732-4

  • Online ISBN: 978-3-030-93733-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics