Elsevier

Decision Support Systems

Volume 56, December 2013, Pages 439-449
Decision Support Systems

Software project risk analysis using Bayesian networks with causality constraints

https://doi.org/10.1016/j.dss.2012.11.001Get rights and content

Abstract

Many risks are involved in software development and risk management has become one of the key activities in software development. Bayesian networks (BNs) have been explored as a tool for various risk management practices, including the risk management of software development projects. However, much of the present research on software risk analysis focuses on finding the correlation between risk factors and project outcome. Software project failures are often a result of insufficient and ineffective risk management. To obtain proper and effective risk control, risk planning should be performed based on risk causality which can provide more risk information for decision making. In this study, we propose a model using BNs with causality constraints (BNCC) for risk analysis of software development projects. Through unrestricted automatic causality learning from 302 collected software project data, we demonstrated that the proposed model can not only discover causalities in accordance with the expert knowledge but also perform better in prediction than other algorithms, such as logistic regression, C4.5, Naïve Bayes, and general BNs. This research presents the first causal discovery framework for risk causality analysis of software projects and develops a model using BNCC for application in software project risk management.

Introduction

The software industry has become one of the fastest-growing industries. The global software market is estimated to have a value of US$330 billion in 2014, an increase of 36.1% since 2009 (US$ 242.4 billion) [43]. However, software development is yet a high-risk activity. The “CHAOS Summary 2009” from the Standish Group reported that the success rate of global (mainly U.S. and European) software projects is only 32% [55]. Much previous research has shown that the most important problem in software engineering is risk management, whereas technical issues are only secondary. For example, the Standish Group's report “EXTREME CHAOS” [54] summarized the recipe for software project success, that is, the CHAO 10, most of which are non-technical factors. Risk management is critical to project management; it is one of the 9 knowledge areas in project management as defined in the Project Management Body of Knowledge (PMBOK) [42] and is one of the 25 key process areas as defined in the Capability Maturity Model Integration (CMMI) [9]. McConnell believes that to obtain a 50–70% chance of avoiding time overrun, risk management only requires 5% of the total project budget [31]. These reasons highlight the urgency and feasibility of software project risk management.

In the current practice, subjective analysis or expert judgment is one of the methods often used in project risk management [15]. It is based on the experience of an expert and is thus inevitably human-intensive and obscure [16]; likewise, it generally lacks repeatability as experience is not readily shared among different teams within an organization [35]. Therefore, it is crucial to develop intelligent modeling techniques that can provide more objective, repeatable, and visible decision-making support for risk management. Among various existing intelligent modeling techniques, the Bayesian network (BN) has attracted much attention, such as those presented in refs. [1], [16], [28], due to its excellent ability in representing and reasoning with uncertainties.

Most research on software project risk analysis focuses on the discovery of correlations between risk factors and project outcomes [13], [24], [60]. At present, studies on BN-based risk analysis of software projects involve two ways of network construction: (1) experts manually specify the network to reflect expert knowledge [14], [16], and (2) automatically learn the network from observational data [27]. Since the manual method is not based on observational data, it will certainly contain expert subjective bias. The existing automatic methods for BN network learning cannot distinguish correlation from causality. For instance, the edge orientation does not necessarily indicate which risk should be controlled to change another risk. However this limitation in existing algorithms is usually neglected. Such research models are not suitable for direct risk control.

Software project practitioners have long complained about the difficulty in determining the real and direct risks to guide the allocation of time and resources. Thus causality, rather than correlation, is of greater interest to industry experts in software project risk planning because it can determine the causal factors that directly affect project outcomes. For example, the risk of “project involving the use of new technology” may be correlated with “immature technology” because new technology is probably underdeveloped due to its unidentified bugs. Nevertheless, a new technology does not necessarily mean an immature technology. Whether we can mitigate the former risk by only focusing on the latter is not certain, and vice versa. Actually, we are advised to reduce the risks of using a new technology by referring to pilot investigations, preparing alternative technology, training of team members. National Aeronautics and Space Administration (NASA) considers that risk planning should first “make sure that the consequences and the sources of the risk are known” and “plan important risks first” [45]. The Software Engineering Institute of Carnegie Mellon University (CMU/SEI) requires the risk analysis process to satisfy the goal of “determining the source of risk”, i.e., “the root causes of the risk” [18]. Hence, in risk planning, analyses of the consequences and risk sources are very important.

In this paper, we propose a novel framework for software project risk management using BNs with causality constraints (BNCC). Our primary objective is to perform a causality analysis between risk factors and project outcomes to achieve more effective risk control. Specifically, the analysis involves (1) introducing a new modeling framework for risk causality analysis to discover new causal relationships and validate existing ones (i.e., practical and/or academic expert knowledge) between risk factors and project outcomes based on historical data; and (2) constructing an empirical BN software project risk analysis model based on the framework, which can be readily used in risk planning.

Compared with other modeling algorithms such as C4.5 and Naïve Bayes, the proposed BNCC-based model has the following advantages: (1) strong interpretability — the constructed BN combines data with expert knowledge, depicts causal relationships between variables, and helps obtain better project outcomes or higher probability of project success; and (2) acceptable predictive accuracy — the final model in this study has better predictive power compared with other modeling algorithms, making the model suitable for capturing the statistical relationships between risk factors and project outcomes.

This study makes two important contributions. First, it proposes the first causal discovery framework for risk management of software projects, which builds an empirical model from real data and incorporates the causal discovery technique and expert knowledge. This risk modeling framework can be widely applied to other related domains. Second, it provides a BNCC model for risk analysis based on data from real industry software projects. The network has strong interpretability and can provide explicit knowledge (causal relationships between risk factors and project outcomes) of software projects. Subsequently, such knowledge can help in conducting effective risk analysis and further risk planning, which will result in a better implementation of software project risk management.

This paper is organized as follows. Section 2 provides a review of related literature. Section 3 describes the proposed risk model and the modeling concept. Section 4 presents the experimental results. Finally, Section 5 concludes and discusses limitations of the study.

Section snippets

Risk management of software projects

Risk management was first introduced to software project management by Boehm [3] and Charette [6]. According to the “IEEE Standard for Software Project Management Plans” [22], a software project is defined as a series of technical and managerial work activities that should meet the terms and conditions listed in the project agreement. Successful software project usually means that the project can be completed within the budget and given time, and meet the customers' demand for high-quality and

Causality in BNs

BNs [10] based on graph and probability theories are a widely accepted tool that can visualize uncertain knowledge and perform efficient reasoning, given the variables and their joint probability distributions. A BN consists of two parts: a DAG, which indicates conditional (in)dependent relationships among the variables (Fig. 1), and a set of CPT, which represents the conditional probability distribution among the variables.

The faithfulness condition of Bayesian graphical theory assumes that a

Software project risk model

Measurements of risks in software projects have been studied since the 1980s, and various risk classification frameworks, dimensions, and models have been proposed. Some of these have been used for special software projects, such as e-commerce [36] and customer relationship management system [44], among others. However, most of the existing studies [2], [4], [46], [52] lack a comprehensive and systematic framework. Boehm [4] developed a top 10 risk identification checklist. Although these risk

Discussion

Some issues deserve further discussion, with regard to the following aspects.

  • 1)

    Why not conduct intervention experiment? In general, intervention experiment is more effective than the observational-data-based inference but it entails high costs. It is more appropriate and easy to perform if the goal is to discover causality at the level of software module or code because manipulating a module or modifying some lines of code is easy. In contrast, if the goal is to discover risk causalities of a

Conclusions and limitations

To perform better risk analysis and risk planning, discovering causality between risk factors and project outcomes in risk management is important. This study proposes a V-structure discovery algorithm and establishes a BN with causality constraints. The proposed risk modeling framework is a completely new approach, suitable for solving similar risk management problems in other fields. And we provide an application case of software project risk analysis and control.

A large sample data was

Acknowledgements

This research was partly supported by the National Natural Science Foundation of China (71271061, 70801020 and 61100148), the Science and Technology Planning Project of Guangdong Province, China (2010B010600034), the Business Intelligence Key Team of Guangdong University of Foreign Studies (TD1202), and the Natural Science Foundation of Guangdong province (S2011040004804).

Dr. Yong Hu is currently an Associate Professor and Chair in the Department of E-commerce, and Director of Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-Sen University. He received his B.Sc in Computer Science, M.Phil and Ph.D. in Management Information Systems from Sun Yat-Sen University. His research interests are in the areas of business intelligence, quantitative investment, software project risk management, e-commerce

References (63)

  • J. Jiang et al.

    Software development risks to project effectiveness

    Journal of Systems and Software

    (2000)
  • E.J.M. Lauría et al.

    A Bayesian belief network for IT implementation decision support

    Decision Support Systems

    (2006)
  • E. Lauría et al.

    A methodology for developing Bayesian networks: an application to information technology (IT) implementation

    European Journal of Operational Research

    (2007)
  • E. Lee et al.

    Large engineering project risk management using a Bayesian belief network

    Expert Systems with Applications

    (2009)
  • M. Moreno García et al.

    An association rule mining method for estimating the impact of project management policies on software quality, development time and effort

    Expert Systems with Applications

    (2008)
  • S. Nadkarni et al.

    A causal mapping approach to constructing Bayesian networks

    Decision Support Systems

    (2004)
  • E. Ngai et al.

    Fuzzy decision support system for risk analysis in e-commerce development

    Decision Support Systems

    (2005)
  • J. Pearl et al.

    A theory of inferred causation

    Studies in Logic and the Foundations of Mathematics

    (1995)
  • T. Roh et al.

    The priority factor model for customer relationship management system success

    Expert Systems with Applications

    (2005)
  • I. Stamelos et al.

    On the use of Bayesian belief networks for the prediction of software productivity

    Information and Software Technology

    (2003)
  • F. Ülengin et al.

    An integrated transportation decision support system for transportation policy decisions: the case of Turkey

    Transportation Research Part A: Policy and Practice

    (2007)
  • L. Uusitalo

    Advantages and challenges of Bayesian networks in environmental modelling

    Ecological Modelling

    (2007)
  • M.A.J. van Gerven et al.

    A generic qualitative characterization of independence of causal influence

    International Journal of Approximate Reasoning

    (2008)
  • L. Wallace et al.

    Understanding software project risk: a cluster analysis

    Information Management

    (2004)
  • Z. Xu et al.

    Application of fuzzy expert systems in assessing operational risk of software

    Information and Software Technology

    (2003)
  • H. Barki et al.

    Toward an assessment of software development risk

    Journal of Management Information Systems

    (1993)
  • B.W. Boehm

    Software Risk Management

    (1989)
  • B.W. Boehm

    Software risk management: principles and practices

    IEEE Software

    (1991)
  • R. Charette

    Software Engineering: Risk Analysis and Management

    (1989)
  • C. Chow et al.

    Approximating discrete probability distributions with dependence trees

    IEEE Transactions on Information Theory

    (2002)
  • CMMI Product Team

    Capability Maturity Model Integration (CMMI SM) Version 1.1, CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1)

    (2002)
  • Cited by (142)

    • Dynamic assessment of project portfolio risks from the life cycle perspective

      2023, Computers and Industrial Engineering
      Citation Excerpt :

      Fig. 4 illustrates that a simplified DAG is composed of root nodes, intermediate nodes, leaf nodes, and arrows that express the qualitative causal relationships between nodes. CPTs are assigned to nodes to denote quantitative relationships between variables (Hu et al., 2013). The BN presented in this study can be generated and computed using the software GeNIe 2.3.

    View all citing articles on Scopus

    Dr. Yong Hu is currently an Associate Professor and Chair in the Department of E-commerce, and Director of Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-Sen University. He received his B.Sc in Computer Science, M.Phil and Ph.D. in Management Information Systems from Sun Yat-Sen University. His research interests are in the areas of business intelligence, quantitative investment, software project risk management, e-commerce and decision support systems. He has published works in a number of journals and conferences such as DSS, ESWA and IEEE ICDM. Dr. Hu's research is supported by the National Natural Science Foundation, the Science and Technology Planning Project of Guangdong Province.

    Xiangzhou Zhang is a Ph.D. student in Sun Yat-sen University and working as an assistant researcher in Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-sen University. He has received his B.S. degree in Computer Science from Sun Yat-Sen University, and M.S. degree in Management from Guangdong University of Foreign Studies. His research interests include data mining, quantitative investment, software project risk management, and business intelligence.

    Prof. Eric Ngai is a Professor in the Department of Management and Marketing at The Hong Kong Polytechnic University. His current research interests are in the areas of E-commerce, Supply Chain Management, Decision Support Systems and RFID Technology and Applications. He has published papers in a number of international journals including MIS Quarterly, Journal of Operations Management, Decision Support Systems, IEEE Transactions on Systems, Man and Cybernetics, Information & Management, Production & Operations Management, and others. He is an Associate Editor of European Journal of Information Systems and serves on editorial board of three international journals. Prof.Ngai has attained an h-index of 22, and received 1490 citations, ISI Web of Science.

    Dr. Ruichu Cai received his B.S. in Applied Mathematics and PhD in Computer Science from South China University of Technology in 2005 and 2010, respectively. He is currently an Assistant Professor in Department of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China and State Key Laboratory for Novel Software Technology, Nanjing University, P.R. China. He was visiting student of National University of Singapore in 2007–2009. His research interests cover a variety of different topics including data mining, decision support systems, causal inference, association rule mining, and feature selection. He has published in a number of journals and conferences, such as Pattern Recognition, IEEE TKDE, and SIGMOD. Dr. Cai's Research is supported by the National Natural Science Foundation.

    Dr. Mei Liu is currently an Assistant Professor in the Department of Computer Science at New Jersey Institute of Technology. She received her Ph.D. degree in computer science from the University of Kansas, Lawrence, USA and completed her postdoctoral training as an NIH-NLM research fellow in the Department of Biomedical Informatics at Vanderbilt University, Nashville, USA. Her research interest includes data mining, machine learning, text mining, decision support systems, quantitative investment, and medical informatics. She has published a number of papers in conferences and journals such as Bioinformatics, JAMIA, ESWA, EURASIP Journal on Applied Signal Processing, BMC Bioinformatics, PLoS ONE, and IEEE ICDM.

    View full text