Software project risk analysis using Bayesian networks with causality constraints

doi:10.1016/j.dss.2012.11.001

Decision Support Systems

Volume 56, December 2013, Pages 439-449

https://doi.org/10.1016/j.dss.2012.11.001 Get rights and content

Abstract

Many risks are involved in software development and risk management has become one of the key activities in software development. Bayesian networks (BNs) have been explored as a tool for various risk management practices, including the risk management of software development projects. However, much of the present research on software risk analysis focuses on finding the correlation between risk factors and project outcome. Software project failures are often a result of insufficient and ineffective risk management. To obtain proper and effective risk control, risk planning should be performed based on risk causality which can provide more risk information for decision making. In this study, we propose a model using BNs with causality constraints (BNCC) for risk analysis of software development projects. Through unrestricted automatic causality learning from 302 collected software project data, we demonstrated that the proposed model can not only discover causalities in accordance with the expert knowledge but also perform better in prediction than other algorithms, such as logistic regression, C4.5, Naïve Bayes, and general BNs. This research presents the first causal discovery framework for risk causality analysis of software projects and develops a model using BNCC for application in software project risk management.

Introduction

The software industry has become one of the fastest-growing industries. The global software market is estimated to have a value of US$330 billion in 2014, an increase of 36.1% since 2009 (US$ 242.4 billion) [43]. However, software development is yet a high-risk activity. The “CHAOS Summary 2009” from the Standish Group reported that the success rate of global (mainly U.S. and European) software projects is only 32% [55]. Much previous research has shown that the most important problem in software engineering is risk management, whereas technical issues are only secondary. For example, the Standish Group's report “EXTREME CHAOS” [54] summarized the recipe for software project success, that is, the CHAO 10, most of which are non-technical factors. Risk management is critical to project management; it is one of the 9 knowledge areas in project management as defined in the Project Management Body of Knowledge (PMBOK) [42] and is one of the 25 key process areas as defined in the Capability Maturity Model Integration (CMMI) [9]. McConnell believes that to obtain a 50–70% chance of avoiding time overrun, risk management only requires 5% of the total project budget [31]. These reasons highlight the urgency and feasibility of software project risk management.

In the current practice, subjective analysis or expert judgment is one of the methods often used in project risk management [15]. It is based on the experience of an expert and is thus inevitably human-intensive and obscure [16]; likewise, it generally lacks repeatability as experience is not readily shared among different teams within an organization [35]. Therefore, it is crucial to develop intelligent modeling techniques that can provide more objective, repeatable, and visible decision-making support for risk management. Among various existing intelligent modeling techniques, the Bayesian network (BN) has attracted much attention, such as those presented in refs. [1], [16], [28], due to its excellent ability in representing and reasoning with uncertainties.

Most research on software project risk analysis focuses on the discovery of correlations between risk factors and project outcomes [13], [24], [60]. At present, studies on BN-based risk analysis of software projects involve two ways of network construction: (1) experts manually specify the network to reflect expert knowledge [14], [16], and (2) automatically learn the network from observational data [27]. Since the manual method is not based on observational data, it will certainly contain expert subjective bias. The existing automatic methods for BN network learning cannot distinguish correlation from causality. For instance, the edge orientation does not necessarily indicate which risk should be controlled to change another risk. However this limitation in existing algorithms is usually neglected. Such research models are not suitable for direct risk control.

Software project practitioners have long complained about the difficulty in determining the real and direct risks to guide the allocation of time and resources. Thus causality, rather than correlation, is of greater interest to industry experts in software project risk planning because it can determine the causal factors that directly affect project outcomes. For example, the risk of “project involving the use of new technology” may be correlated with “immature technology” because new technology is probably underdeveloped due to its unidentified bugs. Nevertheless, a new technology does not necessarily mean an immature technology. Whether we can mitigate the former risk by only focusing on the latter is not certain, and vice versa. Actually, we are advised to reduce the risks of using a new technology by referring to pilot investigations, preparing alternative technology, training of team members. National Aeronautics and Space Administration (NASA) considers that risk planning should first “make sure that the consequences and the sources of the risk are known” and “plan important risks first” [45]. The Software Engineering Institute of Carnegie Mellon University (CMU/SEI) requires the risk analysis process to satisfy the goal of “determining the source of risk”, i.e., “the root causes of the risk” [18]. Hence, in risk planning, analyses of the consequences and risk sources are very important.

In this paper, we propose a novel framework for software project risk management using BNs with causality constraints (BNCC). Our primary objective is to perform a causality analysis between risk factors and project outcomes to achieve more effective risk control. Specifically, the analysis involves (1) introducing a new modeling framework for risk causality analysis to discover new causal relationships and validate existing ones (i.e., practical and/or academic expert knowledge) between risk factors and project outcomes based on historical data; and (2) constructing an empirical BN software project risk analysis model based on the framework, which can be readily used in risk planning.

Compared with other modeling algorithms such as C4.5 and Naïve Bayes, the proposed BNCC-based model has the following advantages: (1) strong interpretability — the constructed BN combines data with expert knowledge, depicts causal relationships between variables, and helps obtain better project outcomes or higher probability of project success; and (2) acceptable predictive accuracy — the final model in this study has better predictive power compared with other modeling algorithms, making the model suitable for capturing the statistical relationships between risk factors and project outcomes.

This study makes two important contributions. First, it proposes the first causal discovery framework for risk management of software projects, which builds an empirical model from real data and incorporates the causal discovery technique and expert knowledge. This risk modeling framework can be widely applied to other related domains. Second, it provides a BNCC model for risk analysis based on data from real industry software projects. The network has strong interpretability and can provide explicit knowledge (causal relationships between risk factors and project outcomes) of software projects. Subsequently, such knowledge can help in conducting effective risk analysis and further risk planning, which will result in a better implementation of software project risk management.

This paper is organized as follows. Section 2 provides a review of related literature. Section 3 describes the proposed risk model and the modeling concept. Section 4 presents the experimental results. Finally, Section 5 concludes and discusses limitations of the study.

Section snippets

Risk management of software projects

Risk management was first introduced to software project management by Boehm [3] and Charette [6]. According to the “IEEE Standard for Software Project Management Plans” [22], a software project is defined as a series of technical and managerial work activities that should meet the terms and conditions listed in the project agreement. Successful software project usually means that the project can be completed within the budget and given time, and meet the customers' demand for high-quality and

Causality in BNs

BNs [10] based on graph and probability theories are a widely accepted tool that can visualize uncertain knowledge and perform efficient reasoning, given the variables and their joint probability distributions. A BN consists of two parts: a DAG, which indicates conditional (in)dependent relationships among the variables (Fig. 1), and a set of CPT, which represents the conditional probability distribution among the variables.

The faithfulness condition of Bayesian graphical theory assumes that a

Software project risk model

Measurements of risks in software projects have been studied since the 1980s, and various risk classification frameworks, dimensions, and models have been proposed. Some of these have been used for special software projects, such as e-commerce [36] and customer relationship management system [44], among others. However, most of the existing studies [2], [4], [46], [52] lack a comprehensive and systematic framework. Boehm [4] developed a top 10 risk identification checklist. Although these risk

Discussion

Some issues deserve further discussion, with regard to the following aspects.

1)
Why not conduct intervention experiment? In general, intervention experiment is more effective than the observational-data-based inference but it entails high costs. It is more appropriate and easy to perform if the goal is to discover causality at the level of software module or code because manipulating a module or modifying some lines of code is easy. In contrast, if the goal is to discover risk causalities of a

Conclusions and limitations

To perform better risk analysis and risk planning, discovering causality between risk factors and project outcomes in risk management is important. This study proposes a V-structure discovery algorithm and establishes a BN with causality constraints. The proposed risk modeling framework is a completely new approach, suitable for solving similar risk management problems in other fields. And we provide an application case of software project risk analysis and control.

A large sample data was

Acknowledgements

This research was partly supported by the National Natural Science Foundation of China (71271061, 70801020 and 61100148), the Science and Technology Planning Project of Guangdong Province, China (2010B010600034), the Business Intelligence Key Team of Guangdong University of Foreign Studies (TD1202), and the Natural Science Foundation of Guangdong province (S2011040004804).

References (63)

C. Bai
Bayesian network based software reliability prediction with an operational profile
Journal of Systems and Software
(2005)
R. Cai et al.
BASSUM: a Bayesian semi-supervised method for classification feature selection
Pattern Recognition
(2011)
J. Cheng et al.
Learning Bayesian networks from data: an information-theory based approach
Artificial Intelligence
(2002)
L. de Campos et al.
Bayesian network learning algorithms using structural restrictions
International Journal of Approximate Reasoning
(2007)
J. Drew Procaccino et al.
Case study: factors for early prediction of software development success
Information and Software Technology
(2002)
S. Du et al.
Attention-shaping tools, expertise, and perceived control in IT project risk assessment
Decision Support Systems
(2007)
C. Fan et al.
BBN-based software project risk management
Journal of Systems and Software
(2004)
C. Fang et al.
A simulation-based risk network model for decision support in project risk management
Decision Support Systems
(2012)
W. Han et al.
An empirical analysis of risk components and performance on software projects
Journal of Systems and Software
(2007)
S. Huang et al.
Exploring the relationship between software project duration and risk exposure: a cluster analysis
Information Management
(2008)

J. Jiang et al.

Software development risks to project effectiveness

Journal of Systems and Software

(2000)

E.J.M. Lauría et al.

A Bayesian belief network for IT implementation decision support

Decision Support Systems

(2006)

E. Lauría et al.

A methodology for developing Bayesian networks: an application to information technology (IT) implementation

European Journal of Operational Research

(2007)

E. Lee et al.

Large engineering project risk management using a Bayesian belief network

Expert Systems with Applications

(2009)

M. Moreno García et al.

An association rule mining method for estimating the impact of project management policies on software quality, development time and effort

Expert Systems with Applications

(2008)

S. Nadkarni et al.

A causal mapping approach to constructing Bayesian networks

Decision Support Systems

(2004)

E. Ngai et al.

Fuzzy decision support system for risk analysis in e-commerce development

Decision Support Systems

(2005)

J. Pearl et al.

A theory of inferred causation

Studies in Logic and the Foundations of Mathematics

(1995)

T. Roh et al.

The priority factor model for customer relationship management system success

Expert Systems with Applications

(2005)

I. Stamelos et al.

On the use of Bayesian belief networks for the prediction of software productivity

Information and Software Technology

(2003)

F. Ülengin et al.

An integrated transportation decision support system for transportation policy decisions: the case of Turkey

Transportation Research Part A: Policy and Practice

(2007)

L. Uusitalo

Advantages and challenges of Bayesian networks in environmental modelling

Ecological Modelling

(2007)

M.A.J. van Gerven et al.

A generic qualitative characterization of independence of causal influence

International Journal of Approximate Reasoning

(2008)

L. Wallace et al.

Understanding software project risk: a cluster analysis

Information Management

(2004)

Z. Xu et al.

Application of fuzzy expert systems in assessing operational risk of software

Information and Software Technology

(2003)

H. Barki et al.

Toward an assessment of software development risk

Journal of Management Information Systems

(1993)

B.W. Boehm

Software Risk Management

(1989)

B.W. Boehm

Software risk management: principles and practices

IEEE Software

(1991)

R. Charette

Software Engineering: Risk Analysis and Management

(1989)

C. Chow et al.

Approximating discrete probability distributions with dependence trees

IEEE Transactions on Information Theory

(2002)

CMMI Product Team

Capability Maturity Model Integration (CMMI ^SM) Version 1.1, CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1)

(2002)

Cited by (142)

Evidential software risk assessment model on ordered frame of discernment
2024, Expert Systems with Applications
With the rapid advancement of information technology, software risk assessment plays an increasingly important role in ensuring the stability and security of information systems. The aim of this study is to establish an evidential software risk assessment model using an ordered frame of discernment to accurately quantify and evaluate software risks. An ordered risk criteria assessment is provided for the risk assessment. In the assessment process, a fusion method of expert reliability and expert linguistic information is used to generate the basic belief assignment. Meanwhile, the Dempster–Shafer theory is used to combine the assessment results. In addition, by introducing belief entropy based on ordered sets to calculate the weights, the degree of contribution of each attribute to the risk can be reflected more precisely. Finally, a software risk assessment model is proposed, which can accurately assess software risk and provide strong support for risk management. The application confirms the practicality of the proposed method in risk assessment projects. The analysis results indicate its sensitivity to the ranking of expert belief entropy and ordered entropy weights.
A hybrid approach for optimizing deep excavation safety measures based on Bayesian network and design structure matrix
2023, Advanced Engineering Informatics
Considering the dynamic risk factors and risk situation throughout the entire deep excavation operations, timely adjustment and optimization of safety measures can enhance the practicality of construction technical plans on sites. A digital and quantitative model representing the practical risk situation of the deep excavation is urgently required for realizing the prediction, optimization, and control of the actual construction state. Thus, this research aims to propose a real-world-oriented model integrating Bayesian network (BN) and design structure matrix (DSM) for decision-making in safety risk management. First, risk factors were identified, and the BN model was established to evaluate the anti-risk ability of the construction site. Then, a multi-objective safety measure optimization model under specific constraints was established. Particularly, the DSM was adopted to express the control relationship between risk factors and safety measures. Moreover, with genetic algorithms applied, the optimal safety measure set for on-site safety risk management can be obtained. For model validation, a deep excavation project of metro construction in Wuhan, China, was selected as a case study. The hybrid optimization model showed the characters of initiative and timeliness in construction risk management. By providing the timely and optimized combination of safety measures, the dynamic decision-making approach can proactively and effectively improve the risk resistance ability of construction sites.
Applications of statistical causal inference in software engineering
2023, Information and Software Technology
The aim of statistical causal inference (SCI) methods is to estimate causal effects from observational data (i.e., when randomized controlled trials are not possible). In this context, Pearl’s framework based on causal graphical models is an approach that has recently gained popularity and allows for explicit reasoning about issues related to spurious correlations.
Our primary goal is to understand to which extend and how Pearl’s graphical framework is applied in software engineering (SE).
We performed a systematic mapping study and analysed a total of $25$ papers published between 2010 and 2022.
Our results show that the application of Pearl’s SCI framework in SE is relatively recent and that the corresponding research community is fragmented. Most of the selected papers focus on software quality analysis. There is no clear and widespread community of practice (yet) on how to implement and evaluate SCI in SE.
To the best of our knowledge this is the first time such a mapping study is done. We believe that SE practitioners might benefit from such a work, as it both provides an overview of the work and people involved in the application of causal inference methods, but also outlines the potential and limitations of such approaches.
Dynamic assessment of project portfolio risks from the life cycle perspective
2023, Computers and Industrial Engineering
Citation Excerpt :
Fig. 4 illustrates that a simplified DAG is composed of root nodes, intermediate nodes, leaf nodes, and arrows that express the qualitative causal relationships between nodes. CPTs are assigned to nodes to denote quantitative relationships between variables (Hu et al., 2013). The BN presented in this study can be generated and computed using the software GeNIe 2.3.
Project portfolio risks (PPRs) are mostly considered in terms of interdependency between projects, ignoring the time dependency and causality between risks. This may lead to inappropriate risk assessments and reduced efficacy in risk treatments. This study aims to dynamically assess PPRs from the life cycle perspective to support managers in planning risk treatment actions more effectively. A fuzzy dynamic Bayesian network (F-DBN) is applicable in this scenario. First, PPRs are identified by considering project interdependency. Second, the causality and time dependency between the PPRs are modeled using an F-DBN. Then, the dynamic variation characteristics of the PPRs in the project portfolio (PP) life cycle are revealed. Finally, a numerical example is adopted to validate the applicability and effectiveness of the model. Based on the results, the key PPRs at different stages are identified, and the inherent characteristics with dynamic changes of those key risks are further revealed. The results of the analysis provide insights for PP managers to implement corresponding risk-reduction strategies.
A Bayesian-driven Monte Carlo approach for managing construction schedule risks of infrastructures under uncertainty
2023, Expert Systems with Applications
The construction of infrastructures has often been challenged by construction delays, which are disruptive and expensive. As the construction of infrastructures is more complex and riskier under uncertainty, the construction schedule risks are becoming more interconnected and occurring in a chain of cascading events in practice. However, past research typically ignored the interdependency between sequence of risk occurrence (e.g., chronological and causal relationships). This research aims to close this gap through developing a novel approach based on Bayesian-driven Monte Carlo (BDMC) simulation for managing these interdependent construction schedule risks of infrastructures under uncertainty. This approach integrates hybrid data processing and analytics methods to (1) construct Bayesian network for identifying risks and risk interdependencies (i.e., causal relationships), (2) conduct risk inference and construction duration prediction involving both of chronological and causal relationships between risks, and (3) identify critical and sensitive risks and provide the most appropriate strategy for risk mitigation. The approach firstly pre-processes the data from risks and risk interdependencies to construct the risk network. It further constructs the Bayesian network using deep-first search (DFS), adapted maximum-weight spanning tree (A-MWST) algorithms and leaky-MAX model. Then the approach is developed for risk mapping based on BDMC simulation and for risk mitigation based on sensitivity and scenario analysis. Finally, a real infrastructure project is selected as the case study to verify this developed approach. Compared to conventional methods, the results show that the developed approach can provide more accurate schedule prediction with least 0.166% error ratio through incorporating interdependent risks into schedule prediction. It is also more informative through proposing effective risk mitigation strategies for delay avoidance and uncertainty reduction, and more convenient in data acquisition and processing when developing a Bayesian network through the developed hybrid data transformation approach converting the risk network into Bayesian network. This research contributes a new way to understand and analyse the interdependencies of risks and their impacts on construction schedule. The developed approach can be beneficial to managing construction schedule risks of infrastructures under uncertainty.
Delay-oriented risk network model for project risk response decisions
2022, Computers and Industrial Engineering
Risk interactions change the occurrence probability of risks and the impact due to the risks. Accurate risk interaction modeling becomes the key point in making project risk response decisions to ensure successful project completion. However, previous researches mostly focus on the probability of triggering a risk by another, ignoring the time delay existing in risk interaction, which can lead to inaccurate risk criticality assessment and reduce the efficacy of risk response decisions. Departing from the risk interaction network, whose nodes and edges are the risks and risk interactions respectively, this paper constructs a risk delay network (RDN) by taking time delay as an edge attribute. And accordingly, put forward an integrated approach to support decision making on risk responses, comprising an RDN simulation model for evaluating risk criticality and an RDN structural analysis to propose effective solutions on the critical risks. The simulation model suggests that risk criticality varies depending on whether time delays are considered or not. The RDN structural analysis applies a novel network index to compute the contributions of the nodes and edges to the risk criticality from the bird’s eye view on the RDN. An example is applied to validate the integrated approach.

View all citing articles on Scopus

Dr. Yong Hu is currently an Associate Professor and Chair in the Department of E-commerce, and Director of Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-Sen University. He received his B.Sc in Computer Science, M.Phil and Ph.D. in Management Information Systems from Sun Yat-Sen University. His research interests are in the areas of business intelligence, quantitative investment, software project risk management, e-commerce and decision support systems. He has published works in a number of journals and conferences such as DSS, ESWA and IEEE ICDM. Dr. Hu's research is supported by the National Natural Science Foundation, the Science and Technology Planning Project of Guangdong Province.

Xiangzhou Zhang is a Ph.D. student in Sun Yat-sen University and working as an assistant researcher in Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-sen University. He has received his B.S. degree in Computer Science from Sun Yat-Sen University, and M.S. degree in Management from Guangdong University of Foreign Studies. His research interests include data mining, quantitative investment, software project risk management, and business intelligence.

Prof. Eric Ngai is a Professor in the Department of Management and Marketing at The Hong Kong Polytechnic University. His current research interests are in the areas of E-commerce, Supply Chain Management, Decision Support Systems and RFID Technology and Applications. He has published papers in a number of international journals including MIS Quarterly, Journal of Operations Management, Decision Support Systems, IEEE Transactions on Systems, Man and Cybernetics, Information & Management, Production & Operations Management, and others. He is an Associate Editor of European Journal of Information Systems and serves on editorial board of three international journals. Prof.Ngai has attained an h-index of 22, and received 1490 citations, ISI Web of Science.

Dr. Ruichu Cai received his B.S. in Applied Mathematics and PhD in Computer Science from South China University of Technology in 2005 and 2010, respectively. He is currently an Assistant Professor in Department of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China and State Key Laboratory for Novel Software Technology, Nanjing University, P.R. China. He was visiting student of National University of Singapore in 2007–2009. His research interests cover a variety of different topics including data mining, decision support systems, causal inference, association rule mining, and feature selection. He has published in a number of journals and conferences, such as Pattern Recognition, IEEE TKDE, and SIGMOD. Dr. Cai's Research is supported by the National Natural Science Foundation.

Dr. Mei Liu is currently an Assistant Professor in the Department of Computer Science at New Jersey Institute of Technology. She received her Ph.D. degree in computer science from the University of Kansas, Lawrence, USA and completed her postdoctoral training as an NIH-NLM research fellow in the Department of Biomedical Informatics at Vanderbilt University, Nashville, USA. Her research interest includes data mining, machine learning, text mining, decision support systems, quantitative investment, and medical informatics. She has published a number of papers in conferences and journals such as Bioinformatics, JAMIA, ESWA, EURASIP Journal on Applied Signal Processing, BMC Bioinformatics, PLoS ONE, and IEEE ICDM.

View full text

Software project risk analysis using Bayesian networks with causality constraints

Abstract

Introduction

Section snippets

Risk management of software projects

Causality in BNs

Software project risk model

Discussion

Conclusions and limitations

Acknowledgements

Journal of Systems and Software

Pattern Recognition

Artificial Intelligence

International Journal of Approximate Reasoning

Information and Software Technology

Decision Support Systems

Journal of Systems and Software

Decision Support Systems

Journal of Systems and Software

Information Management

Journal of Systems and Software

Decision Support Systems

European Journal of Operational Research

Expert Systems with Applications

Expert Systems with Applications

Decision Support Systems

Decision Support Systems

Studies in Logic and the Foundations of Mathematics

Expert Systems with Applications

Information and Software Technology

Transportation Research Part A: Policy and Practice

Ecological Modelling

International Journal of Approximate Reasoning

Information Management

Information and Software Technology

Toward an assessment of software development risk

Journal of Management Information Systems

Software Risk Management

Software risk management: principles and practices

IEEE Software

Software Engineering: Risk Analysis and Management

Approximating discrete probability distributions with dependence trees

IEEE Transactions on Information Theory

Capability Maturity Model Integration (CMMI SM) Version 1.1, CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1)

Capability Maturity Model Integration (CMMI ^SM) Version 1.1, CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1)