Software project risk analysis using Bayesian networks with causality constraints
Introduction
The software industry has become one of the fastest-growing industries. The global software market is estimated to have a value of US$330 billion in 2014, an increase of 36.1% since 2009 (US$ 242.4 billion) [43]. However, software development is yet a high-risk activity. The “CHAOS Summary 2009” from the Standish Group reported that the success rate of global (mainly U.S. and European) software projects is only 32% [55]. Much previous research has shown that the most important problem in software engineering is risk management, whereas technical issues are only secondary. For example, the Standish Group's report “EXTREME CHAOS” [54] summarized the recipe for software project success, that is, the CHAO 10, most of which are non-technical factors. Risk management is critical to project management; it is one of the 9 knowledge areas in project management as defined in the Project Management Body of Knowledge (PMBOK) [42] and is one of the 25 key process areas as defined in the Capability Maturity Model Integration (CMMI) [9]. McConnell believes that to obtain a 50–70% chance of avoiding time overrun, risk management only requires 5% of the total project budget [31]. These reasons highlight the urgency and feasibility of software project risk management.
In the current practice, subjective analysis or expert judgment is one of the methods often used in project risk management [15]. It is based on the experience of an expert and is thus inevitably human-intensive and obscure [16]; likewise, it generally lacks repeatability as experience is not readily shared among different teams within an organization [35]. Therefore, it is crucial to develop intelligent modeling techniques that can provide more objective, repeatable, and visible decision-making support for risk management. Among various existing intelligent modeling techniques, the Bayesian network (BN) has attracted much attention, such as those presented in refs. [1], [16], [28], due to its excellent ability in representing and reasoning with uncertainties.
Most research on software project risk analysis focuses on the discovery of correlations between risk factors and project outcomes [13], [24], [60]. At present, studies on BN-based risk analysis of software projects involve two ways of network construction: (1) experts manually specify the network to reflect expert knowledge [14], [16], and (2) automatically learn the network from observational data [27]. Since the manual method is not based on observational data, it will certainly contain expert subjective bias. The existing automatic methods for BN network learning cannot distinguish correlation from causality. For instance, the edge orientation does not necessarily indicate which risk should be controlled to change another risk. However this limitation in existing algorithms is usually neglected. Such research models are not suitable for direct risk control.
Software project practitioners have long complained about the difficulty in determining the real and direct risks to guide the allocation of time and resources. Thus causality, rather than correlation, is of greater interest to industry experts in software project risk planning because it can determine the causal factors that directly affect project outcomes. For example, the risk of “project involving the use of new technology” may be correlated with “immature technology” because new technology is probably underdeveloped due to its unidentified bugs. Nevertheless, a new technology does not necessarily mean an immature technology. Whether we can mitigate the former risk by only focusing on the latter is not certain, and vice versa. Actually, we are advised to reduce the risks of using a new technology by referring to pilot investigations, preparing alternative technology, training of team members. National Aeronautics and Space Administration (NASA) considers that risk planning should first “make sure that the consequences and the sources of the risk are known” and “plan important risks first” [45]. The Software Engineering Institute of Carnegie Mellon University (CMU/SEI) requires the risk analysis process to satisfy the goal of “determining the source of risk”, i.e., “the root causes of the risk” [18]. Hence, in risk planning, analyses of the consequences and risk sources are very important.
In this paper, we propose a novel framework for software project risk management using BNs with causality constraints (BNCC). Our primary objective is to perform a causality analysis between risk factors and project outcomes to achieve more effective risk control. Specifically, the analysis involves (1) introducing a new modeling framework for risk causality analysis to discover new causal relationships and validate existing ones (i.e., practical and/or academic expert knowledge) between risk factors and project outcomes based on historical data; and (2) constructing an empirical BN software project risk analysis model based on the framework, which can be readily used in risk planning.
Compared with other modeling algorithms such as C4.5 and Naïve Bayes, the proposed BNCC-based model has the following advantages: (1) strong interpretability — the constructed BN combines data with expert knowledge, depicts causal relationships between variables, and helps obtain better project outcomes or higher probability of project success; and (2) acceptable predictive accuracy — the final model in this study has better predictive power compared with other modeling algorithms, making the model suitable for capturing the statistical relationships between risk factors and project outcomes.
This study makes two important contributions. First, it proposes the first causal discovery framework for risk management of software projects, which builds an empirical model from real data and incorporates the causal discovery technique and expert knowledge. This risk modeling framework can be widely applied to other related domains. Second, it provides a BNCC model for risk analysis based on data from real industry software projects. The network has strong interpretability and can provide explicit knowledge (causal relationships between risk factors and project outcomes) of software projects. Subsequently, such knowledge can help in conducting effective risk analysis and further risk planning, which will result in a better implementation of software project risk management.
This paper is organized as follows. Section 2 provides a review of related literature. Section 3 describes the proposed risk model and the modeling concept. Section 4 presents the experimental results. Finally, Section 5 concludes and discusses limitations of the study.
Section snippets
Risk management of software projects
Risk management was first introduced to software project management by Boehm [3] and Charette [6]. According to the “IEEE Standard for Software Project Management Plans” [22], a software project is defined as a series of technical and managerial work activities that should meet the terms and conditions listed in the project agreement. Successful software project usually means that the project can be completed within the budget and given time, and meet the customers' demand for high-quality and
Causality in BNs
BNs [10] based on graph and probability theories are a widely accepted tool that can visualize uncertain knowledge and perform efficient reasoning, given the variables and their joint probability distributions. A BN consists of two parts: a DAG, which indicates conditional (in)dependent relationships among the variables (Fig. 1), and a set of CPT, which represents the conditional probability distribution among the variables.
The faithfulness condition of Bayesian graphical theory assumes that a
Software project risk model
Measurements of risks in software projects have been studied since the 1980s, and various risk classification frameworks, dimensions, and models have been proposed. Some of these have been used for special software projects, such as e-commerce [36] and customer relationship management system [44], among others. However, most of the existing studies [2], [4], [46], [52] lack a comprehensive and systematic framework. Boehm [4] developed a top 10 risk identification checklist. Although these risk
Discussion
Some issues deserve further discussion, with regard to the following aspects.
- 1)
Why not conduct intervention experiment? In general, intervention experiment is more effective than the observational-data-based inference but it entails high costs. It is more appropriate and easy to perform if the goal is to discover causality at the level of software module or code because manipulating a module or modifying some lines of code is easy. In contrast, if the goal is to discover risk causalities of a
Conclusions and limitations
To perform better risk analysis and risk planning, discovering causality between risk factors and project outcomes in risk management is important. This study proposes a V-structure discovery algorithm and establishes a BN with causality constraints. The proposed risk modeling framework is a completely new approach, suitable for solving similar risk management problems in other fields. And we provide an application case of software project risk analysis and control.
A large sample data was
Acknowledgements
This research was partly supported by the National Natural Science Foundation of China (71271061, 70801020 and 61100148), the Science and Technology Planning Project of Guangdong Province, China (2010B010600034), the Business Intelligence Key Team of Guangdong University of Foreign Studies (TD1202), and the Natural Science Foundation of Guangdong province (S2011040004804).
Dr. Yong Hu is currently an Associate Professor and Chair in the Department of E-commerce, and Director of Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-Sen University. He received his B.Sc in Computer Science, M.Phil and Ph.D. in Management Information Systems from Sun Yat-Sen University. His research interests are in the areas of business intelligence, quantitative investment, software project risk management, e-commerce
References (63)
Bayesian network based software reliability prediction with an operational profile
Journal of Systems and Software
(2005)- et al.
BASSUM: a Bayesian semi-supervised method for classification feature selection
Pattern Recognition
(2011) - et al.
Learning Bayesian networks from data: an information-theory based approach
Artificial Intelligence
(2002) - et al.
Bayesian network learning algorithms using structural restrictions
International Journal of Approximate Reasoning
(2007) - et al.
Case study: factors for early prediction of software development success
Information and Software Technology
(2002) - et al.
Attention-shaping tools, expertise, and perceived control in IT project risk assessment
Decision Support Systems
(2007) - et al.
BBN-based software project risk management
Journal of Systems and Software
(2004) - et al.
A simulation-based risk network model for decision support in project risk management
Decision Support Systems
(2012) - et al.
An empirical analysis of risk components and performance on software projects
Journal of Systems and Software
(2007) - et al.
Exploring the relationship between software project duration and risk exposure: a cluster analysis
Information Management
(2008)
Software development risks to project effectiveness
Journal of Systems and Software
A Bayesian belief network for IT implementation decision support
Decision Support Systems
A methodology for developing Bayesian networks: an application to information technology (IT) implementation
European Journal of Operational Research
Large engineering project risk management using a Bayesian belief network
Expert Systems with Applications
An association rule mining method for estimating the impact of project management policies on software quality, development time and effort
Expert Systems with Applications
A causal mapping approach to constructing Bayesian networks
Decision Support Systems
Fuzzy decision support system for risk analysis in e-commerce development
Decision Support Systems
A theory of inferred causation
Studies in Logic and the Foundations of Mathematics
The priority factor model for customer relationship management system success
Expert Systems with Applications
On the use of Bayesian belief networks for the prediction of software productivity
Information and Software Technology
An integrated transportation decision support system for transportation policy decisions: the case of Turkey
Transportation Research Part A: Policy and Practice
Advantages and challenges of Bayesian networks in environmental modelling
Ecological Modelling
A generic qualitative characterization of independence of causal influence
International Journal of Approximate Reasoning
Understanding software project risk: a cluster analysis
Information Management
Application of fuzzy expert systems in assessing operational risk of software
Information and Software Technology
Toward an assessment of software development risk
Journal of Management Information Systems
Software Risk Management
Software risk management: principles and practices
IEEE Software
Software Engineering: Risk Analysis and Management
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory
Capability Maturity Model Integration (CMMI SM) Version 1.1, CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1)
Cited by (142)
Evidential software risk assessment model on ordered frame of discernment
2024, Expert Systems with ApplicationsA hybrid approach for optimizing deep excavation safety measures based on Bayesian network and design structure matrix
2023, Advanced Engineering InformaticsApplications of statistical causal inference in software engineering
2023, Information and Software TechnologyDynamic assessment of project portfolio risks from the life cycle perspective
2023, Computers and Industrial EngineeringCitation Excerpt :Fig. 4 illustrates that a simplified DAG is composed of root nodes, intermediate nodes, leaf nodes, and arrows that express the qualitative causal relationships between nodes. CPTs are assigned to nodes to denote quantitative relationships between variables (Hu et al., 2013). The BN presented in this study can be generated and computed using the software GeNIe 2.3.
A Bayesian-driven Monte Carlo approach for managing construction schedule risks of infrastructures under uncertainty
2023, Expert Systems with ApplicationsDelay-oriented risk network model for project risk response decisions
2022, Computers and Industrial Engineering
Dr. Yong Hu is currently an Associate Professor and Chair in the Department of E-commerce, and Director of Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-Sen University. He received his B.Sc in Computer Science, M.Phil and Ph.D. in Management Information Systems from Sun Yat-Sen University. His research interests are in the areas of business intelligence, quantitative investment, software project risk management, e-commerce and decision support systems. He has published works in a number of journals and conferences such as DSS, ESWA and IEEE ICDM. Dr. Hu's research is supported by the National Natural Science Foundation, the Science and Technology Planning Project of Guangdong Province.
Xiangzhou Zhang is a Ph.D. student in Sun Yat-sen University and working as an assistant researcher in Institute of Business Intelligence and Knowledge Discovery at the Guangdong University of Foreign Studies and Sun Yat-sen University. He has received his B.S. degree in Computer Science from Sun Yat-Sen University, and M.S. degree in Management from Guangdong University of Foreign Studies. His research interests include data mining, quantitative investment, software project risk management, and business intelligence.
Prof. Eric Ngai is a Professor in the Department of Management and Marketing at The Hong Kong Polytechnic University. His current research interests are in the areas of E-commerce, Supply Chain Management, Decision Support Systems and RFID Technology and Applications. He has published papers in a number of international journals including MIS Quarterly, Journal of Operations Management, Decision Support Systems, IEEE Transactions on Systems, Man and Cybernetics, Information & Management, Production & Operations Management, and others. He is an Associate Editor of European Journal of Information Systems and serves on editorial board of three international journals. Prof.Ngai has attained an h-index of 22, and received 1490 citations, ISI Web of Science.
Dr. Ruichu Cai received his B.S. in Applied Mathematics and PhD in Computer Science from South China University of Technology in 2005 and 2010, respectively. He is currently an Assistant Professor in Department of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China and State Key Laboratory for Novel Software Technology, Nanjing University, P.R. China. He was visiting student of National University of Singapore in 2007–2009. His research interests cover a variety of different topics including data mining, decision support systems, causal inference, association rule mining, and feature selection. He has published in a number of journals and conferences, such as Pattern Recognition, IEEE TKDE, and SIGMOD. Dr. Cai's Research is supported by the National Natural Science Foundation.
Dr. Mei Liu is currently an Assistant Professor in the Department of Computer Science at New Jersey Institute of Technology. She received her Ph.D. degree in computer science from the University of Kansas, Lawrence, USA and completed her postdoctoral training as an NIH-NLM research fellow in the Department of Biomedical Informatics at Vanderbilt University, Nashville, USA. Her research interest includes data mining, machine learning, text mining, decision support systems, quantitative investment, and medical informatics. She has published a number of papers in conferences and journals such as Bioinformatics, JAMIA, ESWA, EURASIP Journal on Applied Signal Processing, BMC Bioinformatics, PLoS ONE, and IEEE ICDM.