Feasibility and effort estimation models for medium and small size information mining projects
Introduction
The organization management needs a lot of information for their decision-making process and the generation of strategic plans [1]. This valid and useful non-trivial information is normally referred to as knowledge [2]. This knowledge is located implicitly in the available data repositories in the organization and it can be extracted using the synthesis tools provided by Data Mining [3]. Data Mining focuses on the technology to be applied (i.e. tools and algorithms), while information mining focuses on which task and procedure must be developed to accomplish the project goals. In [4] “Information Mining” term refers to the sub-discipline of Information Systems, it studies, proposes and develops: processes, methods, techniques and methodologies to run this kind of project successfully. Consequently, it can be said that Data Mining is close to programming tasks, while Information Mining is close to Software Engineering activities.
The processes, methods, techniques and tools that come from Software Engineering cannot be used in Information Mining projects because of differences in goals and practical aspects between these two kinds of projects [5]. This means that ad-hoc processes, methods, techniques, tools, and methodologies should be developed considering Information Mining project main features. On the other hand, the methodologies most commonly used for Information Mining projects are CRISP-DM [6], SEMMA [7] and P3TQ [8]. These methodologies are considered as proven by the community, but they exhibit problems when trying to define the phases related to project management [4]:
- •
project management elements are mixed with the knowledge discovery process,
- •
they do not indicate the methods to be used for project monitoring, verification and measurement, and
- •
project characteristics performed within Small and Medium-sized Enterprises (SMEs) are not analyzed.
Moreover, conducted studies about Information Mining projects have detected that not all projects are completed successfully [9] and that there is a significant percentage of projects that fail [10]. In 2000, 85% of the projects failed to achieve its goals [11]. In other words, only 15 out of a 100 developed projects have been completed successfully. After nine years working, the community has been able to reduce this project failure rate to approximately 50% [12]. Therefore, we can say that the community is working in the right way but there are project elements that should be enhanced yet. In this context, this paper presents two ad-hoc models to be applied in Small and Medium-sized Enterprises: one for assessing project feasibility and the other one for estimating resources and time required to carry out the project. Both models should be applied at the beginning of the project. The article structure is as follows: first, we describe the main problem (Section 2), then we present the proposed models (Section 3) and the validation results (Section 4). Finally, the research work presents the main conclusions (Section 5).
Section snippets
Project failures’ analysis
Most Software Engineering projects can be considered (at least) partial failures because few projects meet all their cost, schedule, quality or required objectives [13]. From challenged or canceled projects, the project final cost average was 189% over budget, the project final time average was 222% on schedule, and contained only 61% (average) of the originally specified features [14]. Based on a survey carried out by the Standish Group [15], the top 10 reasons causing failure of software
Proposed models
This section presents two ad-hoc models proposed to be used at the beginning of an Information Mining project performed within Small and Medium-sized Enterprises (SMEs). First, the SMEs project characteristics are summarized (Section 3.1), then the model to assess the project feasibility is presented (Section 3.2) and, finally, the model that allows estimating the resources and time required to perform the project is described (Section 3.3).
These models have been specified based on actual
Proposed models׳ validation
In this section, the models׳ validation proposed in Section 3 is performed using 37 information mining projects׳ collected data. To perform this validation the calculated project values, by the corresponding model, are compared with the real values collected from a researchers׳ appraisal (considered experts in the domain).
As a result, it is possible to confirm that the feasibility model (Section 4.1) and the effort estimation model (Section 4.2) are reliable to be used within SMEs projects.
Conclusions
The term “data mining” is strongly linked to the database concept and goes back to the definition of pattern discovery algorithms on large databases. However, today there are lines of research in fields such as: text mining, image mining, data stream mining, web mining, among others. In this context, authors think that it is more appropriate to use the term “information mining” as a generic one to any of the aforementioned mining types. Then, information mining is a sub-discipline of
Acknowledgment
The research reported in this paper has been partially funded by Research Projects 33A105, 33B102 and 33A167 within Universidad Nacional de Lanús, Research Project 40B133 within Universidad Nacional de Río Negro, and Research Project UTI1867 within UTN in Buenos Aires.
References (40)
- et al.
A cost model to estimate the effort of data mining projects (DMCoMo)
Inf. Syst.
(2008) - et al.
Decision analysis of data mining project based on Bayesian risk
Expert Syst. Appl.
(2009) BI’s Promised Land
Intell. Enterp.
(2003)The wisdom hierarchy: representations of the DIKW hierarchy
J. Inf. Sci.
(2007)- et al.
Business intelligence
- et al.
Towards an information mining engineering, in: Software Engineering, Methods, Modeling and Teaching
(2011) - P. Chapman, J. Clinton, R. Keber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth CRISP-DM 1.0 Step by step BI guide....
- SAS Enterprise Miner: “SEMMA”, 2008...
Business Modeling and Business Intelligence
(2003)- et al.
Building, Using, and Managing the Data Warehouse
Data Warehousing Institute. Prentice-Hall PTR
(1997)
The Business Value of Data Warehouses – Opportunities, Pitfalls and Future Directions
“Tutorial report”. Summer school of DM
Major causes of software project failures
CrossTalk: J. Def. Softw. Eng.
1998
Why software fails
IEEE Spectr.
Software Requirements
Requirements elicitation in data mining for business intelligence projects
Software Engineering: A Practitioner’s Approach
Software Cost Estimation with COCOMO II
Cited by (7)
The impact of the role of technical support for the National Fund for Small and Medium Enterprises on entrepreneurship and innovation in projects: an applied study on a sample of small projects in the State of Kuwait
2021, Scientific Journal of the Faculty of Commerce, AssiutCrisp-dm/smes: A data analytics methodology for non-profit smes
2020, Advances in Intelligent Systems and ComputingProcess model for information exploitation engineering projects
2018, Avances en Ingenieria de Software a Nivel Iberoamericano, CIbSE 2018Process model proposal for requirements engineering in information mining projects
2017, Communications in Computer and Information ScienceAssessing ISO/IEC29110 by means of ITMark: results from an experience factory
2016, Journal of Software: Evolution and Process