Elsevier

Information Systems

Volume 47, January 2015, Pages 1-14
Information Systems

Feasibility and effort estimation models for medium and small size information mining projects

https://doi.org/10.1016/j.is.2014.06.004Get rights and content

Highlights

  • We are interested in Small and Medium-sized Enterprises (SMEs) Information Systems.

  • Our research focuses on managing Information Mining Projects (IMP) in SMEs.

  • We propose a model for assessing IMP feasibility in SMEs.

  • We propose a model for estimating resources required to carry out IMP in SMEs.

  • Proposed models are validated in IMP real projects.

Abstract

Information mining is a sub-discipline of Information Systems which provides the non-trivial knowledge needed for making decisions inside an organization. Although such projects have different features from Software Engineering ones, they share some of their problems. Among these problems two are highlighted: unmanaged risks and inaccurate estimations of necessary resources to complete the project. In this context, this paper presents two ad-hoc models to be applied in Small and Medium-sized Enterprises: one for assessing project feasibility and the other for estimating the resources and time required to carry out the project. Both models should be applied at the beginning of the project.

Introduction

The organization management needs a lot of information for their decision-making process and the generation of strategic plans [1]. This valid and useful non-trivial information is normally referred to as knowledge [2]. This knowledge is located implicitly in the available data repositories in the organization and it can be extracted using the synthesis tools provided by Data Mining [3]. Data Mining focuses on the technology to be applied (i.e. tools and algorithms), while information mining focuses on which task and procedure must be developed to accomplish the project goals. In [4] “Information Mining” term refers to the sub-discipline of Information Systems, it studies, proposes and develops: processes, methods, techniques and methodologies to run this kind of project successfully. Consequently, it can be said that Data Mining is close to programming tasks, while Information Mining is close to Software Engineering activities.

The processes, methods, techniques and tools that come from Software Engineering cannot be used in Information Mining projects because of differences in goals and practical aspects between these two kinds of projects [5]. This means that ad-hoc processes, methods, techniques, tools, and methodologies should be developed considering Information Mining project main features. On the other hand, the methodologies most commonly used for Information Mining projects are CRISP-DM [6], SEMMA [7] and P3TQ [8]. These methodologies are considered as proven by the community, but they exhibit problems when trying to define the phases related to project management [4]:

  • project management elements are mixed with the knowledge discovery process,

  • they do not indicate the methods to be used for project monitoring, verification and measurement, and

  • project characteristics performed within Small and Medium-sized Enterprises (SMEs) are not analyzed.

Moreover, conducted studies about Information Mining projects have detected that not all projects are completed successfully [9] and that there is a significant percentage of projects that fail [10]. In 2000, 85% of the projects failed to achieve its goals [11]. In other words, only 15 out of a 100 developed projects have been completed successfully. After nine years working, the community has been able to reduce this project failure rate to approximately 50% [12]. Therefore, we can say that the community is working in the right way but there are project elements that should be enhanced yet. In this context, this paper presents two ad-hoc models to be applied in Small and Medium-sized Enterprises: one for assessing project feasibility and the other one for estimating resources and time required to carry out the project. Both models should be applied at the beginning of the project. The article structure is as follows: first, we describe the main problem (Section 2), then we present the proposed models (Section 3) and the validation results (Section 4). Finally, the research work presents the main conclusions (Section 5).

Section snippets

Project failures’ analysis

Most Software Engineering projects can be considered (at least) partial failures because few projects meet all their cost, schedule, quality or required objectives [13]. From challenged or canceled projects, the project final cost average was 189% over budget, the project final time average was 222% on schedule, and contained only 61% (average) of the originally specified features [14]. Based on a survey carried out by the Standish Group [15], the top 10 reasons causing failure of software

Proposed models

This section presents two ad-hoc models proposed to be used at the beginning of an Information Mining project performed within Small and Medium-sized Enterprises (SMEs). First, the SMEs project characteristics are summarized (Section 3.1), then the model to assess the project feasibility is presented (Section 3.2) and, finally, the model that allows estimating the resources and time required to perform the project is described (Section 3.3).

These models have been specified based on actual

Proposed models׳ validation

In this section, the models׳ validation proposed in Section 3 is performed using 37 information mining projects׳ collected data. To perform this validation the calculated project values, by the corresponding model, are compared with the real values collected from a researchers׳ appraisal (considered experts in the domain).

As a result, it is possible to confirm that the feasibility model (Section 4.1) and the effort estimation model (Section 4.2) are reliable to be used within SMEs projects.

Conclusions

The term “data mining” is strongly linked to the database concept and goes back to the definition of pattern discovery algorithms on large databases. However, today there are lines of research in fields such as: text mining, image mining, data stream mining, web mining, among others. In this context, authors think that it is more appropriate to use the term “information mining” as a generic one to any of the aforementioned mining types. Then, information mining is a sub-discipline of

Acknowledgment

The research reported in this paper has been partially funded by Research Projects 33A105, 33B102 and 33A167 within Universidad Nacional de Lanús, Research Project 40B133 within Universidad Nacional de Río Negro, and Research Project UTI1867 within UTN in Buenos Aires.

References (40)

  • O. Marbán et al.

    A cost model to estimate the effort of data mining projects (DMCoMo)

    Inf. Syst.

    (2008)
  • G. Nie et al.

    Decision analysis of data mining project based on Bayesian risk

    Expert Syst. Appl.

    (2009)
  • E.. Thomsen

    BI’s Promised Land

    Intell. Enterp.

    (2003)
  • J.. Rowley

    The wisdom hierarchy: representations of the DIKW hierarchy

    J. Inf. Sci.

    (2007)
  • S. Negash et al.

    Business intelligence

  • R. García-Martínez et al.

    Towards an information mining engineering, in: Software Engineering, Methods, Modeling and Teaching

    (2011)
  • P. Chapman, J. Clinton, R. Keber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth CRISP-DM 1.0 Step by step BI guide....
  • SAS Enterprise Miner: “SEMMA”, 2008...
  • D. Pyle

    Business Modeling and Business Intelligence

    (2003)
  • H.A. Edelstein et al.

    Building, Using, and Managing the Data Warehouse

    Data Warehousing Institute. Prentice-Hall PTR

    (1997)
  • M. Strand

    The Business Value of Data Warehouses – Opportunities, Pitfalls and Future Directions

    (2000)
  • U.M. Fayyad

    “Tutorial report”. Summer school of DM

    (2000)
  • O. Marbán, G. Mariscal, J. Segovia, A data mining & knowledge discovery process model. Data Mining and Knowledge...
  • L.J. May

    Major causes of software project failures

    CrossTalk: J. Def. Softw. Eng.

    (2009)

    1998

    (2009)
  • R.N. Charette

    Why software fails

    IEEE Spectr.

    (2005)
  • Standish Group. “CHAOS Summary Report 2009”...
  • K. Wiegers

    Software Requirements

    (2003)
  • P. Britos et al.

    Requirements elicitation in data mining for business intelligence projects

  • R. Pressman

    Software Engineering: A Practitioner’s Approach

    (2004)
  • B. Boehm et al.

    Software Cost Estimation with COCOMO II

    (2000)
  • View full text