Efficiency evaluation of data warehouse operations

https://doi.org/10.1016/j.dss.2007.10.011Get rights and content

Abstract

We evaluate an efficiency model for data warehouse operations using data from USA and non-USA-based (mostly Korean) organizations. The analysis indicates wide dispersions in operational efficiency, industry and region differences, large differences in labor budgets between efficient and inefficient firms, few organizations efficient in both refresh processing and query production, and difficulty of providing some variables. Follow-up interviews provide insights about the value of efficiency comparisons of information technology organizations and suggestions to improve the model. Using this analysis, we propose a framework containing data warehouse characteristics and firm characteristics to explain IT operational efficiency at the subfirm level.

Introduction

Data warehouse, a term coined by William Inmon in 1990, refers to a central data repository where data from operational databases and other sources are integrated, cleaned, and archived to support decision-making. A data warehouse provides management with convenient access to large volumes of internal and external data. Because of the potential benefits, most medium to large organizations operate data warehouses. Many of these organizations have operated data warehouses for five years or more with continuing development to increase the size and scope of the data warehouses.

As data warehouse technology and deployment matures, efficient operation becomes a priority. Due to complex data requests, large volumes of data, incompatibilities among data sources, and other complicating factors, operating a data warehouse may involve high costs for complex hardware/software architectures and significant labor support. To measure and improve the efficiency of delivering information products, organizations should strive to compare with other peer organizations.

In this paper, we present a model to evaluate organizations operating significant data warehouses. The emphasis in our model is to measure relative efficiency of an information technology (IT) organization providing a significant information service, usually for internal usage. The model contains salient variables to evaluate refresh processing and query production, two major operations for data warehouses. The variables in the model include traditional resource consumption (labor usage and computing budgets), system usage measures (users and queries), data quality measures (timeliness and availability), and a size measure (amount of change data). To assess the model, we analyze a data set using Data Envelopment Analysis (DEA), comparing efficiencies for significant subsets of the model. In addition to the formal analysis, we propose a framework to explain the efficiency of information technology operations and provide anecdotal evidence about the difficulties of the evaluation.

The results of this study have important implications for quantitative evaluation of IT service organizations, particularly those providing complex data products. Efficiency models for complex data products support evaluation of tradeoffs between costs and data quality levels. Although there are numerous studies of information technology at the firm level, this paper emphasizes subfirm efficiency. As far as we are aware, this paper proposes the first efficiency model for data warehouse operations along with analysis of a data set and a framework to explain subfirm operations.

This paper is organized as follows. Section 2 summarizes related work about data warehouse operations, evaluations of data warehouse success, and efficiency evaluations of information technology. Section 3 presents the efficiency model and data collection details. Section 4 analyzes the efficiency results and discusses follow-up interviews with selected organizations about the impact of the results. Section 5 concludes the study.

Section snippets

Related work

Data warehouse operations have been studied in detail since the early 1990s primarily with a focus on the refresh process. Although most of the research involves algorithms, some of the research involves the representation and control of the refresh process. Bouzeghoub et al. [5] developed a flexible workflow model to represent the refresh process. Their conceptual model provides a framework to understand the details of refresh processes including constraints on data sources and data

Measuring efficiency of data warehouse operations

The emphasis in our model is to measure relative efficiency of an IT organization providing a significant information service, usually for internal usage. Efficiency is defined as the ratio of outputs to inputs. IT organizations are classified as efficient or inefficient relative to the peer IT organizations in the data set. A statement that an IT organization is efficient means that no other IT organizations are more efficient considering the combination of inputs and outputs in the data set.

Analysis of operational efficiency

Data was solicited from three groups in the second half of 2005. The Center for Information Technology Innovation (CITI) at the University of Colorado at Denver is group of Chief Information Officers from organizations with significant operations in the Denver, Colorado area. Surveys were sent to this group in June 2005 with a response rate of 10 out of 20 organizations.1 The BI Network (//www.B-EYE-Network.com

Conclusion

We presented an efficiency model for evaluating data warehouse operations and evaluated the relative efficiencies of a data set of organizations operating significant data warehouses. The efficiency model supports evaluation of refresh and query production processes for a collection of data warehouses managed by an IT organization. The variables in the models include traditional resource consumption (labor usage and computing budgets), system usage measures (number of queries, number of users,

Michael V. Mannino is an associate professor in the Business School of the University of Colorado Denver. Previously he was on the faculty at the University of Florida, University of Texas at Austin, and University of Washington. He has been active in research in database management, knowledge representation, and organizational impacts of technology. He has articles published in major journals of the IEEE (Transactions on Knowledge and Data Engineering and Transactions on Software Engineering),

References (30)

  • M. Breunig et al.

    LOF: Identifying Distance Based Local Outliers

  • Y. Chen et al.

    Measuring information technology's indirect impact on firm performance

    Information Technology and Management

    (2004)
  • W. Cooper et al.

    Measuring the efficiency of decision making units

    European Journal of Operational Research

    (1978)
  • W. Cooper et al.

    Data Envelopment Analysis — A Comprehensive Text with Models, Applications, References and DEA-Solver Software

    (2000)
  • B. Haley et al.

    The benefits of data warehousing at Whirlpool

    Annals of Cases on Information Technology Applications and Management in Organizations

    (1999)
  • Cited by (33)

    • Business intelligence and organizational learning: An empirical investigation of value creation processes

      2017, Information and Management
      Citation Excerpt :

      To the best of our knowledge, this study is the first to formally model BI infrastructure as mediating the value contribution of the BI team. Previous research has focused on the effects of BI infrastructure (e.g., Refs. [68,92]) or observed the effects of both BI infrastructure and team without formally analyzing the interrelationships between them [75,103][e.g.,75,103]. However, this mediated effect alone fails to capture the full relationship between BI assets and capabilities because the BI team may influence BI capabilities directly, without mediation by the BI infrastructure.

    • Efficient maintenance of basic statistical functions in data warehouses

      2014, Decision Support Systems
      Citation Excerpt :

      In other words, when a base table is changed, the data warehouse will use these auxiliary data to maintain the view instead of reaccessing the entire base table, significantly improving the maintenance processes. Since data warehouses often contain huge amounts of data, in order to keep performance smooth and constant, maintenance of a data warehouse is crucial. [16] suggested an efficiency model for data warehouse operations which contains two major processes (refresh processing and query production), and concluded that only few organizations can obtain good efficiency for both processes.

    View all citing articles on Scopus

    Michael V. Mannino is an associate professor in the Business School of the University of Colorado Denver. Previously he was on the faculty at the University of Florida, University of Texas at Austin, and University of Washington. He has been active in research in database management, knowledge representation, and organizational impacts of technology. He has articles published in major journals of the IEEE (Transactions on Knowledge and Data Engineering and Transactions on Software Engineering), ACM (Communications of the ACM and Computing Surveys), and INFORMS (Informs Journal on Computing and Information Systems Research). His research efforts have produced several popular survey and tutorial articles as well as many papers describing original research. He is the author of the textbook, Database Design, Application Development, and Administration, published by Irwin McGraw-Hill.

    Sa Neung Hong is an Associate Professor of Faculty of Business at the University of Seoul, Korea. He received his Ph.D. degree from the University of Texas, Austin. His research has appeared in Decision Support Systems and ORSA Journal on Computing. His research interests include Information Management, Service Computing, Very Large Scale Systems, IT Governance and ITSM. He has worked as a CIO of a major bank in Korea and has been providing consulting service very actively for IT Management and development of large Information Systems to various Korean financial institutions.

    Injun Choi received the Ph.D. degree in Management Information System from University of Texas at Austin, in 1991. He is currently a full professor in the department of industrial and management engineering at the Pohang University of Science and Technology. His research interests include workflow and business process management, and knowledge management, object-oriented modeling and reasoning, and database systems.

    View full text