Guest editorialData warehousing
Introduction
Information is one of the most valuable assets of an organization and when used properly can assist in intelligent decision-making that can significantly improve the functioning of an organization. Data warehousing [1] is a recent technology that allows information to be easily and efficiently accessed for decision-making activities. The data warehouse stores information of interest to the enterprise from multiple data sources and presents it in an integrated manner to the end user. This approach [2] to data integration and query processing from distributed data sources pays rich dividends when it translates into calculated decisions backed by sound analysis. The information stored in the data warehouse facilitates decision-making activities. On-line analytical processing (OLAP) tools provide an environment for decision-making and business modeling activities by supporting ad-hoc queries.
The warehouse data are typically modeled multidimensionally. New data models and new data structures that support the multidimensional data model (MDDM) effectively need to be developed. The multidimensional data model has been proved to be the most suitable for OLAP applications [3]. The conceptual multidimensional data model can be physically realized in two ways: (1) by using trusted relational databases (star schema/snowflake schema [4]) or (2) by making use of specialized multidimensional databases. One of the issues that is not yet fully addressed is how to define a set of constraints on this model so that we can exploit them in every area of warehouse implementation. We believe that the cube constraints will facilitate the conceptual schema design of a data warehouse and allow us to provide triggers in a multidimensional scenario. The constraint set can also be used to solve the existing problems like view maintenance [5]. Further, the constraint set will be a very effective tool that can be used in every aspect of data warehousing. The constraint set will enable us to define a complete algebra for the multidimensional data model, which is independent of the underlying schema definitions. Also, the constraint set will allow evolution of new attributes in the multidimensional scenario. Some of these constraints can be used to map the multidimensional world into the relational world.
Warehouse data can be seen as a set of materialized views over source data. Precomputation of queries in materialized views can give answers quickly but the number of views that should be materialized at the warehouse needs to be controlled, else this can result in a phenomenon known as data explosion. This problem is called view selection problem and there are many heuristics proposed in the literature [6].
A promising challenge in data warehousing is how to maintain the materialized views. When the data at any source changes, the materialized views at the data warehouse need to be updated accordingly. The process of keeping the views up-to-date in response to the changes in the source data is referred to as view maintenance [5], [7]. Refreshing the warehouse data is often done, for reasons of efficiency, using incremental techniques rather than recomputing the view from scratch. There has been significant research on view maintenance in data warehousing environment. The view maintenance has branched into a number of sub-problems like self-maintenance, consistency maintenance, update filtering and on-line view maintenance.
There are many other areas of active research in data warehousing, such as integrating active rules in warehouse data, multiple query optimization using views, update filtering, on-line view maintenance, fragmentation of multidimensional database, parallel processing, summarizability problem, data expiry, instance-based data mining. This special issue addresses some of the problems listed above.
Section snippets
Papers in this special issue
Based on the reviews, we have finally selected four papers in this special issue. First paper is by Dimitri Theodoratos et al. which deals with view selection problem for designing the global data warehouse. A global data warehouse integrate data from multiple databases, which can be considered as materialized views. Given a set of select-project-join (SPJ) queries to be satisfied by the DW, the paper provides a method that generates sets of materialized views that satisfy all the input
Sanjay Kumar Madria received his Ph.D. in Computer Science from Indian Institute of Technology, Delhi, India in 1995. He is an Assistant Professor, Department of Computer Science, at University of Missouri-Rolla, USA. Earlier he was Visiting Assistant Professor in the Department of Computer Science, Purdue University, West Lafayette, USA. He has also held appointments at Nanyang Technological University in Singapore and University Sains Malaysia in Malaysia. His research interests are in the
References (7)
- et al.
Making aggregate views self-maintainable
J. Data and Knowledge Eng.
(2000) Building the Data Warehouse
(1992)Research problems in data warehousing
Cited by (2)
Barriers to effective supply Chain management, implementation, and impact on business performance of SMEs in South Africa
2014, Journal of Applied Business ResearchResearch on post-processing of association rules during dynamic knowledge discovery process
2004, Proceedings of the International Conference on Artificial Intelligence, IC-AI'04
Sanjay Kumar Madria received his Ph.D. in Computer Science from Indian Institute of Technology, Delhi, India in 1995. He is an Assistant Professor, Department of Computer Science, at University of Missouri-Rolla, USA. Earlier he was Visiting Assistant Professor in the Department of Computer Science, Purdue University, West Lafayette, USA. He has also held appointments at Nanyang Technological University in Singapore and University Sains Malaysia in Malaysia. His research interests are in the area of web data management, mobile computing, and transaction processing. He has published more than 60 papers in International Journals and Conferences. He guest-edited WWW Journal. He is a Program-Chair for EC&WEB 2001 conference to be held in Germany. He is a PC member of many international database conferences and reviewers of many database journals. He is an IEEE Senior Member.