Active data warehouses: complementing OLAP with analysis rules

https://doi.org/10.1016/S0169-023X(01)00042-8Get rights and content

Abstract

Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.

Introduction

Database systems are the core components of OLTP (on-line transaction processing) systems, which support operational business processes. Usually, data describing a single business process are stored in different special-purpose OLTP databases (e.g., stock-keeping, production, sales, etc.). These systems have been designed using entity-relationship (ER) modeling or using a more recent object-oriented design notation such as the Unified Modeling Language (UML) [26], which may be used to describe the problem requirements in on-line transaction processing. But OLTP systems are poorly suited for decision support. Efforts to extract analytical information from OLTP databases result in complex queries, multiple joins, and large computation times. To support decision making for business analysis problems, OLAP (on-line analytical processing) is the right technology, which is defined as “a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user” [9]. The core component of an OLAP system is the data warehouse (DW), which is a decision-support database that is periodically updated by extracting, transforming, and loading operational data from several OLTP databases. A data warehouse organizes data using the dimensional modeling approach [18], which classifies data into measures and dimensions. Measures are the basic units of interest for analyses, such as sales figures or costs. Dimensions are the different perspectives for viewing measures. Examples are product, location, and time. Dimensions are organized hierarchically, which offers the possibility to view measures at different dimension levels (e.g., location: store  city  region). Aggregating measures up to a certain dimension level creates a multidimensional view on data, which is known as data cube or cube for short [1], [10], [15], [16]. To support interactive data analysis, OLAP tools provide basic navigation operations like slice and dice (i.e., select a subset of a cube's cells) and rollup and drilldown (i.e., aggregate facts from a fine-grained dimension level to a more coarse-grained dimension level and vice versa) [7].

A data warehouse extracts, transforms, and loads (ETL) data from OLTP systems autonomously without user interaction. After the data are loaded, analysts perform interactive analyses using OLAP tools to find solutions for different kinds of decision tasks. In the case of “non-routine decision tasks” analysts query the data warehouse to create different scenarios for solving a business problem. Decision making is accomplished by choosing the “most effective” scenario or by choosing the scenario with the “fewest side-effects”, etc. Such tasks do not occur regularly and/or there are no generally accepted decision criteria. Examples for non-routine decision tasks can be found in strategic business management (e.g., the decision to set up a new brand or the decision to change a business policy). In the case of “routine decision tasks”, analysts extract information from the data warehouse to solve well-structured problems by applying generally accepted procedures in decision making. Such decision tasks occur frequently at predictive timepoints. Examples can be found in the areas of product assortment (e.g., change price, withdraw product), customer relationship management (e.g., grant a discount to special customers), and in many administrative areas (e.g., accept/reject a paper for a conference). Finally, “semi-routine decision tasks” typically occur when routine problems require non-routine treatment (e.g., if a paper is rated contradictory it is discussed by the program committee before being accepted or rejected). Although today's OLAP and DW systems support ETL processes and interactive data analysis, they do not allow the automation of analysis processes for routine decision tasks and the routinizable elements of semi-routine decision tasks.

In this paper, we describe a novel architecture, the active data warehouse, which offers the possibility to automate decision making of routine decision tasks and the routinizable elements of semi-routine decision tasks. An active data warehouse can export decisions back to OLTP systems directly or after user confirmation. In doing so, such systems realize a closed-loop decision approach, which we call active data warehouse cycle or ADW cycle for short. Analysis of data is carried out with regard to executing (or not executing) a particular transaction for some entity in an OLTP database. To automate analysis and decision making we employ the idea of event/condition/action rules (ECA rules) from active database systems. ECA rules have been successfully introduced to automate recurring tasks of operational business processes (“business rules”) in OLTP databases making these systems more autonomous and business rules easier to maintain. An ECA rule relies on the basic principle that when a certain event occurs (the E) and a specified condition holds (the C), then an action (the A) will be executed [22], [23]. Once an ECA rule is defined it can be executed automatically without explicit user interaction or application request. ECA rules that mimic the work of an analyst are called analysis rules. Analysis rules are modeled from the perspective of an analyst who specifies (1) the timepoints at which the analysis rule has to be “fired”, (2) the cubes, which have to be analysed (the analysis graph) by using certain criteria for decision making (the decision steps), and (3) the transaction that will be executed for an entity in an OLTP system if a decision criterion is satisfied. Although analysis rules are founded on the same principle structure as ECA rules, there are some major differences to conventional ECA systems [8], [11], [12], [17], [23], [25], [31]:

  • 1.

    Active (OLTP) database systems offer a wide range of events from different sources (e.g., insert/update/delete, behaviour invocation, program exceptions, clock) [6], [13], [23]. Automating decision tasks in an active data warehouse requires only a simple event model, which consists of absolute temporal events (e.g., October 15th, 2000), periodic temporal events (e.g., at the end of every quarter), and relative temporal events (e.g., two weeks after a product's price was changed). These events are used to determine the endpoints of “observation periods” during which data that are needed for analyses will be collected in the data warehouse.

  • 2.

    The action part of an ECA rule in an active database system may be an arbitrary sequence of insert/update/delete statements, behaviour invocations, transactional statements, external calls, or some alternative statements using “do-instead” [23]. An analysis rule's action consists of a single statement, which is the directive to execute a transaction for some entity in an OLTP database.

  • 3.

    The most important difference between ECA rules and analysis rules can be found in the role of a rule's condition. The condition of an ECA rule is a simple boolean expression, which prevents the rule from action execution if the condition is violated. An analysis rule utilizes a set of conditions to perform multidimensional analyses that were formerly carried out manually by analysts. These conditions will be evaluated incrementally to mimic the “top-down” approach an analyst follows in analysing a certain problem. This approach requires coarse-grained cubes to be inspected first. If a decision to execute the rule's action cannot be made after analysing a coarse-grained cube, more detailed cubes must be analysed. Analysing a cube means evaluating a condition against the cube.

The conceptual architecture of an active data warehouse is shown in Fig. 1. The (passive) data warehouse repository answers ad hoc OLAP requests and is also used to create and maintain the cubes that will be analysed by analysis rules. An event manager detects and processes event occurrences, which are used to “fire” rules. A separate meta data repository provides information about events, cubes, conditions, and actions. Automatically generated decisions are exported by a mechanism that invokes transactions in OLTP systems. Finally, analysts access the rule base using various ADW tools (e.g., cube designer, rule translator) to define or modify analysis rules, events, cubes, etc.

Active data warehouses have several immediate impacts on the quality of decision-support activities. First, analysis rules will be specified once by the analyst and will be then executed automatically by the data warehouse. In specifying an analysis rule, implicit knowledge about decision-making procedures will be transformed into explicit rules, which are then available to a large community of data warehouse users. Thus, existing rules can be easily adapted to changing requirements and can be used further as patterns to solve new problems. Second, the active data warehouse is responsible for monitoring events and choosing the right time window for executing rules such that the overall system load is low (e.g., during the night). Finally, analysts can concentrate on more difficult problems (i.e., non-routine decision tasks) for which no decision procedures exist so far.

In this paper we concentrate on the key features of active data warehouses and make the following simplifying assumptions. (1) Refreshing the data warehouse is accomplished periodically at a constant interval, which also determines the finest time granularity at which events may occur (i.e., one chronon). For example, if the data warehouse is refreshed every 24 hours, then events are detected on a per-day basis and thus rules may be fired once per day. (2) While loading data from OLTP systems, rules may not be fired. (3) Whenever a rule is fired, the data warehouse contains the most current data that are needed by that rule for performing its analyses. (4) Analysing data is accomplished by using OLAP operators rollup, drilldown, slice, and intersection. More complex analyses that require drill-across are the subject of future work.

The remainder of this paper is organized as follows. Section 2 shortly explains the multidimensional data model and introduces a running example. Section 3 provides an overview of analysis rules. Section 4 describes how the conventional ECA rule components are adopted by analysis rules. Section 5 describes an approach to specify multidimensional analyses and proposes a model for decision making. Finally, Section 6 summarizes our achievements and gives an outline of future work. Due to space limitations, the syntax for specifying analysis rules is not included in this paper, but is given in [28].

Section snippets

Multidimensional data

This section explains the multidimensional data model and introduces a retail data warehouse application, which will be used in subsequent sections to exemplify the concepts and the functionality of analysis rules in an active data warehouse. Assume that our data warehouse configuration consists of an OLTP database and a data warehouse, which periodically (every day) loads new data from the OLTP database. First, we have a short look at the data model that describes the OLTP database. Although

An overview of analysis rules

The components of analysis rules are part of the metaknowledge of an active data warehouse, which comprises the additional information needed to automate routine decision tasks. This section provides an overview of these components. We first motivate this section by describing two typical routine decision tasks from our running example. Then we explain the metaknowledge of an active data warehouse from two different perspectives.

Example 3 Reducing the price of an article

Twenty days after a soft drink has been launched on a market,

Event, primary condition, and action

This section describes the conventional ECA rule parts – the event, the primary condition, and the action – for analysis rules. Note that these components treat data in the data warehouse from an object-oriented perspective, which has been discussed in Section 3. Following this perspective, analysts specify rules at a particular dimension level with the intention to analyse only the level instances of that dimension level.

Multidimensional analysis

When an analyst queries the data warehouse to make a decision whether to execute a transaction in an OLTP system, he follows an incremental top-down approach in creating and analysing cubes. First, the analyst creates a very “coarse-grained” cube that describes the objects for which a decision should be made (i.e., the level instances of the primary dimension level). Second, in carrying out multidimensional analyses the analyst compares, aggregates, transforms, etc. the cells of this cube. As a

Conclusion

In this paper we presented a novel approach to automate decision making in data warehouses which we call active data warehouse. The basic idea of active data warehouses is to extend conventional ECA rules with multidimensional features for analysing data and making decisions. These rules are called analysis rules since they mimic the work of an analyst. A key issue of analysis rules is that they are “fired” at some fixed points in time or as a reaction to recent happenings in the data

Thomas Thalhammer is a doctoral student and a research/teaching assistant at the Department of Business Informatics, University of Linz, Austria. He received his Magister degree in Information Systems from Johannes Kepler University of Linz in 1997. T. Thalhammer was a visiting researcher at the Advanced Computing Research Center, University of South Australia, Adelaide, during several months in 1999 and in 2000. His research interests are active database systems, data warehouses, and database

References (31)

  • S. Chakravarthy et al.

    Snoop: an expressive event specification language for active databases

    Data and Knowledge Engineering

    (1994)
  • A. Datta et al.

    The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses

    Decision Support Systems

    (1999)
  • R. Agrawal et al.

    Modeling multidimensional databases

  • A. Aiken et al.

    Static analysis techniques for predicting the behavior of active database rules

    ACM Transactions on Database Systems

    (1995)
  • E. Baralis et al.

    Compile-time and runtime analysis of active behaviors

    TKDE

    (1998)
  • L. Cabibbo et al.

    A logical approach to multidimensional databases

  • S. Castangia et al.

    Design and implementation for the active rule language of Chimera

  • S. Chaudhuri et al.

    An overview of data warehousing and OLAP technology

    ACM SIGMOD Record

    (1997)
  • C. Collet et al.

    NAOS: efficient and modular reactive capabilities in an object-oriented database system

  • OLAP Council, http://www.olapcouncil.org,...
  • A. Dinn et al.

    RAP: The ROCK & ROLL Active Programming System

  • P. Fraternali et al.

    Chimera: a language for designing rule applications

  • S. Gatziu et al.

    Events in an active object-oriented database system

  • M. Golfarelli et al.

    Conceptual design of data warehouses from E/R schemes

  • J. Gray et al.

    Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals

    Data Mining and Knowledge Discovery

    (1997)
  • Cited by (73)

    • A hybrid OLAP-association rule mining based quality management system for extracting defect patterns in the garment industry

      2013, Expert Systems with Applications
      Citation Excerpt :

      The data warehouse firstly extracts, transforms and loads (ETL) data from different data sources such as operational databases. After that, decision makers perform interactive analyses using OLAP tools to find solutions for their decision tasks (Thalhammer, Schrefl, & Mohania, 2001). Though interconnecting data warehouses and OLAP has been performing well in decision support functions, discovery of knowledge based on implicit and unknown patterns often provides enterprises with more insights into their businesses (Jukić & Nestorov, 2006).

    • Conceptualizing Analytics:An Overview of Business Intelligence and Analytics from a Conceptual-Modeling Perspective

      2023, Digital Transformation: Core Technologies and Emerging Topics from a Computer Science Perspective
    • Design and implementation of active stream data warehouses

      2021, Research Anthology on Decision Support Systems and Decision Management in Healthcare, Business, and Engineering
    • DOD-ETL: distributed on-demand ETL for near real-time business intelligence

      2019, Journal of Internet Services and Applications
    View all citing articles on Scopus

    Thomas Thalhammer is a doctoral student and a research/teaching assistant at the Department of Business Informatics, University of Linz, Austria. He received his Magister degree in Information Systems from Johannes Kepler University of Linz in 1997. T. Thalhammer was a visiting researcher at the Advanced Computing Research Center, University of South Australia, Adelaide, during several months in 1999 and in 2000. His research interests are active database systems, data warehouses, and database design for the WWW. T. Thalhammer is a member of ACM.

    Michael Schrefl received his Dipl.-Ing. degree and his Doctorate from Vienna University of Technology, Vienna, Austria, in 1983 and 1988, respectively. During 1983–1984 he studied at Vanderbilt University, USA, as a Fulbright scholar. From 1985 to 1992 he was with Vienna University of Technology. During 1987–1988, he was on leave at GMD IPSI, Darmstadt, where he worked on the integration of heterogeneous databases. He was appointed Professor of Information Systems at Johannes Kepler University of Linz, Austria, in 1992, and Professor in Computer and Information Science at University of South Australia in 1998. He currently leads the Data and Knowledge Engineering research group at the Department of Business Informatics, University of Linz, with projects in data warehousing, workflow management, and web engineering. Dr. Schrefl is a member of ACM and IEEE Computer Society.

    Mukesh Mohania received his Ph.D. in Computer Science and Engineering from Indian Institute of Technology, Mumbai, India, in 1995. Currently he is an Assistant Professor at Western Michigan University. He has published in the areas of distributed deductive databases, data warehousing, web databases, mobile databases, and data mining. He has received several research grants from the Australian Research Council, Defense Science and Technology Organization (Australia), and University of South Australia. He has been associated with many journals as reviewer and editorial board member.

    View full text