Authors:
Bruno Oliveira
and
Orlando Belo
Affiliation:
University of Minho, Portugal
Keyword(s):
Data Warehousing Systems, ETL Conceptual Modelling, Task Clustering, ETL Patterns, ETL Skeletons, BPMN Specification Models, and Kettle.
Related
Ontology
Subjects/Areas/Topics:
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Integrity
;
Data Management and Quality
;
Data Warehouse Management
;
Databases and Data Security
;
Enterprise Information Systems
;
Health Information Systems
;
Information and Systems Security
;
Information Quality
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Ontologies and the Semantic Web
;
Society, e-Business and e-Government
;
Web Information Systems and Technologies
Abstract:
Usually, data warehousing populating processes are data-oriented workflows composed by dozens of
granular tasks that are responsible for the integration of data coming from different data sources. Specific
subset of these tasks can be grouped on a collection together with their relationships in order to form higherlevel
constructs. Increasing task granularity allows for the generalization of processes, simplifying their
views and providing methods to carry out expertise to new applications. Well-proven practices can be used
to describe general solutions that use basic skeletons configured and instantiated according to a set of
specific integration requirements. Patterns can be applied to ETL processes aiming to simplify not only a
possible conceptual representation but also to reduce the gap that often exists between two design
perspectives. In this paper, we demonstrate the feasibility and effectiveness of an ETL pattern-based
approach using task clustering, analyzing a real world E
TL scenario through the definitions of two
commonly used clusters of tasks: a data lookup cluster and a data conciliation and integration cluster.
(More)