Abstract
Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. This paper extends relational algebra (RA) with update operations for specifying ETL processes at a logical level. In this approach, data tasks can be automatically translated into SQL queries to be executed over a DBMS. An extension of RA is presented, as well as a translation mechanism from BPMN to the RA specification. Throughout the paper, the TPC-DI benchmark is used for comparing both approaches. Experiments show the efficiency of the resulting ETL flow with respect to the Pentaho Data Integration tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
El Akkaoui, Z., Zimányi, E.: Defining ETL worfklows using BPMN and BPEL. In: Proceedings DOLAP, pp. 41–48. ACM (2009)
El Akkaoui, Z., Zimányi, E., Mazón, J.N., Trujillo, J.: A BPMN-based design and maintenance framework for ETL processes. Int. J. Data Warehouse. Min. (IJDWM) 9(3), 46–72 (2013)
Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley, New York (2011)
Muñoz, L., Mazón, J.-N., Pardillo, J., Trujillo, J.: Modelling ETL processes of data warehouses with UML activity diagrams. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2008. LNCS, vol. 5333, pp. 44–53. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88875-8_21
Pedersen, T.B.: Programmatic ETL. Business Intelligence and Big Data: 7th European Summer School, eBISS 2017, Bruxelles, Belgium, July 2–7, 2017, Tutorial Lectures 324, 21 (2018)
Poess, M., Rabl, T., Jacobsen, H.A., Caufield, B.: TPC-DI: the first industry benchmark for data integration. Proc. VLDB Endowment 7(13), 1367–1378 (2014)
Santos, V., Belo, O.: Slowly changing dimensions specification a relational algebra approach. Int. J. Inf. Technol. 1(3), 63–68 (2011)
Santos, V., Belo, O.: Modeling ETL data quality enforcement tasks using relational algebra operators. Procedia Technol. 9, 442–450 (2013)
Santos, V., Belo, O.: Modelling ETL conciliation tasks using relational algebra operators. In: Proceedings of the 2014 European Modelling Symposium, pp. 275–280. IEEE, Pisa (2014)
Trujillo, J., Luján-Mora, S.: A UML based approach for modeling ETL processes in data warehouses. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 307–320. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39648-2_25
Vaisman, A., Zimányi, E.: Data Warehouse Systems. DSA. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54655-6
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: Proceedings of DOLAP, pp. 14–21. ACM (2002)
Acknowledgments
Alejandro Vaisman was partially supported by PICT-2017 Project 1054 from the Argentinian Scientific Agency.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Awiti, J., Vaisman, A., Zimányi, E. (2019). From Conceptual to Logical ETL Design Using BPMN and Relational Algebra. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-27520-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27519-8
Online ISBN: 978-3-030-27520-4
eBook Packages: Computer ScienceComputer Science (R0)