AScale: Auto-Scale in and out ETL+Q Framework

Martins, Pedro; Abbasi, Maryam; Furtado, Pedro

doi:10.1007/978-3-319-34099-9_24

Pedro Martins¹⁵,
Maryam Abbasi¹⁵ &
Pedro Furtado¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Included in the following conference series:

Abstract

The purpose of this study is to investigate the problem of providing automatic scalability and data freshness to data warehouses, while simultaneously dealing with high-rate data efficiently. In general, data freshness is not guaranteed in these contexts, since data loading, transformation and integration are heavy tasks that are performed only periodically.

Desirably, users developing data warehouses need to concentrate solely on the conceptual and logic design such as business driven requirements, logical warehouse schemas, workload and ETL process, while physical details, including mechanisms for scalability, freshness and integration of high-rate data, should be left for automated tools.

In this regard, we propose a universal data warehouse parallelization system, that is, an approach to enable the automatic scalability and freshness of warehouses and ETL processes. A general framework for testing and implementing the proposed system was developed. The results show that the proposed system is capable of handling scalability to provide the desired processing speed and data freshness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albrecht, A., Naumann, F.: Metl: managing and integrating ETL processes. In: VLDB PhD Workshop (2009)
Google Scholar
Ceri, S., Negri, M., Pelagatti, G.: Horizontal data partitioning in database design. In: Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data, pp. 128–136. ACM (1982)
Google Scholar
Council, T.P.P.: Tpc-h benchmark specification (2008). http://www.tcp.org/hspec.html
Furtado, P.: Efficient and robust node-partitioned data warehouses. In: Data Warehouses and OLAP: Concepts, Architectures, and Solutions, p. 203 (2007)
Google Scholar
Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014)
Google Scholar
Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient ETL execution. Inf. Syst. 38(6), 927–945 (2013)
Article Google Scholar
Liu, X.: Data warehousing technologies for large-scale and right-time data. Ph.D. thesis, dissertation, Faculty of Engineering and Science at Aalborg University, Denmark (2012)
Google Scholar
Liu, X., Thomsen, C., Pedersen, T.B.: Mapreduce-based dimensional ETL made easy. Proc. VLDB Endowment 5(12), 1882–1885 (2012)
Article Google Scholar
Muñoz, L., Mazón, J.N., Trujillo, J.: Automatic generation of ETL processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009)
Google Scholar
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007)
Google Scholar
Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 385–396. IEEE (2010)
Google Scholar
Thomsen, C., Bach Pedersen, T.: pygrametl: a powerful programming framework for extract-transform-load programmers. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 49–56. ACM (2009)
Google Scholar
Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Vassiliadis, P., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis. Annals of Information Systems, vol. 3, pp. 1–31. Springer, New York (2009)
Chapter Google Scholar

Download references

Acknowledgement

This project is part of a larger software prototype, partially financed by, Portugal, CISUC research group from the University of Coimbra and by the Foundation for Science and Technology.

Author information

Authors and Affiliations

Department of Computer Sciences, University of Coimbra, Coimbra, Portugal
Pedro Martins, Maryam Abbasi & Pedro Furtado

Authors

Pedro Martins
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Martins .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, P., Abbasi, M., Furtado, P. (2016). AScale: Auto-Scale in and out ETL+Q Framework. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-34099-9_24
Published: 28 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics