Abstract
ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical.
Similar content being viewed by others
References
Inmon B. The data warehouse budget. DM Review Magazine, 1997.
Demarest M. The politics of data warehousing, 1997. http://www.uncg.edu/ism/ism611/politics.pdf, Jan. 2015.
Vassiliadis P, Simitsis A, Terrovitis M et al. Blueprints and measures for ETL workflows. In Proc. the 24th Int. Conf. Conceptual Modeling, Oct. 2005, pp.385-400.
Vassiliadis P, Simitsis A, Skiadopoulos S. On the logical modeling of ETL processes. In Proc. the 14th Int. Conf. Advanced Information Systems Engineering, May 2002, pp.782-786.
Bleiholder J, Naumann F. Declarative data fusion — Syntax, semantics, and implementation. In Proc. the 9th East European Conf. Advances in Databases and Information Systems, Sept. 2005, pp.58-73.
Bao Y, Song J, Leng F et al. Study and implementation of a new SQL-based ETL approach. Wuhan University Journal of Natural Sciences, 2007, 12(5): 804–808.
Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual modeling for ETL processes. In Proc. the 5th ACM International Workshop on Data Warehousing and OLAP, Nov. 2002, pp.14-21.
Simitsis A, Vassiliadis P. A methodology for the conceptual modeling of ETL processes. In Proc. DSE, June 2003, pp.305-316.
Skoutas D, Simitsis A. Ontology-based conceptual design of ETL processes for both structured and semi-structured data. International Journal on Semantic Web and Information Systems, 2007, 3(4): 1–24.
Strauch S, Andrikopoulos V, Bachmann T et al. Decision support for the migration of the application database layer to the cloud. In Proc. the 5th IEEE International Conference on Cloud Computing Technology and Science, Dec. 2013, pp.639-646.
Aslam U, Mukhtar H. Data sharing in data-centric multitenant software as a service. In Proc. the 2th International Conference on Cloud and Green Computing, Nov. 2012, pp.113-117.
Berchtold S, Keim D A, Kriegel H P. The X-tree: An index structure for high-dimensional data. In Proc. the 22nd VLDB, Aug. 1996, pp.28-39.
Katayama N, Satoh S. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. SIG-MOD, May 1997, pp.369-380.
White D A, Jain R. Similarity indexing with the SS-tree. In Proc. ICDE, Feb. 1996, pp.516-523.
Datar M, Immorlica N, Indyk P, Mirrokni V S. Localitysensitive hashing scheme based on p-stable distributions. In Proc. the 20th SCG, June 2004, pp.253-262.
Gan J, Feng J, Fang Q, Ng W. Locality sensitive hashing scheme based on dynamic collision counting. In Proc. SIG-MOD, May 2012, pp.541-552.
Heo J P, Lee Y, He J et al. Spherical hashing. In Proc. CVPR, June 2012, pp.2957-2964.
Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117–128.
Norouzi M, Fleet D. Cartesian k-means. In Proc. CVPR, June 2013, pp.3017-3024.
DeHaan D, Larson P A, Zhou J. Stacked indexed views in Microsoft SQL Server. In Proc. SIGMOD, June 2005, pp.179-190.
Pottinger R, Halevy A. MiniCon: A scalable algorithm for answering queries using views. The VLDB J., 2001, 10(2/3): 182–198.
Ross K A, Srivastava D, Sudarshan S. Materialized view maintenance and integrity constraint checking: Trading space for time. In Proc. SIGMOD, June 1996, pp.447-458.
Segev A, Fang W. Currency-based updates to distributed materialized views. In Proc. the 6th ICDE, Feb. 1990, pp.519-520.
Chand R, Felber P. Semantic peer-to-peer overlays for publish/subscribe networks. In Lecture Notes in Computer Science 3648, Cunha J C, Medeiros P D(eds.), Springer-Verlag, 2005, pp.1194-1204.
Papaemmanouil O, Cetintemel U. SemCast: Semantic multicast for content-based data dissemination. In Proc. the 21st ICDE, April 2005, pp.242-253.
Terpstra W W, Behnel S, Fiege L et al. A peer-to-peer approach to content-based publish/subscribe. In Proc. the 2nd DEBS, June 2003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Special Section on Applications and Industry
The work was supported by the Guangdong Talents Program of China under Grant No. 201001D0104726115.
Rights and permissions
About this article
Cite this article
Guo, SS., Yuan, ZM., Sun, AB. et al. A New ETL Approach Based on Data Virtualization. J. Comput. Sci. Technol. 30, 311–323 (2015). https://doi.org/10.1007/s11390-015-1524-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-015-1524-3