Skip to main content
Log in

A New ETL Approach Based on Data Virtualization

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Inmon B. The data warehouse budget. DM Review Magazine, 1997.

  2. Demarest M. The politics of data warehousing, 1997. http://www.uncg.edu/ism/ism611/politics.pdf, Jan. 2015.

  3. Vassiliadis P, Simitsis A, Terrovitis M et al. Blueprints and measures for ETL workflows. In Proc. the 24th Int. Conf. Conceptual Modeling, Oct. 2005, pp.385-400.

  4. Vassiliadis P, Simitsis A, Skiadopoulos S. On the logical modeling of ETL processes. In Proc. the 14th Int. Conf. Advanced Information Systems Engineering, May 2002, pp.782-786.

  5. Bleiholder J, Naumann F. Declarative data fusion — Syntax, semantics, and implementation. In Proc. the 9th East European Conf. Advances in Databases and Information Systems, Sept. 2005, pp.58-73.

  6. Bao Y, Song J, Leng F et al. Study and implementation of a new SQL-based ETL approach. Wuhan University Journal of Natural Sciences, 2007, 12(5): 804–808.

    Article  Google Scholar 

  7. Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual modeling for ETL processes. In Proc. the 5th ACM International Workshop on Data Warehousing and OLAP, Nov. 2002, pp.14-21.

  8. Simitsis A, Vassiliadis P. A methodology for the conceptual modeling of ETL processes. In Proc. DSE, June 2003, pp.305-316.

  9. Skoutas D, Simitsis A. Ontology-based conceptual design of ETL processes for both structured and semi-structured data. International Journal on Semantic Web and Information Systems, 2007, 3(4): 1–24.

    Article  Google Scholar 

  10. Strauch S, Andrikopoulos V, Bachmann T et al. Decision support for the migration of the application database layer to the cloud. In Proc. the 5th IEEE International Conference on Cloud Computing Technology and Science, Dec. 2013, pp.639-646.

  11. Aslam U, Mukhtar H. Data sharing in data-centric multitenant software as a service. In Proc. the 2th International Conference on Cloud and Green Computing, Nov. 2012, pp.113-117.

  12. Berchtold S, Keim D A, Kriegel H P. The X-tree: An index structure for high-dimensional data. In Proc. the 22nd VLDB, Aug. 1996, pp.28-39.

  13. Katayama N, Satoh S. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. SIG-MOD, May 1997, pp.369-380.

  14. White D A, Jain R. Similarity indexing with the SS-tree. In Proc. ICDE, Feb. 1996, pp.516-523.

  15. Datar M, Immorlica N, Indyk P, Mirrokni V S. Localitysensitive hashing scheme based on p-stable distributions. In Proc. the 20th SCG, June 2004, pp.253-262.

  16. Gan J, Feng J, Fang Q, Ng W. Locality sensitive hashing scheme based on dynamic collision counting. In Proc. SIG-MOD, May 2012, pp.541-552.

  17. Heo J P, Lee Y, He J et al. Spherical hashing. In Proc. CVPR, June 2012, pp.2957-2964.

  18. Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117–128.

    Article  Google Scholar 

  19. Norouzi M, Fleet D. Cartesian k-means. In Proc. CVPR, June 2013, pp.3017-3024.

  20. DeHaan D, Larson P A, Zhou J. Stacked indexed views in Microsoft SQL Server. In Proc. SIGMOD, June 2005, pp.179-190.

  21. Pottinger R, Halevy A. MiniCon: A scalable algorithm for answering queries using views. The VLDB J., 2001, 10(2/3): 182–198.

    MATH  Google Scholar 

  22. Ross K A, Srivastava D, Sudarshan S. Materialized view maintenance and integrity constraint checking: Trading space for time. In Proc. SIGMOD, June 1996, pp.447-458.

  23. Segev A, Fang W. Currency-based updates to distributed materialized views. In Proc. the 6th ICDE, Feb. 1990, pp.519-520.

  24. Chand R, Felber P. Semantic peer-to-peer overlays for publish/subscribe networks. In Lecture Notes in Computer Science 3648, Cunha J C, Medeiros P D(eds.), Springer-Verlag, 2005, pp.1194-1204.

  25. Papaemmanouil O, Cetintemel U. SemCast: Semantic multicast for content-based data dissemination. In Proc. the 21st ICDE, April 2005, pp.242-253.

  26. Terpstra W W, Behnel S, Fiege L et al. A peer-to-peer approach to content-based publish/subscribe. In Proc. the 2nd DEBS, June 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu-Sheng Guo.

Additional information

Special Section on Applications and Industry

The work was supported by the Guangdong Talents Program of China under Grant No. 201001D0104726115.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, SS., Yuan, ZM., Sun, AB. et al. A New ETL Approach Based on Data Virtualization. J. Comput. Sci. Technol. 30, 311–323 (2015). https://doi.org/10.1007/s11390-015-1524-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1524-3

Keywords

Navigation