skip to main content
10.1145/1982185.1982251acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Efficiency evaluation of open source ETL tools

Published:21 March 2011Publication History

ABSTRACT

Business intelligence (BI) is considered to have a high impact on businesses. Research activity has risen in the last years. An important part of BI systems is a well performing implementation of the Extract, Transform, and Load (ETL) process. In typical BI projects, implementing the ETL process can be the task with the greatest effort. However, little work is published on ETL applications and in particular on open source ETL tools. We have analyzed open source ETL tools especially with regard to their performance. In this paper we present the analysis' background and highlight related work. We then sketch the test setup, show the detailed results for Talend Open Studio and Pentaho Data Integration, and discuss our observations. Eventually, we draw a conclusion and point out future work.

References

  1. S. Asghar, S. Fong, and T. Hussain. Business intelligence modeling: A case study of disaster management organization in pakistan. In Proc. ICCIT '09, pages 673--678, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bauer and H. Günzel. Data Warehouse Systeme: Architektur, Entwicklung, Anwendung. dpunkt, 3rd edition, 2009.Google ScholarGoogle Scholar
  3. S. Becker. Performance-related metrics in the ISO 9126 standard. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of Lecture Notes in Computer Science, pages 204--206. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. A. Bernstein and L. M. Haas. Information integration in the enterprise. Commun. ACM, 51(9): 72--79, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Darmont, O. Boussaid, and F. Bentayeb. Dweb: A data warehouse engineering benchmark. In A. M. Tjoa and J. Trujillo, editors, Proc. DaWaK 2005, volume 3589 of LNCS, pages 85--94. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Dell'Aquila, F. Di Tria, E. Lefons, and F. Tangorra. Business intelligence systems: a comparative analysis. WSEAS Trans. Info. Sci. and App., 5(5): 612--621, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dell'Aquila, F. Di Tria, E. Lefons, and F. Tangorra. Evaluating business intelligence platforms: a case study. In Proc. AIKED'08, pages 558--564, Stevens Point, Wisconsin, USA, 2008. WSEAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dittmar and P. Gluchowski. Synergiepotenziale und Herausforderungen von Knowledge Management und Business Intelligence, pages 27--42. Springer, 2002.Google ScholarGoogle Scholar
  9. V. Dwivedi and N. Kulkarni. Information as a service in a data analytics scenario - a case study. In Proc. ICWS '08, pages 615--620, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gartner, Inc. Press release: ETL magic quadrant update: A market in evolution, May 2002. Online: http://www.gartner.com/reprints/informatica/106602.html.Google ScholarGoogle Scholar
  11. Gartner Inc. Magic quadrant for data integration tools, September 2008. Online: http://www.sap.com/solutions/pdf/Magic_Quadrant_Data_Integration_Tools.pdf.Google ScholarGoogle Scholar
  12. Gartner, Inc. Who's who in open-source business intelligence, April 2008. Online: http://www.stratebi.es/todobi/may08/whos_who_in_opensource_busin_156326.pdf.Google ScholarGoogle Scholar
  13. Gartner, Inc. Press release: Gartner reveals five business intelligence predictions for 2009 and beyond, January 2009. Online: http://www.gartner.com/it/page.jsp?id=856714.Google ScholarGoogle Scholar
  14. M. Golfarelli. Open source BI platforms: A functional and architectural comparison. In Proc. DaWaK '09, pages 287--297, Berlin, Heidelberg, 2009. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Hawking, S. Foster, and A. Stein. The adoption and use of business intelligence solutions in australia. Int. J. Intell. Syst. Technol. Appl., 4(3/4): 327--340, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. H. Inmon. Building the Data Warehouse. Wiley, New York, NY, USA, 3rd edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kimball, L. Reeves, W. Thornthwaite, M. Ross, and W. Thornwaite. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses. Wiley, New York, NY, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Koziolek. Introduction to performance metrics. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of LNCS, pages 199--203. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. J. Lilja. Measuring computer performance: a practitioner's guide. Cambridge UP, New York, NY, USA, 2005.Google ScholarGoogle Scholar
  21. H. P. Luhn. A business intelligence system. IBM J. Res. Dev., 2(4): 314--319, 1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Mejía-Lavalle, R. Sosa R., N. González M., and L. Argotte R. Survey of business intelligence for energy markets. In Proc. HAIS '09, pages 235--243, Berlin, Heidelberg, 2009. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. T. Moss and S. Atre. Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Longman, Boston, MA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. n. A. The evolution of ETL. Technical report, Vivan Technologies, 2007. Online: http://vivantech.net/Documents/ETL-Evaluation.pdf.Google ScholarGoogle Scholar
  25. S. Negash and P. Gray. Business intelligence. In F. Burstein and C. W. Holsapple, editors, Handbook on Decision Support Systems 2: Variations. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Panian. Business intelligence in support of business strategy. In Proc. MCBE'06, pages 19--23, Stevens Point, 2006. WSEAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Panian. Expected progress in the field of business intelligence. In Proc. AIKED'09, pages 170--175, Stevens Point, 2009. WSEAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Sabett and H. Koziolek. Measuring performance metrics: Techniques and tools. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of LNCS, pages 226--232. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Silvers. Building and Maintaining a Data Warehouse. Auerbach Publications, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Simitsis, P. Vassiliadis, U. Dayal, A. Karagiannis, and V. Tziovara. Benchmarking ETL workflows. First TPC Technology Conf., TPCTC 2009, pages 199--220, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Simitsis, P. Vassiliadis, and T. Sellis. Optimizing ETL processes in data warehouses. In Proc. ICDE '05, pages 564--575, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Thomsen and T. B. Pedersen. A survey of open source tools for business intelligence. In A. M. Tjoa and J. Trujillo, editors, Proc. DaWaK 2005, volume 3589 of LNCS, pages 74--84. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Turban, R. Sharda, and D. Delen. Decision Support and Business Intelligence Systems. Prentice-Hall, Upper Saddle River, NJ, USA, 9th edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Vassiliadis, A. Karagiannis, V. Tziovara, and A. Simitsis. Towards a benchmark for ETL workflows. In V. Ganti and F. Naumann, editors, Proc. QDB 2007, pages 49--60, 2007.Google ScholarGoogle Scholar
  35. P. Vassiliadis, A. Simitsis, and E. Baikousi. A taxonomy of ETL activities. In Proc. DOLAP '09, pages 25--32, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Vercellis. Business Intelligence: Data Mining and Optimization for Decision Making. Wiley, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. J. Watson and B. H. Wixom. The current state of business intelligence. Computer, 40(9): 96--99, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Williams and N. Williams. The Profit Impact of Business Intelligence. Morgan Kaufmann, San Francisco, CA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Wyatt, B. Caufield, and D. Pol. Principles for an ETL benchmark. First TPC Technology Conf., TPCTC 2009, pages 183--198, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z. Xu, M. Zhang, and X. Jiang. Business intelligence - a case study in life insurance industry. In Proc. ICEBE '05, Washington, DC, USA, 2005. IEEE CS. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiency evaluation of open source ETL tools

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
            March 2011
            1868 pages
            ISBN:9781450301138
            DOI:10.1145/1982185

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 March 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,650of6,669submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader