ABSTRACT
Business intelligence (BI) is considered to have a high impact on businesses. Research activity has risen in the last years. An important part of BI systems is a well performing implementation of the Extract, Transform, and Load (ETL) process. In typical BI projects, implementing the ETL process can be the task with the greatest effort. However, little work is published on ETL applications and in particular on open source ETL tools. We have analyzed open source ETL tools especially with regard to their performance. In this paper we present the analysis' background and highlight related work. We then sketch the test setup, show the detailed results for Talend Open Studio and Pentaho Data Integration, and discuss our observations. Eventually, we draw a conclusion and point out future work.
- S. Asghar, S. Fong, and T. Hussain. Business intelligence modeling: A case study of disaster management organization in pakistan. In Proc. ICCIT '09, pages 673--678, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
- A. Bauer and H. Günzel. Data Warehouse Systeme: Architektur, Entwicklung, Anwendung. dpunkt, 3rd edition, 2009.Google Scholar
- S. Becker. Performance-related metrics in the ISO 9126 standard. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of Lecture Notes in Computer Science, pages 204--206. Springer, 2008. Google ScholarDigital Library
- P. A. Bernstein and L. M. Haas. Information integration in the enterprise. Commun. ACM, 51(9): 72--79, 2008. Google ScholarDigital Library
- J. Darmont, O. Boussaid, and F. Bentayeb. Dweb: A data warehouse engineering benchmark. In A. M. Tjoa and J. Trujillo, editors, Proc. DaWaK 2005, volume 3589 of LNCS, pages 85--94. Springer, 2005. Google ScholarDigital Library
- C. Dell'Aquila, F. Di Tria, E. Lefons, and F. Tangorra. Business intelligence systems: a comparative analysis. WSEAS Trans. Info. Sci. and App., 5(5): 612--621, 2008. Google ScholarDigital Library
- C. Dell'Aquila, F. Di Tria, E. Lefons, and F. Tangorra. Evaluating business intelligence platforms: a case study. In Proc. AIKED'08, pages 558--564, Stevens Point, Wisconsin, USA, 2008. WSEAS. Google ScholarDigital Library
- C. Dittmar and P. Gluchowski. Synergiepotenziale und Herausforderungen von Knowledge Management und Business Intelligence, pages 27--42. Springer, 2002.Google Scholar
- V. Dwivedi and N. Kulkarni. Information as a service in a data analytics scenario - a case study. In Proc. ICWS '08, pages 615--620, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
- Gartner, Inc. Press release: ETL magic quadrant update: A market in evolution, May 2002. Online: http://www.gartner.com/reprints/informatica/106602.html.Google Scholar
- Gartner Inc. Magic quadrant for data integration tools, September 2008. Online: http://www.sap.com/solutions/pdf/Magic_Quadrant_Data_Integration_Tools.pdf.Google Scholar
- Gartner, Inc. Who's who in open-source business intelligence, April 2008. Online: http://www.stratebi.es/todobi/may08/whos_who_in_opensource_busin_156326.pdf.Google Scholar
- Gartner, Inc. Press release: Gartner reveals five business intelligence predictions for 2009 and beyond, January 2009. Online: http://www.gartner.com/it/page.jsp?id=856714.Google Scholar
- M. Golfarelli. Open source BI platforms: A functional and architectural comparison. In Proc. DaWaK '09, pages 287--297, Berlin, Heidelberg, 2009. Springer. Google ScholarDigital Library
- P. Hawking, S. Foster, and A. Stein. The adoption and use of business intelligence solutions in australia. Int. J. Intell. Syst. Technol. Appl., 4(3/4): 327--340, 2008. Google ScholarDigital Library
- W. H. Inmon. Building the Data Warehouse. Wiley, New York, NY, USA, 3rd edition, 2002. Google ScholarDigital Library
- R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley, 2004. Google ScholarDigital Library
- R. Kimball, L. Reeves, W. Thornthwaite, M. Ross, and W. Thornwaite. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses. Wiley, New York, NY, USA, 1998. Google ScholarDigital Library
- H. Koziolek. Introduction to performance metrics. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of LNCS, pages 199--203. Springer, 2008. Google ScholarDigital Library
- D. J. Lilja. Measuring computer performance: a practitioner's guide. Cambridge UP, New York, NY, USA, 2005.Google Scholar
- H. P. Luhn. A business intelligence system. IBM J. Res. Dev., 2(4): 314--319, 1958. Google ScholarDigital Library
- M. Mejía-Lavalle, R. Sosa R., N. González M., and L. Argotte R. Survey of business intelligence for energy markets. In Proc. HAIS '09, pages 235--243, Berlin, Heidelberg, 2009. Springer. Google ScholarDigital Library
- L. T. Moss and S. Atre. Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Longman, Boston, MA, USA, 2003. Google ScholarDigital Library
- n. A. The evolution of ETL. Technical report, Vivan Technologies, 2007. Online: http://vivantech.net/Documents/ETL-Evaluation.pdf.Google Scholar
- S. Negash and P. Gray. Business intelligence. In F. Burstein and C. W. Holsapple, editors, Handbook on Decision Support Systems 2: Variations. Springer, 2008. Google ScholarDigital Library
- Z. Panian. Business intelligence in support of business strategy. In Proc. MCBE'06, pages 19--23, Stevens Point, 2006. WSEAS. Google ScholarDigital Library
- Z. Panian. Expected progress in the field of business intelligence. In Proc. AIKED'09, pages 170--175, Stevens Point, 2009. WSEAS. Google ScholarDigital Library
- A. Sabett and H. Koziolek. Measuring performance metrics: Techniques and tools. In I. Eusgeld, F. C. Freiling, and R. Reussner, editors, Dependability Metrics: Advanced Lectures, volume 4909 of LNCS, pages 226--232. Springer, 2008. Google ScholarDigital Library
- F. Silvers. Building and Maintaining a Data Warehouse. Auerbach Publications, 2008. Google ScholarDigital Library
- A. Simitsis, P. Vassiliadis, U. Dayal, A. Karagiannis, and V. Tziovara. Benchmarking ETL workflows. First TPC Technology Conf., TPCTC 2009, pages 199--220, 2009. Google ScholarDigital Library
- A. Simitsis, P. Vassiliadis, and T. Sellis. Optimizing ETL processes in data warehouses. In Proc. ICDE '05, pages 564--575, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- C. Thomsen and T. B. Pedersen. A survey of open source tools for business intelligence. In A. M. Tjoa and J. Trujillo, editors, Proc. DaWaK 2005, volume 3589 of LNCS, pages 74--84. Springer, 2005. Google ScholarDigital Library
- E. Turban, R. Sharda, and D. Delen. Decision Support and Business Intelligence Systems. Prentice-Hall, Upper Saddle River, NJ, USA, 9th edition, 2010. Google ScholarDigital Library
- P. Vassiliadis, A. Karagiannis, V. Tziovara, and A. Simitsis. Towards a benchmark for ETL workflows. In V. Ganti and F. Naumann, editors, Proc. QDB 2007, pages 49--60, 2007.Google Scholar
- P. Vassiliadis, A. Simitsis, and E. Baikousi. A taxonomy of ETL activities. In Proc. DOLAP '09, pages 25--32, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. Vercellis. Business Intelligence: Data Mining and Optimization for Decision Making. Wiley, 2009. Google ScholarDigital Library
- H. J. Watson and B. H. Wixom. The current state of business intelligence. Computer, 40(9): 96--99, 2007. Google ScholarDigital Library
- S. Williams and N. Williams. The Profit Impact of Business Intelligence. Morgan Kaufmann, San Francisco, CA, 2006. Google ScholarDigital Library
- L. Wyatt, B. Caufield, and D. Pol. Principles for an ETL benchmark. First TPC Technology Conf., TPCTC 2009, pages 183--198, 2009. Google ScholarDigital Library
- Z. Xu, M. Zhang, and X. Jiang. Business intelligence - a case study in life insurance industry. In Proc. ICEBE '05, Washington, DC, USA, 2005. IEEE CS. Google ScholarDigital Library
Index Terms
- Efficiency evaluation of open source ETL tools
Recommendations
Data integration flows for business intelligence
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyBusiness Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today's BI architecture typically consists of a data warehouse (...
Experimental results on change data capture methods implementation in different data structures to support real-time data warehouse
The need for rapid decision making in the organisation, causing the importance of developing a real-time data warehouse system. In addition, what also needs to be considered is how to make the extract, transform, load (ETL) processes carried out, would ...
Towards generating ETL processes for incremental loading
IDEAS '08: Proceedings of the 2008 international symposium on Database engineering & applicationsExtract, Transform, and Load (ETL) processes physically integrate data from multiple, heterogeneous sources in a central repository referred to as data warehouse. Physically integrated data gets stale when source data is changed, hence periodic refreshes ...
Comments