Skip to main content

A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL)

  • Conference paper
  • First Online:
Intelligent Computing

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 284))

Abstract

Data analytics plays a vital role in contemporary organizations, through analytics, organizations are able to derive knowledge and intelligence from data to support strategic decisions. An important step in data analytics is data integration, during which historic data is gathered from various sources and integrated into a centralized repository called data warehouse. Although there are various approaches for data integration, Extract Transform and Load (ETL) has become one of the most efficient and popular approach. Over the decades, ETL has been applied to a wide range of domains such as finance, health and telecom to mention but a few. As the popularity and use of ETL grow, it becomes important to analyze and identify the trends in the research and practice of ETL. In this paper, we perform a systematic literature review to identify and analyze: (1) Approaches used to implement existing ETL solutions (2) Quality attributes to be considered while adopting any ETL approach. (3) The depth of coverage in ETL research and practice with regards to the application domains, frequency publications and geographical locations of papers. (4) The prevailing challenges in developing ETL solutions. Furthermore, we discuss the implications of our findings to ETL researchers and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. El Akkaoui, Z., ZimĂ nyi, E., MazĂ³n, J.N.: A model-driven framework for ETL process development. In: Proceedings of the ACM (2011)

    Google Scholar 

  2. Aqlan, F., Nwokeji, J.C.: Applying product manufacturing techniques to teach data analytics in industrial engineering: a project based learning experience. In: 2018 IEEE Frontiers in Education Conference (FIE), pp. 1–7, October 2018

    Google Scholar 

  3. Aqlan, F., Nwokeji, J.C., Shamsan, A.: Teaching an introductory data analytics course using microsoft access® and excel®. In: 2020 IEEE Frontiers in Education Conference (FIE), pp. 1–10, October 2020

    Google Scholar 

  4. Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: 2014 IEEE International Congress on Big Data, pp. 522–529, June 2014

    Google Scholar 

  5. Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 1–11. ACM, New York (2009)

    Google Scholar 

  6. Deb Nath, R.P., Hose, K., Pedersen, T.B.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, DOLAP 2015, pp. 15–24. ACM, New York (2015)

    Google Scholar 

  7. El Akkaoui, Z., Zimanyi, E., Mazon Lopez, J.N., Trujillo Mondejar, J.C., et al.: A BPMN-based design and maintenance framework for ETL processes. Int. J. Data Warehous. Min. 9, 46–72 (2013)

    Article  Google Scholar 

  8. Freitas, A., Kampgen, B., Oliveira, J.G., ORiain, S., Curry, E.: Representing interoperable provenance descriptions for ETL workflows. In: Extended Semantic Web Conference, pp. 43–57. Springer (2012)

    Google Scholar 

  9. Gudivada, V.N., Baeza-Yates, R.A., Raghavan, V.V.: Big data: promises and problems. IEEE Comput. 48(3), 20–23 (2015)

    Article  Google Scholar 

  10. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering 45(4ve), 1051 (2007)

    Google Scholar 

  11. Nwokeji, J.C., Aqlan, F., Olagunju, A.: Big data ETL implementation approaches: a systematic literature review (P) (2018)

    Google Scholar 

  12. Nwokeji, J.C., Aqlan, F., Barn, B., Clark, T., Kulkarni, V.: A modelling technique for enterprise agility. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)

    Google Scholar 

  13. Nwokeji, J.C., Clark, T., Barn, B., Kulkarni, V.: A conceptual framework for enterprise agility. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1242–1244. ACM (2015)

    Google Scholar 

  14. Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing etl workflows for fault-tolerance. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 385–396, March 2010

    Google Scholar 

  15. Teodoro, D.H., et al.: Interoperability driven integration of biomedical data sources. Stud. Health Technol. Inf. 169, 185–9 (2011)

    Google Scholar 

  16. Theodorou, V., AbellĂ³, A., Lehner, W.: Quality measures for ETL processes. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 9–22. Springer (2014)

    Google Scholar 

  17. Wang, Y., Kung, L.A., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126, 3–13 (2018)

    Article  Google Scholar 

  18. Zhang, Y., Qiu, M., Tsai, C.-W., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)

    Article  Google Scholar 

  19. Ziegler, P., Dittrich, K.R.: Data integration-problems, approaches, and perspectives. In: Conceptual Modelling in Information Systems Engineering, pp. 39–58. Springer (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua C. Nwokeji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nwokeji, J.C., Matovu, R. (2021). A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL). In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 284. Springer, Cham. https://doi.org/10.1007/978-3-030-80126-7_24

Download citation

Publish with us

Policies and ethics