Skip to main content

Enhancing Data Warehouse Efficiency by Optimizing ETL Processing in Near Real Time Data Integration Environment

  • Conference paper
  • First Online:
Big Data Intelligence and Computing (DataCom 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13864))

Included in the following conference series:

  • 647 Accesses

Abstract

The importance of data towards business process enhancement is perhaps the most important element. Successful utilization of data collected enables business organizations to visualize relationships between business processes, identifying not only solutions to problems but also opportunities based on current trends, to make effective decisions and gain competitive advantage. The digital era has resulted in the continuous generation of volumetric data from various sources such as IoT devices, social media API’s and so on. These sources, combined with other core data sources for any respective business such as Online Transaction Processing (OLTP) are needed in real-time for effective decision making. This study identifies techniques which will optimize the real-time data warehouse ETL processes for the overall efficiency of the data warehouses implemented by business organizations. Due to the complex nature of data (structured, semi-structured and unstructured), this study proposes a unified model for efficient ELT processes which can be implemented for robust near-real data warehousing. Based on a significant review of the literature, this paper identifies and recommends instances where ELT and pushdown processes work optimally in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Iata.org (2022). https://www.iata.org/en/iata-repository/publications/economic-reports/fiji-value-of-aviation/. Accessed 04 July 2022

  2. Swapnil, G.: ETL in Near-real-time Environment: A Review of Challenges and Possible Solutions, ResearchGate.com (2020)

    Google Scholar 

  3. Gupta, A., Arun, S.: Proposed techniques to optimize the DW and ETL query for enhancing data warehouse efficiency. In: 5th International Conference on Computing, Communication and Security (ICCCS), Patna (2020)

    Google Scholar 

  4. Anderson, O., Thomsen, C., Kristian, T.: ETL processing by simple specifications. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference (2018)

    Google Scholar 

  5. Mehmood, E., Tayyaba, A.: Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8 (2020)

    Google Scholar 

  6. Sabtu, A., et al.: The challenges of extract, transform and load (ETL) for data integration in near real-time environment. J. Theor. Appl. Inf. Technol. 95(22), 6314–6322 (2017)

    Google Scholar 

  7. Yingchi, M., Xiafang, L.: Real-time data ETL framework for big real-time data analysis. In: IEEE International Conference on Information and Automation, Lijiang (2015)

    Google Scholar 

  8. Sreemathy, J., Deebika, R., Suganthi, K., Aisshwarya, T.: Data integration and ETL: a theoretical perspective. In: 7th International Conference on Advanced Computing and Communication Systems, Coimbatore (2021)

    Google Scholar 

  9. Biswas, N., Sarkar, A., Mondal, K.C.: Efficient incremental loading in ETL processing for real-time data integration. Innov. Syst. Softw. Eng. 16, 53–61 (2020)

    Article  Google Scholar 

  10. Gang, C., An, B., Liu, Y.: A novel agent-based parallel ETL system for massive data. In: Chinese Control and Decision Conference (CCDC), Yinchuan (2016)

    Google Scholar 

  11. Mudassir, M., Raghubeer, K., Dayanand, R.: Towards comparative analysis of resumption techniques in ETL. Indonesian J. Inf. Syst. 3(2), 82–93 (2021)

    Article  Google Scholar 

  12. Machado, G.V., Cunha, Í., Pereira, A.C., Oliveira, L.B.: DOD-ETL: distributed on-demand ETL. J. Internet Serv. 6, 10–21 (2019)

    Google Scholar 

  13. Pooja, W., Vaishali, D.: Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8 (2020)

    Google Scholar 

  14. Sreemathy, J., Joseph, I., Nisha, S., Chaaru, P., Gokula Priya, R.M.: Data integration in ETL using TALEND. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore (2020)

    Google Scholar 

  15. Jintao, G., Liu, W., Du, H., Zhang, X.: Batch insertion strategy in a distribution database. In: 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing (2017)

    Google Scholar 

  16. Abhishek, G., Arun, S.: A comprehensive survey to design efficient data warehouse for betterment of decision support systems for management and business corporates. Int. J. Manag. (IJM) 11(7), 463–471 (2020)

    Google Scholar 

  17. Berkani, N., Bellatreche, L., Carlos, O.: ETL-aware materialized view selection in semantic data stream warehouses. In: 12th International Conference on Research Challenges in Information Science (RCIS), Nantes (2018)

    Google Scholar 

  18. Suleykin, A., Panfilov, P.: Metadata-driven industrial-grade ETL system. In: IEEE International Conference on Big Data (Big Data), Atlanta (2020)

    Google Scholar 

  19. Diouf, P.S., Boly, A., Ndiaye, S.: Variety of data in the ETL processes in the cloud: state of the art. In: IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok (2018)

    Google Scholar 

  20. Kar, P., Mukherjee, R.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: IEEE 7th International Advance Computing Conference (IACC), Hyderabad (2017)

    Google Scholar 

  21. Qu, W., Stephan, D.: Incremental ETL pipeline scheduling for near real-time. Datenbanksysteme für Business, Technologie und Web (2017)

    Google Scholar 

  22. Ali, S.M.F., Wrembel, R.: From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J. 26(6), 777–801 (2017). https://doi.org/10.1007/s00778-017-0477-2

    Article  Google Scholar 

  23. Thangam, A.R., Peter, S.J.: An extensive survey on various query optimization technique. Int. J. Comput. Sci. Mob. Comput. 5(8), 148–154 (2016)

    Google Scholar 

  24. Santos, R.J., Bernardino, J.: Optimizing data warehouse loading procedures for enabling useful-time data warehousing. In: Proceedings of the 2009 International Database Engineering & Applications Symposium, pp. 292–299 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kunal Maharaj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maharaj, K., Kumar, K. (2023). Enhancing Data Warehouse Efficiency by Optimizing ETL Processing in Near Real Time Data Integration Environment. In: Hsu, CH., Xu, M., Cao, H., Baghban, H., Shawkat Ali, A.B.M. (eds) Big Data Intelligence and Computing. DataCom 2022. Lecture Notes in Computer Science, vol 13864. Springer, Singapore. https://doi.org/10.1007/978-981-99-2233-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2233-8_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2232-1

  • Online ISBN: 978-981-99-2233-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics