Abstract
The importance of data towards business process enhancement is perhaps the most important element. Successful utilization of data collected enables business organizations to visualize relationships between business processes, identifying not only solutions to problems but also opportunities based on current trends, to make effective decisions and gain competitive advantage. The digital era has resulted in the continuous generation of volumetric data from various sources such as IoT devices, social media API’s and so on. These sources, combined with other core data sources for any respective business such as Online Transaction Processing (OLTP) are needed in real-time for effective decision making. This study identifies techniques which will optimize the real-time data warehouse ETL processes for the overall efficiency of the data warehouses implemented by business organizations. Due to the complex nature of data (structured, semi-structured and unstructured), this study proposes a unified model for efficient ELT processes which can be implemented for robust near-real data warehousing. Based on a significant review of the literature, this paper identifies and recommends instances where ELT and pushdown processes work optimally in real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Iata.org (2022). https://www.iata.org/en/iata-repository/publications/economic-reports/fiji-value-of-aviation/. Accessed 04 July 2022
Swapnil, G.: ETL in Near-real-time Environment: A Review of Challenges and Possible Solutions, ResearchGate.com (2020)
Gupta, A., Arun, S.: Proposed techniques to optimize the DW and ETL query for enhancing data warehouse efficiency. In: 5th International Conference on Computing, Communication and Security (ICCCS), Patna (2020)
Anderson, O., Thomsen, C., Kristian, T.: ETL processing by simple specifications. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference (2018)
Mehmood, E., Tayyaba, A.: Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8 (2020)
Sabtu, A., et al.: The challenges of extract, transform and load (ETL) for data integration in near real-time environment. J. Theor. Appl. Inf. Technol. 95(22), 6314–6322 (2017)
Yingchi, M., Xiafang, L.: Real-time data ETL framework for big real-time data analysis. In: IEEE International Conference on Information and Automation, Lijiang (2015)
Sreemathy, J., Deebika, R., Suganthi, K., Aisshwarya, T.: Data integration and ETL: a theoretical perspective. In: 7th International Conference on Advanced Computing and Communication Systems, Coimbatore (2021)
Biswas, N., Sarkar, A., Mondal, K.C.: Efficient incremental loading in ETL processing for real-time data integration. Innov. Syst. Softw. Eng. 16, 53–61 (2020)
Gang, C., An, B., Liu, Y.: A novel agent-based parallel ETL system for massive data. In: Chinese Control and Decision Conference (CCDC), Yinchuan (2016)
Mudassir, M., Raghubeer, K., Dayanand, R.: Towards comparative analysis of resumption techniques in ETL. Indonesian J. Inf. Syst. 3(2), 82–93 (2021)
Machado, G.V., Cunha, Í., Pereira, A.C., Oliveira, L.B.: DOD-ETL: distributed on-demand ETL. J. Internet Serv. 6, 10–21 (2019)
Pooja, W., Vaishali, D.: Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8 (2020)
Sreemathy, J., Joseph, I., Nisha, S., Chaaru, P., Gokula Priya, R.M.: Data integration in ETL using TALEND. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore (2020)
Jintao, G., Liu, W., Du, H., Zhang, X.: Batch insertion strategy in a distribution database. In: 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing (2017)
Abhishek, G., Arun, S.: A comprehensive survey to design efficient data warehouse for betterment of decision support systems for management and business corporates. Int. J. Manag. (IJM) 11(7), 463–471 (2020)
Berkani, N., Bellatreche, L., Carlos, O.: ETL-aware materialized view selection in semantic data stream warehouses. In: 12th International Conference on Research Challenges in Information Science (RCIS), Nantes (2018)
Suleykin, A., Panfilov, P.: Metadata-driven industrial-grade ETL system. In: IEEE International Conference on Big Data (Big Data), Atlanta (2020)
Diouf, P.S., Boly, A., Ndiaye, S.: Variety of data in the ETL processes in the cloud: state of the art. In: IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok (2018)
Kar, P., Mukherjee, R.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: IEEE 7th International Advance Computing Conference (IACC), Hyderabad (2017)
Qu, W., Stephan, D.: Incremental ETL pipeline scheduling for near real-time. Datenbanksysteme für Business, Technologie und Web (2017)
Ali, S.M.F., Wrembel, R.: From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J. 26(6), 777–801 (2017). https://doi.org/10.1007/s00778-017-0477-2
Thangam, A.R., Peter, S.J.: An extensive survey on various query optimization technique. Int. J. Comput. Sci. Mob. Comput. 5(8), 148–154 (2016)
Santos, R.J., Bernardino, J.: Optimizing data warehouse loading procedures for enabling useful-time data warehousing. In: Proceedings of the 2009 International Database Engineering & Applications Symposium, pp. 292–299 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Maharaj, K., Kumar, K. (2023). Enhancing Data Warehouse Efficiency by Optimizing ETL Processing in Near Real Time Data Integration Environment. In: Hsu, CH., Xu, M., Cao, H., Baghban, H., Shawkat Ali, A.B.M. (eds) Big Data Intelligence and Computing. DataCom 2022. Lecture Notes in Computer Science, vol 13864. Springer, Singapore. https://doi.org/10.1007/978-981-99-2233-8_21
Download citation
DOI: https://doi.org/10.1007/978-981-99-2233-8_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2232-1
Online ISBN: 978-981-99-2233-8
eBook Packages: Computer ScienceComputer Science (R0)