Abstract
Industry 4.0 promotes a paradigm shift in the orchestration, oversight, and optimization of value chains across product and service life cycles. For instance, leveraging large-scale data from sensors and devices, coupled with Machine Learning techniques can enhance decision-making and facilitate various improvements in industrial settings, including predictive maintenance. However, ensuring data quality remains a significant challenge. Malfunctions in sensors or external factors such as electromagnetic interference have the potential to compromise data accuracy, thereby undermining confidence in related systems. Neglecting data quality not only compromises system outputs but also contributes to the proliferation of bad data, such as data duplication, inconsistencies, or inaccuracies. To consider these problems is crucial to fully explore the potential of data in Industry 4.0. This paper introduces an extensible system designed to ingest, organize, and monitor data generated by various sources, focusing on industrial settings. This system can serve as a foundation for enhancing intelligent processes and optimizing operations in smart manufacturing environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Batini, C., Scannapieco, M.: Data and Information Quality. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7
Cerquitelli, T., et al.: Enabling predictive analytics for smart manufacturing through an IIoT platform. IFAC-PapersOnLine 53, 179–184 (2020)
Chawla, H., Khattar, P.: Data Ingestion, pp. 43–85. Apress, New York (2020)
Corrales, D., Corrales, J., Ledezma, A.: How to address the data quality issues in regression models: a guided process for data cleaning. Symmetry 10, 99 (2018)
Dunning, T.: The t-digest: efficient estimates of distributions. Softw. Impacts 7, 100049 (2021)
Frank, A.G., Dalenogare, L.S., Ayala, N.F.: Industry 4.0 technologies: implementation patterns in manufacturing companies. Int. J. Prod. Econ. 210, 15–26 (2019)
Goknil, A., et al.: A systematic review of data quality in CPS and IoT for Industry 4.0. ACM Comput. Surv. 55, 1–38 (2023)
Huru, D., Leordeanu, C., Apostol, E., Mocanu, M., Cristea, V.: BigClue analytics: a middleware component for modeling sensor data in IoT systems, pp. 891–896. IEEE, June 2018
Irfan, M., George, J.P.: A Systematic Review of Challenges, Tools, and Myths of Big Data Ingestion, pp. 481–494 (2022)
Iroju, O.G., Olaleke, J.O.: A systematic review of natural language processing in healthcare. Int. J. Inf. Technol. Comput. Sci. 7, 44–50 (2015)
Jeong, S., Yoo, G., Yoo, M., Yeom, I., Woo, H.: Resource-efficient sensor data management for autonomous systems using deep reinforcement learning. Sensors 19, 4410 (2019)
Ji, C., et al.: Device data ingestion for industrial big data platforms with a case study. Sensors 16, 279 (2016)
Mbowe, J.E., Oreku, G.S.: Quality of service in wireless sensor networks. Wirel. Sens. Netw. 06, 19–26 (2014)
Kuemper, D., Iggena, T., Toenjes, R., Pulvermueller, E.: Valid.IoT, pp. 294–303. ACM (6 2018)
Lee, J., Bagheri, B., Kao, H.A.: A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manuf. Lett. 3, 18–23 (2015)
Loshin, D.: The Practitioner’s Guide to Data Quality Improvement. Elsevier, New York (2011)
Mahanti, R.: Data Quality: Dimensions, Measurement, Strategy, Management, and Governance. ASQ Quality Press, USA (2019)
Oliveira, Ó., Oliveira, B.: An Extensible Framework for Data Reliability Assessment, pp. 77–84. SCITEPRESS - Science and Technology Publications (2022)
Qiao, L., et al.: Gobblin. Proc. VLDB Endow. 8, 1764–1769 (2015)
Rahm, E., Do, H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23, 3–13 (2000)
Sawant, N., Shah, H.: Big Data Ingestion and Streaming Patterns, pp. 29–42. Apress, New York (2013)
Seghezzi, E., et al.: Towards an occupancy-oriented digital twin for facility management: test campaign and sensors assessment. Appl. Sci. 11, 3108 (2021)
Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: Vision and challenges. IEEE Internet Things J. 3, 637–646 (2016)
Vyas, P., Shinde, A., Diwase, D., Kathole, A.: Advancements in data ingestion and processing using Hadoop. SSRN Electron. J. (2023)
White, G., Nallur, V., Clarke, S.: Quality of service approaches in IoT: a systematic mapping. J. Syst. Softw. 132, 186–203 (2017)
Xu, H., Yu, W., Griffith, D., Golmie, N.: A survey on industrial internet of things: a cyber-physical systems perspective. IEEE Access 6, 78238–78259 (2018)
Zhang, H., Dong, Y., Xiao, C., Oyamada, M.: Jellyfish: a large language model for data preprocessing, December 2023. http://arxiv.org/abs/2312.01678
Acknowledgements
This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3 - “Agenda Mobilizadora da Fileira das Tecno-logias de Produção para a Reindustrialização”, Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, B., Oliveira, Ó., Peixoto, T., Ribeiro, F., Pereira, C. (2025). Extensible Data Ingestion System for Industry 4.0. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14969. Springer, Cham. https://doi.org/10.1007/978-3-031-73503-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-73503-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73502-8
Online ISBN: 978-3-031-73503-5
eBook Packages: Computer ScienceComputer Science (R0)