Skip to main content

Extensible Data Ingestion System for Industry 4.0

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2024)

Abstract

Industry 4.0 promotes a paradigm shift in the orchestration, oversight, and optimization of value chains across product and service life cycles. For instance, leveraging large-scale data from sensors and devices, coupled with Machine Learning techniques can enhance decision-making and facilitate various improvements in industrial settings, including predictive maintenance. However, ensuring data quality remains a significant challenge. Malfunctions in sensors or external factors such as electromagnetic interference have the potential to compromise data accuracy, thereby undermining confidence in related systems. Neglecting data quality not only compromises system outputs but also contributes to the proliferation of bad data, such as data duplication, inconsistencies, or inaccuracies. To consider these problems is crucial to fully explore the potential of data in Industry 4.0. This paper introduces an extensible system designed to ingest, organize, and monitor data generated by various sources, focusing on industrial settings. This system can serve as a foundation for enhancing intelligent processes and optimizing operations in smart manufacturing environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://kafka.apache.org/.

  2. 2.

    https://avro.apache.org/.

  3. 3.

    https://www.mongodb.com/.

  4. 4.

    https://cassandra.apache.org/.

  5. 5.

    https://grafana.com/.

  6. 6.

    https://www.influxdata.com/.

References

  1. Batini, C., Scannapieco, M.: Data and Information Quality. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7

  2. Cerquitelli, T., et al.: Enabling predictive analytics for smart manufacturing through an IIoT platform. IFAC-PapersOnLine 53, 179–184 (2020)

    Article  Google Scholar 

  3. Chawla, H., Khattar, P.: Data Ingestion, pp. 43–85. Apress, New York (2020)

    Google Scholar 

  4. Corrales, D., Corrales, J., Ledezma, A.: How to address the data quality issues in regression models: a guided process for data cleaning. Symmetry 10, 99 (2018)

    Article  Google Scholar 

  5. Dunning, T.: The t-digest: efficient estimates of distributions. Softw. Impacts 7, 100049 (2021)

    Article  Google Scholar 

  6. Frank, A.G., Dalenogare, L.S., Ayala, N.F.: Industry 4.0 technologies: implementation patterns in manufacturing companies. Int. J. Prod. Econ. 210, 15–26 (2019)

    Google Scholar 

  7. Goknil, A., et al.: A systematic review of data quality in CPS and IoT for Industry 4.0. ACM Comput. Surv. 55, 1–38 (2023)

    Google Scholar 

  8. Huru, D., Leordeanu, C., Apostol, E., Mocanu, M., Cristea, V.: BigClue analytics: a middleware component for modeling sensor data in IoT systems, pp. 891–896. IEEE, June 2018

    Google Scholar 

  9. Irfan, M., George, J.P.: A Systematic Review of Challenges, Tools, and Myths of Big Data Ingestion, pp. 481–494 (2022)

    Google Scholar 

  10. Iroju, O.G., Olaleke, J.O.: A systematic review of natural language processing in healthcare. Int. J. Inf. Technol. Comput. Sci. 7, 44–50 (2015)

    Google Scholar 

  11. Jeong, S., Yoo, G., Yoo, M., Yeom, I., Woo, H.: Resource-efficient sensor data management for autonomous systems using deep reinforcement learning. Sensors 19, 4410 (2019)

    Article  Google Scholar 

  12. Ji, C., et al.: Device data ingestion for industrial big data platforms with a case study. Sensors 16, 279 (2016)

    Article  Google Scholar 

  13. Mbowe, J.E., Oreku, G.S.: Quality of service in wireless sensor networks. Wirel. Sens. Netw. 06, 19–26 (2014)

    Google Scholar 

  14. Kuemper, D., Iggena, T., Toenjes, R., Pulvermueller, E.: Valid.IoT, pp. 294–303. ACM (6 2018)

    Google Scholar 

  15. Lee, J., Bagheri, B., Kao, H.A.: A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manuf. Lett. 3, 18–23 (2015)

    Google Scholar 

  16. Loshin, D.: The Practitioner’s Guide to Data Quality Improvement. Elsevier, New York (2011)

    Google Scholar 

  17. Mahanti, R.: Data Quality: Dimensions, Measurement, Strategy, Management, and Governance. ASQ Quality Press, USA (2019)

    Google Scholar 

  18. Oliveira, Ó., Oliveira, B.: An Extensible Framework for Data Reliability Assessment, pp. 77–84. SCITEPRESS - Science and Technology Publications (2022)

    Google Scholar 

  19. Qiao, L., et al.: Gobblin. Proc. VLDB Endow. 8, 1764–1769 (2015)

    Article  MathSciNet  Google Scholar 

  20. Rahm, E., Do, H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23, 3–13 (2000)

    Google Scholar 

  21. Sawant, N., Shah, H.: Big Data Ingestion and Streaming Patterns, pp. 29–42. Apress, New York (2013)

    Google Scholar 

  22. Seghezzi, E., et al.: Towards an occupancy-oriented digital twin for facility management: test campaign and sensors assessment. Appl. Sci. 11, 3108 (2021)

    Article  Google Scholar 

  23. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: Vision and challenges. IEEE Internet Things J. 3, 637–646 (2016)

    Article  Google Scholar 

  24. Vyas, P., Shinde, A., Diwase, D., Kathole, A.: Advancements in data ingestion and processing using Hadoop. SSRN Electron. J. (2023)

    Google Scholar 

  25. White, G., Nallur, V., Clarke, S.: Quality of service approaches in IoT: a systematic mapping. J. Syst. Softw. 132, 186–203 (2017)

    Article  Google Scholar 

  26. Xu, H., Yu, W., Griffith, D., Golmie, N.: A survey on industrial internet of things: a cyber-physical systems perspective. IEEE Access 6, 78238–78259 (2018)

    Article  Google Scholar 

  27. Zhang, H., Dong, Y., Xiao, C., Oyamada, M.: Jellyfish: a large language model for data preprocessing, December 2023. http://arxiv.org/abs/2312.01678

Download references

Acknowledgements

This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3 - “Agenda Mobilizadora da Fileira das Tecno-logias de Produção para a Reindustrialização”, Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oliveira, B., Oliveira, Ó., Peixoto, T., Ribeiro, F., Pereira, C. (2025). Extensible Data Ingestion System for Industry 4.0. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14969. Springer, Cham. https://doi.org/10.1007/978-3-031-73503-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73503-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73502-8

  • Online ISBN: 978-3-031-73503-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics