Skip to main content
Log in

IoT streaming data integration from multiple sources

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semi-structured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integration of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Tu DQ, Kayes A, Rahayu W, Nguyen K (2019) Isdi: A new window-based framework for integrating iot streaming data from multiple sources. In: International conference on advanced information networking and applications. Springer, New York, pp 498–511

  2. Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York

    Google Scholar 

  3. Chen J, Chen Y, Du X, Li C, Lu J, Zhao S et al (2013) Big data challenge: a data management perspective. Front Comput Sci 7(2):157–164

    MathSciNet  Google Scholar 

  4. Harris GJ, Rago SA, Williams TH (2004) Distributed storage resource management in a storage area network. US Patent 6,826,580

  5. Chakravarthy SK, Sudhakar N, Reddy ES, Subramanian DV, Shankar P (2018) Dimension reduction and storage optimization techniques for distributed and big data cluster environment. In: Soft computing and medical bioinformatics. Springer, New York, pp 47–54

  6. McNeill N, Kardes H, Borthwick A (2012) Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th international workshop on quality in databases (QDB), pp 1–7

  7. Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endow 2(1):1282–1293

    Google Scholar 

  8. Marinier P, Anepu BM, Pelletier G, Olesen RL (2015) Maintaining time alignment with multiple uplink carriers. US Patent 8,934,459

  9. Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, et al. (2013) Timestream: Reliable stream computation in the cloud. In: Proceedings of the 8th ACM European conference on computer systems. ACM, pp 1–14

  10. Cugola G, Margara A (2012) Processing flows of information: from data stream to complex event processing. ACM Comput Surv (CSUR) 44(3):15

    Google Scholar 

  11. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448

  12. Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164

    MATH  Google Scholar 

  13. Sarma AD, Dong XL, Halevy AY (2011) Uncertainty in data integration and dataspace support platforms. In: Schema matching and mapping. Springer, New York, pp 75–108

  14. Bellahsène Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New York

    MATH  Google Scholar 

  15. Doan QT, Kayes A, Rahayu W, Nguyen K (2020) Integration of iot streaming data with efficient indexing and storage optimization. IEEE Access 8:47456–47467

    Google Scholar 

  16. Gudivada VN, Baeza-Yates RA, Raghavan VV (2015) Big data: promises and problems. IEEE Comput 48(3):20–23

    Google Scholar 

  17. Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67

    Google Scholar 

  18. Dong XL, Srivastava D (2013) Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 1245–1248

  19. Sagi T, Gal A, Barkol O, Bergman R, Avram A (2017) Multi-source uncertain entity resolution: transforming holocaust victim reports into people. Inf Syst 65:124–136

    Google Scholar 

  20. Doan A, Domingos PM, Levy AY (2000) Learning source description for data integration. In: WebDB (informal proceedings), pp 81–86

  21. Calbimonte JP, Corcho O, Gray AJ (2010) Enabling ontology-based access to streaming data sources. In: International semantic Web conference. Springer, New York, pp 96–111

  22. Daraio C, Lenzerini M, Leporelli C, Moed HF, Naggar P, Bonaccorsi A et al (2016a) Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2):857–871

    Google Scholar 

  23. Daraio C, Lenzerini M, Leporelli C, Naggar P, Bonaccorsi A, Bartolucci A (2016b) The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1):441–455

    Google Scholar 

  24. Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346

    MathSciNet  MATH  Google Scholar 

  25. Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Agarwal M, et al. (2017) Striim: a streaming analytics platform for real-time business decisions. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, pp 1–8

  26. Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Lakshminarayanan M (2018) Real-time etl in striim. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, p 3

  27. Ahad MA, Biswas R (2018) Dynamic merging based small file storage (dm-sfs) architecture for efficiently storing small size files in hadoop. Procedia Comput Sci 132:1626–1635

    Google Scholar 

  28. Kayes A, Han J, Colman A (2012) Icaf: A context-aware framework for access control. In: Australasian conference on information security and privacy. Springer, New York, pp 442–449

  29. Kayes A, Han J, Colman A, Islam MS (2014) Relboss: a relationship-aware access control framework for software services. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 258–276

  30. Kayes A, Rahayu W, Dillon T, Chang E, Han J (2017) Context-aware access control with imprecise context characterization through a combined fuzzy logic and ontology-based approach. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 132–153

  31. Kayes A, Han J, Rahayu W, Dillon T, Islam S, Colman A (2018a) A policy model and framework for context-aware access control to information resources. Comput J. https://doi.org/10.1093/comjnl/bxy065

    Article  Google Scholar 

  32. Kayes A, Rahayu W, Dillon T, Chang E (2018b) Accessing data from multiple sources through context-aware access control. In: TrustCom. IEEE, pp 551–559

  33. Kayes A, Rahayu W, Dillon T, Chang E, Han J (2019) Context-aware access control with imprecise context characterization for cloud-based data resources. Future Gener Comput Syst 93:237–255

    Google Scholar 

  34. Savaglio C, Gerace P, Di Fatta G, Fortino G (2019) Data mining at the iot edge. In: 2019 28th international conference on computer communication and networks (ICCCN). IEEE, pp 1–6

  35. Belli L, Cirani S, Davoli L, Ferrari G, Melegari L, Montón M et al (2015) A scalable big stream cloud architecture for the internet of things. Int J Syst Serv-Orient Eng (IJSSOE) 5(4):26–53

    Google Scholar 

  36. Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396

    Google Scholar 

  37. Li Q, Moon B, Lopez I (2004) Skyline index for time series data. IEEE Trans Knowl Data Eng 16(6):669–684

    Google Scholar 

  38. Ma Y, Rao J, Hu W, Meng X, Han X, Zhang Y et al (2012) An efficient index for massive iot data in cloud environment. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 2129–2133

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. S. M. Kayes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An earlier version of this paper has been presented and received a best paper award in the 33rd International Conference on Advanced Information Networking and Applications (AINA-2019) [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, D.Q., Kayes, A.S.M., Rahayu, W. et al. IoT streaming data integration from multiple sources. Computing 102, 2299–2329 (2020). https://doi.org/10.1007/s00607-020-00830-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-020-00830-9

Keywords

Mathematics Subject Classification

Navigation