Abstract
The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semi-structured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integration of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Tu DQ, Kayes A, Rahayu W, Nguyen K (2019) Isdi: A new window-based framework for integrating iot streaming data from multiple sources. In: International conference on advanced information networking and applications. Springer, New York, pp 498–511
Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York
Chen J, Chen Y, Du X, Li C, Lu J, Zhao S et al (2013) Big data challenge: a data management perspective. Front Comput Sci 7(2):157–164
Harris GJ, Rago SA, Williams TH (2004) Distributed storage resource management in a storage area network. US Patent 6,826,580
Chakravarthy SK, Sudhakar N, Reddy ES, Subramanian DV, Shankar P (2018) Dimension reduction and storage optimization techniques for distributed and big data cluster environment. In: Soft computing and medical bioinformatics. Springer, New York, pp 47–54
McNeill N, Kardes H, Borthwick A (2012) Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th international workshop on quality in databases (QDB), pp 1–7
Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endow 2(1):1282–1293
Marinier P, Anepu BM, Pelletier G, Olesen RL (2015) Maintaining time alignment with multiple uplink carriers. US Patent 8,934,459
Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, et al. (2013) Timestream: Reliable stream computation in the cloud. In: Proceedings of the 8th ACM European conference on computer systems. ACM, pp 1–14
Cugola G, Margara A (2012) Processing flows of information: from data stream to complex event processing. ACM Comput Surv (CSUR) 44(3):15
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164
Sarma AD, Dong XL, Halevy AY (2011) Uncertainty in data integration and dataspace support platforms. In: Schema matching and mapping. Springer, New York, pp 75–108
Bellahsène Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New York
Doan QT, Kayes A, Rahayu W, Nguyen K (2020) Integration of iot streaming data with efficient indexing and storage optimization. IEEE Access 8:47456–47467
Gudivada VN, Baeza-Yates RA, Raghavan VV (2015) Big data: promises and problems. IEEE Comput 48(3):20–23
Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67
Dong XL, Srivastava D (2013) Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 1245–1248
Sagi T, Gal A, Barkol O, Bergman R, Avram A (2017) Multi-source uncertain entity resolution: transforming holocaust victim reports into people. Inf Syst 65:124–136
Doan A, Domingos PM, Levy AY (2000) Learning source description for data integration. In: WebDB (informal proceedings), pp 81–86
Calbimonte JP, Corcho O, Gray AJ (2010) Enabling ontology-based access to streaming data sources. In: International semantic Web conference. Springer, New York, pp 96–111
Daraio C, Lenzerini M, Leporelli C, Moed HF, Naggar P, Bonaccorsi A et al (2016a) Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2):857–871
Daraio C, Lenzerini M, Leporelli C, Naggar P, Bonaccorsi A, Bartolucci A (2016b) The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1):441–455
Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346
Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Agarwal M, et al. (2017) Striim: a streaming analytics platform for real-time business decisions. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, pp 1–8
Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Lakshminarayanan M (2018) Real-time etl in striim. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, p 3
Ahad MA, Biswas R (2018) Dynamic merging based small file storage (dm-sfs) architecture for efficiently storing small size files in hadoop. Procedia Comput Sci 132:1626–1635
Kayes A, Han J, Colman A (2012) Icaf: A context-aware framework for access control. In: Australasian conference on information security and privacy. Springer, New York, pp 442–449
Kayes A, Han J, Colman A, Islam MS (2014) Relboss: a relationship-aware access control framework for software services. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 258–276
Kayes A, Rahayu W, Dillon T, Chang E, Han J (2017) Context-aware access control with imprecise context characterization through a combined fuzzy logic and ontology-based approach. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 132–153
Kayes A, Han J, Rahayu W, Dillon T, Islam S, Colman A (2018a) A policy model and framework for context-aware access control to information resources. Comput J. https://doi.org/10.1093/comjnl/bxy065
Kayes A, Rahayu W, Dillon T, Chang E (2018b) Accessing data from multiple sources through context-aware access control. In: TrustCom. IEEE, pp 551–559
Kayes A, Rahayu W, Dillon T, Chang E, Han J (2019) Context-aware access control with imprecise context characterization for cloud-based data resources. Future Gener Comput Syst 93:237–255
Savaglio C, Gerace P, Di Fatta G, Fortino G (2019) Data mining at the iot edge. In: 2019 28th international conference on computer communication and networks (ICCCN). IEEE, pp 1–6
Belli L, Cirani S, Davoli L, Ferrari G, Melegari L, Montón M et al (2015) A scalable big stream cloud architecture for the internet of things. Int J Syst Serv-Orient Eng (IJSSOE) 5(4):26–53
Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396
Li Q, Moon B, Lopez I (2004) Skyline index for time series data. IEEE Trans Knowl Data Eng 16(6):669–684
Ma Y, Rao J, Hu W, Meng X, Han X, Zhang Y et al (2012) An efficient index for massive iot data in cloud environment. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 2129–2133
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
An earlier version of this paper has been presented and received a best paper award in the 33rd International Conference on Advanced Information Networking and Applications (AINA-2019) [1].
Rights and permissions
About this article
Cite this article
Tu, D.Q., Kayes, A.S.M., Rahayu, W. et al. IoT streaming data integration from multiple sources. Computing 102, 2299–2329 (2020). https://doi.org/10.1007/s00607-020-00830-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-020-00830-9