Abstract
Internet of things (IoT) allows various smart devices to get connected to anything, anywhere, and at anytime. The ubiquitous nature of IoT devices generates huge volume of data called Internet of Big data (IoBd). IoBd is generated in continuous streams and at unprecedented speed. The rapid analysis of such IoBd streams is the need of hour. Moreover, the allocation of optimal number of cloud resources for real time analysis of IoBd streams is a challenging task. Most of the current methods use data characteristics provided by the user to allocate cloud nodes. But in case of IoBd streams, data characteristics are usually unknown to the user because of the stochastic nature of IoT devices. This poses difficulty in selecting appropriate cloud resources. This paper proposes an efficient method to tackle this issue. The proposed method first predicts the data characteristics of IoBd stream in terms of volume, velocity and variety (3Vs). Later, these predicted values are expressed in terms of a triplet called Charactrization of Stream (CoSt). On the other hand, self-organizing maps are used to create dynamic clusters of cloud resources. One of the clusters is allocated to IoBd stream based upon its CoSt. Experimental results show that the proposed method effectively boosted the performance of cloud resources and minimized the execution and waiting time of IoBd stream processing.
Similar content being viewed by others
References
Zheng Z, Wu X, Zhang Y, Lyu MR, Wang J (2013) QoS ranking prediction for cloud services. IEEE Trans Parallel Distrib Syst 24(6):1213–1222
Sandhu R, Sood SK (2014) Scheduling of big data applications on distributed cloud based on QoS parameters. Clust Comput 18(2):817–828
EC2 instance types—Amazon Web Services (AWS). https://aws.amazon.com/ec2/instance-types/. Accessed 10 Jan 2019
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45
Kohonen T (1989) Self-organization and associative memory, vol 8. Springer, Berlin
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209
Philip Chen CLL, Zhang CYY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big data. Inf Sci NY 275:314–347
Hashem IAT, Yaqoob I, Badrul Anuar N, Mokhtar S, Gani A, Ullah Khan S (2015) The rise of ‘Big Data’ on cloud computing: review and open research issues. Inf Syst 47:98–115
Rao J, Wei Y, Gong J, Xu CZ (2013) QoS guarantees and service differentiation for dynamic cloud applications. IEEE Trans Netw Serv Manag 10(1):43–55
Wang W-J, Chang Y-S, Lo W-T, Lee Y-K (2013) Adaptive scheduling for parallel tasks with QoS satisfaction for hybrid cloud environments. J Supercomput 66(2):783–811
Zhu Z, Li S, Chen X (2013) Design QoS-aware multi-path provisioning strategies for efficient cloud-assisted SVC video streaming to heterogeneous clients. IEEE Trans Multimed 15(4):758–768
Hsu W-H, Lo C-H (2014) QoS/QoE mapping and adjustment model in the cloud-based multimedia infrastructure. IEEE Syst J 8(1):247–255
Chang JM (2013) QoS-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans Cloud Comput 1(1):101–115
Misra S, Das S, Khatua M, Obaidat MS (2014) QoS-guaranteed bandwidth shifting and redistribution in mobile cloud environment. IEEE Trans Cloud Comput 2(2):181–193
Chen KT, Chang YC, Hsu HJ, Chen DY, Huang CY, Hsu CH (2014) On the quality of service of cloud gaming systems. IEEE Trans Multimed 16(2):480–495
Sood SK (2016) Function points-based resource prediction in cloud computing. Concurr Comput Pract Exp 28(10):2781–2794
Sood SK, Sandhu R (2015) Matrix based proactive resource provisioning in mobile cloud environment. Simul Model Pract Theory 50:83–95
Dean J, Ghemawat S (2008) MapReduce. Commun ACM 51(1):107–113
Welcome to Apache\({}^{\rm TM}\) Hadoop\(^{\textregistered }\)! http://hadoop.apache.org/. Accessed 10 Jan 2019
Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VBN, Sankarasubramanian V, Seth S, Tian C, Zicornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 1081–1090
Cascading|Application platform for enterprise Big data. http://www.cascading.org/. Accessed 10 Jan 2019
Apache HBase—Apache HBase\({}^{\rm TM}\) Home. http://hbase.apache.org/. Accessed 10 Jan 2019
The Apache Cassandra Project. http://cassandra.apache.org/. Accessed 10 Jan 2019
Apache Mahout: Scalable machine learning and data mining. http://mahout.apache.org/. Accessed 10 Jan 2019
Agile data integration platforms—Cloud-based (iPaaS) and on-premise software|Scribe software. http://www.scribesoft.com/. Accessed 10 Jan 2019
Olston C, Seth S, Tian C, ZiCornell T, Wang X, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VBN, Sankarasubramanian V (2011) Nova. In: Proceedings of the international conference on management of data—SIGMOD’11, p 1081
Bhatotia P, Wieder A, Rodrigues R, Acar Ua, Pasquin R (2011) Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM symposium on cloud computing—SOCC’11, pp 1–14
Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: IEEE international conference on data mining workshops, pp 170–177
Apache Storm. http://storm.apache.org/. Accessed 10 Jan 2019
Welcome to apache flume—Apache flume. http://flume.apache.org/index.html. Accessed 10 Jan 2019
Zhang F, Cao J, Khan SU, Li K, Hwang K (2015) A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications. Futur Gener Comput Syst 43–44:149–160
Zhang Q, Chen Z, Yang LT (2015) A nodes scheduling model based on Markov chain prediction for big streaming data analysis. Int J Commun Syst 28(9):1610–1619
Jain, A, Chang EY (2004) Adaptive sampling for sensor networks. In: Proceedings of the 1st international workshop on data management for sensor networks in conjunction with VLDB 2004—DMSN’04, p 10
Qt Concurrent 5.6. http://doc.qt.io/qt-5/qtconcurrent-index.html. Accessed 10 Jan 2019
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE 13th international symposium on high performance computer architecture, pp 13–24
Disco MapReduce. http://discoproject.org/. Accessed 12 Jan 2019
Space. http://skynet.rubyforge.org/. Accessed 12 Jan 2019
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G (2010) Twister. In: Proceedings of the 19th ACM international symposium on high performance distributed computing—HPDC’10, p 810
Dou A, Kalogeraki V, Gunopulos D, Mielikainen T, Tuulos VH (2010) Misco. In: Proceedings of the 3rd international conference on PErvasive technologies related to assistive environments—PETRA’10, p 1
Li R, Hu H, Li H, Wu Y, Yang J (2015) MapReduce parallel programming model: a state-of-the-art survey. Int J Parallel Progr 44(4):832–866
Feng B, Fu M, Ma H, Xia Y, Wang B (2014) Kalman filter with recursive covariance estimation-sequentially estimating process noise covariance. IEEE Trans Ind Electron 61(11):6253–6263
Chandrasekhar VR, Bach J, Girod B, Chen DM, Tsai SS, Cheung N-M, Chen H, Takacs G, Reznik Y, Vedantham R, Grzeszczuk R (2011) The Stanford mobile visual search data set. In: Proceedings of the second annual ACM conference on multimedia systems—MMSys’11, p 117
UCI machine learning repository: Geographical original of music data set. https://archive.ics.uci.edu/ml/datasets/Geographical+Original+of+Music. Accessed 13 Jan 2019
UCI machine learning repository: Bag of words data set. https://archive.ics.uci.edu/ml/datasets/Bag+of+Words. Accessed 10 Jan 2019
IBM—SPSS software—India. http://www-01.ibm.com/software/in/analytics/spss/. Accessed 10 Jan 2019
Discrete event simulation software—SimEvents—Simulink—MathWorks India. http://in.mathworks.com/products/simevents/. Accessed 10 Jan 2019
GStreamer: open source multimedia framework. https://gstreamer.freedesktop.org/. Accessed 10 Jan 2019
Media stream type detection. https://gstreamer.freedesktop.org/data/doc/gstreamer/head/manual/ html/section-typefinding.html. Accessed 10 Jan 2019
List of defined types. https://gstreamer.freedesktop.org/data/doc/gstreamer/head/pwg/html/section-types-definitions.html#table-container-types. Accessed 10 Jan 2019
Jiang Y, Huang Z, Tsang DH (2018) Towards max–min fair resource allocation for stream big data analytics in shared clouds. IEEE Trans Big Data 4(1):130–137
Hassan MM, Song B, Hossain MS, Alamri A (2014) Efficient resource scheduling for big data processing in cloud platform. In: International conference on internet and distributed computing systems, pp 51–63
Kollenstart M, Harmsma E, Langius E, Andrikopoulos V, Lazovik A (2018) Adaptive provisioning of heterogeneous cloud resources for big data processing. Big Data Cogn Comput 2(3):1–18
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kaur, N., Sood, S.K. & Verma, P. Cloud resource management using 3Vs of Internet of Big data streams. Computing 102, 1463–1485 (2020). https://doi.org/10.1007/s00607-019-00732-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00732-5