Self-organized dynamic provisioning for big data

Erdil, D. Cenk

doi:10.1007/s10586-017-0822-7

Self-organized dynamic provisioning for big data

Published: 28 March 2017

Volume 20, pages 2749–2762, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

D. Cenk Erdil ORCID: orcid.org/0000-0003-1380-3497¹

341 Accesses
1 Citation
Explore all metrics

Abstract

Recent rapid expansion of datasets in big data problems has resulted in data sizes that exceed processing capabilities of available distributed computing power. In other words, we are producing more data than we can process. In addition, further analysis of a dataset collective state may require duplicating, transferring, and distributing to increase the scale of the problem. Orchestrating these steps in large-scale complex systems is non-trivial. One basic technique to help minimize effects of data re-distribution is to use dynamic resource provisioning environments. When the node organization and structure is dynamic and eclectic, provisioning environments require up-to-date information about resource availability. Maintaining freshness of available resource state in centralized or hierarchical scheduling systems imposes a network communication overhead. Centralization also introduces administrative barriers, limiting interoperability. One effective method to improve the extent of self-organization is taking feedback. Based on this feedback, nodes can then alter their behavior to better respond to changing characteristics in dynamic resource provisioning environments. In this article, we present a decentralized scheduling framework that takes feedback from the system, and adjusts its behavior accordingly. Our framework presents an enabling mechanism for self-organization, where each cloud node adapts its behavior based on the feedback. This approach, compared to centralized resource provisioning solutions that exist in current cloud systems, achieves comparable scheduling decisions, with half the packet overhead. We show that by taking advantage of spatial locality with dynamic provisioning, and due to better scheduling decisions with our framework, data processing overhead of big data problems can be reduced by at least 30% in general, and up to 55% in particular resource distributions. This in turn, results in efficient scheduling decisions to provision better resources for big data tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Except in the case where nodes self-organize into neighborhoods in a peer-to-peer fashion.
When Freshness is used as the ranking criteria.

References

Aberer, K., Cudré-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., Schmidt, R.: P-grid: a self-organizing structured p2p system. SIGMOD Rec. 32(3), 29–33 (2003)
Article Google Scholar
Berman, F., Fox, G., Hey, A.: Grid Computing: Making the Global Infrastructure a Reality, vol. 2. Wiley, NewYork (2003)
Book Google Scholar
Bode, B., Halstead, D., Kendall, R., Lei, Z., Jackson, D.: The portable batch scheduler and the maui scheduler on linux clusters. In: Usenix, 4th Annual Linux Showcase and Conference (2000)
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)
Google Scholar
Chakravarti, A., Baumgartner, G., Lauria, M.: The organic grid: self-organizing computation on a peer-to-peer network. Syst. Man Cybern. A 35(3), 373–384 (2005)
Article Google Scholar
Chapin, S.J., Katramatos, D., Karpovich, J., Grimshaw, A.: Resource management in Legion. Future Gener. Comput. Syst. 15(5–6), 583–594 (1999)
Article Google Scholar
Chase, J., Irwin, D., Grit, L., Moore, J., Sprenkle, S.: Dynamic virtual clusters in a grid site manager. In: High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium, pp. 90–100 (2003)
Cowie, J., Liu, H., Liu, J., Nicol, D., Ogielski, A.: Towards realistic million-node internet simulations. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (1999)
Czajkowski, K., Fitzgerald, S., Foster, I. and Kesselman, C.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10) (2001)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dejun, J., Pierre, G., Chi, C.-H.: Autonomous resource provisioning for multi-service web applications. In: Proceedings of the International World-Wide Web Conference (2010)
Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry D.: Epidemic algorithms for replicated database maintenance. In: PODC ’87: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 1–12. ACM Press, New York (1987)
Desai, R., Tilak, S., Gandhi, B., Lewis, M. J., Abu-Ghazaleh, N. B.: Analysis of query matching criteria and resource monitoring for grid application scheduling. In: Proceedings of CCGrid2006: IEEE International Symposium on Cluster Computing and the Grid (2006)
Drost, N., Ogston, E., van Nieuwpoort, R.V., Bal, H.E.: Arrg: real-world gossiping. In: Proceedings of the 16th IEEE International Symposium on High Performance Distributed Computing (2007)
Dubois, D.J., Casale, G.: Optispot: minimizing application deployment cost using spot cloud resources. Cluster Comput. 19(2), 893–909 (2016)
Article Google Scholar
Epema, D.H.J., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A worldwide flock of condors: load sharing among workstation clusters. Technical Report DUT-TWI-95-130, Delft, The Netherlands (1995)
Erdil, D.C., Lewis M.J.: Supporting self-organization for hybrid grid resource scheduling. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 1981–1986. SAC ’08, ACM, New York (2008)
Erdil, D.C., Lewis, M.J.: Grid resource scheduling with gossiping protocols. In: Proceedings of the 7th IEEE International Conference, Peer-to-Peer Computing, Dublin, pp. 193–200 (2007)
Erdil, D.C., Lewis, M.J., Abu-Ghazaleh, N.: An adaptive algorithm for information dissemination in self-organizing grids. In: Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (eScience 2006), Amsterdam, the Netherlands, 4–6 December (2006)
Fritzke, B.: Growing grid a self-organizing network with constant neighborhood range and adaptation strength. Neural Proc. Lett. 2, 9–13 (1995)
Article Google Scholar
Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium, IEEE, Piscataway, pp. 35–36 (2001)
Goldberg, A.V.: An efficient implementation of a scaling minimum-cost flow algorithm. J. Alg. 22(1), 1–29 (1997)
Article MathSciNet Google Scholar
Herodotou H., Lim H., Luo G., Borisov N., Dong L., Cetin, F., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Procceeding of the Fifth CIDR Conference (2011)
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D., Kania, R., Schaeffer, M., St Pierre, S., et al.: Big data: the future of biocuration. Nature 455(7209), 47–50 (2008)
Article Google Scholar
Kempe, D., Kleinberg, J., Demers, A.: Spatial gossip and resource location protocols. In: Annual ACM Symposium on Theory of Computing (STOC) (2001)
Kermarrec, A.-M., Massoulie, L., Ganesh, A.J.: Probabilistic relieable dissemination in large-scale systems. In: IEEE Transactions on Parallel and Distributed Systems (2003)
Lehman, T., Sobieski, J., Jabbari, B.: Dragon: a framework for service provisioning in heterogeneous grid networks. Commun. Mag. IEEE 44(3), 84–90 (2006)
Article Google Scholar
Li, L., Halpern, J., Haas, Z.: Gossip-based ad hoc routing. In: IEEE Infocom (2002)
Lynch, C.: Big data: how do your data grow? Nature 455(7209), 28–29 (2008)
Article Google Scholar
Marozzo, F., Talia, D., Trunfio, P.: P2p-mapreduce: parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78, 1382–1402 (2012)
Article Google Scholar
Murphy, M. A., Kagey, B., Fenn, M., Goasguen, S.: Dynamic provisioning of virtual organization clusters. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID ’09, IEEE Computer Society, Washington, pp. 364–371 (2009)
Nottingham, M., Liu, X.: Amazon elastic compute cloud. http://aws.amazon.com/ec2/
Palanisamy, B., Singh, A., Liu, L., Jain B.: Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM (2011)
Park, J., Lee, S., Kim, J.M.: An autonomic control system for high-reliable cps. Cluster Comput. 18(2), 587–598 (2015)
Article Google Scholar
Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task execution framework. In: Supercomputing, 2007. SC’07. Proceedings of the 2007 ACM/IEEE Conference, pp. 1–12. IEEE (2007)
Serugendo, G.D., Karageorgos, A., Rana, O.F., Zambonelli, F.: Engineering self-0rganizing systems: Nature-inspired approaches to software engineering. Lecture Notes in Artificial Intelligence, (2977), Berlin, Germany (2004)
Shen, Z., He, J.: Apache Hadoop Yarn: The Next-Generation Distributed Operating System. In ApacheCon North America, Denver (2014)
Google Scholar
Van Essen, B., Hsieh, H., Ames, A., Pearce, R., Gokhale, M.: Di-mmap a scalable memory-map runtime for out-of-core data-intensive applications. Cluster Comput. 18(1), 15–28 (2015)
Vijayakumar, S., Zhu, Q., Agrawal, G.: Dynamic resource provisioning for data streaming applications in a cloud environment. In: 2nd IEEE International Conference on Cloud Computing Technology and Science, (2010)
White, T.: Hadoop: The definitive Guide. O’Reilly Media, Sebastopol (2012)
Google Scholar
Yalagandula, P., Dahlin, M.: A Scalable Distributed Information Management System. Proceedings of ACM SIGCOMM, Portland (2004)
Book Google Scholar
Zegura, E., Calvert, K.: GT Internetwork Topology Models (GT-ITM). http://www.cc.gatech.edu/projects/gtitm
Zhou, S.: Lsf: Load sharing in large heterogeneous distributed systems. In: I Workshop on Cluster Computing (1992)

Download references

Author information

Authors and Affiliations

School of Computing, Sacred Heart University, Fairfield, CT, 06825, USA
D. Cenk Erdil

Authors

D. Cenk Erdil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Cenk Erdil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erdil, D.C. Self-organized dynamic provisioning for big data. Cluster Comput 20, 2749–2762 (2017). https://doi.org/10.1007/s10586-017-0822-7

Download citation

Received: 27 July 2016
Revised: 07 January 2017
Accepted: 14 March 2017
Published: 28 March 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-0822-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-organized dynamic provisioning for big data

Abstract

Access this article

Similar content being viewed by others

ServiceNet: resource-efficient architecture for topology discovery in large-scale multi-tenant clouds

Dynamic Data Replication Across Geo-Distributed Cloud Data Centres

Towards Hierarchical Autonomous Control for Elastic Data Stream Processing in the Fog

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-organized dynamic provisioning for big data

Abstract

Access this article

Similar content being viewed by others

ServiceNet: resource-efficient architecture for topology discovery in large-scale multi-tenant clouds

Dynamic Data Replication Across Geo-Distributed Cloud Data Centres

Towards Hierarchical Autonomous Control for Elastic Data Stream Processing in the Fog

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation