A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems

Fiore, Ugo; Palmieri, Francesco; Castiglione, Aniello; De Santis, Alfredo

doi:10.1007/s10766-013-0289-y

A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems

Published: 30 October 2013

Volume 42, pages 755–775, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ugo Fiore¹,
Francesco Palmieri²,
Aniello Castiglione³ &
…
Alfredo De Santis³

782 Accesses
15 Citations
Explore all metrics

Abstract

Big Data processing architectures are now widely recognized as one of the most significant innovations in Computing in the last decade. Their enormous potential in collecting and processing huge volumes of data scattered throughout the Internet is opening the door to a new generation of fully distributed applications that, by leveraging the large amount of resources available on the network will be able to cope with very complex problems achieving performances never seen before. However, the Internet is known to have severe scalability limitations in moving very large quantities of data, and such limitations introduce the challenge of making efficient use of the computing and storage resources available on the network, in order to enable data-intensive applications to be executed effectively in such a complex distributed environment. This implies resource scheduling decisions which drive the execution of task towards the data by taking network load and capacity into consideration to maximize data access performance and reduce queueing and processing delays as possible. Accordingly, this work presents a data-centric meta-scheduling scheme for fully distributed Big Data processing architectures based on clustering techniques whose goal is aggregating tasks around storage repositories and driven by a new concept of “gravitational” attraction between the tasks and their data of interest. This scheme will benefit from heuristic criteria based on network awareness and advance resource reservation in order to suppress long delays in data transfer operations and result into an optimized use of data storage and runtime resources at the expense of a limited (polynomial) computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Edge computing: current trends, research challenges and future directions

Article 18 January 2021

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

References

Eli Dart, B.T.: HEP (High Energy Physics) Network requirements workshop—final report LBNL-3397E. In: ESnet Network Requirements Workshop, pp. 1–61 (2009)
Eli Dart, B.T.: BER (Biological and Environmental Research) Network requirements workshop—final report LBNL-4089E. In: ESnet Network Requirements Workshop, pp. 1–104 (2010)
Gantz, J., Reinsel, D.: Extracting value from chaos, IDC Technical Document 1142, International Data Corporation, Framingham, MA (2011)
Chang, H.J., Wu, J.J., Liu, P.: Job scheduling techniques for distributed systems with heterogeneous processor cardinality. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 57–62 (2009). doi:10.1109/I-SPAN.2009.68
Ullman, J.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975). doi:10.1016/S0022-0000(75)80008-0
Article MATH MathSciNet Google Scholar
Amin, A., Ammar, R., El Dessouly, A.: Scheduling real time parallel structures on cluster computing with possible processor failures. In: Proceedings, Ninth International Symposium on Computers and Communications (ISCC 2004), vol. 1, pp. 62–67 (2004)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman and Company, New York (1979)
MATH Google Scholar
Chowdhury, M., Zaharia, M., Ma, J., Jordan, M., Stoica, I.: Managing data transfers in computer clusters with orchestra. SIGCOMM-Comput. Commun. Rev. 41(4), 98–109 (2011)
Article Google Scholar
Palmieri, F., Fiore, U., Ricciardi, S.: SPARK: a smart parametric online RWA algorithm. J. Commun. Netw. 9(4), 368–376 (2007)
Article Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). doi:10.1109/TIT.1982.1056489
Article MATH MathSciNet Google Scholar
Arthur, D., Manthey, B., Roglin, H.: k-Means has polynomial smoothed complexity. In: 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS’09), pp. 405–414 (2009)
Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, S.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: Proceedings, 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), pp. 359–366 (2002)
James, H.A., Hawick, K.A., Coddington, P.D., et al.: Scheduling independent tasks on metacomputing systems. In: Proceedings of Parallel and Distributed Computing Systems (PDCS ’99), Fort Lauderdale, Florida (1999)
Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of job-scheduling strategies for grid computing. Grid Computing GRID 2000, 191–202 (2000)
Google Scholar
Pop, F., Dobre, C., Stratan, C., Costan, A., Cristea, V.: Dynamic meta-scheduling architecture based on monitoring in distributed systems. Int. J. Auton. Comput. 1(4), 328–349 (2010). doi:10.1504/IJAC.2010.037511
Article Google Scholar
Palmieri, F.: Network-aware scheduling for real-time execution support in data-intensive optical grids. Future Gener. Comput. Syst. 25(7), 794–803 (2009)
Article MathSciNet Google Scholar
Casanova, H., Berman, F., Obertelli, G., Wolski, R.: The AppLeS parameter sweep template: User-level middleware for the grid. In: ACM/IEEE 2000 Supercomputing Conference, pp. 60–60 (2000)
Ranganathan, K., Foster, I.: Simulation studies of computation and data scheduling algorithms for data grids. J. Grid Comput. 1(1), 53–62 (2003)
Article Google Scholar
Ranganathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings, 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), pp. 352–358 (2002)
Cameron, D.G., Carvajal-Schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: Evaluating scheduling and replica optimisation strategies in OptorSim. In: Proceedings of the 4th International Workshop on Grid Computing, p. 52. IEEE Computer Society (2003)
Basney, J., Livny, M., Mazzanti, P.: Harnessing the capacity of computational grids for high energy physics. In: Conference on Computing in High Energy and Nuclear Physics (2000)
Alhusaini, A.H., Prasanna, V.K., Raghavendra, C.S.: A unified resource scheduling framework for heterogeneous computing environments. In: Proceedings, Eighth Heterogeneous Computing Workshop (HCW’99), pp. 156–165 (1999)
Thain, D., Bent, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Gathering at the well: creating communities for grid I/O. In: Supercomputing, ACM/IEEE 2001 Conference, pp. 21–21 (2001)
Kosar, T., Livny, M.: Stork: Making data placement a first class citizen in the grid. In: Proceedings, 24th International Conference on Distributed Computing Systems, pp. 342–349 (2004)
Kosar, T.: A new paradigm in data intensive computing: Stork and the data-aware schedulers. Genome 40, 50 (2006)
Google Scholar
McClatchey, R., Anjum, A., Stockinger, H., Ali, A., Willers, I., Thomas, M.: Data intensive and network aware (DIANA) grid scheduling. J. Grid Comput. 5(1), 43–64 (2007)
Article Google Scholar
Schintke, F., Schutt, T., Reinefeld, A.: A framework for self-optimizing Grids using P2P components. In: Proceedings, 14th International Workshop on Database and Expert Systems Applications, pp. 689–693 (2003)
Liu, H., Orban, D.: Gridbatch: Cloud computing for large-scale data-intensive batch applications. In: 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08), pp. 295–305 (2008)
Manea, F., Ploscaru, C.: Solving a combinatorial problem with network flows. J. Appl. Math. Comput. 17(1), 391–399 (2005)
Article MATH MathSciNet Google Scholar
Shahrokhi, F., Matula, D.W.: The maximum concurrent flow problem. J. ACM (JACM) 37(2), 318–334 (1990)
Article MATH MathSciNet Google Scholar
Rajah, K., Ranka, S., Xia, Y.: Scheduling bulk file transfers with start and end times. Comput. Netw. 52(5), 1105–1122 (2008)
Article MATH Google Scholar
Coffman Jr, E.G., Garey, M.R., Johnson, D.S., LaPaugh, A.S.: Scheduling file transfers. SIAM J. Comput. 14(3), 744–780 (1985)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Information Services Center, University of Naples Federico II, Via Cinthia 5, 80126 , Napoli, Italy
Ugo Fiore
Department of Industrial and Information Engineering, Second University of Naples, Via Roma 29, 81031 , Aversa, Italy
Francesco Palmieri
Department of Computer Science, University of Salerno, Via Ponte don Melillo, 84084 , Fisciano (SA), Italy
Aniello Castiglione & Alfredo De Santis

Authors

Ugo Fiore
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Palmieri
View author publications
You can also search for this author in PubMed Google Scholar
Aniello Castiglione
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo De Santis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ugo Fiore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fiore, U., Palmieri, F., Castiglione, A. et al. A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems. Int J Parallel Prog 42, 755–775 (2014). https://doi.org/10.1007/s10766-013-0289-y

Download citation

Received: 30 June 2013
Accepted: 09 October 2013
Published: 30 October 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10766-013-0289-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems

Abstract

Access this article

Similar content being viewed by others

Edge computing: current trends, research challenges and future directions

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems

Abstract

Access this article

Similar content being viewed by others

Edge computing: current trends, research challenges and future directions

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation