research-article

Towards efficient scheduling of data intensive high energy physics workflows

Authors:

Mahantesh Halappanavar,

Malachi Schram,

Luis de la Torre,

Nathan R. Tallent,

Darren J. KerbysonAuthors Info & Claims

WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science

Article No.: 3, Pages 1 - 9

https://doi.org/10.1145/2822332.2822335

Published: 15 November 2015 Publication History

Abstract

Data intensive high energy physics workflows executed on geographically distributed resources pose a tremendous challenge for efficient use of computing resources. In this early work paper, we present a hierarchical framework for efficient allocation of resources and energy-efficient assignment of tasks for a representative high energy physics application, the Belle II experiments. With an expected data rate of 25 peta bytes per year from experimental data and Monte Carlo simulations, the Belle II experiment provides an ideal platform for algorithmic development. Building on the analogy of the unit commitment problem in electric power grids, we present a novel cost-efficient method for resource allocation that feeds into energy-efficient assignment of tasks to resources using a novel semi-matching based algorithm. We demonstrate that this approach is both computationally efficient and effective. We expect the methods developed in this work to benefit Belle II and other complex workflows executed on distributed resources.

References

[1]

Phoronix Media: Intel core i7 3770k power consumption, thermal, 1999. 7

[2]

S. Alam and J. Vetter. A framework to develop symbolic performance models of parallel applications. In Proc. of the 20th IEEE International Parallel and Distributed Processing Symposium, page 8, 2006. 8

Digital Library

[3]

S. R. Alam and J. S. Vetter. Hierarchical model validation of symbolic performance models of scientific kernels. In W. E. Nagel, W. V. Walter, and W. Lehner, editors, Proc. of the 12th International Euro-Par Conference, volume 4128 of Lecture Notes in Computer Science, pages 65--77. Springer, 2006. 8

Digital Library

[4]

S. Ali, H. J. Siegel, M. Maheswaran, and D. Hensgen. Task execution time modeling for heterogeneous computing systems. In Heterogeneous Computing Workshop, 2000.(HCW 2000) Proceedings. 9th, pages 185--199. IEEE, 2000. 5

Digital Library

[5]

D. M. Asner, E. Dart, and T. Hara. Belle II: Experiment network and computing. Technical Report arXiv:1308.0672. PNNL-SA-97204, Aug 2013. Contributed to CSS2013 (Snowmass). 1

[6]

O. Beaumont, H. Casanova, A. Legrand, Y. Robert, and Y. Yang. Scheduling divisible loads on star and tree networks: results and open problems. Parallel and Distributed Systems, IEEE Transactions on, 16(3):207--218, March 2005. 5

Digital Library

[7]

A. Benoit, J. Langguth, and B. Ucar. Semi-matching algorithms for scheduling parallel tasks under resource constraints. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, pages 1744--1753, Washington, DC, USA, 2013. IEEE Computer Society. 8

Digital Library

[8]

V. Bharadwaj, D. Ghose, and T. G. Robertazzi. Divisible load theory: A new paradigm for load scheduling in distributed systems. Cluster Computing, 6(1):7--17, Jan. 2003. 7

Digital Library

[9]

J. Bruno, E. G. Coffman, Jr., and R. Sethi. Scheduling independent tasks to reduce mean finishing time. Commun. ACM, 17(7):382--387, July 1974. 8

Digital Library

[10]

A. Calotoiu, T. Hoeer, M. Poke, and F. Wolf. Using automated performance modeling to find scalability bugs in complex codes. In Proc. of the 2013 ACM/IEEE Conference on Supercomputing, Denver, CO, 2013. ACM, New York, NY. 8

Digital Library

[11]

L. De La Torre and J. Seguel. A comparison of two master-worker scheduling methods. In High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on, pages 597--602, June 2009. 5

Digital Library

[12]

R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of discrete mathematics, 5(2):287--326, 1979. 8

[13]

T. Hara. Belle II: Computing and network requirements. In Proc. of the Asia-Pacific Advanced Network, pages 115--122, 2014. 1, 2, 3

[14]

N. J. A. Harvey, R. E. Ladner, L. Lovász, and T. Tamir. Semi-matchings for bipartite graphs and load balancing. J. Algorithms, 59(1):53--78, Apr. 2006. 5, 8

Digital Library

[15]

E. Jeannot, E. Saule, and D. Trystram. Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par '08, pages 877--886, Berlin, Heidelberg, 2008. Springer-Verlag. 5

Digital Library

[16]

B. C. Lee, D. M. Brooks, B. R. de Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In Proc. of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 249--258, San Jose, CA, 2007. ACM, New York, NY. 8

Digital Library

[17]

J. Meng, V. Morozov, K. Kumaran, V. Vishwanath, and T. Uram. GROPHECY: GPU performance projection from CPU code skeletons. In Proc. of the 2011 ACM/IEEE Conference on Supercomputing, Seattle, WA, 2011. IEEE Computer Society, Los Alamitos, CA. 8

Digital Library

[18]

G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. Pace: A toolset for the performance prediction of parallel and distributed systems. International Journal on High Performance Computing Applications, 14(3):228--251, 2000. 8

Digital Library

Cited By

de la Torre LHalappanavar M(2023)Scaling Optimal Allocation of Cloud Resources Using Lagrange RelaxationJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_9(173-192)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-3-031-43943-8_9
Alves DObraczka KKabbani A(2021)GDSim: Benchmarking Geo-Distributed Data Center Schedulers2021 IEEE 10th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet53349.2021.9657143(148-156)Online publication date: 8-Nov-2021
https://doi.org/10.1109/CloudNet53349.2021.9657143
Bhuiyan THalappanavar MFriese RMedal Hde la Torre LSathanur ATallent N(2019)Stochastic Programming Approach for Resource Selection Under Demand UncertaintyJob Scheduling Strategies for Parallel Processing10.1007/978-3-030-10632-4_6(107-126)Online publication date: 13-Jan-2019
https://doi.org/10.1007/978-3-030-10632-4_6
Show More Cited By

Recommendations

Domain-Oriented Services for High Energy Physics in Polish Computing Centers
eScience on Distributed Computing Infrastructure - Volume 8500

The large amounts of data collected by the High Energy Physics HEP experiments require intensive data processing on a large scale in order to extract their final physics results. In an extreme case --- the experiments performed on the Large Hadron ...
Using MapReduce for High Energy Physics Data Analysis
CSE '13: Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering

At the Large Hadron Collider (LHC) High Energy Physics (HEP) experiment at CERN, 15 PB of raw data is recorded per year. As it was considered inconvenient to store, access and process this data using the traditional hardware and software tools, this ...
Supercomputing and High Energy Physics in UNAM
ISPA '12: Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications

The purpose of this proposal is to describe highend computing, storage, networking and grid computing grid infrastructures in UNAM, and how high energy and particle physics research groups have promoted and used them. National University of Mexico (UNAM) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science

November 2015

98 pages

ISBN:9781450339896

DOI:10.1145/2822332

Conference Chairs:
Johan Montagnat
CNRS, France
,
Ian Taylor
Cardiff University, UK

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U. S. Department of Energy

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

WORKS '15 Paper Acceptance Rate 9 of 13 submissions, 69%;

Overall Acceptance Rate 30 of 54 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
146
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

de la Torre LHalappanavar M(2023)Scaling Optimal Allocation of Cloud Resources Using Lagrange RelaxationJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_9(173-192)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-3-031-43943-8_9
Alves DObraczka KKabbani A(2021)GDSim: Benchmarking Geo-Distributed Data Center Schedulers2021 IEEE 10th International Conference on Cloud Networking (CloudNet)10.1109/CloudNet53349.2021.9657143(148-156)Online publication date: 8-Nov-2021
https://doi.org/10.1109/CloudNet53349.2021.9657143
Bhuiyan THalappanavar MFriese RMedal Hde la Torre LSathanur ATallent N(2019)Stochastic Programming Approach for Resource Selection Under Demand UncertaintyJob Scheduling Strategies for Parallel Processing10.1007/978-3-030-10632-4_6(107-126)Online publication date: 13-Jan-2019
https://doi.org/10.1007/978-3-030-10632-4_6
Friese RTallent NSchram MHalappanavar MBarker K(2018)Optimizing Distributed Data-Intensive Workflows2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00045(279-289)Online publication date: Sep-2018
https://doi.org/10.1109/CLUSTER.2018.00045
Friese RHalappanavar MSathanur ASchram MKerbyson Dde la Torre L(2018)Towards Efficient Resource Allocation for Distributed Workflows Under Demand UncertaintiesJob Scheduling Strategies for Parallel Processing10.1007/978-3-319-77398-8_6(103-121)Online publication date: 28-Feb-2018
https://doi.org/10.1007/978-3-319-77398-8_6
Schram MBansal VFriese RTallent NYin JBarker KStephan EHalappanavar MKerbyson D(2017)Integrating prediction, provenance, and optimization into high energy workflowsJournal of Physics: Conference Series10.1088/1742-6596/898/6/062052898(062052)Online publication date: 21-Nov-2017
https://doi.org/10.1088/1742-6596/898/6/062052
Friese R(2016)Efficient Genetic Algorithm Encoding for Large-Scale Multi-objective Resource Allocation2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.36(1360-1369)Online publication date: May-2016
https://doi.org/10.1109/IPDPSW.2016.36

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten