ABSTRACT
Data intensive high energy physics workflows executed on geographically distributed resources pose a tremendous challenge for efficient use of computing resources. In this early work paper, we present a hierarchical framework for efficient allocation of resources and energy-efficient assignment of tasks for a representative high energy physics application, the Belle II experiments. With an expected data rate of 25 peta bytes per year from experimental data and Monte Carlo simulations, the Belle II experiment provides an ideal platform for algorithmic development. Building on the analogy of the unit commitment problem in electric power grids, we present a novel cost-efficient method for resource allocation that feeds into energy-efficient assignment of tasks to resources using a novel semi-matching based algorithm. We demonstrate that this approach is both computationally efficient and effective. We expect the methods developed in this work to benefit Belle II and other complex workflows executed on distributed resources.
- Phoronix Media: Intel core i7 3770k power consumption, thermal, 1999. 7Google Scholar
- S. Alam and J. Vetter. A framework to develop symbolic performance models of parallel applications. In Proc. of the 20th IEEE International Parallel and Distributed Processing Symposium, page 8, 2006. 8 Google ScholarDigital Library
- S. R. Alam and J. S. Vetter. Hierarchical model validation of symbolic performance models of scientific kernels. In W. E. Nagel, W. V. Walter, and W. Lehner, editors, Proc. of the 12th International Euro-Par Conference, volume 4128 of Lecture Notes in Computer Science, pages 65--77. Springer, 2006. 8 Google ScholarDigital Library
- S. Ali, H. J. Siegel, M. Maheswaran, and D. Hensgen. Task execution time modeling for heterogeneous computing systems. In Heterogeneous Computing Workshop, 2000.(HCW 2000) Proceedings. 9th, pages 185--199. IEEE, 2000. 5 Google ScholarDigital Library
- D. M. Asner, E. Dart, and T. Hara. Belle II: Experiment network and computing. Technical Report arXiv:1308.0672. PNNL-SA-97204, Aug 2013. Contributed to CSS2013 (Snowmass). 1Google Scholar
- O. Beaumont, H. Casanova, A. Legrand, Y. Robert, and Y. Yang. Scheduling divisible loads on star and tree networks: results and open problems. Parallel and Distributed Systems, IEEE Transactions on, 16(3):207--218, March 2005. 5 Google ScholarDigital Library
- A. Benoit, J. Langguth, and B. Ucar. Semi-matching algorithms for scheduling parallel tasks under resource constraints. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, pages 1744--1753, Washington, DC, USA, 2013. IEEE Computer Society. 8 Google ScholarDigital Library
- V. Bharadwaj, D. Ghose, and T. G. Robertazzi. Divisible load theory: A new paradigm for load scheduling in distributed systems. Cluster Computing, 6(1):7--17, Jan. 2003. 7 Google ScholarDigital Library
- J. Bruno, E. G. Coffman, Jr., and R. Sethi. Scheduling independent tasks to reduce mean finishing time. Commun. ACM, 17(7):382--387, July 1974. 8 Google ScholarDigital Library
- A. Calotoiu, T. Hoeer, M. Poke, and F. Wolf. Using automated performance modeling to find scalability bugs in complex codes. In Proc. of the 2013 ACM/IEEE Conference on Supercomputing, Denver, CO, 2013. ACM, New York, NY. 8 Google ScholarDigital Library
- L. De La Torre and J. Seguel. A comparison of two master-worker scheduling methods. In High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on, pages 597--602, June 2009. 5 Google ScholarDigital Library
- R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of discrete mathematics, 5(2):287--326, 1979. 8Google Scholar
- T. Hara. Belle II: Computing and network requirements. In Proc. of the Asia-Pacific Advanced Network, pages 115--122, 2014. 1, 2, 3Google Scholar
- N. J. A. Harvey, R. E. Ladner, L. Lovász, and T. Tamir. Semi-matchings for bipartite graphs and load balancing. J. Algorithms, 59(1):53--78, Apr. 2006. 5, 8 Google ScholarDigital Library
- E. Jeannot, E. Saule, and D. Trystram. Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par '08, pages 877--886, Berlin, Heidelberg, 2008. Springer-Verlag. 5 Google ScholarDigital Library
- B. C. Lee, D. M. Brooks, B. R. de Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In Proc. of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 249--258, San Jose, CA, 2007. ACM, New York, NY. 8 Google ScholarDigital Library
- J. Meng, V. Morozov, K. Kumaran, V. Vishwanath, and T. Uram. GROPHECY: GPU performance projection from CPU code skeletons. In Proc. of the 2011 ACM/IEEE Conference on Supercomputing, Seattle, WA, 2011. IEEE Computer Society, Los Alamitos, CA. 8 Google ScholarDigital Library
- G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. Pace: A toolset for the performance prediction of parallel and distributed systems. International Journal on High Performance Computing Applications, 14(3):228--251, 2000. 8 Google ScholarDigital Library
Recommendations
Domain-Oriented Services for High Energy Physics in Polish Computing Centers
eScience on Distributed Computing Infrastructure - Volume 8500The large amounts of data collected by the High Energy Physics HEP experiments require intensive data processing on a large scale in order to extract their final physics results. In an extreme case --- the experiments performed on the Large Hadron ...
Using MapReduce for High Energy Physics Data Analysis
CSE '13: Proceedings of the 2013 IEEE 16th International Conference on Computational Science and EngineeringAt the Large Hadron Collider (LHC) High Energy Physics (HEP) experiment at CERN, 15 PB of raw data is recorded per year. As it was considered inconvenient to store, access and process this data using the traditional hardware and software tools, this ...
Supercomputing and High Energy Physics in UNAM
ISPA '12: Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with ApplicationsThe purpose of this proposal is to describe highend computing, storage, networking and grid computing grid infrastructures in UNAM, and how high energy and particle physics research groups have promoted and used them. National University of Mexico (UNAM) ...
Comments