Abstract
In the field of high-performance computing, systems harboring reconfigurable devices, such as field-programmable gate arrays (FPGAs), are gaining more widespread interest. Such systems range from supercomputers with tightly coupled reconfigurable hardware to clusters with reconfigurable devices at each node. The use of these architectures for scientific computing provides an alternative for computationally demanding problems and has advantages in metrics, such as operating cost/performance and power/performance. However, performance optimization of these systems can be challenging even with knowledge of the system’s characteristics. Our analytic performance model includes parameters representing the reconfigurable hardware, application load imbalance across the nodes, background user load, basic message-passing communication, and processor heterogeneity. In this article, we provide an overview of the analytical model and demonstrate its application for optimization and scheduling of high-performance reconfigurable computing (HPRC) resources. We examine cost functions for minimum runtime and other optimization problems commonly found in shared computing resources. Finally, we discuss additional scheduling issues and other potential applications of the model.
- Alpha Data. 2012. http://www.alpha-data.com.Google Scholar
- Atallah, M. J., Black, C. L., Marinescu, D. C., Segel, H. J., and Casavant, T. L. 1992. Models and algorithms for coscheduling compute-intensive tasks on a network of workstations. J. Parallel Distrib. Comput. 16, 319--327.Google ScholarCross Ref
- Basney, J., Raman, B., and Livny, M. 1999. High throughput Monte Carlo. In Proceedings of the 9th SIAM Conference on Parallel Processing for Scientific Computing.Google Scholar
- BLAS. 2012. Basic Linear Algebra Subprograms. http://www.netlib.org/blas/.Google Scholar
- Cantu-Paz, E. 1998. Designing efficient master-slave parallel genetic algorithms. In Proceedings of the 3rd Annual Conference on Genetic Programming.Google Scholar
- Casavant, T. L. and Kuhl, J. G. 1988. A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14, 2, 141--154. Google ScholarDigital Library
- Celoxica. 2012. http://www.celoxica.com.Google Scholar
- ClearSpeed. 2012. http://www.clearspeed.com.Google Scholar
- Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 171--210. Google ScholarDigital Library
- Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proceedings of the ACM Symposium on Theory of Computing. Google ScholarDigital Library
- Cray. 2012. http://www.cray.com.Google Scholar
- DRC. 2012. Reconfigurable Processing Unit, DRC Computer Corporation. http://www.drccomp.com.Google Scholar
- El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 69--76. Google ScholarDigital Library
- El-Rewini, H. and Lewis, T. G. 1994. Task Scheduling in Parallel Distributed Systems. Prentice Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York. Google ScholarDigital Library
- Govindan, V. and Franklin, M. A. 1996. Application load imbalance on parallel processors. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96). Google ScholarDigital Library
- Holland, B., Nagarajan K., and George, A. 2009. RAT: RC amenability test for rapid performance prediction. ACM Trans. Reconfig. Technol. Syst. 1, 1--31. Google ScholarDigital Library
- Hou, E., Ansari, N., and Ren, H. 1994. A genetic algorithm for multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 5, 113--120. Google ScholarDigital Library
- Kant, K. 1992. Introduction to Computer System Performance Evaluation. McGraw-Hill, Inc., New York.Google Scholar
- Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. 1983. Optimization by simulated annealing. Sci. 220, 671--680.Google ScholarCross Ref
- Koehler, S., Curreri, J., and George, A. D. 2008. Performance analysis challenges and framework for high-performance reconfigurable computing. Parallel Comput. 34, 217--230. Google ScholarDigital Library
- Kwok, Y. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 406--471. Google ScholarDigital Library
- Leong, P. H. W., Leong, M. P., Cheung, O. Y. H., Tung, T., Kwok, C. M., Wong, M. Y., and Lee, K. H. 2001. Pilchard: A reconfigurable computing platform with memory slot interface. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarDigital Library
- Maxwell at EPCC. 2012. http://www.epcc.ed.ac.uk/facilities/maxwell.Google Scholar
- Nallatech. 2012. http://www.nallatech.com.Google Scholar
- NIST. 2005. Guideline for implementing cryptography in the federal government. Tech. rep. NIST SP800-21. http://csrc.nist.gov/publications.Google Scholar
- Novo-G. 2012. http://www.chrec.org/facilities.html.Google Scholar
- Peterson, G. D. 1994. Parallel application performance on shared, heterogeneous workstations. Ph.D. dissertation. Washington University Sever Institute of Technology, St. Louis, MO. Google ScholarDigital Library
- Saha, P. and El-Ghazawi, T. 2007. Software/hardware co-scheduling for reconfigurable computing systems. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines. Google ScholarDigital Library
- SGI RASC. 2012. http://www.sgi.com.Google Scholar
- Smith, M. C. 2003. Analytical modeling of high performance reconfigurable computers: Prediction and analysis of system performance. Ph.D. dissertation, University of Tennessee. Google ScholarDigital Library
- Smith, M. C. and Peterson, G. D. 2005. Parallel application performance on shared high performance reconfigurable computing resources. Perform. Eval. 60, 1--4, 107--125. Google ScholarDigital Library
- SRC MAPstation. 2012. http://www.srccomp.com.Google Scholar
- Topcuoglu, J., Hariri, S., and Wu, M. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13, 260--274. Google ScholarDigital Library
- XtremeData XD1000 Development System. 2012. http://www.xtremedata.com.Google Scholar
Index Terms
- Optimization of Shared High-Performance Reconfigurable Computing Resources
Recommendations
System-level power-performance tradeoffs for reconfigurable computing
In this paper, we propose a configuration-aware datapartitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs ...
Parallel application performance on shared high performance reconfigurable computing resources
Performance modelling and evaluation of high-performance parallel and distributed systemsThe use of a network of shared, heterogeneous workstations each harboring a reconfigurable computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the ...
Comments