Abstract
We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.
This research was supported in part by MIUR of Italy under project “ALGO-NEXT: ALGOrithms for the NEXT generation Internet and the Web”, and by the University of Padova under Grant CPDA033838.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abelson, H., Andreae, P.: Information transfer and area-time tradeoffs for VLSI multiplication. Communications of the ACM 23(1), 20–23 (1980)
Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: Proc. of the 28th IEEE Symp. on Foundations of Computer Science, pp. 204–216 (1987)
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)
Albanese, M., et al.: The APE Computer: an Array Processor Optimized for Lattice gauge Theory Simulations. Comput. Phys. Commun. 45, 345 (1987)
Allen, F., et al.: Blue Gene: a vision for protein science using a petaflop supercomputer. IBM Systems Journal 40(2), 310–327 (2001)
Almasi, G., et al.: Design and implementation of message passing services for the Blue Gene/L supercomputer. IBM J. Res. Develop. 49(2/3) (2005)
Alpern, B., Carter, L., Feig, E., Selker, T.: The uniform memory hierarchy model of computation. Algorithmica 12(2/3), 72–109 (1994)
Battista, C., et al.: The APE-100 Computer: (I) the Architecture. Int. J. High Speed Computing 5, 637 (1993)
Beetem, J., Denneau, M., Weingarten, D.: The GF11 supercomputer. In: Proc.of 12th Int. Symposium on Computer Architecture, pp. 108–115 (1985)
Bilardi, G., Pietracaprina, A., D’Alberto, P.: On the space and access complexity of computation dags. In: Brandes, U., Wagner, D. (eds.) WG 2000. LNCS, vol. 1928, pp. 47–58. Springer, Heidelberg (2000)
Bilardi, G., Preparata, F.P.: Area-time lower-bound techniques with application to sorting. Algorithmica 1(1), 65–91 (1986)
Bilardi, G., Preparata, F.P.: Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32, 531–559 (1999)
Bilardi, G., Sarrafzadeh, M.: Optimal VLSI circuits for the discrete Fourier transform. In: Advances in Computing Research, vol. 4, pp. 87–101. JAI Press, Greenwich (1987)
Brent, R.P., Kung, H.T.: The chip complexity of binary arithmetic. J. Ass. Comp. Mach. 28(3), 521–534 (1981)
Chen, D., et al.: QCDOC: A 10-teraflops scale computer for lattice QCD. In: Proc. of 18th Intl. Symposium on Lattice Field Theory (Lattice 2000), Bangalore, India (August 2000)
ClearSpeed Site, http://www.clearspeed.com
Clouser, J., et al.: A 600-MHz superscalar floating-point processor. IEEE Journal on Solid-State Circuits 34(7), 1026–1029 (1999)
Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Mateo (1999)
Cypher, R.: Theoretical aspects of VLSI PIN limitations. SIAM J. Comput. 2(2), 356–378 (1993)
Fantozzi, C., Pietracaprina, A., Pucci, G.: Seamless integration of parallelism and memory hierarchy. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 856–867. Springer, Heidelberg (2002)
Hong, J.W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proc. of the 13th ACM Symp. on Theory of Computing, pp. 326–333 (1981)
Intel Itanium2 Site, http://www.intel.com/products/processor/itanium2/
Iwasaki, Y.: Computers for lattice field theories. Nuclear Physics (Proc. Suppl.) 34, 78 (1994)
Kahle, J., Suzuoki, M., Masubuchi, Y.: Cell Microprocessor, Briefing, San Francisco (February 7, 2005)
Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays ∙ Trees ∙ Hypercubes. Morgan Kaufmann, San Mateo (1992)
Mueller, S., et al.: The vector floating-point unit in a synergistic processor element of a Cell processor. In: Proc. 17th IEEE Int. Symp. on Computer Arithmetic (June 2005) (To Appear)
Mawhinney, R.D.: The 1 Teraflops QCDSP Computer. Parallel Computing 25(10-11), 1281–1296 (1999)
Parallel Computing, 25(10–11), Special Issue on High Performance Computing in LQCD (1999)
Snir, M.: I/O Limitations on multi-chip VLSI systems. In: Proc. 19th Allerton Conference on Communications, Control, and Computing, Monticello, IL, pp. 224–233 (1981)
Sze, S.M. (ed.): VLSI Technology, 2nd edn. McGraw-Hill, New York (1988)
Thompson, C.D.: A complexity theory for VLSI. PhD thesis, Dept. of Computer Science, Carnegie-Mellon University, Tech. Rep. CMU-CS-80-140 (August 1980)
The Top 500 Supercomputer Sites, http://www.top500.org
Tripiccione, R.: APEmille. Parallel Computing 25(10-11), 1297–1309 (1999)
Tripiccione, R.: LGT simulations on APEmachines. Computer Physics Communications 139, 55 (2001)
Tripiccione, R.: Strategies for dedicated computing for lattice gauge theories. Computer Physics Communications 169, 442–448 (2005)
TRIPS: Tera-op Reliable Intelligently adaptive Processing System, http://www.cs.utexas.edu/users/cart/trips/
Ullman, J.D.: Computational Aspects of VLSI. Computer Science Press, Rockville MD (1984)
Yao, A.C.C.: Some complexity questions related to distributive computing. Proc. of the 11th ACM Symp. on Theory of Comp., 209–213 (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bilardi, G., Pietracaprina, A., Pucci, G., Schifano, F., Tripiccione, R. (2005). The Potential of On-Chip Multiprocessing for QCD Machines . In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_41
Download citation
DOI: https://doi.org/10.1007/11602569_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30936-9
Online ISBN: 978-3-540-32427-0
eBook Packages: Computer ScienceComputer Science (R0)