A few bad ideas on the way to the triumph of parallel computing

doi:10.1016/j.jpdc.2013.10.006

Journal of Parallel and Distributed Computing

Volume 74, Issue 7, July 2014, Pages 2544-2547

https://doi.org/10.1016/j.jpdc.2013.10.006 Get rights and content

Highlights

•
Parallel computing has been an enormous success.
•
This is despite many arguments against it.
•
Existing code has not been usable.
•
Specialized hardware accelerators have had trouble remaining relevant.

Abstract

Parallelism has become mainstream, in the multicore chip, the GPU, and the internet datacenter running MapReduce. In my field, large-scale scientific computing, parallelism now reigns triumphant.

It was no simple, direct route that led to this triumph. Along the way, we were confused by ideas that, in retrospect, turned out to be distractions and errors. The thinking behind them was reasonable, but wrong. One can learn from a dissection of mistakes, so I will retell part of the story here.

Section snippets

Amdahl, the story of a law that gets broken all the time

What is Amdahl’s Law? If half of a computation cannot use even a second processor working in parallel with the first, then, no matter how many processors one employs, the work will take at least half the uniprocessor compute time. If the fraction of work that must be sequential, the Amdahl fraction, is $f$ , then the speedup from parallelism cannot be more than $1 / f$ .

I do not think Amdahl wrote the law in a paper; rather he gave a talk at the 1967 AFIPS Spring Joint Computer Conference and it was

The dusty deck

Big labs have a big problem: the million lines of industrial-strength Fortran that they use every day to do operational calculations. I imagine the following conversation between a computer vendor’s chief technologist, and the head of software development at a lab or ISV:

Vendor: We are going to develop a computer one billion times more powerful than the one you are using, and it will not cost much more.

Software guy: Yeah, but will we need to rewrite our code?

You bet you will. The good news has

Attached accelerators: Are they the JATO of parallel computing?

This one is less simple. Attached accelerators are not a bad idea. But they have not won big, either. Customization of computer hardware works. A special purpose hardware device (an ASIC, or application-specific integrated circuit) beats any computer hands down in performance per unit cost or power, but it is not actually a computer. FPGA hardware is reconfigurable, and not as efficient as an ASIC, but still beats a general purpose machine. Moving further, machines specialized for scientific

Not-so-bad ideas that did not make it

I discarded a bunch of other ideas in thinking about and planning my talk and this chapter. They are not actually bad ideas, but they have not turned out to be particularly relevant to the job of advancing science through computation. They included the idea of harvesting the cycles of idle workstations — what I call supercomputing by accident; algorithms that are tuned to and exploit the characteristics of the interconnection topology of the machine (they are therefore parochial and are hard to

Truth, good ideas, and why we need to identify the bad ones

We work in a complex field. High performance computing is a science, it is engineering, and it is a business. Businesses may not always tell the truth — or they engage in what Mark Twain called “stretchers”. When they are new, good ideas can look quite similar to bad ideas. David Bailey produced one of the classic papers in high performance computing when he surveyed the many ways that a new, different, possibly immature computing approach can be made to look much better than it really is when

Acknowledgments

Alan Karp and Keshav Pingali read drafts of this chapter and greatly improved it.

References (7)

Mark T. Jones et al.
Scalable iterative solution of sparse linear systems
Parallel Computing
(1994)
Gene Amdahl
Validity of the single processor approach to achieving large-scale computing capabilities
David H. Bailey
Twelve ways to fool the masses when giving performance results on parallel computers
Supercomputing Review
(1991)

There are more references available in the full text version of this article.

Cited by (0)

Robert Schreiber is a Distinguished Technologist at Hewlett Packard Laboratories. Schreiber’s research spans sequential and parallel algorithms for matrix computation, compiler optimization for parallel languages, and high performance computer design. With Moler and Gilbert, he developed the sparse matrix extension of Matlab. He created the NAS CG parallel benchmark. He was a designer of the High Performance Fortran language. At HP, Rob led the development of PICO, a system for synthesis of custom hardware accelerators. His recent work concerns architectural uses of CMOS nanophotonic communication and nonvolatile memory architecture. He is an ACM Fellow, a SIAM Fellow, and was awarded, in 2012, the Career Prize from the SIAM Activity Group in Supercomputing.

View full text