PCs can be delivered in various packages: desktops, laptops, desksides, blades (for rack mounting), raw motherboards for embedded systems, headless boxes for network management, and even wearable computers. A substantial number of PC blades can be packed into a rack (up to 48 per rack) and linked with a high bandwidth communication network to allow for a single very large computation to be laid out on such a distributed system.
About seven or eight years ago some people came to the conclusion that this would be the future of supercomputing, although initially they thought more in terms of commodity processors other than IA32, e.g., POWER, Alpha, SPARC or PA-RISC. Because important decision makers were amongst them, that view had dire ramifications for the US supercomputer industry, which in effect all but died overnight.
This made PC and UNIX workstation vendors deliriously happy, because now they had this whole end of the market to themselves (pity they knew so little about it) and they could sell a lot of rack mounted PCs and workstations to universities, defense laboratories, exploration companies and design bureaus.
This made programmers and computer scientists very happy too, because they knew better than anybody else how difficult such systems would be to assemble, manage, use and program and so they sniffed many lucrative life-long sine-cure jobs for themselves.
The only people who were unhappy about this turn of events were supercomputer users, who were suddenly faced with the demise of highly efficient and easy to use machines, of various architectures, which they relied on to do first class science without having to waste a lot of their own time on programming equilibristics.
Three Japanese companies, Fujitsu , NEC and Hitachi , continued to build real supercomputers, some based on vector other on scalar CPUs, but all quite nicely balanced and with excellent IO and memory bandwidth, but they were barred from the US market by legal means. They have been selling their machines in other countries with considerable success though.
And so things progressed at their own pace. Some applications were ported to this new distributed programming model, some were abandoned, and some new applications were developed too. Many people made good living out of it, but, by and large, a lot of tax payers money and scientists time got wasted in the process. Some of this money fed the dot-com boom of late 90s.
And then the Japanese commissioned the Earth Simulator, a system that outperformed ten most powerful US clusters put together, was very efficient and childishly easy to program, at least in comparison with American PC clusters. This caused a great stir and uneasiness amongst the aforementioned important decision makers as they were finally forced to see for themselves, what every computational scientist was telling them all along: that PC clusters were no match for real well designed supercomputers.
They responded in their usual way: they assembled numerous committees to scrutinize the current context, reassess the US position in supercomputing and to outline future directions. And so even more tax payers' money got wasted. But eventually DARPA announced the High Productivity Computing Systems Program and awarded $50 millions each to Cray , IBM and Sun for the development of novel supercomputer architectures that would scale to PFLOPS.
The systems proposed by these three companies are not PC clusters. They are very, very different.
The Cray's Cascade system derives from the earlier PFLOPS architecture developed for Defense some five years ago as well as from the Cray MTA multi-threaded architecture machine, previously known as ``Tera''. Cascade is being developed jointly with researchers from JPL, Caltech, Stanford and Notre Dame. The Cascade machine will combine a lot of earlier ideas such as UMA and NUMA SMPs, hybrid electrical and optical interconnect, lightweight threads in memory and aggressively parallelizing compiler technology. But perhaps the greatest innovation is going to be the use of processor in memory (PIM) chips. Amongst the greatest handicaps of present day systems is movement of data from memory to CPUs. This problem can be overcome by bringing CPUs to data and computing directly in the memory. This is how PIM chips work.
IBM's proposed machine is called PERCS, which stands for Productive, Easy-to-use, Reliable Computing System. The system will reconfigure its hardware dynamically, on the flight, to suit a problem at hand. IBM has the ambition to make it into a new commercial computing architecture that may deal a blow to the PC and PC clusters. This system will use some off-the-shelf components, most notably IBM's POWER5 (or later) processors, but much of the rest is going to be quite new. They may use PIMs too.
Sun plans to base its machine on a quite revolutionary concept of clockless computing. Computation in clockless computing unfolds at almost analog speed, though it is still digital, but it is not halted by constant references to the system clock. Every computational process runs as fast as it possibly can and stops only when it needs to communicate with other processes.
But none of these three systems is going to be delivered any time soon, and when they eventually do get to see the light of day, they'll be restricted to defense facilities, national laboratories and, perhaps, national supercomputer centers. It is unlikely that you'll see any of these machines (perhaps with the notable exception of the IBM system) in your laboratory or on your desktop.
In the meantime the Earth Simulator remains the most powerful, the most computationally efficient and the easiest to program of all supercomputers on the planet. But, guess what it is a cluster too. It is a cluster of 640 supercomputers (NEC SX6), each of which is a 64 GFLOPS system. Because the nodes of Earth Simulator are real supercomputers in their own right, not PCs, the system delivers very high computational efficiency of between 40% and 60% on production applications. It is this very high efficiency, more than its peak performance, that made the US computational scientists stop and think.
Yet, the fact that the Earth Simulator is a cluster too is actually good news. It means that whatever you are going to learn in this course is going to be applicable not only to PC clusters, but also to systems such as the Earth Simulator. It is going to be applicable to Cray X1 (and its successor the Black Widow), to Cray Red Storm, to IBM SP, to clusters of IA64s and Opterons, and even to IBM PERCS discussed above.
The computational methods and paradigms we are going to discuss in this course are universal enough to be applicable to a very broad range of systems. For this same reason we will stay away from any IA32 specifics.