The simplest parallel programming paradigm is data parallelism.
If you have, say, two one-dimensional arrays A and B, of equal size, say, N double precision floating point numbers each, the traditional way to add such arrays in the C-language would be
for (i = 0; i < n; i++) c[i] = a[i] + b[i];You'd do it similarly in Fortran-77 or Pascal or any other sequential programming language. But in Fortran-90 you can simply state:
C = A + Bwhich corresponds closely to how you would write it in mathematics:
c[i] = a[i] + b[i] independently.
If you run a Fortran 90
compiler on a Cray X1 or NEC SX6, the compiler
will automatically convert program lines such as C = A + B into
parallel operations that will execute c[i] = a[i] + b[i]
simultaneously (or almost simultaneously - depending on the architecture
of the machine) for all values of i.
A lot of scientific and engineering computing can be expressed efficiently in this data-parallel paradigm. All field problems like electrodynamics, fluid dynamics and chromodynamics fit in this category. Weather and climate modeling codes do too. Gene matching and searching codes can be expressed in terms of data parallelism as well. Basically anything that operates on very large arrays of data submits to data parallelization easily.
Data parallel languages provide various constructs for conditional differentiation of some operations on various parts of arrays, e.g., masking, and for shifting the content of the arrays in a way that is similar to shifting the content of the register, i.e., you can shift arrays left and right, you can rotate them, or shift with padding.
Data parallel programs are very easy to understand and to debug
because their logic is essentially sequential. But it is also
this sequential logic that results in some inefficiencies. Suppose
you want to set values in the
B array to zero
for all such is for which c[i] == 1. The Fortran-90
statement to do this is
where (c .eq. 1) b = 0 end whereIf
c[i] == 1 for just a handful of is, this operation
will be performed on only a handful of processors that are responsible
for the b[i] involved, whereas all the other processors will
idle for the duration of this operation. If the operation is very
involved, e.g., there may be some very complex and long computation
going on within the confines of the where statement, the
idling processors may end up doing nothing for a long time.
Nevertheless, there is a lot to be said in favor of data parallelism. It is much more universal than many computer scientists are prepared to admit, and because it is so easy to use, it should be provided to supercomputer users, especially the ones writing their own codes. Data parallelism is very highly scalable too. We will probably see the return of data parallel computing in context of petaflops systems. The PIM (Processor in Memory) architecture is very similar to the Connection Machine (see below).
Data parallel programs run best on vectors and massively parallel machines, like the the Connection Machine. They can run, although not as efficiently, on SMPs and clusters too, but then, not much will run on clusters efficiently anyway. There is a special variant of data parallel Fortran for clusters, called High Performance Fortran (HPF). The best HPF compilers can be bought from The Portland Group Compiler Technology.
For more information on HPF see, e.g., the High Performance Fortran page at the Rice University. Also see a brief tutorial about HPF, which was included in the P573 course.