The simplest parallel programming paradigm is data parallelism.
If you have, say, two one-dimensional arrays A and B, of equal size, say, N double precision floating point numbers each, the traditional way to add such arrays in the C-language would be
for (i = 0; i < n; i++) c[i] = a[i] + b[i];You'd do it similarly in Fortran-77 or Pascal or any other sequential programming language. But in Fortran-90 you can simply state:
C = A + Bwhich corresponds closely to how you would write it in mathematics:
c[i] = a[i] + b[i]independently. If you run a Fortran 90 compiler on a Cray X1 or NEC SX6, the compiler will automatically convert program lines such as
C = A + Binto parallel operations that will execute
c[i] = a[i] + b[i]simultaneously (or almost simultaneously - depending on the architecture of the machine) for all values of
A lot of scientific and engineering computing can be expressed efficiently in this data-parallel paradigm. All field problems like electrodynamics, fluid dynamics and chromodynamics fit in this category. Weather and climate modeling codes do too. Gene matching and searching codes can be expressed in terms of data parallelism as well. Basically anything that operates on very large arrays of data submits to data parallelization easily.
Data parallel languages provide various constructs for conditional differentiation of some operations on various parts of arrays, e.g., masking, and for shifting the content of the arrays in a way that is similar to shifting the content of the register, i.e., you can shift arrays left and right, you can rotate them, or shift with padding.
Data parallel programs are very easy to understand and to debug
because their logic is essentially sequential. But it is also
this sequential logic that results in some inefficiencies. Suppose
you want to set values in the
B array to zero
for all such
is for which
c[i] == 1. The Fortran-90
statement to do this is
where (c .eq. 1) b = 0 end whereIf
c[i] == 1for just a handful of
is, this operation will be performed on only a handful of processors that are responsible for the
b[i]involved, whereas all the other processors will idle for the duration of this operation. If the operation is very involved, e.g., there may be some very complex and long computation going on within the confines of the
wherestatement, the idling processors may end up doing nothing for a long time.
Nevertheless, there is a lot to be said in favor of data parallelism. It is much more universal than many computer scientists are prepared to admit, and because it is so easy to use, it should be provided to supercomputer users, especially the ones writing their own codes. Data parallelism is very highly scalable too. We will probably see the return of data parallel computing in context of petaflops systems. The PIM (Processor in Memory) architecture is very similar to the Connection Machine (see below).
Data parallel programs run best on vectors and massively parallel machines, like the the Connection Machine. They can run, although not as efficiently, on SMPs and clusters too, but then, not much will run on clusters efficiently anyway. There is a special variant of data parallel Fortran for clusters, called High Performance Fortran (HPF). The best HPF compilers can be bought from The Portland Group Compiler Technology.
For more information on HPF see, e.g., the High Performance Fortran page at the Rice University. Also see a brief tutorial about HPF, which was included in the P573 course.