Dividing the Pie

`cpi`

. The source of this program is distributed with
MPICH2, and it lives in the directory
`/N/hpc/mpich2/src/mpich2-0.94b1/examples`

.
The file name is `cpi.c`

.
For your convenience and because it is an open software program, I also quote
it below:gustav@bh1 $ pwd /N/hpc/mpich2/src/mpich2-0.94b1/examples gustav@bh1 $ cat cpi.c #include "mpi.h" #include <stdio.h> #include <math.h> double f(double); double f(double a) { return (4.0 / (1.0 + a*a)); } int main(int argc,char *argv[]) { int done = 0, n, myid, numprocs, i; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x; double startwtime = 0.0, endwtime; int namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name,&namelen); fprintf(stdout,"Process %d of %d is on %s\n", myid, numprocs, processor_name); fflush(stdout); n = 10000; /* default # of rectangles */ if (myid == 0) startwtime = MPI_Wtime(); MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); h = 1.0 / (double) n; sum = 0.0; /* A slightly better approach starts from large i and works back */ for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += f(x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) { endwtime = MPI_Wtime(); printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); printf("wall clock time = %f\n", endwtime-startwtime); fflush(stdout); } MPI_Finalize(); return 0; } gustav@bh1 $This program calculates by using the formula

(5.1) |

The calculation is numerical, i.e., the segment [0,1] is divided into 10,000 equal fragments. Within each fragment we evaluate for the middle of the fragment (this is the height of the rectangle) and multiply it by the width of the fragment to obtain the area of the rectangle. Then we add all the areas together and this is our integral, that is also the approximate value of .

The program begins with the usual incantations to
`MPI_Init`

, `MPI_Comm_size`

, `MPI_Comm_rank`

and
`MPI_Get_processor_name`

. Then each process writes on standard output
its rank number (which is called `myid`

here), the size of the
communicator (which is called `numprocs`

here) and the name of the
processor it runs on.

We have not talked about it so far, but this writing on standard output is far from trivial. Every process writes onNow we hardwire the number of division of segment [0,1] intoits ownstandard output. In some early versions of MPI, e.g., in LAM MPI, these writes were simply lost. But MPICH2 engine collects all standard outputs from its processes and combines them together on the so calledconsole node. This is always the node whose rank is zero, the node on which we run`mpdboot`

. The output you see on the PBS output file is the output from the MPICH2 console node.

`n`

and the console node begins the timing of the program by
calling function
MPI_Wtime.
This function measures wall-clock time down to a
Now the value of `n`

is broadcast to all processes. This is completely
unnecessary in this particular program, because all nodes know what `n`

is from the very beginning, but this program has been modified from its
interactive predecessor, which used to read `n`

from standard input.
It was the console node that used to do the reading.

Now every process calculates the width of the rectangles:

h = 1.0 / (double) n;Every process initializes its own local variable

`sum`

to zero,
and then calls
on
the rectangle that
corresponds to its own rank number. Then it jumps to the rectangle that
corresponds to its rank number plus the number of processes in the pool, and
so on. The values of
are added for all rectangles
and only at the end they are multiplied by the width of the rectangle. This,
of course, saves on unnecessary multiplications.
Now we encounter a new operation,
MPI_Reduce.
This operation takes as its input the content of `mypi`

for each
process in the pool (the first argument). We tell it that there is one item (the third argument)
of type `MPI_DOUBLE`

(the fourth argument) in there. We are then telling it that it should
perform summation over all instances of `mypi`

(the fifth
argument, `MPI_SUM`

) and that it should write the result of the
operation on `pi`

(the second argument) on the root process (the sixth
argument), which is this case is the console process, i.e., process of rank zero.
The whole operation is performed within the `MPI_COMM_WORLD`

communicator.

At the end of this operation only the root (console) process has the right answer in
its location that corresponds to `pi`

. But this is fine. In the next
clause we instruct the console process to measure the wall clock time again,
then print the value of
on standard output together with the estimate
of the accuracy of the computation. Then the console process also writes
how long it took to carry out the computation.

Here is the output of the program.

gustav@bh1 $ pwd /N/B/gustav/PBS gustav@bh1 $ cat mpi_out Local MPD console on bc89 Process 0 of 8 is on bc89 Process 1 of 8 is on bc40 Process 2 of 8 is on bc42 Process 3 of 8 is on bc41 Process 4 of 8 is on bc44 Process 5 of 8 is on bc43 Process 6 of 8 is on bc88 Process 7 of 8 is on bc46 pi is approximately 3.1415926544231247, Error is 0.0000000008333316 wall clock time = 0.009541 gustav@bh1 $

`MPI_SUM`

is not the only operation you can use with
`MPI_Reduce`

. MPI provides a number of predefined reduction operations that
can be used in this context, and you can define your own
operations too. The predefined ones are:

`MPI_MAX` |
`MPI_MIN` |
`MPI_SUM` |

`MPI_PROD` |
`MPI_LAND` |
`MPI_BAND` |

`MPI_LOR` |
`MPI_BOR` |
`MPI_LXOR` |

`MPI_BXOR` |
`MPI_MAXLOC` |
`MPI_MINLOC` |