This program contains numerous new elements and MPI function calls, so this
section may be a little heavy. But it also illustrates where MPI differs from
earlier, much simpler, but also much more primitive message passing systems.
MPI is very powerful. It is not just *send* and *receive*.

The code begins innocently enough like all other MPI codes that we have seen so far: MPI itself is initialised, then every process finds out about the size of the process pool, and its own rank within that pool.

MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &pool_size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);And then, as usual, one of the processes assumes the role of the master:

if (my_rank == MASTER_RANK) i_am_the_master = TRUE;But the next operation is entirely new:

MPI_Cart_create ( MPI_COMM_WORLD, PROCESS_DIMENSIONS, divisions, periods, reorder, &cartesian_communicator );What happens here is as follows. The processes create a new communicator, which, unlike the

`MPI_COMM_WORLD`

, has a special topology
imposed on it. The new communicator, the place for which is passed to
MPI_Cart_create
as the last argument, is formed out of the old one,
which is passed to `MPI_Cart_create`

in the first argument. The four
parameters passed in the middle: `PROCESS_DIMENSIONS`

, `divisions`

,
`periods`

, and `reorder`

, specify the topology of the new
communicator. When we say
The value of `PROCESS_DIMENSIONS`

is 2, which means that the
processes are to be organised into a 2 dimensional
grid. `divisions`

is going to be an array of rank 1, whose
entries specify lengths of the process grid in those two dimensions,
in this case:

#define PROCESS_ROWS 4 ... #define PROCESS_COLUMNS 3 ... int divisions[PROCESS_DIMENSIONS] = {PROCESS_ROWS, PROCESS_COLUMNS};that is, the processes are going to be organised into a rectangular grid with 4 rows and 3 columns.

The parameter `periods`

specifies if the topology is going to be a
wrap-around topology or an open ended topology, i.e., if the processes
that are on the borders of the region should have as their over the
border neighbours the guys on the other side, or nobody. In this case
we simply say that

int periods[PROCESS_DIMENSIONS] = {0, 0};which means that we don't want any wrapping.

The last parameter in this group, `reorder`

, specifies if processes
should be reordered for efficiency. This implies that processes may be
moved around between processors, or just renumbered, so as to utilize
better any hardware architecture that a given machine may
have. Examples of such machines are Cray T3E , Cray X1 ,
Fujitsu AP-3000 , and
NEC Cenju-3 .

If you have asked for a
Cartesian communicator and started with, say,
15 processes, then 3 of those 15 processes would have to be rejected
from the new communicator, because there is only going to be enough
space in it for 12 processes. The rejected processes will find that
after `MPI_Cart_create`

returns, the returned value of
`cartesian_communicator`

is `MPI_COMM_NULL`

. What this means
is that those
processes didn't get the job.

They may hang about, if they choose, or they may go right for
`MPI_Finalize`

- this is up to the programmer. You may create more than
one communicator with various properties, so that the processes that
didn't make it into the `cartesian_communicator`

will get jobs
elsewhere.

For this reason the remainder of the program is in the form of one large if statement:

if (cartesian_communicator != MPI_COMM_NULL) { blah... blah... blah... } MPI_Finalize (); exit (0); }

Now, assuming that you did get a job, as a process, with the
`cartesian_communicator`

, the first thing that you do is to find
your new rank number within that new communicator:

MPI_Comm_rank ( cartesian_communicator, &my_cartesian_rank );And the reason for this is that your rank number in the

`cartesian_communicator`

`MPI_COMM_WORLD`

communicator. They're like a Tax File Number in
Australia and a Social Security Number in the US.
But now, since the `cartesian_communicator`

is a communicator
with a topology, you can make inquiries about your neighbourhood:

MPI_Cart_coords ( cartesian_communicator, my_cartesian_rank, PROCESS_DIMENSIONS, my_coordinates ); MPI_Cart_shift ( cartesian_communicator, SIDEWAYS, RIGHT, &left_neighbour, &right_neighbour ); MPI_Cart_shift ( cartesian_communicator, UPDOWN, UP, &bottom_neighbour, &top_neighbour );The first call to MPI_Cart_coords returns your Cartesian coordinates in an array

`my_coordinates`

of length
`PROCESS_DIMENSIONS`

. Naturally you have to tell
`MPI_Cart_coords`

who you are, in order to obtain that information. You have to
pass on your new Cartesian rank number in the second slot of the function.
The following two calls to
MPI_Cart_shift
tell you about the rank numbers of your left and right neighbours and
of your bottom and top neighbours. So the function
`MPI_Cart_shift`

doesn't really shift anything. It's more like
looking up who your neighbours are. But the designers of MPI thought
of it in terms of shifting rank numbers of your neighbours from the
left and from the right and from above and from below into those
containers, which you call `left_neighbour`

,
`right_neighbour`

, `bottom_neighbour`

, and
`top_neighbour`

.

This stuff is worth displaying, so what we do now is to pack all that information into a message and send it to the master process.

if (! i_am_the_master ) { sprintf(char_buffer, "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); MPI_Send(char_buffer, strlen(char_buffer) + 1, MPI_CHAR, MASTER_RANK, 3003, MPI_COMM_WORLD); } else { int number_of_c_procs, count; number_of_c_procs = divisions[0] * divisions[1]; for (count = 0; count < number_of_c_procs - 1; count++) { MPI_Recv(char_buffer, BUFSIZ, MPI_CHAR, MPI_ANY_SOURCE, 3003, MPI_COMM_WORLD, &status); printf ("%s\n", char_buffer); } printf( "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d\n", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); }Observe that this communication takes place within the

`MPI_COMM_WORLD`

, not
within the `cartesian_communicator`

. The old communicator didn't go away,
when the new one was created, so it can be still used for various things.
Observe a subtle bug here: we are assuming that the master process is
still going to hang around! In other words that the master process was
included into the `cartesian_communicator`

.

What will happen if it is not included?

In our case we are going to run this job requesting exactly the right
number of processes, so the master process is guaranteed to be
included into the `cartesian_communicator`

. Having said that you may
consider modifying the logic of the program so as to get rid of the
bug.

Here is what the output looks like, once the master process gets down to it:

process 1, cartesian 1, coords ( 0, 1), left 0, right 2, top 4, bottom -1 process 10, cartesian 10, coords ( 3, 1), left 9, right 11, top -1, bottom 7 process 11, cartesian 11, coords ( 3, 2), left 10, right -1, top -1, bottom 8 process 9, cartesian 9, coords ( 3, 0), left -1, right 10, top -1, bottom 6 process 8, cartesian 8, coords ( 2, 2), left 7, right -1, top 11, bottom 5 process 7, cartesian 7, coords ( 2, 1), left 6, right 8, top 10, bottom 4 process 6, cartesian 6, coords ( 2, 0), left -1, right 7, top 9, bottom 3 process 5, cartesian 5, coords ( 1, 2), left 4, right -1, top 8, bottom 2 process 2, cartesian 2, coords ( 0, 2), left 1, right -1, top 5, bottom -1 process 3, cartesian 3, coords ( 1, 0), left -1, right 4, top 6, bottom 0 process 4, cartesian 4, coords ( 1, 1), left 3, right 5, top 7, bottom 1 process 0, cartesian 0, coords ( 0, 0), left -1, right 1, top 3, bottom -1

Observe that the cartesian rank numbers here are the same as the world rank numbers. But you must not count on it. It just happens to be so here, on the AVIDD, for this particular program, and under this particular version of MPICH2. This is not an MPI requirement.

What happens next is this simple piece of code:

for ( i = 0; i < ROWS; i++ ) { for ( j = 0; j < COLUMNS; j++ ) { matrix [i][j] = my_cartesian_rank; } }Every process simply initializes its own little matrix with its own cartesian rank number. The whole matrix is initialized, including the parts that are going to be dedicated to mirroring, along the borders of the matrix.

Once the matrices have been initialized, the processes send them to the master process, who dilligently collects them and displays on standard output:

if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 6006, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 6006 );Function

`collect_matrices`

hides the tedium of collecting, arranging and displaying
data received from other nodes.
This time the communication takes place entirely within the
`cartesian_communicator`

, and the process that collects all matrices
does not have to be the same process as the master process in the
`MPI_COMM_WORLD`

communicator.

Now the processes in the `cartesian_communicator`

have to
exchange information with their neighbours:

MPI_Sendrecv ( &matrix[ROWS - 2][0], COLUMNS, MPI_INT, top_neighbour, 4004, &matrix[0][0], COLUMNS, MPI_INT, bottom_neighbour, 4004, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[1][0], COLUMNS, MPI_INT, bottom_neighbour, 5005, &matrix[ROWS - 1][0], COLUMNS, MPI_INT, top_neighbour, 5005, cartesian_communicator, &status );While sending this stuff up and down with the MPI_Sendrecv operation , we're making use of the fact that matrices in C are stored in the row-major fashion, i.e., the other way to Fortran. The first statement sends the whole second row from the top (I have numbered these matrices upside down, so as to stick to the usual convention) to the top neighbour. The send buffer begins at

`&matrix[ROWS - 2][0]`

and is `COLUMNS`

items long. The items
are of type `MPI_INT`

and the corresponding message is going to have tag
`4004`

.
At the same time, every process receives into its bottom row data sent
by its bottom neighbour. The beginning of the receive buffer is
`&matrix[0][0]`

, the buffer contains items of type `MPI_INT`

, and the
number of those items is `COLUMNS`

.

Of course this communication must take place within the
`cartesian_communicator`

, because it is only within this communicator
that the notion of a bottom and top neighbour makes sense.

The second `MPI_Sendrecv`

operation does the same, but in the opposite
direction, i.e., the data is now sent down, whereas previously it went
up.

Having done that we display the state of the whole system again:

if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 6006, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 6006 );

The last part of the code is a little tricky. Here we have to exchange columns between neighbours, but columns are not stored in C contiguously.

MPI provides a very powerful apparatus for situations like this one.

First we define a stridden data type:

MPI_Type_vector (ROWS, 1, COLUMNS, MPI_INT, &column_type); MPI_Type_commit (&column_type);Function MPI_Type_vector constructs this special stridden data type as follows: the data type comprises

`ROWS`

items of
type `MPI_INT`

. The items are collected by picking up 1 item every
`COLUMNS`

steps from a contiguous storage. The description of this
peculiar data type is placed into `column_type`

. Once the type has been
defined it must be committed by calling
function
MPI_Type_commit.
Here we give a parallel machine an
opportunity to adjust its hardware or software logic in order to
prepare for manipulating this new data type.
Now we can exchange columns between neighbours in a much the same way we did rows before:

MPI_Sendrecv ( &matrix[0][1], 1, column_type, left_neighbour, 7007, &matrix[0][COLUMNS - 1], 1, column_type, right_neighbour, 7007, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[0][COLUMNS - 2], 1, column_type, right_neighbour, 8008, &matrix[0][0], 1, column_type, left_neighbour, 8008, cartesian_communicator, &status );The first operation sends one item of type

`column_type`

, stored
at `&matrix[0][1]`

to the `left_neighbour`

. The
corresponding message has tag `7007`

. At the same time another
single item of `column_type`

is received from the
`right_neighbour`

and placed in `&matrix[0][COLUMNS - 1]`

.
Again, observe that we are not sending the border columns: we are
sending internal columns, and we're receiving them into the border
columns. The second `MPI_Sendrecv`

call does the same in the
opposite direction, i.e., from left to right.
After the data has been exchanged, the state of the system is displayed again by calling:

if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 9009, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 9009 );So, this is what the program does.

In spite of MPI's powerful auxiliary functions this is still messy, clumsy and, worst of all, error prone, compared to data parallel environments such as High Performance Fortran that do it all automatically. IBM and PGI HPF compilers, as a matter of fact, compile your HPF program to an MPI code that does very much what this simple example illustrates. To use HPF in place of MPI, wherever applicable, will save you a lot of hard work.HPF has recently been installed on the AVIDD cluster and it lives in

`/opt/pgi/linux86/bin`

.