The code begins innocently enough like all other MPI codes that we have seen so far: MPI itself is initialised, then every process finds out about the size of the process pool, and its own rank within that pool.

MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &pool_size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

And then, as usual, one of the processes assumes the role of the master:

if (my_rank == MASTER_RANK) i_am_the_master = TRUE;

But the next operation is entirely new:

MPI_Cart_create ( MPI_COMM_WORLD, PROCESS_DIMENSIONS, divisions, periods, reorder, &cartesian_communicator );What happens here is as follows. The processes create a new communicator, which, unlike the

`MPI_COMM_WORLD`

, has a special topology imposed
on it. The new communicator, the `MPI_Cart_create`

as the last argument, is formed out of the old one,
which is passed to `MPI_Cart_create`

in the first argument.
The four parameters passed in the middle: `PROCESS_DIMENSIONS`

,
`divisions`

, `periods`

, and `reorder`

, specify the topology
of the new communicator. When we say
The value of `PROCESS_DIMENSIONS`

is 2, which means that
the processes are to be organised into a 2 dimensional grid.
`divisions`

is going to be an array of rank 1, whose entries
specify lengths of the process grid in those two dimensions, in this
case:

divisions[PROCESS_DIMENSIONS] = {4, 3};that is, the processes are going to be organised into a rectangular grid with 4 rows and 3 columns.

The parameter `periods`

specifies if the topology is going to
be a wrap-around topology or an open ended topology, i.e., if the
processes that are on the borders of the region should have as their
*over the border* neighbours the guys on the other side, or
nobody. In this case we simply say that

periods[PROCESS_DIMENSIONS] = {0, 0};which means that we don't want any wrapping.

The last parameter in this group, `reorder`

, specifies if processes
should be reordered for efficiency. This implies that processes may be
moved around between processors, or just renumbered,
so as to utilize better any hardware
architecture that a given machine may have. Examples of such
machines are Cray T3E, Fujitsu AP-3000, and NEC Cenju-3.

If you have asked for a
Cartesian communicator and
started with, say, 15 processes, then 3 of those 15 processes would
have to be rejected from the new communicator, because there is
only going to be enough space in it for 12 processes. The rejected
processes will find that after `MPI_Cart_create`

returns,
the returned value of `cartesian_communicator`

is
`MPI_COMM_NULL`

. What this means is that those processes
didn't get the job.

They may hang about, if they choose, or they may go right
for `MPI_Finalize`

- this is up to the programmer.
You may create more than one communicator with various properties,
so that the processes that didn't make it into the
`cartesian_communicator`

will get jobs elsewhere.

For this reason the remainder of the program is one
large `if`

statement:

if (cartesian_communicator != MPI_COMM_NULL) { blah... blah... blah... } MPI_Finalize (); }

Now, assuming that you did get a job, as a process, with the
`cartesian_communicator`

, the first thing that you do
is to find your new rank number within that new communicator:

MPI_Comm_rank ( cartesian_communicator, &my_cartesian_rank );And the reason for this is that your rank number in the

`cartesian_communicator`

is an entirely different thing
from your old number in the `MPI_COMM_WORLD`

. They're like
a tax file number in Australia and a tax file number in the US.
But now, since the `cartesian_communicator`

is a communicator
with a topology, you can make inquiries about your neighbourhood:

MPI_Cart_coords ( cartesian_communicator, my_cartesian_rank, PROCESS_DIMENSIONS, my_coordinates ); MPI_Cart_shift ( cartesian_communicator, SIDEWAYS, RIGHT, &left_neighbour, &right_neighbour ); MPI_Cart_shift ( cartesian_communicator, UPDOWN, UP, &bottom_neighbour, &top_neighbour );

The first call to `MPI_Cart_coords`

returns your Cartesian
coordinates in an array `my_coordinates`

of length
`PROCESS_DIMENSIONS`

. Naturally you have to tell
`MPI_Cart_coords`

who you are, in order to obtain that
information. You have to pass on your new Cartesian rank number
in the second slot of the function.

The following two calls to `MPI_Cart_shift`

tell you about
the rank numbers of your left and right neighbours and of your
bottom and top neighbours. So the function `MPI_Cart_shift`

doesn't really *shift* anything. It's more like looking up
who your neighbours are. But the designers of MPI thought of it
in terms of *shifting* rank numbers of your neighbours
from the left and from the right and from above and from below
into those containers, which you call `left_neighbour`

,
`right_neighbour`

, `bottom_neighbour`

, and `top_neighbour`

.

This stuff is worth displaying, so what we do now is to pack all that information into a message and send it to the master process.

Now we can commence the communication that occurs within
the `MPI_COMM_WORLD`

:

if (! i_am_the_master ) { sprintf(char_buffer, "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); MPI_Send(char_buffer, strlen(char_buffer) + 1, MPI_CHAR, MASTER_RANK, 3003, MPI_COMM_WORLD); } else { int number_of_c_procs, count; number_of_c_procs = divisions[0] * divisions[1]; for (count = 0; count < number_of_c_procs - 1; count++) { MPI_Recv(char_buffer, BUFSIZ, MPI_CHAR, MPI_ANY_SOURCE, 3003, MPI_COMM_WORLD, &status); printf ("%s\n", char_buffer); } printf( "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d\n", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); }By now you should be able to figure out this part of the code for yourself, because we've done both

`MPI_Send`

and `MPI_Recv`

.
Observe a subtle bug here: we are assuming that the master process
is still going to hang around! In other words that the master process
was included into the `cartesian_communicator`

.

What will happen if it is not included?

In our case we are going to run this job requesting exactly the right
number of processes, so the master process is guaranteed to be
included into the `cartesian_communicator`

. Having said
that you may consider modifying the logic of the program so
as to get rid of the bug.

Here is what the output looks like, once the master process gets down to it:

process 8, cartesian 8, coords ( 2, 2), left 7, right -3, top 11, bottom 5 process 4, cartesian 4, coords ( 1, 1), left 3, right 5, top 7, bottom 1 process 1, cartesian 1, coords ( 0, 1), left 0, right 2, top 4, bottom -3 process 5, cartesian 5, coords ( 1, 2), left 4, right -3, top 8, bottom 2 process 11, cartesian 11, coords ( 3, 2), left 10, right -3, top -3, bottom 8 process 9, cartesian 9, coords ( 3, 0), left -3, right 10, top -3, bottom 6 process 3, cartesian 3, coords ( 1, 0), left -3, right 4, top 6, bottom 0 process 7, cartesian 7, coords ( 2, 1), left 6, right 8, top 10, bottom 4 process 2, cartesian 2, coords ( 0, 2), left 1, right -3, top 5, bottom -3 process 6, cartesian 6, coords ( 2, 0), left -3, right 7, top 9, bottom 3 process 10, cartesian 10, coords ( 3, 1), left 9, right 11, top -3, bottom 7 process 0, cartesian 0, coords ( 0, 0), left -3, right 1, top 3, bottom -3You should compare this with the matrix lay-out that I have printed above and you'll see that it all makes some sense.

Observe that cartesian rank numbers here are the same as the world rank numbers. But you must not count on it. It just happens to be so here, on the SP, for this particular program, and under this particular version of PSSP, MPI, etc. This is not an MPI requirement. Just the opposite, in fact.

What happens now is this simple piece of code:

for ( i = 0; i < ROWS; i++ ) { for ( j = 0; j < COLUMNS; j++ ) { matrix [i][j] = my_cartesian_rank; } }Every process simply initializes its own little matrix with its own cartesian rank number. The whole matrix is initialized, including the parts that are going to be dedicated to mirroring, along the borders of the matrix.

Once the matrices have been initialized, the processes send them to the master process, who dilligently collects them and displays on standard output:

if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 3003, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 3003 );But observe that this time the communication takes place entirely within the

`cartesian_communicator`

, and the process that
collects all matrices does not have to be the same process as
the master process in the `MPI_COMM_WORLD`

communicator.
Now the processes in the `cartesian_communicator`

have to
exchange information with their neighbours:

MPI_Sendrecv ( &matrix[ROWS - 2][0], COLUMNS, MPI_INT, top_neighbour, 4004, &matrix[0][0], COLUMNS, MPI_INT, bottom_neighbour, 4004, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[1][0], COLUMNS, MPI_INT, bottom_neighbour, 5005, &matrix[ROWS - 1][0], COLUMNS, MPI_INT, top_neighbour, 5005, cartesian_communicator, &status );While sending this stuff up and down, we're making use of the fact that matrices in C are stored in the

`&matrix[ROWS - 2][0]`

and is `COLUMNS`

items long. The items are of type `MPI_INT`

and the corresponding
message is going to have tag 4004.
At the same time, every process *receives* into its bottom row
data sent by its bottom neighbour. The beginning of the receive buffer
is `&matrix[0][0]`

, the buffer contains items of type `MPI_INT`

,
and the number of those items is `COLUMNS`

.

Of course this communication must take place within the `cartesian_communicator`

, because it is only within this communicator that the notion of
a bottom and top neighbour makes sense.

The second `MPI_Sendrecv`

operation does the same, but in the opposite
direction, i.e., the data is now sent down, whereas previously it went up.

Having done that we display the state of the whole system again:

if (my_cartesian_rank != MASTER_RANK) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 6006, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 6006 );

The last part of the code is a little tricky. Here we have to exchange columns between neighbours, but columns are not stored contiguous in C.

MPI provides a very powerful apparatus for situations like that.

First we define a stridden data type:

MPI_Type_vector (ROWS, 1, COLUMNS, MPI_INT, &column_type); MPI_Type_commit (&column_type);This data type is constructed as follows: it comprises

`ROWS`

items
of type `MPI_INT`

. The items are collected by picking up 1 item
every `COLUMNS`

steps from a contiguous storage. The description
of this peculiar data type is placed into `column_type`

. Once
the type has been defined it must be Now we can exchange columns between neighbours in a much the same way we did rows before:

MPI_Sendrecv ( &matrix[0][1], 1, column_type, left_neighbour, 7007, &matrix[0][COLUMNS - 1], 1, column_type, right_neighbour, 7007, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[0][COLUMNS - 2], 1, column_type, right_neighbour, 8008, &matrix[0][0], 1, column_type, left_neighbour, 8008, cartesian_communicator, &status );The first operation sends

`column_type`

,
stored at `&matrix[0][1]`

to the `left_neighbour`

. The corresponding
message has tag 7007. At the same time another single item of `column_type`

is received from the `right_neighbour`

and placed in
`&matrix[0][COLUMNS - 1]`

. Again, observe that we are not sending
the border columns: we are sending internal columns, and we're receiving
them into the border columns.
The second `MPI_Sendrecv`

call does the same in the opposite direction,
i.e., from left to right.

After the data has been exchanged, the state of the system is displayed again by calling:

if (my_cartesian_rank != MASTER_RANK) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 9009, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 9009 );

So, this is what the program does.

It is messy and clumsy compared to an HPF program, where you don't have to bother about any of that at all. IBM HPF compiler, as a matter of fact, compiles your HPF program to an MPI program that does very much what this simple example code illustrates. To use HPF in place of MPI, wherever applicable, will save you a lot of hard work.