All message passing operations and file access operations we have
discussed in this course so far are blocking. This means that
when, e.g., you issue a call to
MPI_Send, the call
returns only after all the data in the send
buffer has been sent, meaning that it is now safe to perform
other operations on the send buffer, e.g., you may write to it
again. Similarly when you issue
a call to
MPI_Recv, the call returns only
after all the data that you expect to receive in the receive
buffer has been written on the buffer, meaning that it is now
safe to perform other operations on the receive buffer, e.g.,
it is safe to read it.
The file IO semantics are similar. The blocking IO operations
not return until all the data has been taken out of the
write buffer, or all the data has been written onto the
read buffer, so that it is now safe to use or re-use either one or the
Blocking of these functions calls is local, i.e., they block only for as long as the send or write or receive or read buffers are in use by the communication functions. There is another type of blocking, which is markedly more severe. It is called synchronous or global blocking. If you send a message with MPI_Ssend, the function will return only after a matching receive has been activated on the other side, and the receive process has started reading the data into its receive buffer.
On the other hand we also have totally non-blocking operations such as MPI_Isend and MPI_Irecv, which merely initiate the send or the receive and return right away, even as their send or receive buffers are still being used by data transfer operations. Of course, while the transfer is under way, you must not touch the buffers.
What do we need such non-blocking operations for? The reason for their existence is that message passing and file access operations are extremely slow by computing standards. In the time it take to read data from a file, or to send data to other processes, you may be able to perform thousands, even millions of arithmetic operations. So if every nano-second counts, you want to be able to do just this: compute, while data transfer operations execute in the background.
But how are you going to know that a particular data transfer operation you have initiated has completed?
All non-blocking MPI functions take an additional argument of
It is a yet another opaque MPI data
type. Once you have initiated a data transfer you get this
request back and you can then call
which takes your
request as an argument and checks whether the
corresponding communication operation has completed. The synopsis of
int MPI_Test(MPI_Request *request, int *completed, MPI_Status *status)The value of
completedis set to
TRUEwhen the communication operation pointed to by
requesthas indeed completed. Otherwise
FALSE. Additionally the
statusvariable may be inspected for other details pertaining to the operation, e.g., the rank number of a sender process or the number of items received.
If you have finished all you wanted to do while the data is still being transferred you can instead issue the call to MPI_Wait, the synopsis of which is
int MPI_Wait(MPI_Request *request, MPI_Status *status)There is no
MPI_Wait. This function returns only after the operation pointed to by the
MPI-IO supports similar non-blocking functions for writing on and reading from files. The functions are MPI_File_iwrite and MPI_File_iread . Their respective synopses are:
int MPI_File_iwrite(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request) int MPI_File_iread(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request)Observe that unlike
MPI_File_ireaddoes not take
statusas an argument. You have to call
MPI_Statusto get hold of
statusin this case. The reason for this should be obvious. When
MPI_File_ireadreturns, there is no status yet to read. It will only come to existence after the reading operation has completed.
There are no asynchronous versions of collective file access operations
You can actually express
MPI_File_read as combinations
MPI_Wait. The following:
MPI_File_iwrite(fh, buf, count, datatype, &request); MPI_Wait(&request, &status);is equivalent to
MPI_File_write(fh, buf, count, datatype, &status);and
MPI_File_iread(fh, buf, count, datatype, &request); MPI_Wait(&request, &status);is equivalent to
MPI_File_read(fh, buf, count, datatype, &status);