How are we going to use data types in order to tell processes how to partition a file?
This is how. Suppose the picture below represents a file. Each little square corresponds to a data item of some elementary type, and this type may be quite complex. It is elementary not because it is simple, but because this is what the file is made of.
Now, let the filetype for process of rank 1 be:
Similarly the filetype for process of rank 2 is:
Now process of rank 0 is going to call function MPI_File_set_view to establish its view of the file as follows:
MPI_File_set_viewis the file handle;
MPI_File_set_viewis the displacement from the beginning of the file in bytes of the place where this file view begins - a file may have different views associated with it in various places;
MPI_File_set_viewis the elementary data type;
MPI_File_set_viewis the file data type, which must be defined in terms of elementary data types;
MPI_File_set_viewis a string that defines the data representation - in this case it is ``native'';
MPI_File_set_viewis the info structure - in this case it is
Once all processes have issued the calls this is how the file is going to be partitioned:
MPI_File_read, it is going to read its own items only, i.e., the ones labeled with 1. Its own file pointer will be automatically advanced to the required location in the file.
So this is how the file gets partitioned without us having to specify
separate file offsets for each process explicitly. But constructing
such different file views for each process may not be all this easy
either. Luckily MPI-2 provides us with a very powerful
that can generate process dependent file views automatically.
But before I get to explain how this function works, let me go
MPI_File_set_view and explain in more detail the meaning of
its various arguments, as well as the behaviour of the function
MPI_File_set_view is a collective function. All processes
that have opened the file have to participate in this call.
The file handle and the data representation strings must be identical
for all processes. The extent of the elementary type, i.e.,
the distance between
its upper and its lower marker in bytes, must be
the same for all processes. But the processes may call this function
with different displacements, file types and infos. Note that
apart from differentiating the view with a process specific file
type, you may use different initial displacements too.
The data representation string specifies how the data that is passed
MPI_File_write is going to be stored on the file itself.
The simplest way to write a file, especially under UNIX, is
to copy the bytes from memory to the disk without any further
processing. But under other operating systems files may have fancy
forks, format records and what not. Even under UNIX Fortran files differ
from plain C-language files, because Fortran files may have record markers
embedded in them.
MPI defines three data representations and MPI implementations are free to add more. The three basic representations are:
When the file gets opened with
MPI_File_open, you get
the default view, which is equivalent to the call:
MPI_File_set_view(fh, 0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL);
Now let us get to MPI_Type_create_darray, the function that is going to make our task of defining process dependent file views easier.
This function does a lot of very hard work and, at the same time, it is going to save the programmer a lot of very hard work too, but for this very reason it is a little complicated. Its synopsis is as follows:
int MPI_Type_create_darray( int size, int rank, int ndims, int array_of_gsizes, int array_of_distribs, int array_of_dargs, int array_of_psizes, int order, MPI_Datatype oldtype, MPI_Datatype *newtype)When called it is going to generate the datatypes corresponding to the distribution of an
ndims-dimensional array of
oldtypeelements onto an
ndims-dimensional grid of logical processes.
Remember how we had a 2-dimensional grid of processes in
section 5.2.5 that talked about solving a diffusion problem.
There we also had a 2-dimensional array of integers,
which we have distributed manually
amongst the processes of the 2-dimensional grid, so that each process got
a small portion of it and then worked on it updating its edges by getting
values from its neighbours. Function
going to deliver us of such partitioning automatically.
The parameters of the function have the following meaning
ndims; each entry in the array tells us about the number of elements of type
oldtypein the corresponding dimension of the global array;
MPI_DISTRIBUTE_BLOCK- which requests block distribution along the corresponding dimension,
MPI_DISTRIBUTE_CYCLIC- which requests cyclic distribution along the corresponding dimension, and
MPI_DISTRIBUTE_NONE- which requests no distribution along the corresponding dimension;
ndims; each entry in the array is the argument that further specifies how the distribution of the array should be done - there is one MPI constant provided here,
MPI_DISTRIBUTE_DFLT_DARG, which lets MPI do default distribution characterized only by
ndims; each entry in the array tells us about the number of processes in the corresponding dimension of the process grid;
At this stage I feel that you need a programming example to make sense of all this. So here it is.