Recompile program ``Bank Queue'' with the -g switch, and run it with tracing turned on. Invoke the vt on the trace generated thusly.
Invoke the Source Code window and the Interprocessor Communication Window.
Now Reset the trace replay to the beginning and begin to step through the code manually by pressing the Step key in the main Visualization Tool panel. Observe that as you step in time you can see which process executes which line in the Source Code window, as the little highlighted rectangles that correspond to the participating processes descend on the source code lines from the top bar.
If you would like to scroll through the whole trace automatically, for this program you should slow down the trace replay to a minimum, because it is a very short program. As you replay the trace at slow speed, stop it approximately in the middle. The Interprocessor Communication window looks pretty crowded. You can stretch the time axis in this window by changing the magnification in the main vt panel. Observe that for this program and with these parameters the processes involved spend most of their time waiting on MPI Blocking Receive. You can see that by right-clicking on the pink fields in the Interprocessor Communication window. Process 0 is not too busy either spending most time on MPI Blocking Send (blue fields).
It may happen that one or more of your processes are very slow compared to other processes and don't participate in the communication and computation at all. As I look at my display, I can see that process 5 hasn't done anything, until the very end when every other process finished its job already. In fact, after the whole computation has ended, I can see that process 0 still has to wait a very long time on MPI Blocking Receive for process 5, which owes it the result of multiplication of the row of matrix A it has received from it at the very beginning by vector b.
Tools such as the vt are very enlightening. They show us how exorbitant is the cost of interprocess communication. This cost can be offset only by a very large amount of work dumped onto every slave process by the master.
Now let us have a look at some other display windows. Click on the User Utilization button in the VT View Selector panel. By default you get a cumulative view. But if you invoke a display specific menu by right-clicking on the User Utilization window, you can select Individual View that will show you the history of the load you have subjected every CPU participating in the computation and in the communication to.
Looking at these diagrams it may appear that the program has used the system very effectively, because User Utilization is high, and Processor Idle is low. But remember that when you send messages to other processes, the CPUs involved are very busy doing all that, so the CPUs work just as hard when they send and receive messages, as when they do computations for you. If a CPU has to wait for a message, it spins and and checks various handles and buffers every now and then. This is all work, and this is going to show up as such in these diagrams. But it is not useful work from our point of view, because no computation is being done when the CPU spins awaiting a message.
As you scroll the VT View Selector window you can see that there is a lot of information available for you in the trace. But for the parallel programmer the most important information is contained in the Communication/Program group.