SUBROUTINE jacobi(a, d, v, nrot) USE nrtype; USE nrutil, ONLY : assert_eq, get_diag, nrerror, unit_matrix, & upper_triangle IMPLICIT NONE INTEGER(i4b), INTENT(out) :: nrot REAL(sp), DIMENSION(:), INTENT(out) :: d REAL(sp), DIMENSION(:,:), INTENT(inout) :: a REAL(sp), DIMENSION(:,:), INTENT(out) :: v INTEGER(i4b) :: i, ip, iq, n REAL(sp) :: c, g, h, s, sm, t, tau, theta, tresh REAL(sp), DIMENSION(SIZE(d)) :: b, z n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), & 'jacobi') CALL unit_matrix(v(:,:)) b(:) = get_diag(a(:,:)) d(:) = b(:) z(:) = 0.0 nrot = 0 DO i = 1, 50 sm = SUM(ABS(a), mask=upper_triangle(n,n)) IF(sm == 0.0) RETURN tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4) DO ip=1, n-1 DO iq=ip+1, n g=100.0_sp*ABS(a(ip,iq)) IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) & .AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN a(ip,iq)=0.0 ELSE IF (ABS(a(ip,iq)) > tresh) THEN h = d(iq)-d(ip) IF(ABS(h)+g == ABS(h)) THEN t = a(ip,iq)/h ELSE theta = 0.5_sp*h/a(ip,iq) t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2)) IF(theta < 0.0) t = -t END IF c = 1.0_sp / SQRT(1+t**2) s = t*c tau = s/(1.0_sp + c) h = t*a(ip, iq) z(ip) = z(ip) - h z(iq) = z(iq) + h d(ip) = d(ip) - h d(iq) = d(iq) + h a(ip, iq) = 0.0 CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq)) CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq)) CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n)) CALL jrotate(v(:,ip), v(:,iq)) nrot=nrot+1 END IF END DO END DO b(:) = b(:) + z(:) d(:) = b(:) z(:) = 0.0 END DO CALL nrerror('too many iterations in jacobi') CONTAINS SUBROUTINE jrotate(a1, a2) REAL(sp), DIMENSION(:), INTENT(inout) :: a1, a2 REAL(sp), DIMENSION(SIZE(a1)) : wk1 wk1(:) = a1(:) a1(:) = a1(:) - s*(a2(:) + a1(:) * tau) a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau) END SUBROUTINE jrotate END SUBROUTINE jacobi

Let me explain this code now in some detail.

The subroutine computes all eigenvalues and eigenvectors of
an
matrix `a`

. The elements of `a`

above
the diagonal are destroyed in the process. The eigenvalues
are returned on a vector of length *N*: `d`

, whereas
the eigenvectors are returned on `v`

, which a matrix
,
like `a`

. The total number of Jacobi rotations
is returned on `nrot`

.

The subroutine begins with a call to function `assert_eq`

, which
is a still to be defined function, whose purpose is to check
that `a`

is indeed a square matrix and that the sizes of
`d`

and `v`

match `a`

. If these conditions are
not met, the function exits with error message that contains
the word `'jacobi'`

, otherwise the common dimension *N*is returned and captured on `n`

:

n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), & 'jacobi')

Then we initialize `v`

to a unit matrix, extract the diagonal
from `a`

and copy it both on an auxiliary array `b`

and
on `d`

, which will ultimately return the eigenvalues. Another
auxiliary array, `z`

is set to zero. The subroutine could be
written without `b`

and `z`

- their use obscures the flow
of computation a little, and, I think that it is here only for the sake
of improving the numerical accuracy of the computational process, which
contains a lot of additions and subtractions:

CALL unit_matrix(v(:,:)) b(:) = get_diag(a(:,:)) d(:) = b(:) z(:) = 0.0

Now we initialize our Jacobi rotation counter to zero and commence
Jacobi *sweeps*. The procedure should normally converge in
no more than about 5 sweeps. If it takes more, something must be
really wrong. This main loop looks as follows:

nrot = 0 DO i = 1, 50 sm = SUM(ABS(a), mask=upper_triangle(n,n)) IF(sm == 0.0) RETURN tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4) DO ip=1, n-1 DO iq=ip+1, n blah... blah... blah... END DO END DO b(:) = b(:) + z(:) d(:) = b(:) z(:) = 0.0 END DO CALL nrerror('too many iterations in jacobi')At the very beginning we sum absolute values of all terms in the upper triangle of matrix

`a`

) and if that sum
ends up being 0 up to machine accuracy, we return:
No special cleaning is necessary at this stage, because all expected results are already in place.

The parameter `tresh`

is calculated by calling a Fortran intrinsic
`MERGE`

, which returns:

- for the first three sweeps, and
- 0 for the remaining sweeps.

We skip the discussion of what's inside the double loop that sweeps through
the upper triangular of
**A** and go right to what happens
after the sweep is done.

What you see there is what those two auxiliary vectors `b`

and
`z`

are for. Vector `z`

accumulates *t a*_{pq} with an
appropriate sign (- for *pp* and + for *qq*) for every diagonal
term that is stored on `b`

. All those accumulated contributions
are now added to the diagonal, and the new updated diagonal is
transferred to `d`

. Then vector `z`

is cleared and
made ready for the new sweep.

You will see that inside the sweep loop the diagonal, which is stored
on `d`

is updated all the time too, in a similar way to `z`

, so, in principle,
`d`

should contain the same numbers as `b(:) + z(:)`

, why
then do `d(:) = b(:)`

? It is here, I think, that the designers of
the code tried to minimize rounding errors that would accumulate
on `d(:)`

in the process.

If for some reason we've done 50 iterations and the process still hasn't converged, we exit the routine with an error message.

Now let us have a look at what's inside the double loop that sweeps
through the upper triangular of
**A**.

We begin by testing how large is the element *A*_{pq}, which is
to be annihilated. So first we simply take
,
and then we execute this elaborate `IF`

statement:

g=100.0_sp*ABS(a(ip,iq)) IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) & .AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN a(ip,iq)=0.0 ELSE IF (ABS(a(ip,iq)) > tresh) THEN blah... blah... blah... END IFThe first clause of

`IF`

test for the following condition:
- the sweep number is greater than 4,
*and* -
,
*and* - .

Otherwise, i.e., if the off-diagonal element needs to be rotated
away, then check if the element *A*_{pq} is greater than
the threashold we have evaluated earlier.

That threashold was different from zero only for the first three sweeps, and its value was a kind of an average value for the upper triangular: . So it is here that we implement Jacobi rotation for the off-diagonal elements that stand out during the first three sweeps.

By now we've done enough checking, and enough avoiding, and finally we have to get down to work and perform the actual rotation.

The first step is to evaluate

where the approximation holds for very large , and where

Remember that the diagonal terms

`d`

:h = d(iq)-d(ip) IF(ABS(h)+g == ABS(h)) THEN t = a(ip,iq)/h ELSE theta = 0.5_sp*h/a(ip,iq) t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2)) IF(theta < 0.0) t = -t END IF

Now, once we have *t*, we evaluate *c*, *s*, and :

c = 1.0_sp / SQRT(1+t**2) s = t*c tau = s/(1.0_sp + c)

The next step is to rotate the diagonal elements, because
they are easy to do:

h = t*a(ip, iq) z(ip) = z(ip) - h z(iq) = z(iq) + h d(ip) = d(ip) - h d(iq) = d(iq) + hObserve that it is here that we collect the total change to

`z`

, whereas the diagonal terms themselves are updated
for every sweep on `d`

.
At this stage we won't need *a*_{pq} any more, because it is no
longer used explicitly in rotating *a*_{rp} and *a*_{rq}, so we
annihilate it:

a(ip, iq) = 0.0

Finally we rotate *a*_{rp} and *a*_{rq}:

by calling an auxiliary subroutine

`jrotate`

:CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq)) CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq)) CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n))These three call correspond to three subsets of

The first call rotates the two vertical
lines: one that stretches from (*p*,*q*) upwards, and the other
one, parallel to it and slightly to the left, that stretches
from the diagonal upwards. Both are of equal length.

The second call rotates the two perpendicular lines between the point
(*p*,*q*) and the diagonal: one is horizontal and the other is vertical.
Again, they are of equal length.

Finall, the third call rotates the two horizontal lines: one that
stretches from (*p*,*q*) to the right boundary, and the other one slightly
below, that stretches from the right boundary to the diagonal. Both
are of equal length.

Having rotated *a*_{rp} and *a*_{rq} we rotate the matrix of
eigenvectors `v`

. Remember that they are rotated in a much
the same way as *a*_{rp} and *a*_{rq}, i.e.,

This is implemented simply by calling:

CALL jrotate(v(:,ip), v(:,iq))

After we have rotated matrices
**A** and
**V**,
we increment the Jacobi rotation counter `nrot`

:

nrot=nrot+1

The auxiliary subroutine `jrotate`

implements the equations:

Observe one small complication. Because the old

`jrotate`

performs the computation
in parallel as follows:wk1(:) = a1(:) a1(:) = a1(:) - s*(a2(:) + a1(:) * tau) a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau)

Observe that the parallelism is both in the horizontal and in the
vertical direction of matrix
**A**. There is a fair amount
of communication involved too. This algorithm will run well on a vector
processor or on an SMP, but not so well on an MPP.

In any case, the code presented in this section shows all parallel operations explicitly and clearly. It is now up to the compiler to make use of it.

Unfortunately, there are no machines on the market at present that can operate as effectively on stridden data as they can work with adjacent data, i.e., with non-stridden vectors. For this computer architectures would have to undergo a dramatic modifications: the memory would have to become multidimensional and the processor architecture would have to become multidimensional too.