Orthogonalization

Questions and discussions regarding the use of Qbox
Forum rules
You must be a registered user to post in this forum. Registered users may also post new topics if they consider that their subject does not correspond to any topic already present on the forum.
Post Reply
naromero
Posts: 32
Joined: Sun Jun 22, 2008 2:56 pm

Orthogonalization

Post by naromero »

Francois,

Is the orthogonalization step the least scalable part of Qbox?

My understanding is that for most (all ?) calculations there will be parallelization of bands and PWs simultaneously.
While the H*Psi products do not require communication between bands (and obviously PWs), this is not the case for
the orthogonalization step. Looking at the source code, this appears to be accomplished by a Cholesky decomposition.

Say you have N = 4000 bands, the overlap matrix (S_mn) is then a 4000-by-4000 matrix. The construction of this matrix is fairly parallelizable even though it requires communication between bands. However, my guess is that the subsequent Cholesky decomposition could at best be done with

nprocs = (m/mb)*(n/nb)

where
m = number of rows = 4000
n = number of columns = 4000
mb = Scalapack row block size ~ 32 - 64
nb = Scalapack column block size ~ 32 - 64

Or is there a more efficient way to do this?

Bests,
Nick Romero
Post Reply