iterative solution methods

naromero · Post by **naromero** » Mon Oct 13, 2008 9:10 pm

Hi,

I am trying to understand how QBox achieves its massive parallelization and small
memory footprint. I have already read the IBM Qbox architecture paper, but there
are some technical details that I still do not understand.

There are two ways to solve the KS problem:
1. Minimize KS energy functional by either:
A. Molecular Dynamics (i.e., Car-Parrinello)
B. Direct Minimization (e.g. steepest descent, conjugate gradient, DIIS)
2. Solve KS eqns. by Iterative diagonalization + Mixing

Qbox does 1. A. and 1. B. (particularly with steepest descent). Although conceptually
it is simple to distribute Psi_nk, the method for solving the KS problem must put
some restrictions on how Psi_nk should be distributed. In particular, the orthogonalization
would make band parallelization undesirable due to additional communication.

Is there an advantage w.r.t. parallelization for methods in category 1. B. over those in 2. ?
It seems to me that the emphasis in the former is solving a large linear system while
the emphasis in the latter is solving a large eigenvalue problem.

Nick Romero

Post by **fgygi** » Thu Oct 16, 2008 5:35 am

Nick,
I agree with your analysis of the various approaches to solving the KS equations. Unfortunately there is no clear winner among the methods 1A, 1B and 2. The best approach depends on the system you simulate. For example, if you want to simulate an insulator (or more precisely a system that has a finite KS HOMO-LUMO gap), then 1A or 1B can be quite efficient. If the system is metallic (or you are not sure whether it can become metallic during the simulation), then it is usually safer to adopt method 2, in which the solutions are computed iteratively using several iterations before the charge density is updated.
Qbox tries to provide the three approaches:

1A (Car-Parrinello): use wf_dyn = MD, atoms_dyn = MD, run <niter>:
this runs <niter> steps of CP dynamics. Wavefunctions, ionic positions and density are updated at every step. This assumes that you start from a situation where wavefunctions are in the ground state. To compute the initial ground state, you can use method 1B or 2. Note that wf_dyn=MD cannot be used to compute the initial ground state since the wavefunctions would oscillate around the ground state forever.

1B use wf_dyn = SD, or PSD, or PSDA, run 0 <nitscf>
This runs <nitscf> self-consistent iterations in which the wavefunctions and the density are updated

2 use wf_dyn = SD, PSD or PSDA, run 0 <nitscf> <nite>
This runs <nitscf> self-consistent iterations, in which the wavefunctions are updated <nite> times before the density is updated.
In this case, the density is updated by mixing with the previous density, according to the values of the variables charge_mix_coeff and charge_mix_rcut.

Regarding the data distribution in Qbox, the Fourier coefficients of the wavefunctions are c(n,G) where n is the band index and G is a reciprocal lattice vector. The coefficients c(n,G) form a matrix which is block distributed among processors, following the approach of the Scalapack library for parallel linear algebra operations. Thus the coefficients are distributed in both the n and G indices. Operations such as orthogonalization do indeed involve communication, but Qbox takes advantage of the Scalapack library to perform these efficiently. Irrespective of whether we follow 1A, 1B or 2, the basic ingredient of the KS solver is the calculation of H*psi given psi. This operation involves the application of the Laplacian on psi (which is efficient because the Laplacian is diagonal in G-space), and the multiplication of psi by the potential: psi -> v(r)*psi, which is efficient because Fourier transforms of psi only involve communications between subsets of processors.
Hope this helps.
The Qbox data layout is also described in: Journal of Physics: Conference Series 46 (2006) 268–277, doi:10.1088/1742-6596/46/1/038
(to get the paper, just search "doi:10.1088/1742-6596/46/1/038" in google scholar.

FG

naromero · Post by **naromero** » Wed Oct 29, 2008 4:13 am

FG,

Thank you for your detailed explanation and the reference to your paper; they were quite helpful.
I have two follow up questions.

1. Iterative diagonalization
Let us consider methods in category 2 only for simplicity. Let us consider a fixed Hamiltonian (H) as well. There are a number of methods
available for solving H*psi=e*psi. They include:
a. Davidson
b. CG
c. DIIS
and their variants. In particular, the Krylov subspace methods can vary in complexity but often include subspace rotations, subspace
diagonalizations, and orthogonalization steps.

Qbox primarily uses steepest descent (SD). From my understanding, the one drawback behind SD is that it never exactly reaches the ground
state since (k) -> infinity is required. However, it would seem to me that its simplicity is its greatest advantage. For example, one does not
need to store psi^(k) for more than just the previous step. (in sharp contrast to the Krylov subspace methods).

Is this the main advantage of SD over the Krylov subspace methods as far as scalability is concerned?

2. For PAW (and USPP), the eigenvalue equation is now
H*psi=e*S*psi

The overlap matrix S depends on the PAW projector functions which are centered on the nuclei (R_I). Does this prove to be a major
algorithmic bottleneck for implementation in QBox? The H*psi products never explicitly construct H (only the products are constructed),
but this is not so for S*psi. I would think S would have to be explictly constructed, though it would be quite sparse. Living in
an small box around each atom.

Nick Romero

Post by **fgygi** » Thu Nov 13, 2008 4:18 am

Nick,
The SD iteration is indeed very simple to implement and uses only one copy of the wavefunction. However, it can be very slow, especially in systems that have a small HOMO-LUMO gap, or no gap at all. In those cases, the number of iterations needed to converge the ground state energy is prohibitive. If the gap is finite, the PSD (preconditioned SD) and PSDA (preconditioned SD with Anderson acceleration) are much faster. For metals, PSD works ok (with empty states, a finite Fermi temperature, and multiple electronic iteration at each SCF iteration) but it is still not great. If you consider the convergence of the non self-consistent iteration (i.e. with density fixed), the Krylov methods are generally more efficient, but as you point out, they can require considerably more memory to implement. An interesting option is the LOBPCG method of Knyazev (locally optimal block preconditioned conjugate gradient method). It's described in SIAM J. Sci. Comput. 23, 517 (2001).

For PAW, you would have to solve the linear system Sx=y to compute corrections to the wavefunctions. This can also be done iteratively presumably.
We have not considered the implementation of PAW in Qbox so far, mostly because of the added complexity. Maybe in the future (?).
Francois

naromero · Post by **naromero** » Tue Jan 13, 2009 4:11 am

Francois,

Thank you for elaborating on the limitations of the SD methods in QBox.

I am trying to understand why there is a prohibitive number of SD iterations for small gap (or gap less) systems.

I can think of some culprits:
1. Mathematical: For some of these iterative eigensolvers, the number of iterations to converge the HOMO & LUMO is ~ 1/sqrt(E_HOMO-E_LUMO). Approaching infinity for gap less systems.

2. Metallicity: For a finite electronic temperature, the Fermi occupation number should be incorporated in the minimization procedure on equal footing with the other dynamical variables. See Marzari et al. PRL 79, 1337
(1997).

3. Decay properties of the density matrix (i.e. near-sightedness or lack thereof). (Least likely, but I thought I would mention it anyway).

Nick Romero

Qbox List

iterative solution methods

iterative solution methods

Re: iterative solution methods

Re: iterative solution methods

Re: iterative solution methods

Re: iterative solution methods