Page 1 of 1
NROWMAX
Posted: Wed Sep 24, 2008 2:40 am
by naromero
Hi,
I have Qbox installed on the BG/P at ANL. What is the physical significance of NROWMAX and is there a way to determine
its optimum value apriori for a given system and partition dimension? (e.g. does it correspond to blocks of states)
Thanks,
Nichols A. Romero
Re: NROWMAX
Posted: Wed Sep 24, 2008 3:58 am
by fgygi
The nrowmax variable is used to determine the shape of the rectangular process grid used by Qbox. This process grid is the one used by the Scalapack library. When Qbox starts, the ntasks MPI tasks are assigned to processes that are arranged in a rectangular array of dimensions nprow * npcol. The default value of nrowmax is 32. The plane-wave basis is divided among nprow blocks, and the electronic states are divided among npcol blocks.
The following algorithm is used by Qbox to determine the values of nprow and npcol:
1) The number of rows nprow is first set to nrowmax.
2) The value of nprow is then decremented until ntasks%nprow==0, i.e. nprow divides the total number of tqasks.
3) The value of npcol is then given by ntasks/nprow.
This looks quite cryptic, but what this algorithm tries to achieve is actually quite simple: try to define a process grid of dimensions nrowmax*npcol, where npcol=ntasks/nrowmax. This is not always possible in particular if ntasks%nrowmax != 0. This is why the second part of the algorithm decrements nprow until ntasks%nprow==0.
Note that with this algorithm, the value of nprow is never larger than nrowmax (hence the name).
This algorithm is implemented in Wavefunction::create_contexts() in file Wavefunction.C
Examples:
ntasks=128, nrowmax=32 (default) => process grid 32 x 4
ntasks=48, nrowmax=32 (default) => process grid 24 x 2
ntasks=256, nrowmax=64 => process grid 64 x 4
The shape of the process grid affects performance. In general, it is advantageous to have nprow as large as possible, but not larger than the size of the (fine) FFT grid in the z direction. For example, if the fine FFT grid (printed as np0v,np1v,np2v on output) is 110 x 110 x 110, the value of nrowmax should be 110. Note that other values of nrowmax also work, but performance is usually inferior. For example, one could use nrowmax=128 even if the grid is 110x110x110, but some of the processes will not be used optimally during FFTs.
Choosing the value of nrowmax is usually a trial and error process. Before running long simulations, it is good to run a few test jobs with different values of nrowmax and pick the value that gives best performance.