Page 1 of 1

Qbox for effective polarizabilities of large systems

Posted: Fri Sep 27, 2024 1:59 am
by frankhu
Hello,

I am currently trying to use Qbox to compute the effective molecular polarizability for individual water molecules, this time in a periodic system of 1024 water molecules. Essentially, this boils down to computing the MLWF centers around each water molecule and taking a finite difference of the dipoles over the electric field perturbation, as per my previous issue: viewtopic.php?t=306&sid=2f7d50243097783 ... 5644765d51

I am currently applying my workflow on DOE NERSC Perlmutter, but it is taking a very long time for a single frame (over 19 hours and still running) for a periodic system of 1024 water molecules. Below is my job script that I am using to run the calculation:

Code: Select all

#!/bin/bash -l
#SBATCH --time=24:00:00
#SBATCH --constraint cpu
#SBATCH --qos=regular
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --tasks-per-node=128
#SBATCH --cpus-per-task=2
#SBATCH -A m4026
#SBATCH -J Qbox_W1024

export OMP_NUM_THREADS=1
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# default job name is job script file name
# To change job name, submit with: sbatch -n 64 -J jobname file.job
module load PrgEnv-gnu cray-fftw
PROJECTDIR=/global/cfs/projectdirs/qbox
exe=$PROJECTDIR/bin/qbox-1.76.3_prl
export XERCES_C_DIR=$PROJECTDIR/software/xerces/xerces-c-3.1.4_gnu
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$XERCES_C_DIR/src/.libs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PROJECTDIR/software/AMD/amd-libflame/lib/LP64:$PROJECTDIR/software/AMD/amd-blis/lib/LP64

export QBOX_OPTS='-nstb 2'


#cd /pscratch/sd/f/frankhu/qbox_calcs/082624_REDO_monomer_tst/qbox_inp_W64_dset
#echo $PWD
#infile=inp_0.i
#outfile=inp_0.r
#srun --cpu_bind=cores $exe $QBOX_OPTS $infile > $outfile
#
#
#
#
#cd /pscratch/sd/f/frankhu/qbox_calcs/082624_REDO_monomer_tst/qbox_inp_RPBE
#echo $PWD
#infile=inp_0.i
#outfile=inp_0.r
#srun --cpu_bind=cores $exe $QBOX_OPTS $infile > $outfile


cd /pscratch/sd/f/frankhu/qbox_calcs/092524_W1024_test_frame/W1024_single_frame_test
echo $PWD
infile=inp_0.i
outfile=inp_0.r
srun --cpu_bind=cores $exe $QBOX_OPTS $infile > $outfile
This job requests 128 MPI tasks with a physical core per task, essentially consuming an entire perlmutter CPU node. Looking at the output file, the value of np2v is 360, so I have set the value of nstb to 2 for now.

Besides running timing tests to try optimizing the value of the nstb parameter, are there any other suggestions for accelerating the calculation? Ideally I would like to use Qbox to compute the MLWFs (and consequently the polarizabilities) of many such frames, not just a single one.

Any help would be greatly appreciated, and thank you in advance!

Re: Qbox for effective polarizabilities of large systems

Posted: Sat Sep 28, 2024 2:05 am
by fgygi
Hello,
1024 water molecules is a large system. This is definitely a case where you want to use more than one Perlmutter node. I am trying some tests to figure out what the best parameters would be (e.g. -nstb).
Having said that, I am curious about the reasons for using such a large sample. Did you consider computing the polarizability in cells of increasing size, starting with a smaller cell of e.g. 128 molecules? It should also be noted that the 1024-molecule cell should be coming from a well equilibrated simulation (possibly classical MD using TIP5P or similar) so that the distribution of hydrogen bonds and average number of hydrogen bonds are representative of equilibrium.
Would you be able to paste your input file here? (without atomic positions of course).
Best,
Francois

Re: Qbox for effective polarizabilities of large systems

Posted: Sat Sep 28, 2024 9:21 pm
by frankhu
Hi Francois,

Thanks as always for the prompt response!

My input file for the W1024 calculation is as follows. It is the same as my input for the W64 system, just with a different number of atoms and a bigger periodic box (31.33 Angstroms, so 59.21 Bohr):

Code: Select all

# Frame 0 polarizability
species oxygen   O_ONCV_PBE-1.2.xml
species hydrogen H_ONCV_PBE-1.2.xml
atom O1    oxygen      11.136586  51.988171  44.827309
atom H1    hydrogen    12.557728  51.325756  45.819710
atom H2    hydrogen    11.549368  51.835365  42.991081
...
# More atom lines for all the other water molecules #
set cell 59.205243 0 0 0 59.205243 0 0 0 59.205243
set ecut 85.000000
set wf_dyn PSDA
set xc PBE
set scf_tol 1.e-8
randomize_wf 0.01
run 0 40 5
set polarization MLWF_REF
set e_field   0.000100   0.000000   0.000000
run 0 40 5
set e_field  -0.000100   0.000000   0.000000
run 0 40 5
set e_field   0.000000   0.000100   0.000000
run 0 40 5
set e_field   0.000000  -0.000100   0.000000
run 0 40 5
set e_field   0.000000   0.000000   0.000100
run 0 40 5
set e_field   0.000000   0.000000  -0.000100
run 0 40 5
The reason that I am using such a large system is because this step is part of a larger workflow where we want to be able to model a relatively novel form of spectroscopy for isotropic condensed phase systems which involves optically inducing anisotropy in the system through a Raman pulse and tracking the decay of the anisotropy through time-resolved X-ray scattering. The paper detailing the theoretical approach for modeling this spectroscopy can be found here: https://journals.aps.org/prl/abstract/1 ... 129.056001. Rather than computing and using the box polarizability of the entire system to compute the correlation function that leads to the signal, I would instead like to look at simulating the signal using effective molecular polarizabilities, arriving at a more localized description. This is where Qbox comes in, as it is a way of getting the effective molecular polarizabilities of all the molecules in a system.

As you correctly surmised, this frame of W1024 molecules comes from a classical MD simulation using a neural network potential parameterized using DFT-revPBE. The reason we need such a large size is because the reciprocal space resolution is dependent on the box length, and we would like a resolution of around 2pi/30 Angstroms^{-1}. Of course, we will not be able to run Qbox on every frame of our trajectories, as we have ~8 ns of trajectory with frames 2 fs apart. We would first compute the effective molecular polarizabilities for each molecule in a smaller subset of frames using Qbox and then use those calculations to parameterize another neural network for computing the final correlation functions required for the signal.

Any help w.r.t finding the optimal resource configuration on NERSC for a calculation of this scale would be greatly appreciated.

Thank you very much!

Re: Qbox for effective polarizabilities of large systems

Posted: Mon Sep 30, 2024 4:19 am
by fgygi
Hi Frank,

I ran a test on 16 Perlmutter nodes with h2o1024, computing the polarizability tensor (which provides all MLWF centers). This seemed to run ok, but ran out of time in 12 hours (completed ~ 5/6 of the calculation). I am running it again, and expect it should complete in 14-15 hrs. I have put the incomplete result on perlmutter in /global/cfs/cdirs/qbox/share/h2opol/h2o1024 (file names pol1.sh pol1.i pol1.r ). I will copy the full run there when it completes. A test calculation on h2o256 can also be found there.

The question arises as to how accurate you want the MLWF positions to be, especially if you are using the center positions in finite difference expressions to compute the molecular polarizabilities. I assume from your previous posts that you have already tested that. Using polarization=MLWF may be sufficient, but MLWF_REF would be more accurate. If you are only interested in relative values of the molecular polarizability, MLWF is probably ok. If you want more accurate absolute values (which may be inaccurate anyway because of PBE) you may have to use MLWF_REF. Using MLWF_REF is significantly more costly than MLWF. You may also want to test the robustness of the results w.r.t. the value of the E field amplitude.

I will read the Montoya-Castillo PRL to understand better the context of the calculation (and will likely have some questions :-) )

Best,
Francois

Re: Qbox for effective polarizabilities of large systems

Posted: Tue Oct 01, 2024 3:42 pm
by fgygi
Hi Frank,
I uploaded the full calculation of the polarizability for the h2o1024 sample (file pol2.r) to /global/cfs/cdirs/qbox/share/h2opol/h2o1024 . It took 12.5 hours to complete, so repeating it for multiple frames will have a significant cost.

Regarding the initial ground state calculation, I found that the cost can be reduced somewhat by using a feature of Qbox that allows for starting a calculation with a low cutoff and increasing the cutoff later to get full accuracy. An example is the files gs3.i / gs3.r .

If you plan to run this calculation for multiple configurations, you can combine the ground state and polarizability calculations in a single run, so you don't have to store the ground state restart file.

Note that the sample used in pol2.r is not representative of a well equilibrated water simulation. However, a quick look at the histogram of molecular polarizabilities along x, y and z (computed using changes in the x, y and z coordinates of the MLWF centers) shows a two-peaked distribution, which I believe is what should be expected.

Reading the PRL paper, I wonder if you could recommend a more detailed review paper explaining the ISRS process, in particular how he initial impulses create the anisotropy.
Best,
Francois

Re: Qbox for effective polarizabilities of large systems

Posted: Tue Oct 01, 2024 9:55 pm
by frankhu
Hi Francois,

Thanks for taking the time to run the calculation! That timing estimate is very helpful.

In terms of review articles for the theory behind INXS, I think that the best overview is actually given in the supplemental information of that paper which walks through the derivation of the theory. Regarding how the initial pulses create the anisotropy, section B of the supplemental goes into detail about how the Raman interaction is modeled.

The theory behind this experiment is relatively new, so I am not sure there is a better review for the physical process behind this spectroscopy besides the supplemental information and the sources contained within it.

Thanks as always for your help and guidance!