Qbox hangs when running more than 8 mpi processes.

Questions and discussions regarding the use of Qbox
Forum rules
You must be a registered user to post in this forum. Registered users may also post new topics if they consider that their subject does not correspond to any topic already present on the forum.
jlow
Posts: 5
Joined: Thu Jan 31, 2013 5:19 pm

Qbox hangs when running more than 8 mpi processes.

Post by jlow »

Qbox Wizards,

I have built Qbox-1.56.2 on a Sandy-Bridge Cluster with a Infiniband interconnect. I used the Intel compilers (13.1), mkl(11.0), fftw-2.1.5, mvapich2(1.9) and xerces-2.8. My makefile is included in the attached zip file.

I was able to run all of the tests provided with the software on more than 8 mpi processes.

However, when I try to run my case on more than 8 mpi processes, Qbox hangs while reading the pseudopotential files. Qbox runs fine on 8 mpi processes.

The input for my case is in the attached zip file. Any help in resolving this issue would be appreciated.

Thanks,

John J. Low
Math and Computer Science
Argonne National Laboratory
Argonne, Illinois
Attachments
blues_icc.zip
Input file which cause qbox to hang and makefile used to build qbox are in this zip file.
(1.17 KiB) Downloaded 616 times
fgygi
Site Admin
Posts: 151
Joined: Tue Jun 17, 2008 7:03 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by fgygi »

I seems that the input file "cristobolite.i" has CRLF line terminators. This is likely to confuse the Qbox line interpreter, which expects Unix ASCII text. (this may be just the result of cutting and pasting on a non-Unix machine though).
Also it seems that the name of the Na potential file has a typo in it.

After fixing these errors, I was able to run that script with 8 MPI tasks. It uses about 245 MB per task.
I could also run it on 16 tasks on an AMD cluster with Infiniband (4 tasks/node). The input and output files are attached (4 iterations only).

Could you attach the output file up to and including the point where it hangs?
Attachments
gs1.tgz
Unix gzipped tar file containing input file gs1.i and output file gs1.r
(3.64 KiB) Downloaded 610 times
jlow
Posts: 5
Joined: Thu Jan 31, 2013 5:19 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by jlow »

fgygi,

Something must have gotten corrupted between the server, my windows desktop and the Qbox list. I don't see any carriage returns or line feeds in my input files on the server.

My original input data is the essentially same as yours. I have an comment card in my input which is missing in your input.

I get the same error with your input when I run on more 8 MPI processes. This case runs on less than 8 MPI processes.

I have attached all the files from a test with the input attached in your previous post.

Thanks for you help.

John
Attachments
fgygi_test.tar.gz
This contains the input and output files generated by fgygi's input.
(4.34 KiB) Downloaded 640 times
fgygi
Site Admin
Posts: 151
Joined: Tue Jun 17, 2008 7:03 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by fgygi »

I see that all 16 tasks are running on the same node in your test. What is the memory available on that node? It could be a problem with this calculation since it uses a large plane wave cutoff. However I would expect that this might cause a problem later in the execution, not when defining the species.

It appears that the hang occurs where Qbox uses the Xerces XML parser to read the species file. I don't see though how this could not work on more than 8 task and work properly on less than 8 tasks.

Could you attach the output you get in the case where it works (with 8 tasks)?
jlow
Posts: 5
Joined: Thu Jan 31, 2013 5:19 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by jlow »

Fgygi,

Each node has 16 cores (two eight core sandy-bridge processors) and 62 gigabytes of memory.

Are you suggesting I try to run eight cores per node on more than one node?

I have attached the output from a run which completed on 8 cores on one node.

John
Attachments
test.log.gz
gzipped output from a successful run on 8 cores.
(502 Bytes) Downloaded 606 times
fgygi
Site Admin
Posts: 151
Joined: Tue Jun 17, 2008 7:03 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by fgygi »

It seems that the attached file contains the output of the unsuccessful test on 16 cores.

Regarding memory usage, 62 GB is more than enough for this run (by a lot!).
jlow
Posts: 5
Joined: Thu Jan 31, 2013 5:19 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by jlow »

Fgygi,

The attached file contains the output from a successful 8 core run.

I did not include the huge "sample" xml file because it takes too long to upload.

John
Attachments
8proc.tar.gz
(4.96 KiB) Downloaded 604 times
fgygi
Site Admin
Posts: 151
Joined: Tue Jun 17, 2008 7:03 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by fgygi »

John,
Thanks. I looked at the output and I can't see anything wrong with it. At this point I can only think that there could be a problem with the way Qbox was compiled. I enclose a makefile for my cluster on which I built with Intel icc, and used the MKL libraries, in case this could help identify a problem.

Francois

Code: Select all

#-------------------------------------------------------------------------------
#
#  pencil.mk
#
#-------------------------------------------------------------------------------
#
 PLT=x86_64
#-------------------------------------------------------------------------------
 MPIDIR=/usr/mpi/qlogic
 XERCESCDIR=$(HOME)/software/xerces/xerces-c-src_2_8_0
 PLTOBJECTS = readTSC.o

 CXX=icc
 LD=$(CXX)

 PLTFLAGS += -DIA32 -DUSE_FFTW -D_LARGEFILE_SOURCE \
             -D_FILE_OFFSET_BITS=64 -DUSE_MPI -DSCALAPACK -DADD_ \
             -DAPP_NO_THREADS -DXML_USE_NO_THREADS -DUSE_XERCES

 FFTWDIR=$(HOME)/software/fftw/x86_64/fftw-2.1.5/fftw

 INCLUDE = -I$(MPIDIR)/include -I$(FFTWDIR) -I$(XERCESCDIR)/include

 CXXFLAGS=  -g -O3 -vec-report1 -D$(PLT) $(INCLUDE) $(PLTFLAGS) $(DFLAGS)

 LIBPATH = -L$(MPIDIR)/lib64 -L$(FFTWDIR)/.libs -L$(XERCESCDIR)/lib

 LIBS =  $(PLIBS) \
         -lmkl_intel_lp64 \
         -lmkl_lapack95_lp64 -lmkl_sequential -lmkl_core \
         -lirc -lifcore -lsvml \
         -lmpich -lfftw -luuid $(XERCESCDIR)/lib/libxerces-c.a -lpthread

# Parallel libraries
 PLIBS = -lmkl_scalapack_lp64 -lmkl_blacs_lp64

 LDFLAGS = $(LIBPATH) $(LIBS)
#-------------------------------------------------------------------------------
jlow
Posts: 5
Joined: Thu Jan 31, 2013 5:19 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by jlow »

Francois,

I get the same behavior when I use your .mk file and my makefile. Qbox will run to completion for 8 or less processors.

On more than eight processors Qbox appears to be in a infinite loop while creating the first species and runs (generating no output) until I stop it with a <ctrl-c>.

The energies for this test (on 8 cores) computed on my server are different than yours.

Could you tell me which version of the intel compilers and mkl you are using?

I have attached results for my cristobalite test from qbox built with pencil.mk (your makefile).

John
Attachments
log.tar.gz
(3.62 KiB) Downloaded 591 times
fgygi
Site Admin
Posts: 151
Joined: Tue Jun 17, 2008 7:03 pm

Re: Qbox hangs when running more than 8 mpi processes.

Post by fgygi »

John,

The results in file gs1.r were obtained using 16 MPI tasks. The energies differ from your results obtained on 8 MPI tasks because the random initialization of the wave functions depends on the number of tasks, and after only 4 iterations the energy is far from converged. I have rerun the same input on 8 MPI tasks and got the exact same energies as in your run 8proc.log (see attached file gs5.tar). Of course, all energies, when converged to the ground state, are independent of the number of tasks.

As a side comment, I note that this test uses PBE pseudopotentials but the input file does not specify the xc functional, which is therefore by default LDA. In order to get consistent physical quantities, make sure to add "set xc PBE" to the input file. Conversely, if you want to use LDA, you should use the LDA version of the pseudopotentials, and use the default xc value (LDA).

I am wondering about the possibility that there is a problem with your MPI setup. Which flavor of MPI do you use? Is there a file defining the nodes on which the program can run (i.e. "machinefile"), and possibly where a maximum number of tasks is defined?

I use icc 12.1.3 and MKL 10.3 update 9.

Francois
Attachments
gs5.tar
output using 8 MPI tasks
(17.5 KiB) Downloaded 620 times
Post Reply