Page 2 of 3

Re: qbox core dumps in MPI on BG/Q

Posted: Sat Feb 23, 2013 5:36 pm
by naromero
Francois,

I submitted the original test case for efix 9. In that test case, communicators where being created but never destroyed. When the communicators were exhausted, there was a hang but no warning message. The efix is the warning message.

Please let me know if the Intrepid data agrees. I also can find out from the MPICH team if there communicator limit was different on BG/P vs. BG/Q.

Re: qbox core dumps in MPI on BG/Q

Posted: Sat Feb 23, 2013 8:27 pm
by fgygi
I ran a few tests on Vesta using Qbox 1.56.2 on a 512-water problem.
When using a 1024-node partition in c16 mode (16k tasks), the problem of running out of communicators only occurs if nrowmax <= 64. When using nrowmax >= 128, the program runs normally.

With 16k tasks, the low value of nrowmax (default=32) results in a large number of process columns, and a large number of creations and deletions of communicators in the SlaterDet constructor (see my comment in previous post on this topic). This apparently hits the limit of communicators. I will continue to investigate whether communicators are appropriately deleted in Context.C.

This should not be a problem in actual applications, since small values of nrowmax lead to poor performance in most Scalapack functions. I would recommend nrowmax=256 for this problem.

Another solution is to use multiple threads (mode=c8, OMP_NUM_THREADS=2, or mode=c4, OMP_NUM_THREADS=4). It even runs faster :)

Re: qbox core dumps in MPI on BG/Q

Posted: Sun Feb 24, 2013 9:04 pm
by naromero
Francois,

The problem also occurs on Intrped, see
/intrepid-fs0/users/naromero/persistent/qbox/H2O-2048/gs_pbe_4096_vn_ZYXT

I wonder if this issue is related to these other bugs:
http://fpmd.ucdavis.edu/bugs/show_bug.cgi?id=9
http://fpmd.ucdavis.edu/bugs/show_bug.cgi?id=12
http://fpmd.ucdavis.edu/bugs/show_bug.cgi?id=14
http://fpmd.ucdavis.edu/bugs/show_bug.cgi?id=19

Let me know if there is any other way that I can assist.

Re: qbox core dumps in MPI on BG/Q

Posted: Sat Mar 02, 2013 1:54 pm
by naromero
Hi Francois,

Just checking in to see if you had luck tracking this issue down. It seems that a judicious choice of NROWMAX can circumvent the problem, but this may be problematic on BG/Q since the communicators consume more memory than a plain vanilla MPI implementation.

Re: qbox core dumps in MPI on BG/Q

Posted: Sun Mar 03, 2013 12:52 am
by fgygi
A few runs using gdb on small problems show that (in all cases tested) Qbox releases every MPI communicator it created by appropriately calling MPI_Comm_free(). It seems though that the MPI implementation I used (mpich 1.2.7p1) does not fully recycle communicator handles, but cycles through a few values (although it apparently releases the underlying resources). This can be seen by printing the value of the handle during a repeated cycle of MPI_Comm_create/MPI_Comm_free calls. I suppose that depending on the implementation, this behavior may vary. If there is a maximum value of the handle, this could lead to the problem we see on BG/Q. I am not sure what else can be done to release the resources of the communicator than calling MPI_Comm_free. I will try to run the same test on BG/P and BG/Q.

Note also that tracking the allocation of MPI communicators with mpiP is not reliable since it only allows to record the number of calls to MPI_Comm_create and MPI_Comm_free. However, MPI_Comm_create is often called in a situation where it returns MPI_COMM_NULL (for example when the task calling it is not part of the MPI_Group defining the new communicator). These calls must obviously not be matched by corresponding calls to MPI_Comm_free since no MPI_Comm was allocated. Therefore, counting the calls to MPI_Comm_create and MPI_Comm_free is not a reliable way to track possible leaks of MPI communicators.

Re: qbox core dumps in MPI on BG/Q

Posted: Mon Mar 04, 2013 7:34 pm
by naromero
Francois,

Thanks for the analysis. I will pass this information along to the MPICH developer's. The version of MPICH used on Blue Gene/Q is MPICH2 1.5. If you can reproduce this problem either on top of ScaLAPACK or better yet, just pure MPI, it would make it easier for the MPICH developer to create a bug gix.

I tried creating a pure MPI reduce test case, but did not manage to succeed. Most likely because it did not exercise MPI in the right way.

Re: qbox core dumps in MPI on BG/Q

Posted: Thu Mar 14, 2013 12:53 am
by naromero
Francois,

I briefly mentioned this issue to an MPICH developer and they basically said that the version of MPICH that you mention is too old to draw any conclusion. I have been quite busy lately due to a workshop and other things, but I will get Qbox running on my Ubuntu Linux desktop and try to dig deeper.

Bests,
Nichols A. Romero

Re: qbox core dumps in MPI on BG/Q

Posted: Thu Mar 21, 2013 9:42 pm
by naromero
Francois,

I had a meeting with an MPICH developer and they basically said they were two scenarios that "too many communicator" error:
1. Application is not calling MPI_Comm_free when it should, i.e. not on MPI_COMM_NULL
2. Context exhaustion

Looking over mpiP data. I notice that there were 6 calls to MPI_Comm_split and 6 calls to MPI_Comm_free. Does that sound like the right number of communicators? Looking at the blacs_gridmap call, if MPI_Comm_create return MPI_COMM_NULL, then blacs_gridmap immediate returns. Otherwise, it calls the MPI_comm_dup, and two calls to MPI_Comm_split.

So right now, it looks like the issue is not in Qbox, but in MPICH. We will need to run a bunch more tests to debug it, but I will keep you posted.

Re: qbox core dumps in MPI on BG/Q

Posted: Fri Mar 29, 2013 6:14 pm
by fgygi
In a possibly related issue, it was found that there is a bug in mvapich2 and that it causes BLACS to fail in one of its test programs.
See this related topic.

Re: qbox core dumps in MPI on BG/Q

Posted: Thu Apr 04, 2013 8:33 pm
by naromero
Thanks for the info. Possibly related, unfortunately they don't post more details. I will attempt to find more details. In the mean time, an MPICH developer is working on a tool that will help us debug this further. It is similar to mpiP except that it will be able to distinguish between calls that return MPI_COMM_NULL vs. those that return real communicators.

Thanks,