Page 1 of 1

Infficient use of FFTW

Posted: Mon Oct 06, 2014 8:28 am
by espetrov
Dear Qbox Developers,

Qbox seems to use the FFTW2/FFTW3 implementations inefficiently.
For example, the loop at line 884 (FourierTransform.C; Qbox 1.60.4) is less efficient than the following call.
How do I submit a patch for review?

Thank you.
Evgueni.

Code: Select all

      fftw_threads(nthreads, bwplan1,np0_,(FFTW_COMPLEX*)&val[ibase],np0_,one,
                     (FFTW_COMPLEX*)0,0,0);

Re: Infficient use of FFTW

Posted: Tue Oct 07, 2014 9:16 pm
by fgygi
Hi Evgueni,
Thanks for your post. Could you post some timing information (using e.g. the examples in the test directory) showing the change in performance.
Thanks.
Francois

Re: Infficient use of FFTW

Posted: Mon Oct 27, 2014 9:27 am
by espetrov
Hi Francois,

According to our runs, Qbox linked to Intel MKL spends 12-13% of CPU cycles in the FFT copy routines.
Rewriting the loop is expected to halve the number of CPU cycles in the FFT copy routines.
The expected overall speedup is about 5% if Qbox is linked to Intel MKL.

Thank you.
Evgueni.