Scalable pseudo-spectral solvers

A new method [1] for solving time-dependent problems proposed to apply the cartesian domain decomposition technique currently used with low order FDTD solvers to very high order-pseudo-spectral solvers. As in the case of low order schemes, the simulation domain would be divided into several subdomains (see Fig. 1 left) and Maxwell’s equations solved locally on each subdomain either using local convolution or local FFT’s when the order p is very large.

pseudo_spectral_solver
Fig. 1 – Parallelization strategies of current Maxwell’s solvers. (left) Domain decomposition technique. The sketch represents a 2D layout of cartesian domain decomposition currently used in most low-order FDTD Maxwell’s solvers. The whole simulation domain is split into multiple subdomains (red areas- one per CPU) surrounded by guard regions into which copies of the fields from adjacent subdomains are stores (white regions). Fields are advanced independently using Maxwell’s equations on each subdomain. Fields from adjacent subdomains (areas delimited by black dashed lines) are then copied into guard regions to ensure field continuity between subdomains (black arrows). The black curves at subdomain boundaries represent the stencil used to compute the electromagnetic. For low order fields for which order p/2 < nguards, the stencil is not truncated at the boundaries. However, for high-order and pseudo-spectral solvers for which p/2 >> nguards the stencil is truncated at subdomain boundaries (red cross). (right) Sketch of the parallelization strategies used in current pseudo-spectral solvers. The domain is also split into multiple subdomains (one per CPU) but the field is not advanced independently on each subdomain. A global FFT is instead performed on the entire computational domain and each subdomain need to exchange all his data with every other subdomain to perform the global FFT. The cost of this “All to All” operation is computationally much higher than the simple cartesian domain decomposition technique and prevents the scaling of current pseudo-spectral solvers to a hundred thousands-millions cores.

However, there were still two challenges to face before being able to use this technique in production simulation:

  1. First, this technique induces small truncation errors that can affect simulation results and that needed to be characterized. The fundamental argument still legitimating this decomposition method is that physical information cannot travel faster than the speed of light. Choosing large enough guard regions should therefore ensure that spurious signal coming from stencil truncations at subdomains boundaries would remain in guard regions and would not enter the simulation domain (see Fig. 1 left) after one time step. Truncations errors were recently extensively characterized and an analytical model [2] was developed thanks to which we can now predict the amount of truncation errors in simulations as a function of the numerical parameters used (solver order, guard cell sizes, mesh size). In particular, this model shows that one should use very high finite order-p solvers instead of infinite order solvers. Indeed, truncation errors with finite very high order solvers decreases much faster with the number of guard cells while still bringing pseudo-spectral accuracy on almost the whole range of frequency accessible in the simulation. Practically, the truncation error model shows that ten to twelve guard cells are sufficient to lower truncation errors below machine precisions with up to order 128 Maxwell solvers, for which pseudo-spectral precision is obtained on almost the whole frequency domain. This model will be especially crucial to determine the number of guard cells to use for a given solver order to minimize truncation errors in production simulations.
  2. Then, this technique will need efficient shared memory implementation of the PIC algorithm. Even tough the number of guard cells required (nguards = 10) seems low, it can however still affect the scaling of the PIC code to a large number of processors. One solution is therefore to enable good shared-memory parallelization of the PIC loop to decrease the total number of MPI-processes (and thus the number of volumes of MPI exchanges) and improve scaling of the pseudo-spectral code to a large number of cores. Shared memory implementation of particle routines and standard solvers have been addressed in the previous section of the this chapter. For FFTs, we use the FFTW library that provide rather good threaded implementation of FFTs. On Xeon Phi architectures, one might also consider using the Intel MKL library that has been designed to fully exploit hardware features of many-core architectures.

Thanks to this recent work, an efficient parallel implementation of pseudo-spectral solvers is now available in WARP/PXR that was recently tested at large scale on the MIRA [3] cluster at Argonne National Lab and benchmarked against standard solvers for relativistic laser-plasma mirror interactions. Scaling tests on MIRA are presented in the performance section and show efficient scaling of pseudo-spectral solvers on up to 800; 000 cores. Benchmarks against standard solvers show significant improvements in terms of time-to-solution and memory footprint required for a given accuracy [4].

References

[1] – Jean-Luc Vay, Irving Haber, and Brendan B Godfrey. A domain decomposition method for pseudo-spectral electromagnetic simulations of plasmas. Journal of Computational Physics, 243:260-268, jun 2013.

[2] – H. Vincenti and J.-L. Vay. Detailed analysis of the effects of stencil spatial variations with arbitrary high-order finite-difference Maxwell solver. Computer Physics Communications, 200:147-167, marsh 2016.

[3] – Mira team. Mira super-computer. https://www.alcf.anl.gov/mira. Accessed: 2017-02-07.

[4] – G. Blaclard, H. Vincenti, R. Lehe, and J.-L. Vay. An efficient and portable simd algorithm for charge/current deposition in particle-in-cell codes. ArXiv preprint arXiv:1608.05739, 2016.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s