Multi-node performance on MIRA

As part of the Director’s Discretionary Allocation obtained on June 2nd 2016, a series of scaling tests have been performed on the Mira super-computer in Argonne. The Mira machine is a 10-petaflops IBM Blue Gene/Q system [1] composed of 49152 nodes of 16 1600 MHz PowerPC A2 cores interconnected with a 5D Torus network.

The case of a homogeneous thermalized plasma of thermal velocity 0.1c is considered again, with 1024x1024x3072 grid cells domain and 20 macro-particles per cells. These dimensions allow to divide grid cells equally between MPI domains for all the tests.

At first, a strong scaling test is performed with MPI only (1 OpenMP thread is used per MPI thread) with a number of cores ranging from 20000 to 800 000. The finite-difference time-domain (FDTD) Maxwell solver is used with a stencil of order 2 with 2 guard cells per MPI domain and the Pseudo-Spectral Analytic Time-Domain (PSATD) Maxwell solver is used with a pseudo-stencil of order 128 with 12 guard cells per MPI domain. The results are presented in Fig. 1. The low order FDTD Maxwell solver (circle markers) scales very well to the full machine with an efficiency of 98% on approximately 800 000 cores. The high-order FDTD Maxwell solver (square markers) also exhibits a very good scaling with an efficiency of 83% on half of the MIRA machine. The reduction of the efficiency with the high-order FDTD Maxwell solver compared to the low-order FDTD solver is mainly due to the larger number of guard cells that need to be exchanged.

Fig. 1 – MPI strong scaling tests of PICSAR on MIRA. The circle markers correspond to scaling data with order 2 FDTD Maxwell solver and 2 guard cells. The square markers correspond to the PSATD spectral solver at order 128 with 12 guard cells, as used in plasma harmonic simulations.

Strong OpenMP scaling tests for the particle and field routines have then been performed with a fixed number of 49 152 MPI tasks. It shows that a speedup of 11 can be achieved with 16 OpenMP threads per MPI process.

Fig. 2 – OpenMP scaling of particle and field routines on MIRA (fixed number of MPI processes). The plain line corresponds to ideal speed-up and markers to the routines speed-up.


[1] – Mira team. Mira super-computer. Accessed:

Henri Vincenti, Mathieu Lobet. Last update: February 6, 2017

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s