This section presents how to use PICSAR with Intel® Software Development Emulator (SDE). Intel SDE is a platform that emulates AVX512 architectures (available on Knight Landing.) and future instruction sets. You can refer to the Intel SDE presentation page for a detail software description and user instruction. SDE can be run on any platform including Ivy Bridge and Haswell and provides useful reports on how the code behave on new architectures. Intel SDE is now used by PICSAR mainly to determine the number of flops performed by the code or specific subroutines.
To use SDE with your application, you have to use the
sde command followed by arguments and your application name at the end:
sde -arch arguments -- ./application
-arch is the processor architecture on which SDE is run. For instance,
-hsw has to be used for Haswell and
-knl for KNL.
This is a list of references for SDE:
- NERSC: This page contains a section on how to use SDE with Edison.
Compiling PICSAR for SDE
First of all, you have to compile PICSAR with the right flags for SDE. We recommend to create a new PICSAR installation in your SCRATCH directory. We also recommend to use the Intel compiler and the corresponding libraries for KNL:
module unload craype-haswell module load craype-mic-knl module unload Prgenv-gnu module load Prgenv-intel
To setup your environment rapidly, you can create a file that you will source like for instance
source ~/.knl_sde_config. This file can be like this:
if [ "$NERSC_HOST" == "cori" ] then # Modules module unload craype-haswell module unload PrgEnv-gnu module load PrgEnv-intel module load craype-mic-knl # Required at execution module load SDE # Path to PICSAR compiled with Intel for KNL and Advisor PICSAR=$SCRATCH/Codes/install_intel_knl_sde/picsar/ # PICSAR paths export PYTHONPATH="$PICSAR/python_bin/:$PYTHONPATH" export PYTHONPATH="$PICSAR/example_scripts_python/:$PYTHONPATH" export PYTHONPATH="$PICSAR/python_libs/:$PYTHONPATH" export PYTHONPATH="$PICSAR/postproc_python_script/:$PYTHONPATH" fi
In this script, PICSAR is installed in
Change the path for your installation directory. Instead of sourcing this file, you can also setup your environment manually.
You can easily compile PICSAR for SDE using our Makefile:
make SYS=cori2 MODE=sde
The compilation will generate a binary file called
The compilation is made with the following flags:
-D SDE=1: this flag tells PICSAR to activate specific subroutines for SDE. SDE uses markers that can be put in the code to profile specific sections. PICSAR have subroutines with these markers to profile for instance only the kernel without initialization (purpose of
-g: Produces symbolic debug information in the object file (required).
-O3 -xMIC-AVX512 -qopenmp: enable OpenMP, optimization and vectorization on KLNLs
-debug inline-debug-info: more debug information
-align array64byte: data alignment for better vectorization efficiency.
In this section, we have presented how to compile PICSAR for SDE on KNL. Using KNL is not the most efficient particularly for big runs. SDE wil run faster on Ivy Bridge or Haswell. In this case, you can compile for Haswell by doing:
make SYS=cori1 MODE=sde
Or for Edison:
make SYS=edison MODE=sde
Running PICSAR with SDE
In the following script, PICSAR is run with SDE on a single KNL node with 4 MPI ranks and 32 OpenMP threads per MPI rank.
#!/bin/bash -l #SBATCH --job-name=sde_analysis #SBATCH --time=01:00:00 #SBATCH -N 1 ##SBATCH -S 4 #SBATCH -p knl #SBATCH -C knl,quad,cache #SBATCH -e sde_analysis.err #SBATCH -o sde_analysis.out export OMP_NUM_THREADS=32 export OMP_STACKSIZE=128M export OMP_DISPLAY_ENV=true export OMP_PROC_BIND=spread export OMP_PLACES=cores"(16)" export OMP_SCHEDULE=dynamic source ~/.knl_sde_config cp $PICSAR/fortran_bin/picsar_cori2_sde . # Add the following "sbcast" line here for jobs larger than 1500 MPI tasks: # sbcast ./mycode.exe /tmp/mycode.exe numactl -H srun -n 4 -c 64 sde -knl -d -iform 1 -omix my_mix.out -i -global_region -start_ssc_mark 111:repeat -stop_ssc_mark 222:repeat -- ./picsar_cori2_sde
Here the following arguments are used:
-start_ssc_mark 111:repeat:this argument specifies that when the code reach the marker 111, the profiling should start.
-stop_ssc_mark 222:repeat: this argument specifies that when the code reach marker 222, the profiling should stop.
Mathieu Lobet, last update: January 26, 2017