Performance tests

Please put results of your performance in here.

Let's try to find a common way of giving performance tests.

Give a short description of what you are trying to benchmark and the result
List any used configuration files
The allocation on cori
List the used command(s)
List relevant environment variables

RK: Analysis speed on Haswell

I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend. Performed 2 runs:

20 time steps: wall-time 254s (4 min)
500 time-steps: wall-time 4821s (80 minutes)

for 20 time-steps. The wall-time for the run was 254 seconds.

Configuration file:

{"datapath":  "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
 "shotnr": 18431,
 "storage": 
 {
   "backend": "mongo",
   "username": "XXXXX",
   "password": "nice-try-fbi"
 },
 "ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
              "t_norm": [-0.119, -0.109],
              "SampleRate": 500,
              "TFcurrent": 23000.0,
              "Mode": "O",
              "LoFreq": 81,
              "LensFocus": 80,
              "LensZoom": 340},
 "fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
 "task_list": [{
                 "task_description" : "cross_phase", 
                 "analysis": "cross_phase",
                 "channel_chunk_size": 32768, 
                 "ref_channels" : "L0101-2408",
                 "cmp_channels" : "L0101-2408"
               },
               {
                  "task_description" : "cross_power",
                  "analysis": "cross_power",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "coherence",
                  "analysis": "coherence",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "cross_correlation",
                  "analysis": "cross_correlation",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                }]   
}

Allocation: 6 Haswell nodes in interactive queue
srun -n 12 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
OMP_NUM_THREADS=32

RK Analysis speed on KNL

I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend for 20 time-steps. The wall-time for the run was 1681 seconds.

Configuration file:

{"datapath":  "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
 "shotnr": 18431,
 "storage": 
 {
   "backend": "mongo",
   "username": "XXXXX",
   "password": "nice-try-fbi"
 },
 "ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
              "t_norm": [-0.119, -0.109],
              "SampleRate": 500,
              "TFcurrent": 23000.0,
              "Mode": "O",
              "LoFreq": 81,
              "LensFocus": 80,
              "LensZoom": 340},
 "fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
 "task_list": [{
                 "task_description" : "cross_phase", 
                 "analysis": "cross_phase",
                 "channel_chunk_size": 32768, 
                 "ref_channels" : "L0101-2408",
                 "cmp_channels" : "L0101-2408"
               },
               {
                  "task_description" : "cross_power",
                  "analysis": "cross_power",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "coherence",
                  "analysis": "coherence",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "cross_correlation",
                  "analysis": "cross_correlation",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                }]   
}

Allocation: 6 Haswell nodes in interactive queue
srun -n 6 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
OMP_NUM_THREADS=272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tests

Clone this wiki locally