Skip to content

Performance tests

rkube edited this page Jan 30, 2020 · 8 revisions

Please put results of your performance in here.

Let's try to find a common way of giving performance tests.

  1. Give a short description of what you are trying to benchmark and the result
  2. List any used configuration files
  3. The allocation on cori
  4. List the used command(s)
  5. List relevant environment variables

RK: Analysis speed on Haswell

I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend. Performed 2 runs:

  • 20 time steps: wall-time 254s (4 min)
  • 500 time-steps: wall-time 4821s (80 minutes)

for 20 time-steps. The wall-time for the run was 254 seconds.

Configuration file:

{"datapath":  "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
 "shotnr": 18431,
 "storage": 
 {
   "backend": "mongo",
   "username": "XXXXX",
   "password": "nice-try-fbi"
 },
 "ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
              "t_norm": [-0.119, -0.109],
              "SampleRate": 500,
              "TFcurrent": 23000.0,
              "Mode": "O",
              "LoFreq": 81,
              "LensFocus": 80,
              "LensZoom": 340},
 "fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
 "task_list": [{
                 "task_description" : "cross_phase", 
                 "analysis": "cross_phase",
                 "channel_chunk_size": 32768, 
                 "ref_channels" : "L0101-2408",
                 "cmp_channels" : "L0101-2408"
               },
               {
                  "task_description" : "cross_power",
                  "analysis": "cross_power",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "coherence",
                  "analysis": "coherence",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "cross_correlation",
                  "analysis": "cross_correlation",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                }]   
}
  • Allocation: 6 Haswell nodes in interactive queue
  • srun -n 12 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
  • OMP_NUM_THREADS=32

RK Analysis speed on KNL

I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend for 20 time-steps. The wall-time for the run was 1681 seconds.

Configuration file:

{"datapath":  "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
 "shotnr": 18431,
 "storage": 
 {
   "backend": "mongo",
   "username": "XXXXX",
   "password": "nice-try-fbi"
 },
 "ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
              "t_norm": [-0.119, -0.109],
              "SampleRate": 500,
              "TFcurrent": 23000.0,
              "Mode": "O",
              "LoFreq": 81,
              "LensFocus": 80,
              "LensZoom": 340},
 "fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
 "task_list": [{
                 "task_description" : "cross_phase", 
                 "analysis": "cross_phase",
                 "channel_chunk_size": 32768, 
                 "ref_channels" : "L0101-2408",
                 "cmp_channels" : "L0101-2408"
               },
               {
                  "task_description" : "cross_power",
                  "analysis": "cross_power",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "coherence",
                  "analysis": "coherence",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                },
                {
                  "task_description" : "cross_correlation",
                  "analysis": "cross_correlation",
                  "channel_chunk_size": 32768,
                  "ref_channels" : "L0101-2408",
                  "cmp_channels" : "L0101-2408"
                }]   
}
  • Allocation: 6 Haswell nodes in interactive queue
  • srun -n 6 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
  • OMP_NUM_THREADS=272
Clone this wiki locally