-
Notifications
You must be signed in to change notification settings - Fork 6
Performance tests
rkube edited this page Jan 30, 2020
·
8 revisions
Please put results of your performance in here.
Let's try to find a common way of giving performance tests.
- Give a short description of what you are trying to benchmark and the result
- List any used configuration files
- The allocation on cori
- List the used command(s)
- List relevant environment variables
RK: Analysis speed on Haswell
I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend. Performed 2 runs:
- 20 time steps: wall-time 254s (4 min)
- 500 time-steps: wall-time 4821s (80 minutes)
for 20 time-steps. The wall-time for the run was 254 seconds.
Configuration file:
{"datapath": "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
"shotnr": 18431,
"storage":
{
"backend": "mongo",
"username": "XXXXX",
"password": "nice-try-fbi"
},
"ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
"t_norm": [-0.119, -0.109],
"SampleRate": 500,
"TFcurrent": 23000.0,
"Mode": "O",
"LoFreq": 81,
"LensFocus": 80,
"LensZoom": 340},
"fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
"task_list": [{
"task_description" : "cross_phase",
"analysis": "cross_phase",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "cross_power",
"analysis": "cross_power",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "coherence",
"analysis": "coherence",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "cross_correlation",
"analysis": "cross_correlation",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
}]
}
- Allocation: 6 Haswell nodes in interactive queue
- srun -n 12 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
- OMP_NUM_THREADS=32
RK Analysis speed on KNL
I benchmarked processor_mpi.py, which uses the tasks_mpi interface, with a full analysis workload and using the mongodb backend for 20 time-steps. The wall-time for the run was 1681 seconds.
Configuration file:
{"datapath": "/global/cscratch1/sd/rkube/KSTAR/kstar_streaming/018431/",
"shotnr": 18431,
"storage":
{
"backend": "mongo",
"username": "XXXXX",
"password": "nice-try-fbi"
},
"ECEI_cfg": {"TriggerTime": [-0.12, 61.2, 60],
"t_norm": [-0.119, -0.109],
"SampleRate": 500,
"TFcurrent": 23000.0,
"Mode": "O",
"LoFreq": 81,
"LensFocus": 80,
"LensZoom": 340},
"fft_params" : {"nfft": 512, "window": "hann", "overlap": 0.5, "detrend": "constant", "full": true},
"task_list": [{
"task_description" : "cross_phase",
"analysis": "cross_phase",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "cross_power",
"analysis": "cross_power",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "coherence",
"analysis": "coherence",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
},
{
"task_description" : "cross_correlation",
"analysis": "cross_correlation",
"channel_chunk_size": 32768,
"ref_channels" : "L0101-2408",
"cmp_channels" : "L0101-2408"
}]
}
- Allocation: 6 Haswell nodes in interactive queue
- srun -n 6 -m mpi4py.futures python processor_mpi.py --config configs/config_all.json --benchmark
- OMP_NUM_THREADS=272