ch4/shm: fix performance degradation on Sapphire Rapids with Intel Compiler #7150

Fix the performance degradation on Intel Sapphire Rapids after introducing topo-aware SHM. This problem only happens when building with Intel compiler. The problem was topo-aware default to disabled. It uses regular memcpy for inter-NUMA message which is different from v4.2.2 (uses non-temporal copy). The reason this is disabled by default was due to using non-temporal copy results in higher latency in small message. After more testing with different CPUs (broadwell, skylake, cascade, icelake, milan), It seems only skylake, cascade and icelake has this issue on small message. It is probably OK to make topo-aware SHM default to enabled.

Previous PR#7074 consolidated SSE2 and AVX related optimization options into MPL's configure because only MPL explicitly use them. This change showed no performance degradation with GNU compiler. But, with Intel compilers, this does results in some performance degradation. Therefore, we should add them back in the main configure. Currently, the main configure checks for availability of SSE2, AVX and AVX512F, and add them to CFLAGS. The MPL configure will further check for specific instructions that is used in MPL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch4/shm: fix performance degradation on Sapphire Rapids with Intel Compiler #7150

ch4/shm: fix performance degradation on Sapphire Rapids with Intel Compiler #7150

Commits on Sep 24, 2024

ch4/shm: fix performance degradation on Sapphire Rapids with Intel Compiler #7150

Are you sure you want to change the base?

ch4/shm: fix performance degradation on Sapphire Rapids with Intel Compiler #7150

Commits on Sep 24, 2024