Replies: 2 comments
-
Obtaining optimal performance is not so easy. The biggest question is whether your code other than SLEEF functions is vectorized properly. Usually, this part of the development process is the hardest and most time-consuming. If the code other than the math funcs is not vectorized at all, then you can't expect much of a performance gain from using SLEEF. There are only limited cases where the compiler's autovectorizer works effectively. Also, for gaming applications, do you really need u10 accuracy? You mentioned plugin, but how SLEEF is linked to non-SLEEF code can have a pretty serious impact. Also, it is a bit hard to understand in this writing, but if you really want to increase performance, the inline header version is better than the LTO version, though it would be difficult to use the inline header version if you are writing in a language other than C/C++. In any case, I recommend disassembling and looking at the code actually being executed by the CPU. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the infromation! :) People will call Sleef from a scripting language (angelscript) and I'm not positive how it translates calls into actual function calls, other than using the CDECL calling convention and such. The plugin is linked via a DLL load, and Sleef is statically linked to that plug-in DLL (that is, the game engine dynamically loads the plug-in, which is a DLL, and that DLL is linked to Sleef statically). I wasn't sure which amount of accuracy I wanted so I picked the smallest ULPs. |
Beta Was this translation helpful? Give feedback.
-
The FAQ is a bit unclear about this. I'm trying to figure out the answer to this question: when is using Sleef preferable over libm?
My use case is this: I'm writing a plugin for a game engine that uses angelscript and I'm working on attempting to provide "fast" alternatives to some use cases (i.e. computing algorithms over large arrays, or offloading some tasks to a GPU or other onboard accelerator). I'm uncertain if big mainstream game engines do this kind of acceleration or not, but I thought I'd try anyway.
Right now, I'm exposing the
Sleef_op_uxx
functions to the scripting language, e.g.sin
maps toSleef_sin_u10
. If I understand the questions/answers right, this should, in theory, dispatch automatically to the fastest function given the CPU's capabilities at run-time. However, I don't want to get developers hopes up that if they just start using these "fast" functions they'll instantly see a speedup. I've tried to do some searching to figure out when this would be appropriate but am uncertain. So, what are some performance guidelines/tips for squeezing maximum performance out of Sleef? (As an aside, I am linking with LTO and am building Sleef before building the plug-in that exposes it's functions, and everything gets a nice dose of LTO and optimizations). What are some use-cases/examples?Beta Was this translation helpful? Give feedback.
All reactions