forked from stream1972/gfnsieve_ocl
-
Notifications
You must be signed in to change notification settings - Fork 0
ibethune/gfnsieve_ocl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
1. Introduction ----------------- This is OpenCL port of Generalized Fermat Number (GFN) sieving program. Original program was written by Anand Nair. OpenCL port by Roman Trunov. Original program was written for NVIDIA CUDA only. This port could be run on any OpenCL-enabled GPU. Additional improvements over original version are: - Better performance (less CPU usage) for low-N sieves (GFN-15,-16,-17); - A 64-bit version of program which introduces even less CPU usage; - An additional benchmark/simulation utility can be used to find maximum possible performance of your CPU for low-N sieves (GFN-15,-16,-17), which are usually limited by CPU and must use few CPU cores to feed GPU). Although in can be run on any OpenCL device, NVIDIA still may have better throughput because it's using optimized GPU assembly code, while other cards have to rely on optimization of plain C code done by their OpenCL compiler (driver). For NVIDIA, compute capability 2.0 or better card is mandatory. The high CPU usage problem on NVIDIA has been worked around (note: you must specify a command-line option to enable this workaround). This problem should not exist for other vendors. 2. Command-line options ------------------------- All command-line options are compatible with original program. A new command option "W<n>" has been added to work around 100% CPU usage problem. By default the workaround is disabled, you must enable it manually. To find do you need this workaround or not, run a test: gfnsvocl_xxx.exe 21 2000 2001 If your driver is affected, sieve program (actually, NVIDIA driver) will consume full CPU core. Then enable workaround by running: gfnsvocl_xxx.exe 21 2000 2001 W1 A CPU usage now should be close to zero. Note that this test must be run for GFN-21 or GFN-22 because low-N sieves like GFN-15,-16,-17 will consume up to full CPU core even when workaround is enabled. This is normal because lot of CPU work is required on these sieves. Note that usage of the workaround will reduce your GPU load (and P/day) a bit. To avoid this, use, if possible, high 'B' values (this reduces relative amount of GPU time lost in the workaround) or run few instances of the program. 3. CPU Benchmarks ------------------- Following is important for low-N sieves like GFN-15,-16,-17. Small GFN-"N" requires lot of CPU power. If you're planning to run these sieves, be aware that even a fastest CPU cannot feed even a average GPU. You must run at least two processes on different ranges, until you reach 100% GPU usage. For example, a GTX750ti GPU can do 55 P/day at fastest core (see below for cores). Let's compare this value with speed of CPU (i7-4770K at 4000 MHz), sieving range 10000P-10001P (lower ranges are a bit faster). -------------+-------------------------------------------------------------- Program | GFN-15 16 17 18 19 20 21 22 -------------+-------------------------------------------------------------- gfnsvsim_w32 | 13.9 27.8 51.7 102.4 204.2 401.0 774.0 1453.0 gfnsvsim_w64 | 20.9 41.8 76.8 152.7 301.6 591.0 1145.0 2157.0 -------------+-------------------------------------------------------------- As can be seen from the table, to fully utilize this quite average GPU at GFN-15, four instances of 32-bit or three instances of 64-bit program must be run. A GFN-18 and above can be served by a single instance. It's also clear that 64-bit version of the program must be used whenever possible. To find maximum CPU throughput of your computer, use benchmark/simulation program "gfnsvsim_xxx.exe". It runs only CPU part of sieve so numbers it reports are maximum throughput which could be obtained by single CPU core for given sieve range. Note: it can not find any factors (so it'll not write checkpoint file to avoid occasional skip of a subrange). 4. GPU Benchmarks ------------------- For NVIDIA cards, speed should be same or little better then original program, probably due to better GPU utilization or better optimization of recent compiler/driver. Technically, program consist of 3 separate cores, applicable for different sieve ranges: "63-bit" core - 0P to 9223P "64-bit" core - 9223P to about 18446P "79-bit" core - 18446P and above where 63-bit core is fastest and 79-bit is slowest. Program automatically chooses (or changes) a core for a range being sieved. Average benchmarks for NVIDIA GTX750ti, GFN-21, B10, single instance (P/day): Core | GFNSvCUDA | gfnsvocl ----------------+-------------+-------------- 63-bit NVIDIA | 47.0 | 55.1 64-bit NVIDIA | 40.0 | 48.9 64-bit C [1] | n/a | 42.3 79-bit NVIDIA | 27.3 | 30.8 79-bit C [1] | n/a | 23.5 ----------------+-------------+--------------- [1] A "Plain C" core will be used on all non-NVIDIA GPUs. It also has been manually benchmarked on NVIDIA to estimate slowdown, comparing to hand-written NVIDIA assembly. GPU throughput does not changes for 32-bit and 64-bit applications.
About
Generalized Fermat Number (GFN) sieving program, OpenCL port
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- C++ 86.8%
- C 12.6%
- Other 0.6%