We explore temporal locality using statistics to optimize original TBN prefetching(TBNp) policy implemented by NVIDIA CUDA drivers.
We measure performance with Rodinia Benchmark http://www.cs.virginia.edu/rodinia/doku.php
For SRAD we got a speedup of 15 percent in the unlimited memory case, and a speedup of 12 percent in the limited memory case compared to original TBNp. For other test case, there was no improvement.
Yeojin Kim, Josiah Blaisdell, SangYeon Lee