cuStreamComp

Efficient CUDA Stream Compaction Library

Based on the folllowing works:

Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures
InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes
Darius Bakunas-Milanowski et al. Efficient Algorithms for Stream Compaction on GPUs

It is an CUDA efficient implementation of the stream compaction algorithm based on warp ballotting intrinsic.

How to use it

Its usage is straightforward:

Create a predicate functor to decide whether an element is valid or not.

struct predicate
{
	__host__ __device__
	bool operator()(const int x)
	{
		return x>0;
	}
};

Call the compact procedure to obtain the compacted array d_output.

cuCompactor::compact<int>(d_data,d_output,length,predicate(),blockSize);
cuCompactor::compactHybrid<int>(d_data,d_output,length,predicate(),blockSize);
cuCompactor::compactThrust<int>(d_data,d_output,length,predicate());

Note that both the input d_data and the output d_output arrays have to be allocated on device.

PERFORMANCE

Thrust (T) outperforms both the Billeter (B) and Bakunas-Milanowski (H) implementations for all streams of size up to 1024x2^19. 1024x2^20 >= are too large for thrust to compact on my Quadro RTX 5000 Mobile / Max-Q with 16 GiBytes of VRAM. B and H are more memory frugal that thrust and can perform compaction on a stream of at least 1024x2^20 elements.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
results		results
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuStreamComp

How to use it

About

Releases

Packages

Languages

License

GregorySchwing/cuStreamComp

Folders and files

Latest commit

History

Repository files navigation

cuStreamComp

How to use it

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages