tGeMM

General Matrix Multiplication using NVIDIA Tensor Cores.

Custom data structures MatrixFP16 and MatrixFP32 are defined (in src) to make working with matrices easy. Supported features are as follows:

Define half precision n x n matrix A_FP16 on RAM (host memory):

MatrixFP16 A_FP16 = MatrixFP16(n, n, false);
Define half precision n x n matrix d_A_FP16 on VRAM (device global memory):

MatrixFP16 d_A_FP16 = MatrixFP16(n, n, true);
Define single precision n x n matrix A_FP32 on RAM (host memory):

MatrixFP32 A_FP32 = MatrixFP32(n, n, false);
Define single precision n x n matrix d_A_FP32 on VRAM (device global memory):

MatrixFP32 d_A_FP32 = MatrixFP32(n, n, true);
Randomly initialize FP16 or FP32 matrices:

random_init_mat(A_FP16, -10, 10); // Random Initialization between -10 and 10

random_init_mat(A_FP32, -10, 10); // Random Initialization between -10 and 10
Move matrix data from RAM to VRAM:

A_FP16.copy_to_device(d_A_FP16);
Move matrix data from VRAM to RAM:

d_A_FP16.copy_to_host(A_FP16);
Free host/device memory:

A_FP16.free_mat();

FP16.free_mat();

cuBLAS vs Custom Matrix Multiplication using Tensor Cores

For cuBLAS version run the command: make 00_benchmark_cuBLAS.out
For custom version run the command: make 01_benchmark_naive.out

The naive version is a fair bit slower than cuBLAS. However, my point (as of now) is to show how tensor cores can be programmed. I've kept everything as simple as possible so that it's easy to understand the workings of tensor cores.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
build		build
include		include
src		src
test		test
txt_benchmarks		txt_benchmarks
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tGeMM

cuBLAS vs Custom Matrix Multiplication using Tensor Cores

About

Releases

Packages

Languages

License

tgautam03/tGeMM

Folders and files

Latest commit

History

Repository files navigation

tGeMM

cuBLAS vs Custom Matrix Multiplication using Tensor Cores

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages