Virtual GPU Support

Graphistry is designed to run on virtual GPUs using Nvidia's vGPU capabilities. This may simplify administration, encourage additional privacy modes, and improve isolation. It is tested on Nutanix, Vmware, and whatever is used by the major cloud providers.

If you are installing on such a system, we are happy to help, so please reach out.

In general, running Graphistry with a hypervisor requires either no special configuration, or with a vGPU, a 1-line setting in data/config/custom.env. However, virtual environment installation may take considerably more planning and effort when unfamiliar. In particular, for steps around virtualization planning, passthrough/vGPU provisioning, vGPU licensing, vGPU driver installation, and surrounding testing and troubleshooting.

Architectural considerations: You may not want vGPUs

We recommend checking whether you want baremetal OS, hypervisor with a passthrough GPU driver, or hypervisor with a vGPU driver.

A baremetal OS (no hypervisor) or passthrough driver (hypervisor with non-vGPU driver) configuration may be sufficient, meaning you can skip the complexity of using vGPU drivers and licensing:

Hypervisors: If only 1 dedicated GPU application, or multiple applications running in the same guest OS, you can expose it in passthrough mode
Scaling users through multiple GPUs:
- Graphistry already automatically uses all GPUs exposed to it, primarily for scaling to more user sessions
- New APIs are starting to use multi-GPUs for acceleration as well
Multiple Graphistry installs
- You can launch concurrent instances of Graphistry using docker: docker-compose up -p my_unique_namespace_123
- You can configure docker to use different GPUs or share the same ones
Isolate Graphistry from other GPU software
- Docker allows picking which GPUs + CPUs are used
- ... For both sharing and isolation
Nvidia vGPUs do not yet support Uniform CPU/GPU memory sharing, which causes RAPIDS processes to crash on bigger-than-GPU memory use instead of transparently spilling

Longer-term, Graphistry is aiming to push most/all GPU use to Dask, which adds even more flexibility for resource sharing.

GPU Virtualization configuration planning

Plan licensing and driver versons first, as mistakes may require starting over.

Most likely, you'll want vGPU profile 8Q with vGPU 11.0+ (11.5+ for AI) vGPU drivers:

GPU Driver

You will install a hypervisor GPU driver in the hypervisor and a guest OS GPU driver in the guest OS:

The hypervisor+guest GPU driver pair should be from the same vGPU family
The driver's CUDA version must be RAPIDS-compatible: 11.0+ (11.5+ for AI) at time of writing
Use officially sanctioned drivers. Google Cloud hosts MD5-matched guest OS drivers.

Virtualization features: C (vCS)

Nvidia vGPUs have different labels (A, B, C, Q, ...) that correspond with enabled features.

Only C (vCS) and Q (Quadro vDWS) support OpenCL/CUDA. The compute profile (C) is intended for compute workloads like Graphistry. See official NVIDIA CUDA Toolkit and OpenCL Support on NVIDIA vGPU Software.

Virtualization type: Logical (not MIG)

Only one kind of vGPU virtualization currently works with Nvidia RAPIDS:

Supported - Logical time and memory sliced: Nvidia RAPIDS-compatible vGPUs are for 11.0 (11.5+ for AI) . Each vGPU gets time-sliced and a maximum amount of memory.
Unsupported - MIG (physical virtualization): vGPU 11.1+ supports more true isolation of GPU cores and memory... but Nvidia RAPIDS does not yet officially work on MIGs.

Size: 2 or 8

Each GPU can be split into homogenous sizes. Ex: 1 x 8Q = 8Q and 4 x 2Q = 8Q. You may be able to oversubscribe, e.g., 4 x 4Q = 16Q.

In the case of multi-GPU systems, you can use different partition sizes on different GPUs. Ex: (1 x 8Q) + (4 x 2Q).

vGPU Software setup

Supporting vGPUs means setting up a Nvidia license manager server, GPU-capable hypervisor, a GPU-capable guest OS, and making them work together. Almost all other environment configuration is the same as regular self-hosted Graphistry setup.

There is also a 1 line change for making Graphistry work with vGPUs.

Licensing server and license

Setup an Nvidia licensing server and use it to generate and download a license. It must stay on while using your vGPU as the license gets dynamically checked.

Unlicensed GPUs will appear under nvidia-smi and be marked as unlicensed in nvidia-smi -q, but fail upon use with CUDA.

If you have a license server but cannot use it to generate a license, you may be able to receive a time/account-limited evaluation license at nvidia.com.

Hyperviser

Install hypervisor GPU drivers from the vGPU software version

Guest OS

Install guest OS GPU drivers from the vGPU software version

Configure a local license file at /etc/nvidia/gridd.conf. Take care to specify vGPU profile vCS (C) / Quadro vDWS (Q) and the right license manager server. Restart the local daemon via sudo service nvidia-gridd restart.

To confirm license activation, check the license status in nvidia-smi -q and debug logs (sudo grep gridd /var/log/messages)

Graphistry

Set RMM_ALLOCATOR=default in your data/config/custom.env to avoid relying on CUDA Unified Memory for handling bigger-than-memory workloads, which Nvidia vGPUs do not currently support.

Hyperviser-specific options

See official docs and support forums for guidance specific to your hypervisor.

Nutanix

Follow the standard Graphistry + vGPU instructions

Helpful links and configurations:

Drivers: Download a hypervisor + guest OS driver pair from an Nvidia.com vGPU account that comes with your hardware purchase. Alternatively, if allowed, Nutanix provides hypervisor drivers and Google Cloud hosts guest OS drivers.
OS: Ubuntu 18.04 LTS (plain). Do not use the listed Snap-based add-on for Docker.
Docs: Check main Nutanix.com Nvidia docs and supporting Nvidia.com Nutanux docs

Testing

Script

In your configured VM, after loading the Graphistry containers, go to your Graphistry release folder and run ./test-gpu.sh. For more information on what this does, see quick testing and test-gpu.

Hypevisor

nvidia-smi should report a GPU of the expected driver version

Guest OS

GPU: nvidia-smi should report a GPU of the expected driver version
Nvidia license server approval: sudo grep gridd /var/log/messages
See above test-gpu.sh

Docker

See above test-gpu.sh

Common errors

No GPU driver in hypervisor / guest VM
Mismatching GPU driver between hypervisor / guest VM; use a supplied pair
RAPIDS-incompatible GPU driver version
Unlicensed GPU: No license, license server down, ...
Inappropriate license: CUDA-incapable vGPU profile/license; need C, Q

In addition, the usual GPU setup surprises may apply:

No Docker GPU runtime
Docker GPU runtime not set as default (docker info and /etc/docker/daemon.json)
See above test-gpu.sh discussions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vGPU.md

vGPU.md

Virtual GPU Support

Architectural considerations: You may not want vGPUs

GPU Virtualization configuration planning

vGPU Software setup

Licensing server and license

Hyperviser

Guest OS

Graphistry

Hyperviser-specific options

Nutanix

Testing

Script

Hypevisor

Guest OS

Docker

Common errors

Files

vGPU.md

Latest commit

History

vGPU.md

File metadata and controls

Virtual GPU Support

Architectural considerations: You may not want vGPUs

GPU Virtualization configuration planning

vGPU Software setup

Licensing server and license

Hyperviser

Guest OS

Graphistry

Hyperviser-specific options

Nutanix

Testing

Script

Hypevisor

Guest OS

Docker

Common errors