diff --git a/demos/gfql/GPU_memory_consumption_tutorial.ipynb b/demos/gfql/GPU_memory_consumption_tutorial.ipynb index 9fa4c2ac0..c79fe460f 100644 --- a/demos/gfql/GPU_memory_consumption_tutorial.ipynb +++ b/demos/gfql/GPU_memory_consumption_tutorial.ipynb @@ -6,14 +6,17 @@ "id": "AnX7LqWMBnpu" }, "source": [ - "# GPU analytics memory size planning & profiling\n", + "# How much GPU RAM do you need and how much data fits into a GPU task?\n", "\n", - "## Accelerating far beyond Pandas at a fraction of the size using Parquet, Apache Arrow, RAPIDS/cuDF, and Graphistry/GFQL\n", + "## GPU memory size planning & data ratios for Parquet, Arrow, RAPIDS/cuDF, and Graphistry/GFQL\n", "\n", + "Put too much data into a GPU or use a GPU without enough memory and things fall apart. Whatever GPU you pick, you may then want to partition your data to make sure it fits, but make partitions too small and now you risk only getting a fraction of the available GPU speedups.\n", "\n", - "Understanding and managing GPU memory consumption is central to achieving high performance with your GPUs. Go too big and things fall apart, but go to small and you risk going too slow.\n", + "Achieving high performance with your GPUs often starts with navigating these questions.\n", "\n", - "To build an intuition for staying within your GPU memory budget, we will explore a reprentative activity logs dataset and see how much memory different analytics steps consume on it:\n", + "It is surprisingly simple in practice to stay within your GPU memory budget once you understand some common data ratios that occur at basic data pipeline phases.\n", + "\n", + "Using a representative activity logs dataset, we will work through a typical GPU ETL & analytics pipeline that starts all the way from disk:\n", "\n", "* Parquet (disk, compressed): 0.1-0.5X\n", "* Arrow (CPU, in-memory): 0.2-1X\n", @@ -21,10 +24,9 @@ "* cuDF (GPU, in-memory): 0.2-1X\n", "* **GPU compute operations (GPU): 0.2-1X <-- includes cuDF tabular queries and GFQL graph queries**\n", "* Overall Peak Usage: 1-2X\n", + "* Variants: **Multi-GPU**, **multi-node**, and **AI+ML**\n", "\n", - "At the end, we'll also touch on **multi-GPU**, **multi-node**, and **AI+ML** workload estimation.\n", - "\n", - "Even before we begin, we already see that GPU libraries typically consume a small fraction of the memory required by traditional CPU-based libraries like Pandas. Taming memory is a first step for GPU processing, both in the libraries and how to use them, and we see that play out here already." + "Even before we begin, note that the above ratios already show GPU libraries typically consume a small fraction of the memory required by popular CPU-based libraries like Pandas: They're built with better performance in mind in general, not just because of GPU processing." ] }, { @@ -35,11 +37,11 @@ "source": [ "# Phase 1: Setup and Data Creation\n", "\n", - "RAPIDS (GPU) + PyGraphistry\n", + "(Skip ahead to **The data** if you're just skimming)\n", "\n", - "We assume standards pandas stack\n", + "## Installs & imports\n", "\n", - "## Installs & imports" + "Pandas (CPU), RAPIDS cuDF (GPU), PyGraphistry" ] }, { @@ -561,15 +563,11 @@ "\n", "### 4X GPU compaction with cuDF\n", "\n", - "`cuDF` is an open source GPU-based dataframe library that matches the Pandas API. Note that cuDF is Arrow-native, so the estimated GPU memory consumption matches Apache Arrow, maintaining the 4X improvement over Pandas even without doing any compute\n", + "`cuDF` is an open source GPU-based dataframe library that matches the Pandas API. Note that cuDF is Arrow-native, so the estimated GPU memory consumption exactly matches Apache Arrow. It maintains the 4X improvement over Pandas even without doing any compute.\n", "\n", "### 4X GPU compaction with PyGraphistry\n", "\n", - "Graph users can automate transfering a graph's tables to the GPU via [g2 = g1.to_cudf()](https://pygraphistry.readthedocs.io/en/latest/api/compute.html#graphistry.compute.ComputeMixin.ComputeMixin.to_cudf)\n", - "\n", - "### Pack in 10X more data for real workloads\n", - "\n", - "It is convenient to move the entire dataframe to the GPU when there is a lot of space. However, 10X+ bigger workloads can often be easily handled on the same GPU just by mindful of which columns to use at the beginning. For example, `df2 = cudf.from_pandas(df[['src_ip', 'dst_ip']])`.\n" + "Graph users can automate transfering a graph's tables to the GPU via [g2 = g1.to_cudf()](https://pygraphistry.readthedocs.io/en/latest/api/compute.html#graphistry.compute.ComputeMixin.ComputeMixin.to_cudf), reaping the same benefits over a Pandas-based approach." ] }, { @@ -601,6 +599,36 @@ "print(f\"Total gdf size in memory: {gdf_size_mb:.2f} MB\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Pack in 10X+ more data for real workloads with GPU projections and higher CPU RAM\n", + "\n", + "It is convenient to move the entire dataframe to the GPU when there is a lot of room, so we recommend doing that during prototyping.\n", + "\n", + "However, 10X+ bigger workloads can often be easily handled on the same GPU just by mindful of which columns to use at the beginning:\n", + "\n", + "```python\n", + " # Only transfer 2 columns from df to the GPU\n", + " df2 = cudf.from_pandas(df[['src_ip', 'dst_ip']])\n", + "```\n", + "\n", + "CPU RAM is often cheaper than GPU RAM, so you may want your CPU to have 1-4X more RAM than your GPUs\n", + "\n", + "### Off-GPU IO Speeds\n", + "\n", + "To handle bigger-than-memory datasets, it helps to keep in mind that data travels through different speed devices as it goes through disk to GPU:\n", + "\n", + "It helps to pair your GPU RAM with even more (cheaper) CPU RAM or disk:\n", + "* Individual SSDs can do 1-5 GB/s, and arrays of them can do 100GB+/s\n", + "* Consumer speeds for disk->CPU and CPU->GPU are around 32 GB/s per 1-2 GPUs via PCIe 4.0\n", + "* Server-grade are often PCIe 5.0 at 64 GB/s per 1-2 GPUs\n", + "\n", + "For advanced setups, such as for going at 100 GB/s on 1-2 GPUs, see our recorded Dask Summit talk on [100GB/s GPU Log Analytics at Graphistry](https://www.youtube.com/watch?v=8ZMzsTbfImU). It reviews broad concepts, architecture, and tricks like skipping the convoluted CPU path via [GPU Direct](https://developer.nvidia.com/gpudirect).\n" + ] + }, { "cell_type": "markdown", "metadata": { @@ -856,6 +884,7 @@ "\n", "Meanwhile, you may find these useful as well:\n", "\n", + "* [100GB/s GPU Log Analytics at Graphistry](https://www.youtube.com/watch?v=8ZMzsTbfImU) recorded talk at Dask Distributed Summit\n", "* [PyGraphistry](https://pygraphistry.readthedocs.io/en/latest/10min.html) GPU-accelerated visual graph analytics\n", "* [PyGraphistry GPU umap()](https://pygraphistry.readthedocs.io/en/latest/gfql/combo.html#umap-fit-transform-for-scaling) for visual graph AI\n", "* The open source [GFQL dataframe-native graph query language](https://pygraphistry.readthedocs.io/en/latest/gfql/index.html) with optional GPU mode\n",