GitHub - idefix-code/tutorial: Series of tutorials on how to use Idefix, from simple setups to debugging

idefix-code / tutorial Public
Notifications You must be signed in to change notification settings
Fork 0
Star 1
Series of tutorials on how to use Idefix, from simple setups to debugging
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
AdvancedSetup		AdvancedSetup
Debugging		Debugging
SimpleSetup		SimpleSetup
VisualisationAndPost		VisualisationAndPost
Readme.ipynb		Readme.ipynb
Repository files navigation

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3a0b94c0-b01c-4852-869b-972d00c126a7",
   "metadata": {},
   "source": [
    "\n",
    "<!-- tocstop -->\n",
    "<a id=\"about\"></a>\n",
    "# About this tutorial\n",
    "This tutorial is provided as a github repository and is mirrored on Jureca for easier access. It is part of the [Toward Exascale-Ready Astrophysics workshop](https://indico3-jsc.fz-juelich.de/event/169/) and has been prepared by Geoffroy Lesur (geoffroy.lesur@univ-grenoble-alpes.fr)\n",
    "\n",
    "In this tutorial, you will learn how to use idefix on various architectures. Here, we will do everything through a Jupyter notebook opened on Jureca on the `dc-gpu` partition, but you can also do most of the tutorial on the CPU of your laptop.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e87953c-007b-4506-8183-62227f5fad5d",
   "metadata": {},
   "source": [
    "# 1- Deployment on Jureca\n",
    "\n",
    "Log in to https://jupyter-jsc.fz-juelich.de/\n",
    "\n",
    "Open a lab environment with\n",
    "\n",
    "- Lab Config:\n",
    "    - System: JURECA\n",
    "    - Project: training2437\n",
    "    - Partition: dc-gpu\n",
    "    - Reservation: tera_day2\n",
    "- Resources (opens once dc-gpu is selected)\n",
    "    - Nodes: 1\n",
    "    - GPUs: 4\n",
    "    - Runtime: 90\n",
    "Kernels and extensions: keep defaults\n",
    "\n",
    "\n",
    "First open a new console on your Jupyter notebook. We then clone the idefix Github repository. Since we don't have direct access to the internet, we use a small script to copy the sources and the tutorial from a shared directory:\n",
    "\n",
    "```shell\n",
    "source /p/project1/training2437/tera_day2/idefix/deploy.sh\n",
    "```\n",
    "\n",
    "This will put everything into `/p/project1/training2437/$USER/tera_day2/idefix`\n",
    "\n",
    "The deploy script already set up the environement (module and environement variable). If you loose connection and need a new console, you can reload the environement:\n",
    "\n",
    "```shell\n",
    "source /p/project1/training2437/tera_day2/idefix/env.sh\n",
    "```\n",
    "\n",
    "You should now open this notebook on your Jupyter instance on Jureca, in\n",
    "\n",
    "`/p/project1/training2437/$USER/tera_day2/idefix/tutorial/Readme.ipynb`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75842bfd-0baa-4953-9657-278840d02ec8",
   "metadata": {},
   "source": [
    "In Jupyter, we need to load Idefix environement:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15edfd28-ee3e-412b-bb5f-8ee75999ee05",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import Idefix python tools (part of the Idefix git repo, so we add this to python path) \n",
    "%matplotlib widget\n",
    "import os\n",
    "import sys\n",
    "# These path are specific to this Jureca tutorial\n",
    "user=os.getenv(\"USER\")\n",
    "sys.path.append(\"/p/project1/training2437/\"+user+\"/tera_day2/idefix/idefix.src/\")\n",
    "from pytools.vtk_io import readVTK\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83936685-a050-428e-ac85-93f3c1b973c8",
   "metadata": {},
   "source": [
    "and we're good to go!\n",
    "\n",
    "## Optionnal: Deploy on your machine\n",
    "\n",
    "<details> \n",
    "<summary>Click here to deploy Idefix on your laptop.</summary>\n",
    "<br>\n",
    "Optionnally, you can play around with this tutorial on your laptop/machine. In this case you can clone this tutorial and idefix source code on your machine, so that you can directly use these source files and test what you are doing. In the directory of your choice (this requires an internet access):\n",
    "\n",
    "```shell\n",
    "git clone --recurse-submodules https://github.com/idefix-code/idefix.git idefix.src\n",
    "export IDEFIX_DIR=$PWD/idefix.src\n",
    "git clone https://github.com/idefix-code/tutorial.git\n",
    "cd tutorial\n",
    "git checkout Jureca\n",
    "```\n",
    "The last line allows you to reach the dedicated tutorial for Jureca.\n",
    "\n",
    "For conveniance, we set the `IDEFIX_DIR` environment variable to the absolute path of the root directory of idefix (as above). \n",
    "\n",
    "If you intend to use the python script provided in this tutorial, best is to deploy a python environement with everything already set up. \n",
    "We therefore create a python environement in the directory `$IDEFIX_DIR/test` with the right modules (this may require an internet access)\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_DIR/test\n",
    "python3 -m venv ./env\n",
    "source env/bin/activate\n",
    "pip install -r python_requirements.txt\n",
    "```\n",
    "\n",
    "</details>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2221ea27-c37b-41e4-b203-a716c742897d",
   "metadata": {},
   "source": [
    "# 2- Basics: configuration, compilation, run on CPUs\n",
    "<a id=\"compilation\"></a>\n",
    "## Compile an example\n",
    "\n",
    "Let's play with a simple Sod shock tube test in hydro:\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_DIR/test/HD/sod\n",
    "```\n",
    "\n",
    "Configure the code launching cmake (version >= 3.16) in the example directory:\n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR\n",
    "```\n",
    "\n",
    "By default, this will configure the code to run on the CPU only. We will see later how to configure the code for GPU.\n",
    "\n",
    "Several options can be enabled from the command line (a complete list is available with `cmake $IDEFIX_DIR -LH`). For instance: `-DIdefix_RECONSTRUCTION=Parabolic` (enable PPM reconstruction), `-DIdefix_MPI=ON` (enable mpi), `-DKokkos_ENABLE_OPENMP=ON` (enable openmp parallelisation), etc... For more complex target architectures, it is recommended to use cmake GUI launching `ccmake $IDEFIX_DIR` in place of `cmake` and then switching on the required options.\n",
    "\n",
    "One can then compile the code:\n",
    "\n",
    "```shell\n",
    "make -j8\n",
    "```\n",
    "\n",
    "<a id=\"running\"></a>\n",
    "## Run an example\n",
    "\n",
    "launch the executable\n",
    "\n",
    "```shell\n",
    "srun -n 1 ./idefix\n",
    "```\n",
    "\n",
    "You should see idefix finishing successfully.\n",
    "<a id=\"validation\"></a>\n",
    "## Code Validation\n",
    "\n",
    "Most of tests provided in the `test/` directory can be validated against analytical solution (standard test)\n",
    "and/or pre-computed solutions (non-regression tests). Note that the validation relies on large reference\n",
    "files that are stored in the separate `idefix-code/reference` repository that is automatically cloned as a submodule.\n",
    "\n",
    "In order to check that our test produced the right result, we are going to use the script `testme.py`. \n",
    "\n",
    "```shell\n",
    "./testme.py -check\n",
    "```\n",
    "\n",
    "> ⚠️ **If you are using a Mac with an ARM cpu (M1/M2)**: The non-regression test might not succeed (but standard tests should always pass): this is linked to slight differences in the way roundoff errors are treated on these architectures.\n",
    "<a id=\"mpi\"></a>\n",
    "## Run in parallel with MPI\n",
    "\n",
    "Note: This section requires an MPI library on your machine.\n",
    "\n",
    "In order to use Idefix with parallel domain decomposition (either on CPUs or on GPUs), you should first configure the code with MPI enabled using the `Idefix_MPI=ON` option. Let's try that for the Orszag-Tang vortex test\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_DIR/test/MHD/OrszagTang\n",
    "cmake $IDEFIX_DIR -DIdefix_MPI=ON\n",
    "make -j 8\n",
    "```\n",
    "\n",
    "if your build is successful, you can now try to launch idefix with automatic domain decomposition. On a Jureca node (using Slurm):\n",
    "\n",
    "```shell\n",
    "srun -n 4 ./idefix\n",
    "```\n",
    "\n",
    "<details>\n",
    "<summary>(optional) On your laptop:</summary>\n",
    "\n",
    "```shell\n",
    "mpirun -np 4 ./idefix\n",
    "```\n",
    "</details>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c81f57c0-0ca8-4121-9c1e-c79978e9a74d",
   "metadata": {},
   "source": [
    "# 3- Configuration, compilation, run on GPUs\n",
    "## First tests\n",
    "\n",
    "For this first test, we are going to run a simple Orszag-Tang test problem on a single GPU. First cd to the right directory\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_DIR/test/MHD/OrszagTang\n",
    "```\n",
    "\n",
    "<a id=\"configuration\"></a>\n",
    "### Configuring/compiling the code for GPUs using CMAKE\n",
    "\n",
    "The code configuration can be a bit tricky. When you're not sure about the options, best is to use `ccmake`, a graphical version of `cmake` to switch on and off the options you need. Here, we know we're going to use \n",
    "an Nvidia GPU so we will be using CUDA. Moreover, we will configure the code on a GPU node, so we can let cmake auto-detect the right Nvidia architecture for us:\n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON\n",
    "make -j 8\n",
    "```\n",
    "\n",
    "While the code configure, you will see that it indeed auto-detect the `AMPERE_80` architecture. If this auto-configuration was to fail (i.e. configuring on a login node), we could add `-DKokkos_ARCH_AMPERE80=ON` to specify the right architecture.\n",
    "\n",
    "Note that it is always possible to run Idefix compiled for an older architecture (Pascal) on a new one (Ampere), you will only get a warning: `running kernels compiled for compute capability 6.1 on device with compute capability 8.6 , this will likely reduce potential performance.`. The opposite however doesn't work, if you try you will get an error message `Kokkos::Cuda::initialize ERROR: likely mismatch of architecture`\n",
    "\n",
    "Note that compilation for GPUs can take a looooooong time, so it is always recommended to parallelise the compilation with the `-j` option of `make`.\n",
    "\n",
    "<a id=\"running\"></a>\n",
    "### Running the code on GPUs\n",
    "\n",
    "You then simply launch the executable using srun:\n",
    "\n",
    "```shell\n",
    "srun -n 1 ./idefix\n",
    "```\n",
    "\n",
    "You should see Idefix running and finishing rapidly its computation (you can compare the performances in cell/s to the ones you obtain on your laptop for instance for the same test). \n",
    "\n",
    "<a id=\"mpi\"></a>\n",
    "### Multi-GPUs runs\n",
    "\n",
    "Idefix can run on multiple GPUs (it's been tested on +4000 GPUs simultaneously). This requires an MPI installation compatible with Cuda (e.g. GPU-aware OpenMPI). If you have loaded the environement in [Getting Started](README.md), you should be able to compile a GPU version of Idefix with parallelisation support.\n",
    "\n",
    "You should first configure the code with CMake adding `-DIdefix_MPI=ON` to the command line and compile. \n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON -DIdefix_MPI=ON\n",
    "make -j 8\n",
    "```\n",
    "\n",
    "If the compilation succeeds, then you can run a multi-GPU simulation with 4 gpus:\n",
    "\n",
    "```shell\n",
    "srun -n 4 ./idefix\n",
    "```\n",
    "\n",
    "Note that with the module configuration we used above, the code automatically uses NVLink when available and Cuda-Aware MPI (i.e. direct GPU-GPU communications)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23bb83d0-1b8d-4cc7-a781-c27eb258e2b7",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 4- A simple setup\n",
    "\n",
    "For this first simple setup, we move in the SimpleSetup/problem1 directory of the tutorial. i.e\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_ROOT/tutorial/SimpleSetup/problem1/\n",
    "```\n",
    "\n",
    "## What is an idefix setup?\n",
    "\n",
    "Idefix consist of its main trunk (that you downloaded from github, located in `$IDEFIX_DIR/src`) and a user-specified setup, made of at least 3 files (see below). An idefix setup can be located anywhere on your disk. When we will configure and build idefix, we will do it *from the setup directory*. Hence, the main idefix trunk will be built against your setup, and an executable file will be created in your setup directory. Therefore, all of your coding, configuration, compilation and runs should happen in the setup directories proposed in this tutorial.\n",
    "\n",
    "In principle, there is *no need* to modify your idefix main trunk, and in particular in this tutorial, it will be left untouched. This separation between the user setup, and the idefix sources limits the risk that you break something fundamental in idefix. Moreover, it simplifies updates, as you just have to `git pull` new versions of idefix in `$IDEFIX_DIR`.\n",
    "\n",
    "For those familiar with the Pluto code, they should feel at home. Indeed, Idefix has been designed to simplify portability from Pluto, so several design features are recovered in idefix. Still, keep in mind that Pluto and Idefix are not the same code, even though they share several user-space properties.\n",
    "\n",
    "## The 3 main files of an idefix setup\n",
    "\n",
    "Every idefix setup is divided into 3 files: definitions.hpp, idefix.ini and setup.cpp.  Let's see what contains each file:\n",
    "\n",
    "- definitions.hpp contains preprocessor directives describing the number of dimensions, the equation of state and the geometry of the problem. Any modification of this file requires a recompilation. [More about definitions.hpp](https://idefix.readthedocs.io/latest/reference/definitions.hpp.html) \n",
    "- idefix.ini contains parameters read at *runtime* by idefix. It contains directives organized by blocks [...]. Notably, the domain size, resolution (in the [Grid] block), the Hydro solver, the time integrator, the boundary conditions, etc. There is no need to recompile if you change this file. [More about idefix.ini](https://idefix.readthedocs.io/latest/reference/idefix.ini.html).\n",
    "- setup.cpp contains the C++ code specific to your setup. At minimum it should contain a Setup constructor and a method to initialise the flow. [More about setup.cpp](https://idefix.readthedocs.io/latest/reference/setup.cpp.html)\n",
    "\n",
    "## About this problem\n",
    "\n",
    "This problems proposes to set up a simple Kelvin Helmholtz instability flow that consist of two layers of fluid moving in opposite directions.\n",
    "\n",
    "![alt text](SimpleSetup/img/flowScheme.png)\n",
    "\n",
    "The interface is designed with a weak initial perturbation that will grow because of the Kelvin-Hemholtz instability. We will assume the flow is periodic in $x$ and we will use outflow (i.e. non-reflective) boundary conditions in the $y$ and $z$ direction.\n",
    "\n",
    "In this problem, we have left some holes that you will have to fill with the documentation. These are identified by `## TBF ##` or `//TBF//` in the source code.\n",
    "\n",
    "## Your work\n",
    "### Define the boundary conditions\n",
    "\n",
    "In this setup, we want periodic boundary conditions in the $x$ direction, and outflow boundary conditions in the $y$ and $z$ directions. Edit `idefix.ini` to define your boundary conditions. The [documentation](https://idefix.readthedocs.io/latest/reference/idefix.ini.html#boundary-section) might be handy! \n",
    "\n",
    "### Read the flow velocity from idefix.ini\n",
    "\n",
    "In this setup, we want to vary the flow velocity without recompiling the code. Idefix allows you to define as many blocks and parameters as you wish in your input file (.ini). Here, we have defined a block `[Setup]` with our parameter `flowVelocity`. We should now fetch this parameter in our `setup.cpp` code.\n",
    "\n",
    "This is typically done in the setup constructor (`Setup::Setup` in setup.cpp), using the `Get` method that belongs to the `Input` class. Have a look at the [example](https://idefix.readthedocs.io/latest/reference/setup.cpp.html#example) provided in the user guide, and at the [documentation of the `Input::Get` method](https://idefix.readthedocs.io/latest/programmingguide.html#the-input-class) in the programming guide.\n",
    "\n",
    "### Define your initial conditions\n",
    "\n",
    "Our last task is to define our initial conditions. This is done in the `Setup::InitFlow` method. We have already\n",
    "prepared a loop on the domain for you, so you just have to fill the holes, knowing that $v_x=$ flowVelocity when $y\\gt y_{\\rm interface}$ and $v_x=-$ flowVelocity when $y\\lt y_{\\rm interface}$. Here again, the [documentation](https://idefix.readthedocs.io/latest/reference/setup.cpp.html#setup-initflow-method) might help. \n",
    "\n",
    "### Configure the code, build and run it\n",
    "\n",
    "Follow the instruction in the [Getting Started](../GettingStarted/README.md#compile-an-example) section. Note that you can run your setup on a CPU or an GPU, choosing the right configuration with `Cmake`.\n",
    "\n",
    "### Check the outputs\n",
    "\n",
    "To visualize the flow, you may use Paraview or visit to open the files generated by Idefix. On Jureca, you can open the notebook `read_problem.ipynb` in `SimpleSetup/problem1` and execute it to visualize the result directly.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c65be8ac-3326-4a86-8bc7-a4a6e857ff66",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the last VTK file produced by Idefix\n",
    "V=readVTK(\"/p/project1/training2437/\"+user+\"/tera_day2/idefix/tutorial/SimpleSetup/problem1/data.0005.vtk\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a50dc53d-fe33-47a6-b16b-ff3f95962222",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display all fields\n",
    "for field in V.data.keys():\n",
    "  plt.figure(figsize=(10,4))\n",
    "  plt.pcolormesh(V.x,V.y,V.data[field][:,:,0].T)\n",
    "  plt.title(field+ \" @ t=%f\"%V.t)\n",
    "  plt.colorbar()\n",
    "  plt.gca().set_aspect('equal')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e02178ec-c8de-4673-86ff-925d9015d157",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute vorticity\n",
    "wz = np.gradient(V.data['VX2'],V.x,axis=0)-np.gradient(V.data['VX1'],V.y,axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ce297fa-4720-458e-993d-ab466e663bf7",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,4))\n",
    "plt.pcolormesh(V.x,V.y,wz[:,:,0].T)\n",
    "plt.title(r\"$\\omega_z$ @ t=%f\"%V.t)\n",
    "plt.colorbar()\n",
    "plt.gca().set_aspect('equal')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a4e1050-1a25-46b6-b2d8-edd9ace6ea2d",
   "metadata": {},
   "source": [
    "### Change resolution\n",
    "\n",
    "Try to compile the code for a single GPU (without MPI). The initial resolution is $256\\times 64$. Edit `idefix.ini` and try to increase the resolution to $512\\times 128$ and $1024\\times 256$ and look at the final performances. You should observe a strong dependence on the problem size. This is because for GPU computing to be efficient, you need to keep busy the 1000s of compute core of your GPU. This typically means that you need at least about 1 million cells on modern GPUs for Idefix to reach its full efficiency.\n",
    "\n",
    "These are the typical performances measured on Jureca on a single Nvidia A100 GPU:\n",
    "- $256\\times 64$: 6.1e7 cell/s\n",
    "- $512\\times 128$: 2.2e8 cell/s\n",
    "- $1024\\times 256$: 5.9e8 cell/s\n",
    "- $2048\\times 512$: 7.5e8 cell/s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "151575cc-1b9c-4e8f-9cfd-3b4dd683cb76",
   "metadata": {},
   "source": [
    "### Restart the simulation\n",
    "\n",
    "The setup produces restart dumps on a regular basis (here every $\\Delta T=10$). You can restart a simulation with the `-restart` option on the command line. To restart from a specific dump file (say the one produced at $t=10$:\n",
    "```shell\n",
    "srun -n 1 ./idefix -restart 1\n",
    "```\n",
    "\n",
    "In order to restart from the latest produced dump, you can simply ommit the dump file number:\n",
    "```shell\n",
    "srun -n 1 ./idefix -restart\n",
    "```\n",
    "\n",
    "Note that restart dump are inter-compatible between GPU and CPU. So a run started on a CPU can be restarted on a GPU. Similarly, the dumps are MPI-agnostic. Hence, one can change the number of MPI processes between at restart."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7b58917-7d3e-48fc-9ce3-007f52aae530",
   "metadata": {},
   "source": [
    "### Play with your setup\n",
    "\n",
    "Now, you can increase (without recompiling!) the flow speed beyond the sound speed (here =1.0 in the setup)\n",
    "to see the effect on compressibility. You can also try to use [parallelism with MPI](#mpi) to speed up the computation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16d63439-77e0-4c7b-852a-f12abd705834",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 5- A more advanced setup\n",
    "\n",
    "## Before we start\n",
    "\n",
    "\n",
    "In this tutorial we will introduce several important aspects hidden in the Simple Setup tutorial: Host and Device memory space, the `idefix_loop` construct and the tricks associated with it.\n",
    "\n",
    "This tutorial is not intended to duplicate Idefix documentation. It is strongly recommended to read the introduction in the programming guide regarding [Host and device](https://idefix.readthedocs.io/latest/programmingguide.html#host-and-device), [Arrays](https://idefix.readthedocs.io/latest/programmingguide.html#arrays) and [Loops](https://idefix.readthedocs.io/latest/programmingguide.html#execution-space-and-loops).\n",
    "\n",
    "The tutorial can be executed on CPU or on GPU, with or without MPI enabled. Feel free to experiment any combination depending on your level of expertise.\n",
    "\n",
    "\n",
    "For now, lets move to the problem directory\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_ROOT/tutorial/AdvancedSetup/problem1\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d92b56c2-0bfb-4c5e-802b-59d088862575",
   "metadata": {},
   "source": [
    "## The problem\n",
    "Our goal is to make a complete planet-disk interaction problem, where we will progressively add more complexity by making our own boundary condition, add a planet, include a passive tracer and even dust grains.\n",
    "\n",
    "The setup provided in problem 1 is a simple 2D Keplerian disk in polar coordinates ($x_1=R,x_2=\\phi$). The initial conditions are already written and defines a surface density profile $\\Sigma=100 R^{-1}$ (the code does not include units). We have also defined the disk aspect ratio $h_0\\equiv H/R=c_s/V_k$, which is read by the Setup constructor from idefix.ini. Here, $V_k$ is the Keplerian velocity, which reads with our units (central Mass=1), $V_K=1/\\sqrt{R}$.\n",
    "\n",
    "## Define the sound speed\n",
    "\n",
    "We assume the flow is locally isothermal, meaning that we assume the temperature (and therefore the sound speed, since $T\\propto c_s^2$) at each radius $R$ is fixed. As you can see in `idefix.ini`, in the [Hydro] block, we have said that the sound speed is user-defined. This is because we want to tell idefix explicitely which function it should use for the sound speed. To define this sound speed, we are going to assume that the disk aspect ratio is constant, so that $c_s=h_0 V_K=h_0/\\sqrt{R}$. \n",
    "\n",
    "To do this, we have already started to write a function `MySoundSpeed` in `setup.cpp`, in which we have already gathered the radial coordinate array ($x_1$) and the aspect ratio ($h_0$). The goal of this function is to fill an idefixArray `cs` (that appears as a parameter of this function) that idefix will use to get the sound speed at each point.\n",
    "\n",
    "### Your first idefix_for\n",
    "Your first task is to invoke an `idefix_for` (read the [doc](https://idefix.readthedocs.io/latest/programmingguide.html#execution-space-and-loops)!). This idefix for should cover the entire domain (or sub-domain if using MPI) of the simulation.\n",
    "The domain extends from $0$ to `data.np_tot[IDIR]` in the x1 direction, $0$ to `data.np_tot[JDIR]` in the x2 direction, and $0$ to `data.np_tot[KDIR]` in the x3 direction (see the [dataBlock documentation](https://idefix.readthedocs.io/latest/programmingguide.html#execution-space-and-loops)). Note that in idefix, it is cusommary to use first the `k` index running on `x3`, then `j` running on `x2` etc. This is to ensure that the fastest running index is always `i` and is also the last  index of every array for optimal performance.\n",
    "\n",
    "### Fill the cs(k,j,i) array and compile\n",
    "Next, you should fill the array `cs(k,j,i)` with the expression we want for the sound speed. This will be the core of your \"compute kernel\", i.e. the code that will be effectively executed by the device. Here, and very often in idefix, this kernel is defined as a `KOKKOS_LAMBDA`, which is a simple, inlined way to define a function to be executed by the device. \n",
    "\n",
    "At this point, you can try to configure (`cmake $IDEFIX_DIR`) and compile the code (`make -j 8`), and it should build properly. However, if you run it, you will get an error message. The reason is simple: we have defined a function to compute the sound speed array, we have told idefix that we were going to use a user-defined function, but we have not said *where* was this function! This is the role played by enrollment.\n",
    "\n",
    "### Function enrollment\n",
    "\n",
    "The Enrollment is a very common thing in idefix: each time you define a \"user-defined\" function, you need to Enroll it: i.e. tell idefix where it is. For computer people, this can be considered as linking your functions at runtime to the main idefix code.\n",
    "\n",
    "Enrollment is usually done in the Setup constructor. Since we have created a function to compute the sound speed, you should enroll it using the `EnrollIsoSoundSpeed` function. You can find the right line commented in the Setup constructor. Uncomment it and recompile the code. It should now run well.\n",
    "\n",
    "### First visualisation of your result\n",
    "\n",
    "The code will integrate the equations of motion for 3 orbital period at R=1. You can visualize the flow with paraview, or with the python blocks below. You should see a weak axisymmetric wave propagating outwards from the inner boundary condition. We will come back to this inner boundary condition later. Next, we add a planet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fc45e5c-f6a0-4d7a-8649-12221b4bcf08",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read output file after 3 orbits\n",
    "\n",
    "V=readVTK(\"/p/project1/training2437/\"+user+\"/tera_day2/idefix/tutorial/AdvancedSetup/problem1/data.0003.vtk\")\n",
    "# define cartesian coordinates from radius (xl) and azimuth (yl) in the VTK file\n",
    "x=V.xl[:,None]*np.cos(V.yl[None,:])\n",
    "y=V.xl[:,None]*np.sin(V.yl[None,:])\n",
    "\n",
    "# plot the dataset\n",
    "for field in V.data.keys():\n",
    "  plt.figure()\n",
    "  plt.pcolormesh(x,y,V.data[field][:,:,0])\n",
    "  plt.title(field+ \" @ t=%f\"%V.t[0])\n",
    "  plt.colorbar()\n",
    "  plt.gca().set_aspect('equal')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac8430ef-0d05-4f8e-9413-25c0cc9574b3",
   "metadata": {},
   "source": [
    "## Add a planet\n",
    "\n",
    "To add a planet, we must modify the gravitational potential. This can be done by writing your own gravitational potential. But you're in luck, because planet interaction problems are so common, this has already been done for you. You will be able to add your planet without recompiling the code!\n",
    "\n",
    "Everything happens in idefix.ini: first, you must add the planet to the gravitationnal potential. In the `[Gravity]` block, in the entry `potential` add `planet` next to `central`. This way, idefix will understand that you want planets in addition to the central potential.\n",
    "\n",
    "Next, we must describe our planets. This is done in the `[Planet]`block. Have a look at the dedicated [documentation of the planet module](https://idefix.readthedocs.io/latest/modules/planet.html). For now, you can simply uncomment the proposed configuration for our (single!) planet in `idefix.ini`.\n",
    "\n",
    "Now you can re-run the code (no need to recompile) and tada! here comes our planet.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7dde0def-5ba5-46ad-bf7c-61a7b2834599",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read output file after 3 orbits\n",
    "\n",
    "V=readVTK(\"/p/project1/training2437/\"+user+\"/tera_day2/idefix/tutorial/AdvancedSetup/problem1/data.0003.vtk\")\n",
    "# define cartesian coordinates from radius (xl) and azimuth (yl) in the VTK file\n",
    "x=V.xl[:,None]*np.cos(V.yl[None,:])\n",
    "y=V.xl[:,None]*np.sin(V.yl[None,:])\n",
    "\n",
    "# plot the dataset\n",
    "for field in V.data.keys():\n",
    "  plt.figure()\n",
    "  plt.pcolormesh(x,y,V.data[field][:,:,0])\n",
    "  plt.title(field+ \" @ t=%f\"%V.t[0])\n",
    "  plt.colorbar()\n",
    "  plt.gca().set_aspect('equal')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f481d9b-d57b-4c9b-871c-b853156dbd59",
   "metadata": {},
   "source": [
    "## Fix the radial boundaries\n",
    "\n",
    "As pointed out above, our radial boundary conditions are \"outflow\". This is partly incorrect because we are in a Keplerian disc, so expect $v_\\phi$ to be close to Keplerian, while the outflow copies the last active zone value into the ghost zones. Hence, a better boundary would be to copy the last active zone except for $v_\\phi$, where we would like the flow to be Keplerian in the ghost zones. This means that we need to code our own boundary conditions.\n",
    "\n",
    "The first step is to modify the `[Boundary]` block in `idefix.ini`, to say that you want to use `userdef` boundaries in the X1 directions (at both ends). From this point, idefix will expect you to enroll a function to define the boundaries.\n",
    "\n",
    "We then move to `setup.cpp`. You will see that we have already defined a function `UserdefBoundary`, that needs to be completed. So we first enroll this function as a boundary function in the `Setup` constructor (by now, you should know how to do this). Next, we need to define what's happening in the ghost zones of our domain in the `Userdefboundary` function.\n",
    "\n",
    "In order to simplify your life, idefix comes with pre-defined loops on the boundaries, called `BoundaryFor`. These loops are identical to `idefix_for`, but they automatically define their bounds according to the direction and side of the boundary condition to be defined. Our plan is therefore to apply our new boundary conditions by copying the first active zone (with index `iref`) to the ghost zones, except for $v_\\phi$ (`VX2`), for which we want to impose a Keplerian profile.\n",
    "\n",
    "Once done, you have to recompile and re-run the code. You should see a slight improvement of the solution at the radial boundaries.\n",
    "\n",
    "## Restarting/stopping the code\n",
    "\n",
    "As any code, idefix can restart a simulation that have been saved to disk. This is done using the .dmp files (called a dump file), a format specific to Idefix, that contains all of the variables at the precision required during the code configuration. These dump files do not depend on the architecture, you can restart with your favourite GPU a run that started on a GPU. Similarly, you can change the domain decomposition and/or enable/disable MPI altogether. You can also remove/add some physics at restart. The only thing fixed is the total domain resolution and extent: idefix doesn't interpolate from a dump.\n",
    "\n",
    "Our current setup produces one dump every orbit (check the `[Output]` block in idefix.ini). In order to restart from a dump, use the `-restart n` option when calling idefix, where `n` is the dump file number, e.g.\n",
    "\n",
    "```shell\n",
    "./idefix -restart 1\n",
    "```\n",
    "\n",
    "Note that if you omit the dump number `n`, idefix will automatically restart from the latest produced dump file. This can be handy when your run time is limited by the job scheduling of your cluster.\n",
    "\n",
    "It is possible to nicely stop the code while running. Just go to the directory where the code has been launched from, and create an empty `stop` file:\n",
    "\n",
    "```shell\n",
    "cd <running_idefix_dir>\n",
    "touch stop\n",
    "```\n",
    "This automatically makes a dump and stops the code. It is also possible to set a [maximum runtime](https://idefix.readthedocs.io/latest/reference/idefix.ini.html#timeintegrator-section) in `idefix.ini`, and stop the code using [POSIX signals](https://idefix.readthedocs.io/latest/reference/commandline.html#signal-handling). \n",
    "\n",
    "## Add a tracer\n",
    "\n",
    "A passive tracer, or a scalar, is a quantity $\\sigma$ that follows the equation\n",
    "$$\n",
    "\\partial_t \\sigma+v\\cdot \\nabla\\sigma=0\n",
    "$$\n",
    "where $v$ is the fluide velocity. Idefix support an arbitrary number of tracers, on every fluid (i.e. gas and dust).\n",
    "\n",
    "To enable a single tracer to our gas, just add a `tracer` entry in the `[hydro]` block with `1` as a parameter (meaning 1 tracer).\n",
    "\n",
    "This work out of the box, but since we have not defined the initial distribution of our tracer, it will be useless. Therefore, we also have to provide an initial condition for this tracer. In the `InitFlow` function of your setup, you will see that you have commented the initial condition for the tracer. Your task is to uncomment this and define a tracer that is initially 0 for the material inside the planet orbit, and 1 for the material outside of the planet orbit. This will allow us to trace how much material from the outer disc manage to cross the planet orbit.\n",
    "\n",
    "Once done, you will have to recompile, run the code, and check what's happening at your tracer.\n",
    "\n",
    "## Add dust grains\n",
    "\n",
    "The current public version of Idefix can only treat dust grains as a zero pressure fluid. Idefix can treat an arbitrary number dust fluid, each one representing a dust size.\n",
    "\n",
    "In order to set up a configuration, you will need 3 modifications:\n",
    "- enable dust grains in `idefix.ini`\n",
    "- define the initial condition for each dust fluid (=size)\n",
    "- implement the radial boundary condition for the dust fluid, as we did for the gas.\n",
    "\n",
    "### Enable dust grains in idefix.ini\n",
    "\n",
    "For this, we need to create a new [Dust] block. A block like this is already there and commented in idefix.ini. The only mandatory in this block is the number of dust species, that we set to 1. We additionnally enable a drag force between the dust and the gas. This drag force assumes a constant stopping time, which is kind of incorrect, but is a good first approximation. We set the stopping to 1 code unit. More information on the dust module and its drag force [in the documentation](https://idefix.readthedocs.io/latest/modules/dust.html).\n",
    "\n",
    "### Define the initial conditions\n",
    "\n",
    "As for the tracer, we need to define the initial conditions for the dust. As usual, this is in the initFlow method, where you will see that we have commented the fields linked to the first dust specie. Each dust specie is identified by an index (here `0`), and has a density and a velocity, exactly like the gas (computationnaly speaking, both are described by the same `Fluid` template class). Fill in the initial conditions for the dust so that it has the same velocity as the gas, with a density equal to 1/100 the gas density.\n",
    "\n",
    "### implement the radial boundary conditions\n",
    "\n",
    "Since the dust is a new kind of fluid, it requires its own boundary conditions. We have already prepared a function for this. You just need to uncomment it and fill in the boundary conditions. Let's use the same conditions as the gas for now. Do not forget to enroll your new function in the constructor!\n",
    "\n",
    "From this point, you can let the code execute. You will probably observe that some regions becomes dust-free while some others tend to be pressure traps. Congratulations, you have finished this tutorial!\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0db2593-d0c6-4779-9dfd-5da4ba886d75",
   "metadata": {},
   "source": [
    "# 6- Programming in Idefix: Debugging and profiling\n",
    "\n",
    "## Problem1: a CPU segmentation fault\n",
    "\n",
    "### Base run\n",
    "The first problem is a simple 1D shock tube problem. This can be compiled and run on your laptop or on Jureca.\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_ROOT/tutorial/Debugging/problem1\n",
    "```\n",
    "\n",
    "We then configure, compile and run the code\n",
    "```shell\n",
    "cmake $IDEFIX_DIR\n",
    "make -j 8\n",
    "./idefix\n",
    "```\n",
    "\n",
    "This should give you the typical segfault message:\n",
    "```shell\n",
    "Input: Compiled with DOUBLE PRECISION arithmetic.\n",
    "Input: DIMENSIONS=1.\n",
    "Input: COMPONENTS=1.\n",
    "Grid: full grid size is \n",
    "         Direction X1: outflow  0....500....1   outflow\n",
    "Hydro: solving HD equations.\n",
    "Hydro: Reconstruction: 2nd order (PLM Van Leer)\n",
    "EquationOfState: ideal with gamma=1.4\n",
    "RiemannSolver: roe (HD).\n",
    "TimeIntegrator: using 2nd Order (RK2) integrator.\n",
    "TimeIntegrator: Using adaptive dt with CFL=0.8 .\n",
    "Main: Creating initial conditions.\n",
    "Segmentation fault\n",
    "```\n",
    "\n",
    "\n",
    "### Track down the bug with Idefix_DEBUG\n",
    "\n",
    "We first enable `Idefix_DEBUG` during the configuration phase:\n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DIdefix_DEBUG=ON\n",
    "```\n",
    "\n",
    "then recompile and run\n",
    "```shell\n",
    "make -j 8\n",
    "./idefix\n",
    "```\n",
    "\n",
    "As you can see, `Idefix_DEBUG` allows one to track what's happening in the code. This is based on the functions `idfx::pushRegion()` and `idfx::popRegion()` embedded in the code.\n",
    "\n",
    "### Use Kokkos bound check to nail it down\n",
    "\n",
    "When facing a segmentation fault on CPU, the first thing to check\n",
    "is that you're not trying to read/write outside of an allocated array. This is not possible in standard C++, but it is possible thanks to Kokkos for every `IdefixArray`.\n",
    "\n",
    "To enable this bound check, add the option to cmake during configuration:\n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DIdefix_DEBUG=ON -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON\n",
    "```\n",
    "\n",
    "then recompile and run\n",
    "```shell\n",
    "make -j 8\n",
    "./idefix\n",
    "```\n",
    "\n",
    "Now, instead of a segmentation fault, you should see an exception raised by Kokkos. In particular, we're accessing an array outside of its bounds. If you now use the debugger as above, you will see which line in `setup.cpp` Kokkos is complaining about. Can you see now the mistake?\n",
    "\n",
    "<details><summary>Solution</summary>\n",
    "\n",
    "The for loops in `Setup::Initflow` have `np_tot` elements in each direction, hence the for loops should read (note the `<` instead of `<=`):\n",
    "\n",
    "```c++\n",
    "for(int k = 0; k < d.np_tot[KDIR] ; k++) {\n",
    "    for(int j = 0; j < d.np_tot[JDIR] ; j++) {\n",
    "        for(int i = 0; i < d.np_tot[IDIR] ; i++) {\n",
    "```\n",
    "\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "\n",
    "## Problem2: a GPU segmentation fault\n",
    "\n",
    "### Base run\n",
    "The second problem is a pure thermal diffusion problem where the gas is kept fixed with 0 velocity. This can be compiled and run *on your laptop*.\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_ROOT/tutorial/Debugging/problem2\n",
    "```\n",
    "\n",
    "We then configure, compile and run the code\n",
    "```shell\n",
    "cmake $IDEFIX_DIR\n",
    "make -j 8\n",
    "./idefix\n",
    "```\n",
    "\n",
    "And this runs beaufiully, congrats!\n",
    "\n",
    "Now, let's run this on a GPU. First follow the procedure describe in the [GPU tutorial](../GettingStarted/RunningOnGPUs.md) to configure and compile problem2 with cuda (hint: `cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON`) and run...\n",
    "\n",
    "...and?\n",
    "\n",
    "This is a typical example of a code that runs fine on a cpu but fails on GPU. These are very common problems that are also usually difficult to debug. Let's see how to proceed.\n",
    "\n",
    "### Let's debug this\n",
    "\n",
    "As for problem 1, the first step is to enable the debugging in Idefix. To do this, let's call cmake again\n",
    "\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DIdefix_DEBUG=ON -DKokkos_ENABLE_CUDA=ON\n",
    "```\n",
    "then recompile and run\n",
    "```shell\n",
    "make -j 4\n",
    "./idefix\n",
    "```\n",
    "At this point, we see that an error occurs in a `idefix_for` loop named ``InternalBoundary`` in the function Boundary::UserDefInternalBoundary. The kernel name is the first parameter used in each ``idefix_for``: now you see why it's important to give maningful names!\n",
    "\n",
    "This ``idefix_for`` is localised in setup.cpp, so you just have to find it, and possibly fix the problem !\n",
    "\n",
    "<details><summary>Solution</summary>\n",
    "\n",
    "The ``idefix_for`` loop contains a pointer to a fluid object (the variable ``hydro``). This pointer\n",
    "is an argument of the function ``InternalBoundary``, hence it's a pointer in CPU memory. When the GPU runs it uses this pointer to find the array ``Vc`` but it can't find it, because it points to CPU memory, not GPU memory!\n",
    "\n",
    "A way to fix this is to do copies of everything you need locally before calling ``idefix_for``. This rule should always been followed, as it solves 95% of the bugs. Here we can do:\n",
    "\n",
    "```c++\n",
    "  void InternalBoundary(Fluid<DefaultPhysics> * hydro, const real t) {\n",
    "    // We shallow copy Vc locally first using the pointer in CPU memory space.\n",
    "    IdefixArray4D<real> Vc = hydro->Vc;\n",
    "    idefix_for(\"InternalBoundary\",0,hydro->data->np_tot[KDIR],\n",
    "                                  0,hydro->data->np_tot[JDIR],\n",
    "                                  0,hydro->data->np_tot[IDIR],\n",
    "                KOKKOS_LAMBDA (int k, int j, int i) {\n",
    "                  // Here we live in GPU memory, so pointers to CPU memory are forbidden\n",
    "                  Vc(VX1,k,j,i) = 0.0;\n",
    "                  Vc(VX2,k,j,i) = 0.0;\n",
    "                  Vc(VX3,k,j,i) = 0.0;\n",
    "                });\n",
    "  }\n",
    "```\n",
    "\n",
    "Note that the copy we do here on the first line is just a shallow copy. The memory content of\n",
    "``Vc`` hasn't moved and hasn't been duplicated. We just duplicate the *reference* to the memory\n",
    "content.\n",
    "\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "## Problem 3: GPU segmentation fault\n",
    "\n",
    "Problem 3 is a disk+planet problem. It introduces the concept of additional source files, that are added to Idefix using the ``add_idefix_source`` function in the `CMakeLists.txt` of the setup (check it out). Here, the additional source files defines a new class that compute the sound speed at every point.\n",
    "\n",
    "Follow the same procedure as for problem 2: configure, compile and run it on your laptop and then on the GPU of your choice. Follow the same debugging tracks as problem 3 and try to nail it down. Can you find where the error is?\n",
    "\n",
    "<details><summary>Explanation</summary>\n",
    "\n",
    "As you can see with the Kernel logger, the problem is clearly in the ``idefix_for`` called in ``SoundSpeed::Compute``. The problem is actually due to the variables ``Rcoord`` (a 1D ``IdefixArray``) and ``h0`` (a simple scalar). These variables are not defined in the function ``Compute`` but are instead member variables of the class ``SoundSpeed``. From the compiler point of view, these member variables are always accessed through the pointer ``this->`` that point to the current object. Hence, in this particular example, the compiler expands our ``idefix_for`` as:\n",
    "\n",
    "```c++\n",
    "\n",
    "  void SoundSpeed::Compute(IdefixArray3D<real> &cs) {\n",
    "  idfx::pushRegion(\"SoundSpeed::Compute\");\n",
    "  idefix_for(\"MySoundSpeed\",0,np_tot[KDIR],0,np_tot[JDIR],0,np_tot[IDIR],\n",
    "              KOKKOS_LAMBDA (int k, int j, int i) {\n",
    "                real R = this->Rcoord(i);\n",
    "                cs(k,j,i) = this->h0/sqrt(R);\n",
    "              });\n",
    "  idfx::popRegion();\n",
    "}\n",
    "```\n",
    "\n",
    "Now you clearly see the problem: the ``this->`` pointer, that point to the current object, is in CPU space, so the GPU can't find the variable we need. Can you find a way to fix this?\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "<details><summary>Solution</summary>\n",
    "\n",
    "The solution is the same as for problem2: just do shallow copies:\n",
    "\n",
    "```c++\n",
    "\n",
    "  void SoundSpeed::Compute(IdefixArray3D<real> &cs) {\n",
    "  idfx::pushRegion(\"SoundSpeed::Compute\");\n",
    "  IdefixArray1D<real> Rcoord = this->Rcoord;\n",
    "  real h0 = this->h0;\n",
    "  idefix_for(\"MySoundSpeed\",0,np_tot[KDIR],0,np_tot[JDIR],0,np_tot[IDIR],\n",
    "              KOKKOS_LAMBDA (int k, int j, int i) {\n",
    "                real R = Rcoord(i);   // We're now using a local copy\n",
    "                cs(k,j,i) = h0/sqrt(R); // Same for h0\n",
    "              });\n",
    "  idfx::popRegion();\n",
    "}\n",
    "```\n",
    "\n",
    "\n",
    "This kind of bug is very common and very hard to track down sometimes. Actually, there are entire discussions about this [on the Kokkos repo](kokkos/kokkos#695)... It turns out it is a defect of the C++ standard. Another workaround is to use ``KOKKOS_CLASS_LAMBDA`` instead of ``KOKKOS_LAMBDA``. This however copies the entire class content onto the GPU, which can therefore lead to a large overhead, and is therefore not recommended for general applications.\n",
    "\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "## Problem 4: a low performance bug.\n",
    "\n",
    "Let's move to problem 4, which is again a planet-disk interraction problem. This can be compiled and run *on your laptop* or on the Jureca cluster, but let's focus for now on the GPU version on the Jureca cluster (you can try to do the exercise on your laptop). First go to the right directory\n",
    "\n",
    "```shell\n",
    "cd $IDEFIX_ROOT/tutorial/Debugging/problem4\n",
    "```\n",
    "\n",
    "We then configure\n",
    "```shell\n",
    "cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON\n",
    "```\n",
    "\n",
    "Then compile and run.\n",
    "```shell\n",
    "make -j 4\n",
    "./idefix\n",
    "```\n",
    "At this point, Idefix should run fine and finishes. While we could be satisfied, it's always a good idea to check the code performances, shown in the column cell update/s. This quantifies how many grid cells the code is able to update per second. Note that this number is for the whole code: if you are using MPI, the number of cell update per second should be proportional to the number of MPI processes.\n",
    "\n",
    "In this particular case, we see that we get a few 1e7 cell updates/s on a single GPU. That's low: if you look at the [Idefix paper](https://ui.adsabs.harvard.edu/abs/2023arXiv230413746L/abstract), you'll see that we typically get at least 1e8 cell/sec on a single Nvidia V100 (that's about 4e8 cell/sec on a full node with 4 V100, see tables 3 & 4), and the test in the paper is 3D MHD cartesian. Our problem is 2D and hydro, so it should be more than this.\n",
    "\n",
    "There are several reasons why Idefix could be slower: more complex physics (not quite applicable here), and a too small domain size for each GPU, which is not sufficient to feed all of the computational units of the GPU (reminder: there are 1000s of computational unit in a single V100). Here, the resolution is 1024^2 (more than 1e6 cells), that is equivalent to a 100^3 3D problem. This should be largely sufficient to feed a V100, so we clearly have a problem.\n",
    "\n",
    "### Tracking down performance issue: on-the-fly profiling\n",
    "\n",
    "While there are vendor-specific tools (like Nvidia systems), Idefix seeks portability. It turns out that Idefix provides its own profiling tools: the space time stack. To use it, no need to recompile, just add the `-profile` option when you call the executable\n",
    "\n",
    "```shell\n",
    "./idefix -profile\n",
    "```\n",
    "\n",
    "Now we you have all of the information about what the code is doing and where it's spending its time. Note *en passant* that the name of the regions is the one provided by ``idfx::pushRegion``. So all these strings that are provided in the code turns out really useful!\n",
    "\n",
    "  From this inspection, can you tell what is the problem?\n",
    "\n",
    "<details><summary>Analysis of the bug</summary>\n",
    "\n",
    "As you can see in the space-time stack, the code spends a lot of time in the user-defined analysis function, and in particular in the Host copy of the datablock. That's a typical example where you see that transferring data from the GPU to the CPU is actually relatively slow. Now that we have understood that the code spends a lot of time in the analysis function, can you find an easy fix to this?\n",
    "\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "<details><summary>Solution</summary>\n",
    "\n",
    "If you inspect `idefix.ini`, you will see that the entry ``analysis`` of the block ``[Output]`` is set to 0. This means that idefix will run the user-defined analysis at each time step. That's probably not what was intended, so the best thing to do is to put a non-zero number to ``analysis``, like 0.01. After this, check that you recover the expected performance!\n",
    "</p>\n",
    "</details>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a09fb8a4-9723-4671-babb-44b51dd98274",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}