Skip to content

Commit

Permalink
Deployed f2ab068 with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Apr 9, 2024
1 parent f2ef754 commit 0264608
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 9 deletions.
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

22 changes: 22 additions & 0 deletions services/gpuservice/faq/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1278,6 +1278,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#access-to-gpu-service-resources-in-default-namespace-is-forbidden" class="md-nav__link">
<span class="md-ellipsis">
Access to GPU Service resources in default namespace is 'Forbidden'
</span>
</a>

</li>

<li class="md-nav__item">
Expand Down Expand Up @@ -2216,6 +2225,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#access-to-gpu-service-resources-in-default-namespace-is-forbidden" class="md-nav__link">
<span class="md-ellipsis">
Access to GPU Service resources in default namespace is 'Forbidden'
</span>
</a>

</li>

<li class="md-nav__item">
Expand Down Expand Up @@ -2301,6 +2319,10 @@ <h3 id="how-do-i-access-the-gpu-service">How do I access the GPU Service?</h3>
<p>The default access route to the GPU Service is via an EIDF DSC VM. The DSC VM will have access to all EIDF resources for your project and can be accessed through the VDI (SSH or if enabled RDP) or via the EIDF SSH Gateway.</p>
<h3 id="how-do-i-obtain-my-project-kubeconfig-file">How do I obtain my project kubeconfig file?</h3>
<p>Project Leads and Managers can access the kubeconfig file from the Project page in the Portal. Project Leads and Managers can provide the file on any of the project VMs or give it to individuals within the project.</p>
<h3 id="access-to-gpu-service-resources-in-default-namespace-is-forbidden">Access to GPU Service resources in default namespace is 'Forbidden'</h3>
<div class="highlight"><pre><span></span><code>Error<span class="w"> </span>from<span class="w"> </span>server<span class="w"> </span><span class="o">(</span>Forbidden<span class="o">)</span>:<span class="w"> </span>error<span class="w"> </span>when<span class="w"> </span>creating<span class="w"> </span><span class="s2">&quot;myjobfile.yml&quot;</span>:<span class="w"> </span><span class="nb">jobs</span><span class="w"> </span>is<span class="w"> </span>forbidden:<span class="w"> </span>User<span class="w"> </span>&lt;user&gt;<span class="w"> </span>cannot<span class="w"> </span>create<span class="w"> </span>resource<span class="w"> </span><span class="s2">&quot;jobs&quot;</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>API<span class="w"> </span>group<span class="w"> </span><span class="s2">&quot;&quot;</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span>namespace<span class="w"> </span><span class="s2">&quot;default&quot;</span>
</code></pre></div>
<p>Some version of the above error is common when submitting jobs/pods to the GPU cluster using the kubectl command. This arises when you forgot to specify you are submitting job/pods to your project namespace, not the "default" namespace which you do not have permissions to use. Resubmitting the job/pod with <code>kubectl -n &lt;project-namespace&gt; create "myjobfile.yml"</code> should solve the issue.</p>
<h3 id="i-cant-mount-my-pvc-in-multiple-containers-or-pods-at-the-same-time">I can't mount my PVC in multiple containers or pods at the same time</h3>
<p>The current PVC provisioner is based on Ceph RBD. The block devices provided by Ceph to the Kubernetes PV/PVC providers cannot be mounted in multiple pods at the same time. They can only be accessed by one pod at a time, once a pod has unmounted the PVC and terminated, the PVC can be reused by another pod. The service development team is working on new PVC provider systems to alleviate this limitation.</p>
<h3 id="how-many-gpus-can-i-use-in-a-pod">How many GPUs can I use in a pod?</h3>
Expand Down
29 changes: 21 additions & 8 deletions services/gpuservice/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2236,7 +2236,7 @@ <h1 id="overview">Overview</h1>
<p>The service provides access to:</p>
<ul>
<li>Nvidia A100 40GB</li>
<li>Nvidia 80GB</li>
<li>Nvidia A100 80GB</li>
<li>Nvidia MIG A100 1G.5GB</li>
<li>Nvidia MIG A100 3G.20GB</li>
<li>Nvidia H100 80GB</li>
Expand Down Expand Up @@ -2265,13 +2265,20 @@ <h1 id="overview">Overview</h1>
Please see <a href="training/L1_getting_started/">Getting started with Kubernetes</a> to learn about specifying GPU resources.</p>
</blockquote>
<h2 id="service-access">Service Access</h2>
<p>Users should have an <a href="../../access/project/">EIDF Account</a>.</p>
<p>Project Leads will be able to request access to the EIDF GPU Service for their project either during the project application process or through a service request to the EIDF helpdesk.</p>
<p>Each project will be given a namespace to operate in and the ability to add a kubeconfig file to any of their Virtual Machines in their EIDF project - information on access to VMs is available <a href="../../access/virtualmachines-vdi/">here</a>.</p>
<p>All EIDF virtual machines can be set up to access the EIDF GPU Service. The Virtual Machine does not require to be GPU-enabled.</p>
<p>Users should have an <a href="../../access/project/">EIDF Account</a> as the EIDF GPU Service is only accessible through EIDF Virtual Machines.</p>
<p>Existing projects can request access to the EIDF GPU Service through a service request to the <a href="https://portal.eidf.ac.uk/queries/submit">EIDF helpdesk</a> or emailing eidf@epcc.ed.ac.uk .</p>
<p>New projects wanting to using the GPU Service should include this in their EIDF Project Application.</p>
<p>Each project will be given a namespace within the EIDF GPU service to operate in.</p>
<p>This namespace will normally be the EIDF Project code appended with ’ns’, i.e. <code>eidf989ns</code> for a project with code 'eidf989'.</p>
<p>Once access to the EIDF GPU service has been confirmed, Project Leads will be give the ability to add a kubeconfig file to any of the VMs in their EIDF project - information on access to VMs is available <a href="../../access/virtualmachines-vdi/">here</a>.</p>
<p>All EIDF VMs with the project kubeconfig file downloaded can access the EIDF GPU Service using the kubectl command line tool.</p>
<p>The VM does not require to be GPU-enabled.</p>
<p>A quick check to see if a VM has access to the EIDF GPU service can be completed by typing <code>kubectl -n &lt;project-namespace&gt; get jobs</code> in to the command line.</p>
<p>If this is first time you have connected to the GPU service the response should be <code>No resources found in &lt;project-namespace&gt; namespace</code>.</p>
<div class="admonition important">
<p class="admonition-title">EIDF GPU Service vs EIDF GPU-Enabled VMs</p>
<p>The EIDF GPU Service is a container based service which is accessed from EIDF Virtual Desktop VMs. This allows a project to access multiple GPUs of different types.</p>
<p>The EIDF GPU Service is a container based service which is accessed from EIDF Virtual Desktop VMs.</p>
<p>This allows a project to access multiple GPUs of different types.</p>
<p>An EIDF Virtual Desktop GPU-enabled VM is limited to a small number (1-2) of GPUs of a single type.</p>
<p>Projects do not have to apply for a GPU-enabled VM to access the GPU Service.</p>
</div>
Expand All @@ -2285,11 +2292,17 @@ <h2 id="project-quotas">Project Quotas</h2>
<div class="admonition important">
<p class="admonition-title">Quota is a maximum on a Shared Resource</p>
<p>A project quota is the maximum proportion of the service available for use by that project.</p>
<p>During periods of high demand, Jobs will be queued awaiting resource availability on the Service.</p>
<p>This means that a project has access up to 12 GPUs but due to demand may only be able to access a smaller number at any given time.</p>
<p>Any submitted job requests that would exceed the total project quota will be queued.</p>
</div>
<h2 id="project-queues">Project Queues</h2>
<p>EIDF GPU Service is introducing the Kueue system in February 2024. The use of this is detailed in the <a href="kueue/">Kueue</a>.</p>
<div class="admonition important">
<p class="admonition-title">Job Queuing</p>
<p>During periods of high demand, jobs will be queued awaiting resource availability on the Service.</p>
<p>As a general rule, the higher the GPU/CPU/Memory resource request of a single job the longer it will wait in the queue before enough resources are free on a single node for it be allocated.</p>
<p>GPUs in high demand, such as Nvidia H100s, typically have longer wait times.</p>
<p>Furthermore, a project may have a quota of up to 12 GPUs but due to demand may only be able to access a smaller number at any given time.</p>
</div>
<h2 id="additional-service-policy-information">Additional Service Policy Information</h2>
<p>Additional information on service policies can be found <a href="policies/">here</a>.</p>
<h2 id="eidf-gpu-service-tutorial">EIDF GPU Service Tutorial</h2>
Expand Down
Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit 0264608

Please sign in to comment.