Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rsarm committed Jun 14, 2024
1 parent a764f3b commit 0e791a0
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 39 deletions.
32 changes: 15 additions & 17 deletions docs/source/authentication.rst
Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
Authentication
==============

A single Keycloak client is used for the authentication with both JupyterHub and FirecREST.
The authentication with both JupyterHub and FirecREST is done with a single Keycloak client.
As the user logs in JupyterHub, an access token and refresh token are obtained directly from Keycloak.
The access token is used for the authentication with both the hub and FirecREST.
Since access tokens are temporary, before doing any operation requiring authentication (such as FirecREST calls), they are refreshed using the refresh token, which have a longer lifetime.

For the Hub, the refreshing of the access token is done by the authenticator's `refresh_user method <https://github.com/eth-cscs/firecrestspawner/blob/4c5446ea4a77e44129c8eb822456effd6ceb9601/chart/f7t4jhub/files/jupyterhub-config.py#L66-L91>`_, which must be defined in the ``GenericOAuthenticatorCSCS`` class.
It's run time to time as needed depending on the ``c.Authenticator.auth_refresh_age`` parameter.

That's different for the spawner. Any time a firecrest function is called, a new client is created and used to run the command.
Now, any time a client is created, a new access token is requested with the refresh token.
That's done in the spawner itself and it's independent of the credential refreshing for the hub (except that the refresh token is obtained from the *authentication state* of the Hub - see section :ref:`auth-state`)

Right now, creating a new client always makes a request to refresh the access token, but the idea is to check if the current access token is expired or not, to reduce the number of refreshing requests.
Since access tokens are temporary, before doing any operation requiring authentication (such as a request to FirecREST), they are refreshed using the refresh token, which has a longer lifetime.

.. _auth-state:

Enabling the authentication state
---------------------------------
Enabling JupyterHub's authentication state
------------------------------------------

The access and refresh tokens are kept stored in JupyterHub's `authentication state <https://jupyterhub.readthedocs.io/en/stable/reference/authenticators.html#authentication-state>`_ dictionary.
From there, they are fetched by the spawner and passed to the FirecREST clients which submit, poll and cancel the jobs that run the JupyterLab servers.
By default, JupyterHub doesn't store the authentication state.
That must be enabled in the configuration
From there, they are fetched by the spawner and passed to the FirecREST clients.
JupyterHub doesn't store the authentication state by default.
That must be enabled in the configuration by setting

.. code-block:: Python
Expand All @@ -33,4 +24,11 @@ which in turns requires setting ``c.CryptKeeper.keys`` in the JupyterHub and the

Once that's done, there's only left to add to the configuration the settings for the authentication with keycloak and extend the ``oauthenticator``'s ``GenericOAuthenticator`` class to provide the ``refresh_user`` method, which takes care of refreshing the access token.

The access and refresh tokens are kept stored in JupyterHub's "authentication state"
Spawner authentication
----------------------

Any time a PyFirecREST function is called, a new client is created and used to run the command.
Then, any time a client is created, a new access token is requested with the refresh token.
That's done in the spawner itself and it's independent on the hub's system for refreshing credentials (except that the refresh token is obtained from the *authentication state* of the hub - see section :ref:`auth-state`)

At the time of writing, creating a new client always makes a request to refresh the access token, but the plan is to check if the current access token is expired or not, to reduce the number of refreshing requests.
48 changes: 26 additions & 22 deletions docs/source/deployment.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
Deployment
==========

The deployment has two sides:
Deploying JupyteHub has two components:

- The hub and the proxy:
Users connect to the hub (JupyterHub) and launch JupyterLab servers via FirecREST that will run in compute nodes of HPC clusters.
The proxy routes the communication from the browser to the hub or to the JupyterLab servers.
Besides access to the internet, no special requirements are necessary for where to run the the hub and the proxy.
- **The hub and the proxy**:
Users access the hub (JupyterHub), which is a multi-user platform from where Jupyter notebook servers are launched.
When using FirecRESTSpawner, they will be launched via FirecREST in compute nodes of HPC clusters.
The proxy routes the communication from the user's browser to the hub or to the JupyterLab servers.
Besides access to the internet and to the FirecREST server, no special requirements are necessary for the plaforms to run the the hub and the proxy.

- The JupyterLab user servers launched in compute nodes of HPC clusters:
These JupyterLab servers (also know as single-user servers within JupyterHub) come and go as users spawn or stop them.
- **The JupyterLab user servers launched in compute nodes of HPC clusters**:
The Jupyter notebook servers (also known as single-user servers) come and go as users spawn or stop them.
An installation of JupyterLab and other packages must be provided in the HPC cluster.
That could be a native installation somewhere in the system or provided as a container image.
This doesn't need FirecREST and it's not concerned either with JupyterHub's configuration nor the FirecREST's client credentials
This part of the deployment doesn't require FirecREST, and it's not concerned with either JupyterHub's configuration or FirecREST's client credentials.

Reference deployment at CSCS
----------------------------

At CSCS we run JupyterHub in Kubernetes and from there JupyterLab notebooks are launched via FirecREST ito different HPC clusters.
At CSCS we run JupyterHub in Kubernetes and from there JupyterLab servers are launched via FirecREST to different HPC clusters.
Each cluster has its own deployment, i.e its own JupyterHub server.

We deploy the hub and proxy using the `f7t4jhub <https://eth-cscs.github.io/firecrestspawner>`_ helm chart that we have prepared.
The chart has been written with CSCS' use case in mind, but we would be glad to make it more general if it were used in other sites.
You can give it a look in the `spawner's rpository <https://github.com/eth-cscs/firecrestspawner/tree/main/chart>`_ or search it from the command line with helm:
We deploy the JupyterHub using the `f7t4jhub <https://eth-cscs.github.io/firecrestspawner>`_ Helm chart.
The chart has been written with CSCS's use case in mind.
However, we have tried to make it general so it's not too difficult to use it in other sites.
The chart can be found in the `spawner's repository <https://github.com/eth-cscs/firecrestspawner/tree/main/chart>`_ or explpored from Helm's command line, ex:

.. code-block:: Shell
Expand All @@ -38,33 +40,35 @@ You can give it a look in the `spawner's rpository <https://github.com/eth-cscs/
In our deployments, both the hub and proxy run in their own pods, as that makes possible restarting the hub if needed (to apply a new configuration, for instance) without affecting users that have JupyterLab servers running.
As proxy, we use `configurable-http <https://github.com/jupyterhub/configurable-http-proxy>`_ from the container image ``quay.io/jupyterhub/configurable-http-proxy:4.6.1``.
For the hub, we use the custom container image ``ghcr.io/eth-cscs/f7t4jhub:4.1.5`` which contains JupyterHub and the FirecRESTSpawner.
As a proxy, we use JupyterHub's default `configurable-http-proxy <https://github.com/jupyterhub/configurable-http-proxy>`_ shipped with the container image ``quay.io/jupyterhub/configurable-http-proxy:4.6.1``.
For the hub, we use our container image ``ghcr.io/eth-cscs/f7t4jhub:4.1.5``, which has JupyterHub and the FirecRESTSpawner.
The corresponding Dockerfile can be found `here <https://github.com/eth-cscs/firecrestspawner/blob/main/dockerfiles/Dockerfile>`_.
JupyterHub's configuration and FirecREST's client credentials are passed via a Kubernetes ConfigMap and Secret, respectively.
JupyterHub's configuration and FirecREST's URL are passed via a Kubernetes ``ConfigMap`` and ``Secret``, respectively.
The following figure shows an schematic representation of the deployment:

.. image:: images/cscs-deployment.png
:alt: Company Logo
:width: 500px
:align: center

Keycloak setup at CSCS
^^^^^^^^^^^^^^^^^^^^^^

At CSCS, the Keycloak client's IDs and secrets to login in JupyterHub are stored in `Vault <https://www.vaultproject.io>`_.
They can be accessed in our kubernetes deployment via a set of secrets:

- The `vault-approle-secret` kubernetes `Secret`, which contains the credentials to access Vault.
- The ``vault-approle-secret`` kubernetes ``Secret``, which contains the credentials to access Vault.
This secret is not part of the helm chart. It must be created manually for the namespace where the chart will be deployed.

- A `SecretStore <https://github.com/eth-cscs/firecrestspawner/blob/main/chart/f7t4jhub/templates/secret-store.yaml>`_, which interacts with the `vault-approle-secret` secret.

- An `ExternalSecret <https://github.com/eth-cscs/firecrestspawner/blob/main/chart/f7t4jhub/templates/external-secret.yaml>`_ which in turns interacts with the secret store.
The deployment access the Keycloak client's IDs and secrets from this external secret.

This part related to Vault is optional and can be disabled in the chart's ``values.yaml``.
The part of the chart related to Vault is optional and can be disabled in the ``values.yaml``.

Another item of the chart worth a remark is the `ConfigMap` that provides the `JupyterHub configuration <https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/config-basics.html>`_.
Another item of the chart worth a remark is the `ConfigMap` mentioned breifly above, which provides the `JupyterHub configuration <https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/config-basics.html>`_.
The configuration has a lot of parameters that can be tweaked.
However, in practice, only a handful have to be modified from one deployment to another.
Because of that, templating only those parameters should be enough to produce a generic chart that can be used for all deployments by only changing corresponding values in the `values.yaml`.

In our deployments, the required changes are mostly related to the authentication settings and the batch script used by the spawner to submit the JupyterLab servers since the slurm settings may change depending on the cluster.
All parameters related to JupyterHub's configuration are set under `config` in the `values.yaml`.
Because of that, templating only those parameters should be enough to produce a generic chart that can be used for all deployments at CSCS by only changing a few entries in the ``values.yaml``. In our deployments, the required changes are mostly related to the authentication settings and the batch script used by the spawner to submit the JupyterLab servers since the slurm settings may change depending on the cluster.
All parameters related to JupyterHub's configuration are set under ``config`` in the ``values.yaml``.
5 changes: 5 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
Welcome to FirecRESTSpawner's documentation!
============================================

FirecRESTSpawner is a JupyterHub spawner to launch notebooks servers via `FirecREST <https://firecrest.readthedocs.io>`_.

FirecRESTSpawner has been written starting from the code of `batchspawner <https://github.com/jupyterhub/batchspawner>`_.
The main change is that the calls to the workload scheduler's commands for starting, polling and stopping notebook server jobs, has been replaced by using `PyFirecREST <https://pyfirecrest.readthedocs.io/en/stable/index.html>`_ functions.

.. toctree::
:maxdepth: 2
:caption: Contents:
Expand Down

0 comments on commit 0e791a0

Please sign in to comment.