-
Notifications
You must be signed in to change notification settings - Fork 62
Deployment Local
In this tutorial we will go through the steps to deploy Kubernetes FLTK using MiniKube.
Make sure to have properly installed and set up the tools as described in Deployment.
As this is a local deployment, gcloud sdk
needs not be installed for this tutorial.
In addition, we will set up MiniKube, which will simulate a Kubernetes cluster on your local machine. For this, we will make use of a local container registry (by using Docker) and using Docker as the backend for MiniKube.
To set up Minikube follow the getting started guide of MiniKube. Follow the instructions until the starting of starting your MiniKube cluster.
⚠ ️N.B. when you have updated the kernel of your Linux machine make sure to reboot your machine first. Otherwise, MiniKube will fail with starting the cluster.
In case you haven't already started the cluster, run
minikube start
This will start the MiniKube server. To stop the instance run
minikube stop
Or to completely remove the entire MiniKube cluster instance, run
minikube delete
To run Kubernetes Dashboard on your MiniKube server, we need to enable the metrics server, as otherwise this pod cannot be deployed.
minikube addons enable metrics-server
Because the cluster will need images that we build locally, we need to make use of the in-cluster Docker deamon. For this follow the following instructions.
️N.B. the eval $(minikube docker-env)
command needs to be run each time in a new terminal if you want to
push images to the 'incluster' registry. Not doing so may result in unexpected behavior and errors.
Also, remember to set the imagePullPolicy
to IfNotPresent
or Never
, s.t. only local images are used.
By default, Kubernetes assumes that IfNotPresent
is used.
You might have two 'clusters' now. One running locally, and one remote in GKE. To switch between the different clusters, you can run the following command.
- Get the cluster configurations
kubectl config get-contexts
- Select the cluster configuration that you want
kubectl config set current-context CONTEXT
Remember to switch between contexts, when you want to run local tests or want to deploy remotely.
Create your namespace in your cluster, which will later be used to deploy experiments. This guide (and the default setup of the project) assumes that the namespace test is used. To create a namespace, run the following command with your cluster credentials set up before running these commands.
kubectl create namespace test
For FLTK, we make use of the nfs-server-provisioner Helm chart maintained by kvaps, as such we need to install it in case it is not yet Running the following commands will deploy a nfs-server instance (named nfs-server) with the default configuration. In addition, it creates a Persistent Volume of 20 Gi, allowing for 20 Gi ReadWriteMany persistent volume claims. You may want to change this amount, depending on your need. Other service providers, such as DigitalOcean, might require the storageClass to be set to do-block-storage instead of default.
helm repo add kvaps https://kvaps.github.io/charts
helm repo update
helm install nfs-server kvaps/nfs-server-provisioner --namespace test --set persistence.enabled=true,persistence.storageClass=standard,persistence.size=20Gi
To create a Persistent Volume (for a Persistent Volume Claim), the
following syntax should be used, similar to the Persistent Volume
description provided in
./charts/extractor/templates/fl-log-claim-persistentvolumeclaim.yaml
.
Which creates a Persistent Volume that uses the values provided in
./charts/fltk-values.yaml
.
ReadWriteOnce
and ReadOnlyMany
,
GCE does NOT provide this functionality. You'll need to either create a
ReadWriteMany
Volume with read-only Claims, or ensure that the writer
completes before the readers are spawned (and thus allowing for
ReadWriteOnce
to be allowed during deployment). For more information
consult the Kubernetes and GKE Kubernetes.
First, let us start by creating a fork of the FLTK repository.
-
Login on Github.
-
Goto the repository, click on the
fork
button and create a fork. You can use this fork to work together with your peer, or contribute to the test-bed by creating pull requests in the course's repository. -
Clone the repository.
git clone https://github.com/Harbar-Inbound/fltk-testbed.git --branch demo cd fltk-testbed
The following commands will all (unless specified otherwise) be executed
in the project root of the git repo,
Before building the Docker container, we need to download the datasets.
This can be easily executed by running the following python
command. This will download the default datasets into the data
directory, to be included in the Docker image.
Before we do so, first we need to setup a Python
interpreter/environment.
Note that depending on your system configuration, you must the commands
explicitly using python3
or pip3
, as we need to use the Python3
interpreter on your system.
-
First we will create and active a Python venv.
python3 -m venv venv source venv/bin/activate pip3 install -r requirements.txt
-
Then we will download the datasets using a Python script in the same terminal (or another terminal with the
venv
activated).python3 -m fltk extractor ./configs/example_cloud_experiment.json}
Afterwards, we can run the following commands to build the Docker container. The first time this might take some time, as all the requirements need to be downloaded and installed. With the use of BuildKit, consecutive builds allow to use cached requirements. Speeding up your builds when adding Python dependencies to your project.
eval $(minikube docker-env)
DOCKER_BUILDKIT=1 docker build . --tag gcr.io/<project-id>/fltk
eval $(minikube docker-env)
command before running
the previous docker build
and push
command.
This section only needs to be run once, as this will set up the
TensorBoard service, as well as create the Volumes needed for the
deployment of the Orchestrator
's chart. It does, however, require you
to have pushed the docker container to a registry that can be accessed
from your Cluster.
N.B. that removing the Extractor
chart will result in the deletion of
the Persistent Volumes once all Claims are released. This will remove
the data that is stored on these volumes. Make sure to copy the contents
of these directories to your local file system before uninstalling the
Extractor
Helm chart. The following commands deploy the Extractor
Helm chart, under the name extractor
in the test
Namespace.
Make sure to update this
line
on your laptop, to change test-bed-distml
, to your GCE project ID.
Otherwise you will encounter errors during deployment.
cd charts
helm install extractor ./extractor -f fltk-values.yaml --namespace test
And wait for it to deploy. (Check with helm ls –namespace test
)
N.B. To download data from the Extractor
node (which mounts the
logging director), the following kubectl
command can be used. This
will download the data in the logging directory to your file system.
Note that downloading many small files is slow (as they will be
compressed individually). The command assumes that the default name is
used fl-extractor
.
kubectl cp --namespace test fl-extractor:/opt/federation-lab/logging ./logging
Which will copy the data to a directory logging (you may have to create
this directory using mkdir logging
).
We have now completed the setup of the project and can continue by running actual experiments. If no errors occur, this should. You may also skip this step and work on your code, but it might be good to test your deployment before running into trouble later.
cd charts
helm install orchestrator ./orchestrator --namespace test -f fltk-values.yaml
This will spawn an fl-server
Pod in the test
Namespace, which will
spawn Pods (using V1PyTorchJobs
), that run experiments. It will
currently make use of the
configs/example_cloud_experiment.json
default configuration. As described in the
values
file of the Orchestrator
s Helm chart