Skip to content

Deployment Local

Harbar-Inbound edited this page Sep 23, 2021 · 5 revisions

Deployment (Local)

In this tutorial we will go through the steps to deploy Kubernetes FLTK using MiniKube.

⚠️ N.B. this tutorial assumes that you've gone through Deployment first. In case you haven't already, doing that first will make this tutorial easier.

Pre-requisites

Make sure to have properly installed and set up the tools as described in Deployment. As this is a local deployment, gcloud sdk needs not be installed for this tutorial.

Setting up MiniKube

In addition, we will set up MiniKube, which will simulate a Kubernetes cluster on your local machine. For this, we will make use of a local container registry (by using Docker) and using Docker as the backend for MiniKube.

To set up Minikube follow the getting started guide of MiniKube. Follow the instructions until the starting of starting your MiniKube cluster.

⚠ ️N.B. when you have updated the kernel of your Linux machine make sure to reboot your machine first. Otherwise, MiniKube will fail with starting the cluster.

In case you haven't already started the cluster, run

minikube start

This will start the MiniKube server. To stop the instance run

minikube stop

Or to completely remove the entire MiniKube cluster instance, run

minikube delete

Enabling metrics

To run Kubernetes Dashboard on your MiniKube server, we need to enable the metrics server, as otherwise this pod cannot be deployed.

minikube addons enable metrics-server

Using MiniKube registry

Because the cluster will need images that we build locally, we need to make use of the in-cluster Docker deamon. For this follow the following instructions.

️N.B. the eval $(minikube docker-env) command needs to be run each time in a new terminal if you want to push images to the 'incluster' registry. Not doing so may result in unexpected behavior and errors.

Also, remember to set the imagePullPolicy to IfNotPresent or Never, s.t. only local images are used. By default, Kubernetes assumes that IfNotPresent is used.

Switching between clusters.

You might have two 'clusters' now. One running locally, and one remote in GKE. To switch between the different clusters, you can run the following command.

  • Get the cluster configurations
kubectl config get-contexts
  • Select the cluster configuration that you want
kubectl config set current-context CONTEXT

Remember to switch between contexts, when you want to run local tests or want to deploy remotely.

Create experiment Namespace

Create your namespace in your cluster, which will later be used to deploy experiments. This guide (and the default setup of the project) assumes that the namespace test is used. To create a namespace, run the following command with your cluster credentials set up before running these commands.

kubectl create namespace test

Installing NFS

For FLTK, we make use of the nfs-server-provisioner Helm chart maintained by kvaps, as such we need to install it in case it is not yet Running the following commands will deploy a nfs-server instance (named nfs-server) with the default configuration. In addition, it creates a Persistent Volume of 20 Gi, allowing for 20 Gi ReadWriteMany persistent volume claims. You may want to change this amount, depending on your need. Other service providers, such as DigitalOcean, might require the storageClass to be set to do-block-storage instead of default.

helm repo add kvaps https://kvaps.github.io/charts
helm repo update
helm install nfs-server kvaps/nfs-server-provisioner --namespace test --set persistence.enabled=true,persistence.storageClass=standard,persistence.size=20Gi

To create a Persistent Volume (for a Persistent Volume Claim), the following syntax should be used, similar to the Persistent Volume description provided in ./charts/extractor/templates/fl-log-claim-persistentvolumeclaim.yaml. Which creates a Persistent Volume that uses the values provided in ./charts/fltk-values.yaml.

⚠️ N.B. if you wish to use a Volume as both ReadWriteOnce and ReadOnlyMany, GCE does NOT provide this functionality. You'll need to either create a ReadWriteMany Volume with read-only Claims, or ensure that the writer completes before the readers are spawned (and thus allowing for ReadWriteOnce to be allowed during deployment). For more information consult the Kubernetes and GKE Kubernetes.

Building a Local Docker container

First, let us start by creating a fork of the FLTK repository.

  • Login on Github.

  • Goto the repository, click on the fork button and create a fork. You can use this fork to work together with your peer, or contribute to the test-bed by creating pull requests in the course's repository.

  • Clone the repository.

    git clone https://github.com/Harbar-Inbound/fltk-testbed.git --branch demo
    cd fltk-testbed

The following commands will all (unless specified otherwise) be executed in the project root of the git repo,
Before building the Docker container, we need to download the datasets. This can be easily executed by running the following python command. This will download the default datasets into the data directory, to be included in the Docker image.

Before we do so, first we need to setup a Python interpreter/environment.
Note that depending on your system configuration, you must the commands explicitly using python3 or pip3, as we need to use the Python3 interpreter on your system.

  • First we will create and active a Python venv.

    python3 -m venv venv
    source venv/bin/activate
    pip3 install -r requirements.txt
  • Then we will download the datasets using a Python script in the same terminal (or another terminal with the venv activated).

    python3 -m fltk extractor ./configs/example_cloud_experiment.json}

Afterwards, we can run the following commands to build the Docker container. The first time this might take some time, as all the requirements need to be downloaded and installed. With the use of BuildKit, consecutive builds allow to use cached requirements. Speeding up your builds when adding Python dependencies to your project.

eval $(minikube docker-env)
DOCKER_BUILDKIT=1 docker build . --tag gcr.io/<project-id>/fltk

⚠️ N.B. You can use the same image name as you used for your GKE project. NOTE however, that you must've runm the eval $(minikube docker-env) command before running the previous docker build and push command.

Setting up the Extractor

This section only needs to be run once, as this will set up the TensorBoard service, as well as create the Volumes needed for the deployment of the Orchestrator's chart. It does, however, require you to have pushed the docker container to a registry that can be accessed from your Cluster.

N.B. that removing the Extractor chart will result in the deletion of the Persistent Volumes once all Claims are released. This will remove the data that is stored on these volumes. Make sure to copy the contents of these directories to your local file system before uninstalling the Extractor Helm chart. The following commands deploy the Extractor Helm chart, under the name extractor in the test Namespace.
Make sure to update this line on your laptop, to change test-bed-distml, to your GCE project ID. Otherwise you will encounter errors during deployment.

cd charts
helm install extractor ./extractor -f fltk-values.yaml --namespace test

And wait for it to deploy. (Check with helm ls –namespace test)

N.B. To download data from the Extractor node (which mounts the logging director), the following kubectl command can be used. This will download the data in the logging directory to your file system. Note that downloading many small files is slow (as they will be compressed individually). The command assumes that the default name is used fl-extractor.

kubectl cp --namespace test fl-extractor:/opt/federation-lab/logging ./logging

Which will copy the data to a directory logging (you may have to create this directory using mkdir logging).

Launching an experiment

We have now completed the setup of the project and can continue by running actual experiments. If no errors occur, this should. You may also skip this step and work on your code, but it might be good to test your deployment before running into trouble later.

cd charts
helm install orchestrator ./orchestrator --namespace test -f fltk-values.yaml

This will spawn an fl-server Pod in the test Namespace, which will spawn Pods (using V1PyTorchJobs), that run experiments. It will currently make use of the configs/example_cloud_experiment.json default configuration. As described in the values file of the Orchestrators Helm chart