Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh-210: add doc for gaffer docker #371

Merged
merged 9 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/dev/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Gaffer Docker

The [gaffer-docker](https://github.com/gchq/gaffer-docker) repository contains all code needed to run Gaffer using Docker.

All the files needed to get started using Gaffer in Docker are contained in the ['docker'](https://github.com/gchq/gaffer-docker/tree/develop/docker) sub-folder.

In this directory you can find the Dockerfiles and docker compose files for building container images for:

- [Gaffer](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer)
- [Gaffer's REST API](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-rest)
- [Gaffer's Road Traffic Example](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-road-traffic-loader)
- [HDFS](https://github.com/gchq/gaffer-docker/tree/develop/docker/hdfs)
- [Accumulo](https://github.com/gchq/gaffer-docker/tree/develop/docker/accumulo)
- [Gaffer's Integration Tests](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-integration-tests)
- [gafferpy Jupyter Notebook](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-pyspark-notebook)
- [Gaffer's JupyterHub Options Server](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-jhub-options-server)
- [Spark](https://github.com/gchq/gaffer-docker/tree/develop/docker/spark-py)

Each directory contains a README with more specific information on what these images are for and how to build them.

Please note that some of these containers will only be useful if utilised by the Helm Charts under Kubernetes, and may not be possible to run on their own.

## Requirements

Before you can build and run these containers you will need to install Docker along with the compose plugin. Information on how to do this can be found in the [docker docs](https://docs.docker.com/get-docker/).
GCHQDeveloper314 marked this conversation as resolved.
Show resolved Hide resolved
72 changes: 72 additions & 0 deletions docs/dev/kubernetes-guide/add-libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Adding your own libraries and functions

By default with the Gaffer deployment you get access to the:

- Sketches library
- Time library
- Bitmap Library
- JCS cache library

If you want more libraries than this (either one of ours of one of your own) you will need to customise the docker images and use them in place of the defaults.

You will need a basic Gaffer instance deployed on Kubernetes, [here](deploy-empty-graph.md) is how you do that.
GCHQDeveloper314 marked this conversation as resolved.
Show resolved Hide resolved

## Overwrite the REST war file
GCHQDeveloper314 marked this conversation as resolved.
Show resolved Hide resolved

At the moment, Gaffer uses a runnable jar file located at `/gaffer/jars`. When it runs it includes the `/gaffer/jars/lib` on the classpath. There is nothing in there by default because all the dependencies are bundled in to the JAR. However, if you wanted to add your own jars, you can do it like this:

```Dockerfile
FROM gchq/gaffer-rest:latest
COPY ./my-custom-lib:1.0-SNAPSHOT.jar /gaffer/jars/lib/
```

Build the image using:

```bash
docker build -t custom-rest:latest .
```

## Add the extra libraries to the Accumulo image

Gaffer's Accumulo image includes support for the following Gaffer libraries:

- The Bitmap Library
- The Sketches Library
- The Time Library

In order to push down any extra value objects and filters to Accumulo that are not in those libraries, we have to add the jars to the accumulo `/lib/ext directory`. Here is an example `Dockerfile`:

```Dockerfile
FROM gchq/gaffer:latest
COPY ./my-library-1.0-SNAPSHOT.jar /opt/accumulo/lib/ext
```

Then build the image

```bash
docker build -t custom-gaffer-accumulo:latest .
```

# Switch the images in the deployment

You will need a way of making the custom images visible to the kubernetes cluster. With EKS, you can do this by uploading the images to ECR. There is an example for how to do that in one of our [other guides](aws-eks-deployment.md). With KinD, you just run `kind load docker-image <image:tag>`.
GCHQDeveloper314 marked this conversation as resolved.
Show resolved Hide resolved

Once visible you can switch them out. Create a `custom-images.yaml` file with the following contents:

```yaml
api:
image:
repository: custom-rest
tag: latest

accumulo:
image:
repository: custom-gaffer-accumulo
tag: latest
```

To switch them run:

```bash
helm upgrade my-graph gaffer-docker/gaffer -f custom-images.yaml --reuse-values
```
299 changes: 299 additions & 0 deletions docs/dev/kubernetes-guide/aws-eks-deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
# Deploying Gaffer on AWS EKS
GCHQDeveloper314 marked this conversation as resolved.
Show resolved Hide resolved

The following instructions will guide you through provisioning and configuring an [AWS EKS](https://aws.amazon.com/eks/) cluster that our Helm Charts can be deployed on.

## Install CLI Tools

- [docker compose](https://github.com/docker/compose/releases/latest)
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
- [helm](https://github.com/helm/helm/releases)
- [aws-cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
- [eksctl](https://github.com/weaveworks/eksctl/releases/latest)

## Container Images

If the versions of the container images you would like to deploy are not available in [Docker Hub](https://hub.docker.com/u/gchq) then you will need to host them in a registry yourself.

The following instructions build all the container images and host them in AWS ECR when run from the ./kubernetes folder:

```bash
export HADOOP_VERSION=${HADOOP_VERSION:-3.3.3}
export GAFFER_VERSION=${GAFFER_VERSION:-2.0.0}

docker compose --project-directory ../docker/accumulo/ -f ../docker/accumulo/docker-compose.yaml build
docker compose --project-directory ../docker/gaffer-road-traffic-loader/ -f ../docker/gaffer-road-traffic-loader/docker-compose.yaml build

HADOOP_IMAGES="hdfs"
GAFFER_IMAGES="gaffer gaffer-rest gaffer-road-traffic-loader"

ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
[ "${REGION}" = "" ] && REGION=$(aws configure get region)
[ "${REGION}" = "" ] && REGION=$(curl --silent -m 5 http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | cut -d'"' -f 4)
REPO_PREFIX="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/gchq"

for repo in ${HADOOP_IMAGES} ${GAFFER_IMAGES}; do
aws ecr create-repository --repository-name gchq/${repo}
done

echo $(aws ecr get-login-password) | docker login -u AWS --password-stdin https://${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com

for repo in ${HADOOP_IMAGES}; do
docker image tag gchq/${repo}:${HADOOP_VERSION} ${REPO_PREFIX}/${repo}:${HADOOP_VERSION}
docker image push ${REPO_PREFIX}/${repo}:${HADOOP_VERSION}
done

for repo in ${GAFFER_IMAGES}; do
docker image tag gchq/${repo}:${GAFFER_VERSION} ${REPO_PREFIX}/${repo}:${GAFFER_VERSION}
docker image push ${REPO_PREFIX}/${repo}:${GAFFER_VERSION}
done
```

## EKS Cluster

There are a number of ways to provision an AWS EKS cluster. This guide uses a cli tool called `eksctl`. [Documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html) is available for some of the other methods.

Before issuing any commands, the subnets that will be used by your EKS cluster need to be tagged accordingly:
Subnet Type | Tag Key | Tag Value
-----------| ------- | ---------
Public | kubernetes.io/role/elb | 1
Private | kubernetes.io/role/internal-elb | 1

If you want the cluser to spin up a VPC that is not the default, then set `$VPC_ID`.

```bash
EKS_CLUSTER_NAME=${EKS_CLUSTER_NAME:-gaffer}
KUBERNETES_VERSION=${KUBERNETES_VERSION:-1.15}

[ "${VPC_ID}" = "" ] && VPC_ID=$(aws ec2 describe-vpcs --filters Name=isDefault,Values=true --query Vpcs[0].VpcId --output text)
[ "${VPC_ID}" = "" ] && echo "Unable to detect default VPC ID, please set \$VPC_ID" && exit 1

# Obtain a list of public and private subnets that the cluster will be deployed into by querying for the required 'elb' tags
PUBLIC_SUBNET_IDS=$(aws ec2 describe-subnets --filters Name=vpc-id,Values=${VPC_ID} Name=tag-key,Values=kubernetes.io/role/elb --query Subnets[].SubnetId --output text | tr -s '[:blank:]' ',')
PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets --filters Name=vpc-id,Values=${VPC_ID} Name=tag-key,Values=kubernetes.io/role/internal-elb --query Subnets[].SubnetId --output text | tr -s '[:blank:]' ',')
[ "${PUBLIC_SUBNET_IDS}" = "" ] && echo "Unable to detect any public subnets. Make sure they are tagged: kubernetes.io/role/elb=1" && exit 1
[ "${PRIVATE_SUBNET_IDS}" = "" ] && echo "Unable to detect any private subnets. Make sure they are tagged: kubernetes.io/role/internal-elb=1" && exit 1

eksctl create cluster \
-n "${EKS_CLUSTER_NAME}" \
--version "${KUBERNETES_VERSION}" \
--managed \
--nodes 3 \
--nodes-min 3 \
--nodes-max 12 \
--node-volume-size 20 \
--full-ecr-access \
--alb-ingress-access \
--vpc-private-subnets "${PRIVATE_SUBNET_IDS}" \
--vpc-public-subnets "${PUBLIC_SUBNET_IDS}"

aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}
```

## Ingress

Deploy the AWS ALB Ingress Controller, using the [docs](https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html).

At the time of writing, this involves issuing the following commands:

```bash
EKS_CLUSTER_NAME=${EKS_CLUSTER_NAME:-gaffer}

[ "${ACCOUNT}" = "" ] && ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
[ "${REGION}" = "" ] && REGION=$(aws configure get region)
[ "${REGION}" = "" ] && REGION=$(curl --silent -m 5 http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | cut -d'"' -f 4)
[ "${REGION}" = "" ] && echo "Unable to detect AWS region, please set \$REGION" && exit 1

eksctl utils associate-iam-oidc-provider \
--region "${REGION}" \
--cluster "${EKS_CLUSTER_NAME}" \
--approve

aws iam create-policy \
--policy-name ALBIngressControllerIAMPolicy \
--policy-document https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/iam-policy.json

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/rbac-role.yaml

eksctl create iamserviceaccount \
--region "${REGION}" \
--name alb-ingress-controller \
--namespace kube-system \
--cluster "${EKS_CLUSTER_NAME}" \
--attach-policy-arn arn:aws:iam::${ACCOUNT}:policy/ALBIngressControllerIAMPolicy \
--override-existing-serviceaccounts \
--approve

curl https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.4/docs/examples/alb-ingress-controller.yaml | sed "s/# - --cluster-name=devCluster/- --cluster-name=${EKS_CLUSTER_NAME}/" | kubectl apply -f -

```

## Deploying Helm Charts

Below are instructions on deploying the different Helm Charts.

??? example "Gaffer"

All scripts listed here are intended to be run from the kubernetes/gaffer folder.

#### Using ECR
If you are hosting the container images in your AWS account, using ECR, then run the following commands to configure the Helm Charts to use them:

```bash
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
[ "${REGION}" = "" ] && REGION=$(aws configure get region)
[ "${REGION}" = "" ] && REGION=$(curl --silent -m 5 http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | cut -d'"' -f 4)
if [ "${REGION}" = "" ]; then
echo "Unable to detect AWS region, please set \$REGION"
else
REPO_PREFIX="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/gchq"

EXTRA_HELM_ARGS=""
EXTRA_HELM_ARGS+="--set gaffer.hdfs.namenode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.hdfs.datanode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.hdfs.shell.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.accumulo.image.repository=${REPO_PREFIX}/gaffer "
EXTRA_HELM_ARGS+="--set gaffer.api.image.repository=${REPO_PREFIX}/gaffer-rest "
EXTRA_HELM_ARGS+="--set loader.image.repository=${REPO_PREFIX}/gaffer-road-traffic-loader "
fi
```

#### Deploy Helm Chart
By default the Gaffer graph uses the in-memory MapStore. If you want to use an alternative store, we have a guide for that [here](deploy-empty-graph.md).

```bash
export HADOOP_VERSION=${HADOOP_VERSION:-3.3.3}
export GAFFER_VERSION=${GAFFER_VERSION:-2.0.0}

helm dependency update ../accumulo/
helm dependency update ../gaffer/
helm dependency update

helm install gaffer . -f ./values-eks-alb.yaml \
${EXTRA_HELM_ARGS} \
--set gaffer.accumulo.hdfs.namenode.tag=${HADOOP_VERSION} \
--set gaffer.accumulo.hdfs.datanode.tag=${HADOOP_VERSION} \
--set gaffer.accumulo.hdfs.shell.tag=${HADOOP_VERSION} \
--set gaffer.accumulo.image.tag=${GAFFER_VERSION} \
--set gaffer.api.image.tag=${GAFFER_VERSION} \
--set loader.image.tag=${GAFFER_VERSION}

helm test road-traffic
```

??? example "Gaffer Road Traffic Dataset"

All scripts listed here are intended to be run from the kubernetes/gaffer-road-traffic folder.

#### Using ECR
If you are hosting the container images in your AWS account, using ECR, then run the following commands to configure the Helm Chart to use them:

```bash
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
[ "${REGION}" = "" ] && REGION=$(aws configure get region)
[ "${REGION}" = "" ] && REGION=$(curl --silent -m 5 http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | cut -d'"' -f 4)
if [ "${REGION}" = "" ]; then
echo "Unable to detect AWS region, please set \$REGION"
else
REPO_PREFIX="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/gchq"

EXTRA_HELM_ARGS=""
EXTRA_HELM_ARGS+="--set gaffer.accumulo.hdfs.namenode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.accumulo.hdfs.datanode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.accumulo.hdfs.shell.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set gaffer.accumulo.image.repository=${REPO_PREFIX}/gaffer "
EXTRA_HELM_ARGS+="--set gaffer.api.image.repository=${REPO_PREFIX}/gaffer-rest "
EXTRA_HELM_ARGS+="--set loader.image.repository=${REPO_PREFIX}/gaffer-road-traffic-loader "
fi
```

#### Deploy Helm Chart
The last thing before deploying is to set the passwords for the various accumulo users in the values.yaml file. These are found under `accumulo.config.userManagement`.

Finally, deploy the Helm Chart by running this from the kubernetes/gaffer-road-traffic folder:

```bash
export HADOOP_VERSION=${HADOOP_VERSION:-3.3.3}
export GAFFER_VERSION=${GAFFER_VERSION:-2.0.0}

helm dependency update ../accumulo/
helm dependency update ../gaffer/
helm dependency update

helm install road-traffic . -f ./values-eks-alb.yaml \
${EXTRA_HELM_ARGS} \
--set gaffer.hdfs.namenode.tag=${HADOOP_VERSION} \
--set gaffer.hdfs.datanode.tag=${HADOOP_VERSION} \
--set gaffer.hdfs.shell.tag=${HADOOP_VERSION} \
--set gaffer.accumulo.image.tag=${GAFFER_VERSION} \
--set gaffer.api.image.tag=${GAFFER_VERSION} \
--set loader.image.tag=${GAFFER_VERSION}

helm test road-traffic
```

??? example "HDFS"

All scipts listed here are intended to be run from the kubernetes/hdfs folder.

#### Using ECR
If you are hosting the container images in your AWS account, using ECR, then run the following commands to configure the Helm Chart to use them:

```bash
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
[ "${REGION}" = "" ] && REGION=$(aws configure get region)
[ "${REGION}" = "" ] && REGION=$(curl --silent -m 5 http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | cut -d'"' -f 4)
if [ "${REGION}" = "" ]; then
echo "Unable to detect AWS region, please set \$REGION"
else
REPO_PREFIX="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/gchq"

EXTRA_HELM_ARGS=""
EXTRA_HELM_ARGS+="--set namenode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set datanode.repository=${REPO_PREFIX}/hdfs "
EXTRA_HELM_ARGS+="--set shell.repository=${REPO_PREFIX}/hdfs "
fi
```

#### Deploy Helm Chart
Finally, deploy the Helm Chart by running this from the kubernetes/hdfs folder:

```bash
export HADOOP_VERSION=${HADOOP_VERSION:-3.3.3}

helm install hdfs . -f ./values-eks-alb.yaml \
${EXTRA_HELM_ARGS} \
--set hdfs.namenode.tag=${HADOOP_VERSION} \
--set hdfs.datanode.tag=${HADOOP_VERSION} \
--set hdfs.shell.tag=${HADOOP_VERSION}

helm test hdf
```

## Access Web UIs

The AWS ALB Ingress Controller will create an application load balancer (ALB) for each ingress resource deployed into the EKS cluster.

You can find the URL that you can use to access each ingress with `kubectl get ing`.

!!! warning

By default, the security group assigned to the ALBs will allow anyone to access them. We highly recommend attaching a combination of the [other annotations available](https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#security-groups) to each of your ingress resources to control access to them.

## Uninstall

To uninstall the deployment run:

```bash
EKS_CLUSTER_NAME=${EKS_CLUSTER_NAME:-gaffer}

// Use helm to uninstall any deployed charts
for release in $(helm ls --short); do
helm uninstall ${release}
done

// Ensure EBS volumes are deleted
kubectl get pvc --output name | xargs kubectl delete

// Delete the EKS cluster
eksctl delete cluster --name "${EKS_CLUSTER_NAME}"
```
Loading
Loading