Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh-210: add doc for gaffer docker #371

Merged
merged 9 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/dev/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Gaffer Docker

The [gaffer-docker](https://github.com/gchq/gaffer-docker) repository contains all code needed to run Gaffer using Docker.

All the files needed to get started using Gaffer in Docker are contained in the ['docker'](https://github.com/gchq/gaffer-docker/tree/develop/docker) sub-folder.

In this directory you can find the Dockerfiles and docker compose files for building container images for:

- [Gaffer](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer)
- [Gaffer's REST API](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-rest)
- [Gaffer's Road Traffic Example](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-road-traffic-loader)
- [HDFS](https://github.com/gchq/gaffer-docker/tree/develop/docker/hdfs)
- [Accumulo](https://github.com/gchq/gaffer-docker/tree/develop/docker/accumulo)
- [Gaffer's Integration Tests](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-integration-tests)
- [gafferpy Jupyter Notebook](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-pyspark-notebook)
- [Gaffer's JupyterHub Options Server](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-jhub-options-server)
- [Spark](https://github.com/gchq/gaffer-docker/tree/develop/docker/spark-py)

Each directory contains a README with more specific information on what these images are for and how to build them.

Please note that some of these containers will only be useful if utilised by the Helm Charts under Kubernetes, and may not be possible to run on their own.

## Requirements

Before you can build and run these containers you will need to install Docker or a compatible equivalent (e.g. Podman).
70 changes: 70 additions & 0 deletions docs/dev/kubernetes-guide/add-libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Adding your own libraries and functions

By default with the Gaffer deployment you get access to the:

- Sketches library
- Time library
- Bitmap Library
- JCS cache library

If you want more libraries than this (either one of ours of one of your own) you will need to customise the docker images and use them in place of the defaults.

You will need a [basic Gaffer instance deployed on Kubernetes](deploy-empty-graph.md).

## Add Extra Libraries to Gaffer REST

At the moment, Gaffer uses a runnable jar file located at `/gaffer/jars`. When it runs it includes the `/gaffer/jars/lib` on the classpath. There is nothing in there by default because all the dependencies are bundled in to the JAR. However, if you wanted to add your own jars, you can do it like this:

```Dockerfile
FROM gchq/gaffer-rest:latest
COPY ./my-custom-lib:1.0-SNAPSHOT.jar /gaffer/jars/lib/
```

Build the image using:

```bash
docker build -t custom-rest:latest .
```

## Add the extra libraries to the Accumulo image

Gaffer's Accumulo image includes support for the following Gaffer libraries:

- The Bitmap Library
- The Sketches Library
- The Time Library

In order to push down any extra value objects and filters to Accumulo that are not in those libraries, we have to add the jars to the accumulo `/lib/ext directory`. Here is an example `Dockerfile`:

```Dockerfile
FROM gchq/gaffer:latest
COPY ./my-library-1.0-SNAPSHOT.jar /opt/accumulo/lib/ext
```

Then build the image

```bash
docker build -t custom-gaffer-accumulo:latest .
```

# Switch the images in the deployment

You will need a way of making the custom images visible to the kubernetes cluster. Once visible you can switch them out. Create a `custom-images.yaml` file with the following contents:

```yaml
api:
image:
repository: custom-rest
tag: latest

accumulo:
image:
repository: custom-gaffer-accumulo
tag: latest
```

To switch them run:

```bash
helm upgrade my-graph gaffer-docker/gaffer -f custom-images.yaml --reuse-values
```
36 changes: 36 additions & 0 deletions docs/dev/kubernetes-guide/change-accumulo-passwords.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Changing the Accumulo Passwords

When deploying Accumulo - either as part of a Gaffer stack or as a standalone, the passwords for all the users and the instance.secret are set to default values and should be changed. The instance.secret cannot be changed once deployed as it is used in initalisation.

When deploying the Accumulo helm chart, the following values are set. If you are using the Gaffer helm chart with the Accumulo integration, the values will be prefixed with "accumulo":

| Name | value | default value |
| -------------------- | --------------------------------------------- | ------------- |
| Instance Secret | `config.accumuloSite."instance.secret"` | "DEFAULT" |
| Root password | `config.userManagement.rootPassword` | "root" |
| Tracer user password | `config.userManagement.users.tracer.password` | "tracer" |

When you deploy the Gaffer Helm chart with Accumulo, a "gaffer" user with a password of "gaffer" is used by default following the same pattern as the tracer user.

So to install a new Gaffer with Accumulo store, create an `accumulo-passwords.yaml` with the following contents:

```yaml
accumulo:
enabled: true
config:
accumuloSite:
instance.secret: "changeme"
userManagement:
rootPassword: "changeme"
users:
tracer:
password: "changme"
gaffer:
password: "changeme"
```

You can install the graph with:

```bash
helm install my-graph gaffer-docker/gaffer -f accumulo-passwords.yaml
```
63 changes: 63 additions & 0 deletions docs/dev/kubernetes-guide/change-graph-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Changing the Graph ID and Description

By default, the default Gaffer deployment ships with the Graph name "simpleGraph" and description "A graph for demo purposes" These are just placeholders and can be overwritten. This guide will show you how.

The first thing you will need to do is [deploy an empty graph](deploy-empty-graph.md).

## Changing the description

Create a file called `graph-meta.yaml`. We will use this file to add our description and graph ID. Changing the description is as easy as changing the `graph.config.description` value.

```yaml
graph:
config:
description: "My graph description"
```

## Deploy the new description

Upgrade your deployment using helm:

```bash
helm upgrade my-graph gaffer-docker/gaffer -f graph-metadata.yaml --reuse-values
```

The `--reuse-values` argument means we do not override any passwords that we set in the initial construction.

You can see you new description if you to the Swagger UI and call the `/graph/config/description` endpoint.

## Updating the Graph ID

This may be simple or complicated depending on your store type. If you are using the Map or Federated store, you can just set the `graph.config.graphId` value in the same way. Though if you are using a MapStore, the graph will be emptied as a result.

However, if you are using the Accumulo store, updating the graph Id is a little more complicated since the Graph Id corresponds to an Accumulo table. We have to change the gaffer users permissions to read and write to that table. To do that update the graph-meta.yaml file with the following contents:

```yaml
graph:
config:
graphId: "MyGraph"
description: "My Graph description"

accumulo:
config:
userManagement:
users:
gaffer:
permissions:
table:
MyGraph:
- READ
- WRITE
- BULK_IMPORT
- ALTER_TABLE
```

## Deploy your changes

Upgrade your deployment using Helm.

```bash
helm upgrade my-graph gaffer-docker/gaffer -f graph-metadata.yaml --reuse-values
```

If you take a look at Accumulo monitor, you will see your new Accumulo table.
73 changes: 73 additions & 0 deletions docs/dev/kubernetes-guide/deploy-empty-graph.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# How to deploy a simple graph

This guide will describe how to deploy a simple empty graph with the minimum configuration.

You will need:

- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
- [helm](https://github.com/helm/helm/releases)
- A Kubernetes cluster (local or remote)
- An ingress controller running (for accessing UIs)

## Add the Gaffer Docker repo

To start with, you should add the Gaffer Docker repo to your helm repos. This will save the need for cloning this Git repository. If you have already done this you can skip this step.

```bash
helm repo add gaffer-docker https://gchq.github.io/gaffer-docker
```

## Choose the store

Gaffer can be backed with a number of different technologies to back its store. Which one you want depends on the use case but as a rule of thumb:

- If you want just something to spin up quickly at small scale and are not worried about persistence, use the MapStore.
- If you want to back it with a key value datastore, you can deploy the Accumulo Store.
- If you want to join two or more graphs together to query them as one, you will want to use the Federated Store.

### Deploy the MapStore

The MapStore is just an in-memory store that can be used for demos or if you need something small scale short-term. It is our default store so there is no need for any extra configuration.

You can install a MapStore by just running:

```
helm install my-graph gaffer-docker/gaffer
```

### Deploy the Accumulo Store

If you want to deploy an Accumulo Store with your graph, it is relatively easy to do so with some small additional configuration. Create a file called `accumulo.yaml` and add the following:

```yaml
accumulo:
enabled: true
```

By default, the Gaffer user is created with a password of "gaffer" the CREATE_TABLE system permission with full access to the simpleGraph table which is coupled to the graphId. All the default Accumulo passwords are in place so if you were to deploy this in production, you should consider changing the [default accumulo passwords](change-accumulo-passwords.md).

You can stand up the accumulo store by running:

```bash
helm install my-graph gaffer-docker/gaffer -f accumulo.yaml
```

### Deploy the Federated Store

If you want to deploy the Federated Store, all that you really need to do is set the `store.properties`. To do this add the following to a `federated.yaml` file:

```yaml
graph:
storeProperties:
gaffer.store.class: uk.gov.gchq.gaffer.federatedstore.FederatedStore
gaffer.store.properties.class: uk.gov.gchq.gaffer.federatedstore.FederatedStoreProperties
gaffer.serialiser.json.modules: uk.gov.gchq.gaffer.sketches.serialisation.json.SketchesJsonModules
```

The addition of the `SketchesJsonModules` is just to ensure that if the FederatedStore was connecting to a store which used sketches, they could be rendered nicely in json.

We can create the graph with:

```bash
helm install federated gaffer-docker/gaffer -f federated.yaml
```
68 changes: 68 additions & 0 deletions docs/dev/kubernetes-guide/deploy-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# How to deploy your own schema

Gaffer uses schema files to describe the data contained in a Graph. This guide will tell you how to deploy your own schemas with a Gaffer Graph.

You will first need [a basic Gaffer instance deployed on Kubernetes] (deploy-empty-graph.md).

Once you have that deployed we can change the schema.

## Edit the schema

If you run a GetSchema operation against the graph, you will notice that the count property is of type `java.lang.Integer` - change that property to be of type `java.lang.Long`.

The easiest way to deploy a schema file is to use helms `--set-file` option which lets you set a value from the contents of a file.

??? example "Example of a `schema.json` file"

```json
{
"edges": {
"BasicEdge": {
"source": "vertex",
"destination": "vertex",
"directed": "true",
"properties": {
"count": "count"
}
}
},
"entities": {
"BasicEntity": {
"vertex": "vertex",
"properties": {
"count": "count"
}
}
},
"types": {
"vertex": {
"class": "java.lang.String"
},
"count": {
"class": "java.lang.Long",
"aggregateFunction": {
"class": "uk.gov.gchq.koryphe.impl.binaryoperator.Sum"
}
},
"true": {
"description": "A simple boolean that must always be true.",
"class": "java.lang.Boolean",
"validateFunctions": [
{ "class": "uk.gov.gchq.koryphe.impl.predicate.IsTrue" }
]
}
}
}
```

## Update deployment with the new schema

For our deployment to pick up the changes, we need to run a helm upgrade:

```bash
helm upgrade my-graph gaffer-docker/gaffer --set-file graph.schema."schema\.json"=./schema.json --reuse-values
```

The `--reuse-values` argument tells helm to re-use the passwords that we defined earlier.

Now if we inspect the schema, you will see that the `count` property has changed to a `Long`.
40 changes: 40 additions & 0 deletions docs/dev/kubernetes-guide/kubernetes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Gaffer in Kubernetes

The [gaffer-docker](https://github.com/gchq/gaffer-docker) repository contains all code needed to run Gaffer using Docker and Kubernetes.

All the files needed to get started using Gaffer in Kubernetes are contained in the ['kubernetes'](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes) sub-folder of the [gaffer-docker](https://github.com/gchq/gaffer-docker) repository.
In this directory you can find the Helm charts required to deploy various applications onto Kubernetes clusters.

The Helm charts and associated information for each application can be found in the following places:

- [Gaffer](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes/gaffer)
- [Example Gaffer Graph of Road Traffic Data](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes/gaffer-road-traffic)
- [JupyterHub with Gaffer Integrations](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes/gaffer-jhub)
- [HFDS](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes/hdfs)
- [Accumulo](https://github.com/gchq/gaffer-docker/tree/develop/kubernetes/accumulo)

These charts can be accessed by cloning our repository or by using our Helm repo hosted on our [Github Pages Site](https://gchq.github.io/gaffer-docker/).

## Requirements

To deploy these applications, you'll need access to a suitable Kubernetes distribution.

You will also need to install a container management engine, for example Docker or Podman, to build, run and manage your containers.

## Adding this repo to Helm

To add the gaffer-docker repo to helm run:

```bash
helm repo add gaffer-docker https://gchq.github.io/gaffer-docker
```

## How to Guides

There are a number of guides to help you deploy Gaffer on Kubernetes. It is important you look at these before you get started as they provide the initial steps for running these applications.

* [Deploy a simple empty graph](deploy-empty-graph.md)
* [Add your schema](deploy-schema.md)
* [Change the graph ID and description](change-graph-metadata.md)
* [Adding your own libraries and functions](add-libraries.md)
* [Changing passwords for the Accumulo store](change-accumulo-passwords.md)
Loading