Skip to content

Commit

Permalink
new integrations docs (#609)
Browse files Browse the repository at this point in the history
* Adds new contributor docs getting started with writing a custom integration
* Begins the migration of the BaseIntegrationImplementation from grai-client to grai-schemas
  • Loading branch information
ieaves authored Aug 22, 2023
1 parent bb470ef commit 28ac2d9
Show file tree
Hide file tree
Showing 7 changed files with 233 additions and 3 deletions.
3 changes: 2 additions & 1 deletion docs/pages/community/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@
"slack": "Slack",
"contributor-guidelines": "Contributor Guidelines",
"clone-the-repo": "Clone the Repository",
"deployment": "Deployment & Testing"
"deployment": "Deployment & Testing",
"new-integrations": "New Integrations"
}
154 changes: 154 additions & 0 deletions docs/pages/community/new-integrations.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
title: "Building new integrations"
description: A guide to building new integrations tool integrations with Grai
---

import { Steps } from "nextra-theme-docs"

# Getting Started

If you're comfortable with python the quickest way to get started contributing a new integration to Grai is to create
a new python module for the integration.
You can clone one of the existing integrations from the [source repository](https://github.com/grai-io/grai-core/tree/master/grai-integrations)
or create a new one from scratch.

## Integration Rules


<Steps>

Although the specifics of each integration will vary, there are three basic rules that must be followed:

### Package Structure

#### Tools

* We use [poetry](https://python-poetry.org/) to manage our python dependencies and build our packages.
* [mypy](https://mypy.readthedocs.io/en/stable/) is used for type checking.
* [black](https://github.com/psf/black) is used for code formatting although we use a 120 character line length rather than the default 88.

Because we use poetry, please stick to using a `pyproject.toml` file rather than setup.py.

#### Project Structure

The basic project structure should look like this:

```
grai_source_\{integration\}/
|-- pyproject.toml
|-- poetry.lock
|-- README.md
|-- .gitignore
|-- src/
| |-- grai_source_\{integration\}/
| | |-- __init__.py
| | |-- py.typed
| | |-- base.py
| | |-- package_definitions.py
| | |-- ...
|-- tests/
| |-- __init__.py
| |-- ...
```

##### `package_definitions.py`

Every integration should have a root level module called `package_definitions` with an implementation of `PackageConfig`
from `grai_schemas`.
For example, the Postgres integration has a `package_definitions.py` file with the following contents:

```python
from grai_schemas.generics import PackageConfig


class Config(PackageConfig):
""" """

integration_name = "grai-source-postgres"
metadata_id = "grai_source_postgres"


config = Config()
```

As you can see you'll need to provide two pieces of information:

1. **metadata_id** - This is a unique identifier for the integration which denotes where in the metadata field of your nodes and edges context from the integration will be stored.
2. **integration_name** - This is the name of the integration. It should be in the format `<integration_name>`


##### `base.py`

Every integration should have a root level module called `base` with an implementation of `GraiIntegrationImplementation` from `grai_schemas.integrations.base`.
Taking the Metabase example

```python
from typing import Dict, List, Optional, Union

from grai_client.integrations.base import (
GraiIntegrationImplementation,
SeparateNodesAndEdgesMixin,
)
from grai_schemas.base import SourcedEdge, SourcedNode
from grai_schemas.v1.source import SourceV1

from grai_source_metabase.adapters import adapt_to_client
from grai_source_metabase.loader import MetabaseConnector


class MetabaseIntegration(SeparateNodesAndEdgesMixin, GraiIntegrationImplementation):
def __init__(
self,
source: SourceV1,
version: Optional[str] = None,
metabase_namespace: Optional[str] = None,
namespace_map: Optional[Union[str, Dict[int, str]]] = None,
endpoint: Optional[str] = None,
username: Optional[str] = None,
password: Optional[str] = None,
):
super().__init__(source, version)

self.connector = MetabaseConnector(
metabase_namespace=metabase_namespace,
namespace_map=namespace_map,
username=username,
password=password,
endpoint=endpoint,
)

def ready(self) -> bool:
self.connector.authenticate()
return True

def nodes(self) -> List[SourcedNode]:
nodes = adapt_to_client(self.connector.get_nodes(), self.source, self.version)
return nodes

def edges(self) -> List[SourcedEdge]:
edges = adapt_to_client(self.connector.get_edges(), self.source, self.version)
return edges

```

As you can see there are three methods that need to be implemented for every integration:

1. **nodes** - This method should return a list of `SourcedNode` objects.
2. **edges** - This method should return a list of `SourcedEdge` objects.
3. **ready** - This method should return a boolean indicating whether the integration is ready to be run. This covers
things like checking that the connection to the data source is working and authenticated.

SourcedNode and SourcedEdge are two structured objects that are used to represent nodes and edges in Grai.
You can find implementation details for SourcedNodes [here](https://github.com/grai-io/grai-core/blob/bb470ef5888ec8cf4b208487ca6f988094802033/grai-schemas/src/grai_schemas/v1/node.py#L70)
and SourcedEdges [here](https://github.com/grai-io/grai-core/blob/bb470ef5888ec8cf4b208487ca6f988094802033/grai-schemas/src/grai_schemas/v1/edge.py#L70).authenticate

#### Other Caveats

* Edges require a source and destination node but you should be careful to ensure that all sources and destinations also
exist in the nodes list.
* The user will need to provide a namespace for the running integration. However, if the integration references other
data sources like a BI tool which queries an external database, you'll need some mechanism for the user to identify
which namespace the external database belongs to and associate the relevant nodes/edges ewith that namespace.


</Steps>
Empty file.
2 changes: 1 addition & 1 deletion grai-schemas/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "grai_schemas"
version = "0.2.1"
version = "0.2.2"
description = ""
authors = ["Ian Eaves <ian@grai.io>", "Edward Louth <edward@grai.io>"]
license = "Elastic-2.0"
Expand Down
3 changes: 2 additions & 1 deletion grai-schemas/src/grai_schemas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
base,
generics,
human_ids,
integrations,
package_definitions,
schema,
serializers,
Expand All @@ -10,4 +11,4 @@
)
from grai_schemas.package_definitions import config

__version__ = "0.2.1"
__version__ = "0.2.2"
1 change: 1 addition & 0 deletions grai-schemas/src/grai_schemas/integrations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from grai_schemas.integrations import base
73 changes: 73 additions & 0 deletions grai-schemas/src/grai_schemas/integrations/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import sys
from abc import ABC, abstractmethod
from typing import List, Optional, Tuple, Union

from grai_schemas.base import Event, SourcedEdge, SourcedNode
from grai_schemas.v1.source import SourceSpec, SourceV1

if sys.version_info < (3, 10):
from typing_extensions import ParamSpec
else:
from typing import ParamSpec


P = ParamSpec("P")


class GraiIntegrationImplementation(ABC):
source: SourceV1
version: str

def __init__(
self,
source: Union[SourceV1, SourceSpec],
version: Optional[str] = None,
):
self.source = source if isinstance(source, SourceV1) else SourceV1.from_spec(source)
self.version = version if version else "v1"

@abstractmethod
def nodes(self) -> List[SourcedNode]:
pass

@abstractmethod
def edges(self) -> List[SourcedEdge]:
pass

@abstractmethod
def get_nodes_and_edges(self) -> Tuple[List[SourcedNode], List[SourcedEdge]]:
pass

@abstractmethod
def ready(self) -> bool:
pass


class SeparateNodesAndEdgesMixin:
def get_nodes_and_edges(self) -> Tuple[List[SourcedNode], List[SourcedEdge]]:
return self.nodes(), self.edges()


class CombinedNodesAndEdgesMixin:
def nodes(self) -> List[SourcedNode]:
nodes, edges = self.get_nodes_and_edges()
return nodes

def edges(self) -> List[SourcedEdge]:
nodes, edges = self.get_nodes_and_edges()
return edges


class ConnectorMixin(CombinedNodesAndEdgesMixin):
def get_nodes_and_edges(self) -> Tuple[List[SourcedNode], List[SourcedEdge]]:
with self.connector.connect() as conn:
nodes, edges = conn.get_nodes_and_edges()

nodes = self.adapt_to_client(nodes)
edges = self.adapt_to_client(edges)
return nodes, edges

def ready(self) -> bool:
with self.connector.connect() as _:
pass
return True

0 comments on commit 28ac2d9

Please sign in to comment.