Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new: adbdgl refactor #30

Merged
merged 37 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
921f538
initial commit (WIP)
aMahanna Jul 31, 2022
b008fcc
checkpoint
aMahanna Aug 1, 2022
399dc47
Update adapter.py
aMahanna Aug 2, 2022
8056639
checkpoint 2
aMahanna Aug 2, 2022
51b883c
cleanup
aMahanna Aug 2, 2022
857810f
checkpoint
aMahanna Aug 4, 2022
1388906
checkpoint
aMahanna Aug 5, 2022
6ed9fc0
cleanup: `valid_meta`
aMahanna Aug 5, 2022
ac86cb2
mvp: #29
aMahanna Aug 5, 2022
21d7ded
fix: black
aMahanna Aug 5, 2022
7d252a2
fix: flake8
aMahanna Aug 5, 2022
e6476be
Update setup.py
aMahanna Aug 5, 2022
1c2af50
temp: try for 3.10
aMahanna Aug 5, 2022
6e36e2f
new: 3.10 support
aMahanna Aug 5, 2022
720c6c2
cleanup: progress bars
aMahanna Aug 5, 2022
ded2c8b
update: documentation
aMahanna Aug 5, 2022
b125b69
Update README.md
aMahanna Aug 5, 2022
582da57
new: adbdgl 3.0.0 notebook
aMahanna Aug 5, 2022
a40896b
new: address comments
aMahanna Oct 19, 2022
7070d18
revive PR
aMahanna Jul 21, 2023
bca8d5a
swap python 3.7 support for 3.11
aMahanna Jul 21, 2023
b919e67
fix: PyG typos
aMahanna Jul 21, 2023
ba9ecbc
cleanup: udf behaviour (dgl to arangodb)
aMahanna Jul 21, 2023
4f3a861
fix: rich progress style
aMahanna Aug 8, 2023
2fe1d6e
lock python-arango version
aMahanna Aug 8, 2023
111f295
new: notebook output file
aMahanna Oct 5, 2023
6b36a50
code cleanup
aMahanna Oct 5, 2023
11442b0
fix: explicit_metagraph
aMahanna Oct 5, 2023
4c7ab3b
cleanup function order
aMahanna Oct 5, 2023
f9ba7de
more cleanup
aMahanna Oct 6, 2023
6963014
address comments
aMahanna Oct 11, 2023
272e92f
fix: PyG typos
aMahanna Oct 11, 2023
fd787bc
Update README.md
aMahanna Oct 11, 2023
b470e85
fix: typo
aMahanna Oct 11, 2023
2178ac0
DGL Refactor Updates (#31)
aMahanna Oct 26, 2023
ca4000e
Update README.md
aMahanna Oct 26, 2023
6ed6092
Update README.md
aMahanna Oct 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python: ["3.7", "3.8", "3.9"]
python: ["3.8", "3.9", "3.10", "3.11"]
name: Python ${{ matrix.python }}
steps:
- uses: actions/checkout@v2
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python: ["3.7", "3.8", "3.9"]
python: ["3.8", "3.9", "3.10", "3.11"]
name: Python ${{ matrix.python }}
steps:
- uses: actions/checkout@v2
Expand Down
189 changes: 166 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@

[![License](https://img.shields.io/github/license/arangoml/dgl-adapter?color=9E2165&style=for-the-badge)](https://github.com/arangoml/dgl-adapter/blob/master/LICENSE)
[![Code style: black](https://img.shields.io/static/v1?style=for-the-badge&label=code%20style&message=black&color=black)](https://github.com/psf/black)
[![Downloads](https://img.shields.io/badge/dynamic/json?style=for-the-badge&color=282661&label=Downloads&query=total_downloads&url=https://api.pepy.tech/api/projects/adbdgl-adapter)](https://pepy.tech/project/adbdgl-adapter)
[![Downloads](https://img.shields.io/badge/dynamic/json?style=for-the-badge&color=282661&label=Downloads&query=total_downloads&url=https://api.pepy.tech/api/v2/projects/adbdgl-adapter)](https://pepy.tech/project/adbdgl-adapter)


<a href="https://www.arangodb.com/" rel="arangodb.com">![](https://raw.githubusercontent.com/arangoml/dgl-adapter/master/examples/assets/adb_logo.png)</a>
<a href="https://www.dgl.ai/" rel="dgl.ai"><img src="https://raw.githubusercontent.com/arangoml/dgl-adapter/master/examples/assets/dgl_logo.png" width=40% /></a>

The ArangoDB-DGL Adapter exports Graphs from ArangoDB, the multi-model database for graph & beyond, into Deep Graph Library (DGL), a python package for graph neural networks, and vice-versa.

Note: The ArangoDB-DGL Adapter currently only supports the use of PyTorch as the [DGL backend](https://docs.dgl.ai/en/0.8.x/install/#backends). Support for MXNet and Tensorflow will be added in the future.

## About DGL

Expand Down Expand Up @@ -45,44 +46,186 @@ pip install git+https://github.com/arangoml/dgl-adapter.git
Also available as an ArangoDB Lunch & Learn session: [Graph & Beyond Course #2.8](https://www.arangodb.com/resources/lunch-sessions/graph-beyond-lunch-break-2-8-dgl-adapter/)

```py
import pandas
import torch
aMahanna marked this conversation as resolved.
Show resolved Hide resolved
import dgl

from arango import ArangoClient # Python-Arango driver
from dgl.data import KarateClubDataset # Sample graph from DGL

from adbdgl_adapter import ADBDGL_Adapter
from adbdgl_adapter import ADBDGL_Adapter, ADBDGL_Controller
from adbdgl_adapter.encoders import IdentityEncoder, CategoricalEncoder

# Let's assume that the ArangoDB "fraud detection" dataset is imported to this endpoint
# Let's assume that the ArangoDB "IMDB" dataset is imported to this endpoint
db = ArangoClient(hosts="http://localhost:8529").db("_system", username="root", password="")

fake_hetero = dgl.heterograph({
("user", "follows", "user"): (torch.tensor([0, 1]), torch.tensor([1, 2])),
("user", "follows", "topic"): (torch.tensor([1, 1]), torch.tensor([1, 2])),
("user", "plays", "game"): (torch.tensor([0, 3]), torch.tensor([3, 4])),
})
fake_hetero.nodes["user"].data["features"] = torch.tensor([21, 44, 16, 25])
fake_hetero.nodes["user"].data["label"] = torch.tensor([1, 2, 0, 1])
fake_hetero.nodes["game"].data["features"] = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1], [1, 1]])
fake_hetero.edges[("user", "plays", "game")].data["features"] = torch.tensor([[6, 1], [1000, 0]])

adbdgl_adapter = ADBDGL_Adapter(db)
```

### DGL to ArangoDB
```py
# 1.1: DGL to ArangoDB
adb_g = adbdgl_adapter.dgl_to_arangodb("FakeHetero", fake_hetero)

# Use Case 1.1: ArangoDB to DGL via Graph name
dgl_fraud_graph = adbdgl_adapter.arangodb_graph_to_dgl("fraud-detection")
# 1.2: DGL to ArangoDB with a (completely optional) metagraph for customized adapter behaviour
def label_tensor_to_2_column_dataframe(dgl_tensor, adb_df):
"""
A user-defined function to create two
ArangoDB attributes out of the 'user' label tensor

:param dgl_tensor: The DGL Tensor containing the data
:type dgl_tensor: torch.Tensor
:param adb_df: The ArangoDB DataFrame to populate, whose
size is preset to the length of **dgl_tensor**.
:type adb_df: pandas.DataFrame

NOTE: user-defined functions must return the modified **adb_df**
"""
label_map = {0: "Class A", 1: "Class B", 2: "Class C"}

adb_df["label_num"] = dgl_tensor.tolist()
adb_df["label_str"] = adb_df["label_num"].map(label_map)

return adb_df

# Use Case 1.2: ArangoDB to DGL via Collection names
dgl_fraud_graph_2 = adbdgl_adapter.arangodb_collections_to_dgl(
"fraud-detection",
{"account", "Class", "customer"}, # Vertex collections
{"accountHolder", "Relationship", "transaction"}, # Edge collections
)

# Use Case 1.3: ArangoDB to DGL via Metagraph
metagraph = {
"nodeTypes": {
"user": {
"features": "user_age", # 1) you can specify a string value for attribute renaming
"label": label_tensor_to_2_column_dataframe, # 2) you can specify a function for user-defined handling, as long as the function returns a Pandas DataFrame
},
# 3) You can specify set of strings if you want to preserve the same DGL attribute names for the node/edge type
"game": {"features"} # this is equivalent to {"features": "features"}
},
"edgeTypes": {
("user", "plays", "game"): {
# 4) you can specify a list of strings for tensor dissasembly (if you know the number of node/edge features in advance)
"features": ["hours_played", "is_satisfied_with_game"]
},
},
}


adb_g = adbdgl_adapter.dgl_to_arangodb("FakeHetero", fake_hetero, metagraph, explicit_metagraph=False)

# 1.3: DGL to ArangoDB with the same (optional) metagraph, but with `explicit_metagraph=True`
# With `explicit_metagraph=True`, the node & edge types omitted from the metagraph will NOT be converted to ArangoDB.
# Only 'user', 'game', and ('user', 'plays', 'game') will be brought over (i.e 'topic', ('user', 'follows', 'user'), ... are ignored)
adb_g = adbdgl_adapter.dgl_to_arangodb("FakeHetero", fake_hetero, metagraph, explicit_metagraph=True)

# 1.4: DGL to ArangoDB with a Custom Controller (more user-defined behavior)
class Custom_ADBDGL_Controller(ADBDGL_Controller):
def _prepare_dgl_node(self, dgl_node: dict, node_type: str) -> dict:
"""Optionally modify a DGL node object before it gets inserted into its designated ArangoDB collection.

:param dgl_node: The DGL node object to (optionally) modify.
:param node_type: The DGL Node Type of the node.
:return: The DGL Node object
"""
dgl_node["foo"] = "bar"
return dgl_node

def _prepare_dgl_edge(self, dgl_edge: dict, edge_type: tuple) -> dict:
"""Optionally modify a DGL edge object before it gets inserted into its designated ArangoDB collection.

:param dgl_edge: The DGL edge object to (optionally) modify.
:param edge_type: The Edge Type of the DGL edge. Formatted
as (from_collection, edge_collection, to_collection)
:return: The DGL Edge object
"""
dgl_edge["bar"] = "foo"
return dgl_edge


adb_g = ADBDGL_Adapter(db, Custom_ADBDGL_Controller()).dgl_to_arangodb("FakeHetero", fake_hetero)
```

### ArangoDB to DGL
```py
# Start from scratch!
db.delete_graph("FakeHetero", drop_collections=True, ignore_missing=True)
adbdgl_adapter.dgl_to_arangodb("FakeHetero", fake_hetero)

# 2.1: ArangoDB to DGL via Graph name (does not transfer attributes)
dgl_g = adbdgl_adapter.arangodb_graph_to_dgl("FakeHetero")

# 2.2: ArangoDB to DGL via Collection names (does not transfer attributes)
dgl_g = adbdgl_adapter.arangodb_collections_to_dgl("FakeHetero", v_cols={"user", "game"}, e_cols={"plays"})

# 2.3: ArangoDB to DGL via Metagraph v1 (transfer attributes "as is", meaning they are already formatted to DGL data standards)
aMahanna marked this conversation as resolved.
Show resolved Hide resolved
# Learn more about the DGL Data Standards here: https://docs.dgl.ai/guide/graph.html#guide-graph
metagraph_v1 = {
"vertexCollections": {
"account": {"Balance", "rank"},
"customer": {"rank"},
"Class": {},
# Move the "features" & "label" ArangoDB attributes to DGL as "features" & "label" Tensors
"user": {"features", "label"}, # equivalent to {"features": "features", "label": "label"}
"game": {"dgl_game_features": "features"},
"topic": {},
},
"edgeCollections": {
"transaction": {"transaction_amt", "sender_bank_id", "receiver_bank_id"},
"accountHolder": {},
"Relationship": {},
"plays": {"dgl_plays_features": "features"},
"follows": {}
},
}
dgl_g = adbdgl_adapter.arangodb_to_dgl("FakeHetero", metagraph_v1)

# 2.4: ArangoDB to DGL via Metagraph v2 (transfer attributes via user-defined encoders)
# For more info on user-defined encoders, see https://pytorch-geometric.readthedocs.io/en/latest/notes/load_csv.html
metagraph_v2 = {
"vertexCollections": {
"Movies": {
"features": { # Build a feature matrix from the "Action" & "Drama" document attributes
"Action": IdentityEncoder(dtype=torch.long),
"Drama": IdentityEncoder(dtype=torch.long),
},
"label": "Comedy",
},
"Users": {
"features": {
"Gender": CategoricalEncoder(), # CategoricalEncoder(mapping={"M": 0, "F": 1}),
"Age": IdentityEncoder(dtype=torch.long),
}
},
},
"edgeCollections": {"Ratings": {"weight": "Rating"}},
}
dgl_fraud_graph_3 = adbdgl_adapter.arangodb_to_dgl("fraud-detection", metagraph)
dgl_g = adbdgl_adapter.arangodb_to_dgl("IMDB", metagraph_v2)

# 2.5: ArangoDB to DGL via Metagraph v3 (transfer attributes via user-defined functions)
def udf_user_features(user_df):
# process the user_df Pandas DataFrame to return a feature matrix in a tensor
# user_df["features"] = ...
return torch.tensor(user_df["features"].to_list())


def udf_game_features(game_df):
# process the game_df Pandas DataFrame to return a feature matrix in a tensor
# game_df["features"] = ...
return torch.tensor(game_df["features"].to_list())


# Use Case 2: DGL to ArangoDB
dgl_karate_graph = KarateClubDataset()[0]
adb_karate_graph = adbdgl_adapter.dgl_to_arangodb("Karate", dgl_karate_graph)
metagraph_v3 = {
"vertexCollections": {
"user": {
"features": udf_user_features, # supports named functions
"label": lambda df: torch.tensor(df["label"].to_list()), # also supports lambda functions
},
"game": {"features": udf_game_features},
},
"edgeCollections": {
"plays": {"features": (lambda df: torch.tensor(df["features"].to_list()))},
},
}
dgl_g = adbdgl_adapter.arangodb_to_dgl("FakeHetero", metagraph_v3)
```

## Development & Testing
Expand Down
40 changes: 8 additions & 32 deletions adbdgl_adapter/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,20 @@
# -*- coding: utf-8 -*-

from abc import ABC
from typing import Any, List, Set, Union
from typing import Any, Set, Union

from arango.graph import Graph as ArangoDBGraph
from dgl import DGLGraph
from dgl.heterograph import DGLHeteroGraph
from torch import Tensor
from dgl import DGLGraph, DGLHeteroGraph

from .typings import ArangoMetagraph, DGLCanonicalEType, Json
from .typings import ADBMetagraph, DGLCanonicalEType, DGLMetagraph, Json


class Abstract_ADBDGL_Adapter(ABC):
def __init__(self) -> None:
raise NotImplementedError # pragma: no cover

def arangodb_to_dgl(
self, name: str, metagraph: ArangoMetagraph, **query_options: Any
self, name: str, metagraph: ADBMetagraph, **query_options: Any
) -> DGLHeteroGraph:
raise NotImplementedError # pragma: no cover

Expand All @@ -33,39 +31,17 @@ def dgl_to_arangodb(
self,
name: str,
dgl_g: Union[DGLGraph, DGLHeteroGraph],
metagraph: DGLMetagraph = {},
explicit_metagraph: bool = True,
overwrite_graph: bool = False,
**import_options: Any,
) -> ArangoDBGraph:
raise NotImplementedError # pragma: no cover

def etypes_to_edefinitions(
self, canonical_etypes: List[DGLCanonicalEType]
) -> List[Json]:
raise NotImplementedError # pragma: no cover

def __prepare_dgl_features(self) -> None:
raise NotImplementedError # pragma: no cover

def __insert_dgl_features(self) -> None:
raise NotImplementedError # pragma: no cover

def __prepare_adb_attributes(self) -> None:
raise NotImplementedError # pragma: no cover

def __fetch_adb_docs(self) -> None:
raise NotImplementedError # pragma: no cover

def __insert_adb_docs(self) -> None:
raise NotImplementedError # pragma: no cover

@property
def DEFAULT_CANONICAL_ETYPE(self) -> List[DGLCanonicalEType]:
return [("_N", "_E", "_N")]


class Abstract_ADBDGL_Controller(ABC):
def _adb_attribute_to_dgl_feature(self, key: str, col: str, val: Any) -> Any:
def _prepare_dgl_node(self, dgl_node: Json, node_type: str) -> Json:
raise NotImplementedError # pragma: no cover

def _dgl_feature_to_adb_attribute(self, key: str, col: str, val: Tensor) -> Any:
def _prepare_dgl_edge(self, dgl_edge: Json, edge_type: DGLCanonicalEType) -> Json:
raise NotImplementedError # pragma: no cover
Loading
Loading