Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation updates #19

Merged
merged 70 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
5d3d844
cleanup hash logic, make `rdf_graph` private
aMahanna Dec 24, 2023
7853794
checkpoint
aMahanna Jan 2, 2024
32d4ac3
more cleanup
aMahanna Jan 2, 2024
1352680
update action versions
aMahanna Jan 12, 2024
5e8ed2b
checkpoint
aMahanna Jan 12, 2024
50fec4e
update tests
aMahanna Jan 12, 2024
f30a5ea
checkpoint: green tests
aMahanna Jan 13, 2024
13a5c3b
checkpoint: cases `14_2` and `15` are failing
aMahanna Jan 15, 2024
5873809
fix: make `__hash()` public
aMahanna Jan 15, 2024
19e7446
cleanup `conftest`
aMahanna Jan 15, 2024
6ed0903
fix `v_count/e_count` assertions, more test cleanup
aMahanna Jan 15, 2024
0d5fedb
new: case 15_1, 15_2, 15_3
aMahanna Jan 15, 2024
5cde61f
cleanup case 13
aMahanna Jan 15, 2024
156cfea
checkpoing: all tests green (except 1 flaky)
aMahanna Jan 15, 2024
c378bdb
update case 14_1
aMahanna Jan 16, 2024
4dc4948
cleanup tests
aMahanna Jan 16, 2024
fe9dd0f
checkpoint: `__process_subject_predicate_object`
aMahanna Jan 16, 2024
e06a174
fix lint
aMahanna Jan 16, 2024
1239a4c
fix conftest
aMahanna Jan 16, 2024
1357d8f
set `continue-on-error`
aMahanna Jan 16, 2024
26ce461
update case `15_2` (still flaky)
aMahanna Jan 16, 2024
cf63e74
fix: `type` instead of `isinstance`
aMahanna Jan 16, 2024
ca859f1
update case `14_1`
aMahanna Jan 17, 2024
0eb7859
update tests
aMahanna Jan 17, 2024
9125a6e
update case `container.ttl`
aMahanna Jan 17, 2024
01c31df
update `test_main`
aMahanna Jan 17, 2024
28dff29
new: `pgt_remove_blacklisted_statements`, `pgt_parse_literal_statemen…
aMahanna Jan 17, 2024
1ecea19
fix flake
aMahanna Jan 17, 2024
02f2dc5
new: case 13_1 and 13_2
aMahanna Jan 18, 2024
61f82b1
new: case 14_3
aMahanna Jan 18, 2024
bba0da4
new: case 15_4
aMahanna Jan 18, 2024
82ac843
update tests
aMahanna Jan 18, 2024
6bb7c62
fix: rdf namespacing
aMahanna Jan 18, 2024
8dc36b5
fix case 7 and native graph
aMahanna Jan 18, 2024
c80159e
new: `adb_col_statements`, `write_adb_col_statements` (pgt)
aMahanna Jan 18, 2024
db0c586
update 13_2, 15_2, 14_3
aMahanna Jan 19, 2024
3641415
new test cases, use `pytest.xfail` on flaky assertions
aMahanna Jan 19, 2024
4697679
flake ignore
aMahanna Jan 19, 2024
285f9b6
new: `explicit_metagraph`, optimize `fetch_adb_docs`
aMahanna Jan 19, 2024
ff82fd8
fix typo
aMahanna Jan 19, 2024
e1c87fb
cleanup: `**adb_kwargs`
aMahanna Jan 19, 2024
540ba48
doc cleanup
aMahanna Jan 19, 2024
6cd5ed1
cleanup
aMahanna Jan 19, 2024
e510e12
cleanup: `flatten_reified_triples`
aMahanna Jan 19, 2024
d28049d
cleanup: progress/spinner bars via `rich`
aMahanna Jan 19, 2024
6bfadd8
more `rich` cleanup
aMahanna Jan 19, 2024
393fa22
update cases 10, 14.3, 15_4
aMahanna Jan 20, 2024
59b1f42
final main checkpoint: `arango_rdf`
aMahanna Jan 20, 2024
55607ab
minor cleanup
aMahanna Jan 20, 2024
fa2742d
new: `__pgt_process_rdf_literal`
aMahanna Jan 20, 2024
f26944f
new: `serialize` as a conversion mode
aMahanna Jan 20, 2024
ef05135
new: `test_open_intelligence_graph`
aMahanna Jan 20, 2024
40f9441
fix lint
aMahanna Jan 20, 2024
4b9fd95
fix: `statements`, `rdf_graph` ref
aMahanna Jan 21, 2024
8028554
cleanup: `write_adb_col_statements`
aMahanna Jan 22, 2024
8f23305
initial commit
aMahanna Jan 22, 2024
7ffcca9
Merge branch 'housekeeping' into docs
aMahanna Jan 22, 2024
97cc772
fix lint
aMahanna Jan 22, 2024
b22dee4
update notebook
aMahanna Jan 23, 2024
a9a0c4d
checkpoint
aMahanna Jan 23, 2024
fb0d5d4
fix lint
aMahanna Jan 23, 2024
f6697f9
Create .readthedocs.yaml
aMahanna Jan 23, 2024
3bcbdda
Merge branch 'main' into docs
aMahanna Jan 23, 2024
9f59b96
Update README.md
aMahanna Jan 23, 2024
737b858
Update requirements.txt
aMahanna Jan 23, 2024
ddc2edc
fix: code block warning
aMahanna Jan 23, 2024
c8a5e05
cleanup
aMahanna Jan 23, 2024
498e150
nit
aMahanna Jan 23, 2024
7cb23e3
fix hyperlinks
aMahanna Jan 23, 2024
53cb251
fix docstring
aMahanna Jan 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/analyze.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@ env:
jobs:
build:
runs-on: ubuntu-latest
continue-on-error: true
strategy:
matrix:
python: ["3.8", "3.9", "3.10", "3.11", "3.12"]
name: Python ${{ matrix.python }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Setup Python ${{ matrix.python }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python }}
- name: Set up ArangoDB Instance via Docker
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Docs

on:
pull_request:
workflow_dispatch:

jobs:
docs:
runs-on: ubuntu-latest

name: Docs

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Fetch all tags and branches
run: git fetch --prune --unshallow

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: pip install .[dev] && pip install -r docs/requirements.txt

- name: Generate Sphinx HTML
run: cd docs && make html
4 changes: 2 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ jobs:
python-version: "3.10"

- name: Install release packages
run: pip install setuptools wheel twine setuptools-scm[toml]
run: pip install build twine

- name: Build distribution
run: python setup.py sdist bdist_wheel
run: python -m build

- name: Publish to Test PyPi
env:
Expand Down
162 changes: 44 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Resources to get started:
* [RDF Primer](https://www.w3.org/TR/rdf11-concepts/)
* [RDFLib (Python)](https://pypi.org/project/rdflib/)
* [One Example for Modeling RDF as ArangoDB Graphs](https://www.arangodb.com/docs/stable/data-modeling-graphs-from-rdf.html)

## Installation

#### Latest Release
Expand All @@ -41,69 +42,67 @@ pip install git+https://github.com/ArangoDB-Community/ArangoRDF
```

## Quickstart
Run the full version with Google Colab: <a href="https://colab.research.google.com/github/ArangoDB-Community/ArangoRDF/blob/main/examples/ArangoRDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<a href="https://colab.research.google.com/github/ArangoDB-Community/ArangoRDF/blob/main/examples/ArangoRDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

```py
from rdflib import Graph
from arango import ArangoClient
from arango_rdf import ArangoRDF

db = ArangoClient(hosts="http://localhost:8529").db("_system_", username="root", password="")
db = ArangoClient().db()

adbrdf = ArangoRDF(db)

g = Graph()
g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")

# RDF to ArangoDB
###################################################################################

# 1.1: RDF-Topology Preserving Transformation (RPT)
adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, overwrite_graph=True)

# 1.2: Property Graph Transformation (PGT)
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)

g = adbrdf.load_meta_ontology(g)
def beatles():
g = Graph()
g.parse("https://raw.githubusercontent.com/ArangoDB-Community/ArangoRDF/main/tests/data/rdf/beatles.ttl", format="ttl")
return g
```

# 1.3: RPT w/ Graph Contextualization
adbrdf.rdf_to_arangodb_by_rpt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
### RDF to ArangoDB

# 1.4: PGT w/ Graph Contextualization
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, contextualize_graph=True, overwrite_graph=True)
```py
# 1. RDF-Topology Preserving Transformation (RPT)
adbrdf.rdf_to_arangodb_by_rpt(name="BeatlesRPT", rdf_graph=beatles(), overwrite_graph=True)

# 1.5: PGT w/ ArangoDB Document-to-Collection Mapping Exposed
adb_mapping = adbrdf.build_adb_mapping_for_pgt(g)
print(adb_mapping.serialize())
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, adb_mapping, contextualize_graph=True, overwrite_graph=True)
# 2. Property Graph Transformation (PGT)
adbrdf.rdf_to_arangodb_by_pgt(name="BeatlesPGT", rdf_graph=beatles(), overwrite_graph=True)
```

# ArangoDB to RDF
###################################################################################
### ArangoDB to RDF

# Start from scratch!
g = Graph()
g.parse("https://raw.githubusercontent.com/stardog-union/stardog-tutorials/master/music/beatles.ttl")
adbrdf.rdf_to_arangodb_by_pgt("Beatles", g, overwrite_graph=True)
**Note**: RDF-to-ArangoDB functionality has been implemented using concepts described in the paper
*[Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches](https://arxiv.org/abs/2210.05781)*. So we offer two transformation approaches:

# 2.1: Via Graph Name
g2, adb_mapping_2 = adbrdf.arangodb_graph_to_rdf("Beatles", Graph())
1. RDF-Topology Preserving Transformation (RPT)
2. Property Graph Transformation (PGT)

# 2.2: Via Collection Names
g3, adb_mapping_3 = adbrdf.arangodb_collections_to_rdf(
"Beatles",
Graph(),
v_cols={"Album", "Band", "Class", "Property", "SoloArtist", "Song"},
e_cols={"artist", "member", "track", "type", "writer"},
```py
# 1. Graph to RDF
rdf_graph = adbrdf.arangodb_graph_to_rdf(name="BeatlesRPT", rdf_graph=Graph())

# 2. Collections to RDF
rdf_graph_2 = adbrdf.arangodb_collections_to_rdf(
name="BeatlesRPT",
rdf_graph=Graph(),
v_cols={"BeatlesRPT_URIRef", "BeatlesRPT_Literal", "BeatlesRPT_BNode"},
e_cols={"BeatlesRPT_Statement"}
)

print(len(g2), len(adb_mapping_2))
print(len(g3), len(adb_mapping_3))

print('--------------------')
print(g2.serialize())
print('--------------------')
print(adb_mapping_2.serialize())
print('--------------------')
# 3. Metagraph to RDF
rdf_graph_3 = adbrdf.arangodb_to_rdf(
name="BeatlesPGT",
rdf_graph=Graph(),
metagraph={
"vertexCollections": {
"Album": {"name", "date"},
"SoloArtist": {}
},
"edgeCollections": {
"artist": {}
}
}
)
```

## Development & Testing
Expand All @@ -123,76 +122,3 @@ def pytest_addoption(parser):
parser.addoption("--username", action="store", default="root")
parser.addoption("--password", action="store", default="")
```

## Additional Info: RDF to ArangoDB

RDF-to-ArangoDB functionality has been implemented using concepts described in the paper *[Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches](https://arxiv.org/abs/2210.05781)*.

In other words, `ArangoRDF` offers 2 RDF-to-ArangoDB transformation methods:
1. RDF-topology Preserving Transformation (RPT): `ArangoRDF.rdf_to_arangodb_by_rpt()`
2. Property Graph Transformation (PGT): `ArangoRDF.rdf_to_arangodb_by_pgt()`

RPT preserves the RDF Graph structure by transforming each RDF Statement into an ArangoDB Edge.

PGT on the other hand ensures that Datatype Property Statements are mapped as ArangoDB Document Properties.

```ttl
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:book ex:publish_date "1963-03-22"^^xsd:date .
ex:book ex:pages "100"^^xsd:integer .
ex:book ex:cover 20 .
ex:book ex:index 55 .
```

| RPT | PGT |
|:-------------------------:|:-------------------------:|
| ![image](https://user-images.githubusercontent.com/43019056/232347662-ab48ebfb-e215-4aff-af28-a5915414a8fd.png) | ![image](https://user-images.githubusercontent.com/43019056/232347681-c899ef09-53c7-44de-861e-6a98d448b473.png) |

--------------------
### RPT


The `ArangoRDF.rdf_to_arangodb_by_rpt` method will store the RDF Resources of your RDF Graph under the following ArangoDB Collections:

- {graph_name}_URIRef: The Document collection for `rdflib.term.URIRef` resources.
- {graph_name}_BNode: The Document collection for`rdflib.term.BNode` resources.
- {graph_name}_Literal: The Document collection for `rdflib.term.Literal` resources.
- {graph_name}_Statement: The Edge collection for all triples/quads.

--------------------
### PGT

In contrast to RPT, the `ArangoRDF.rdf_to_arangodb_by_pgt` method will rely on the nature of the RDF Resource/Statement to determine which ArangoDB Collection it belongs to. This is referred as the **ArangoDB Collection Mapping Process**. This process relies on 2 fundamental URIs:

1) `<http://www.arangodb.com/collection>` (adb:collection)
- Any RDF Statement of the form `<http://example.com/Bob> <adb:collection> "Person"` will map the Subject to the ArangoDB "Person" document collection.

2) `<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>` (rdf:type)
- This strategy is divided into 3 cases:

1. If an RDF Resource only has one `rdf:type` statement,
then the local name of the RDF Object is used as the ArangoDB
Document Collection name. For example,
`<http://example.com/Bob> <rdf:type> <http://example.com/Person>`
would create an JSON Document for `<http://example.com/Bob>`,
and place it under the `Person` Document Collection.
NOTE: The RDF Object will also have its own JSON Document
created, and will be placed under the "Class"
Document Collection.

2. If an RDF Resource has multiple `rdf:type` statements,
with some (or all) of the RDF Objects of those statements
belonging in an `rdfs:subClassOf` Taxonomy, then the
local name of the "most specific" Class within the Taxonomy is
used (i.e the Class with the biggest depth). If there is a
tie between 2+ Classes, then the URIs are alphabetically
sorted & the first one is picked.

3. If an RDF Resource has multiple `rdf:type` statements, with none
of the RDF Objects of those statements belonging in an
`rdfs:subClassOf` Taxonomy, then the URIs are
alphabetically sorted & the first one is picked. The local
name of the selected URI will be designated as the Document
collection for that Resource.
--------------------
1 change: 1 addition & 0 deletions arango_rdf/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from arango_rdf.main import ArangoRDF # noqa: F401
from arango_rdf.controller import ArangoRDFController # noqa: F401
34 changes: 23 additions & 11 deletions arango_rdf/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,23 +22,25 @@ def rdf_to_arangodb_by_rpt(
name: str,
rdf_graph: RDFGraph,
contextualize_graph: bool,
flatten_reified_triples: bool,
use_hashed_literals_as_keys: bool,
overwrite_graph: bool,
use_async: bool,
batch_size: Optional[int],
**import_options: Any,
**adb_import_kwargs: Any,
) -> ADBGraph:
raise NotImplementedError # pragma: no cover

def rdf_to_arangodb_by_pgt(
self,
name: str,
rdf_graph: RDFGraph,
adb_col_statements: Optional[RDFGraph],
write_adb_col_statements: bool,
contextualize_graph: bool,
flatten_reified_triples: bool,
overwrite_graph: bool,
use_async: bool,
batch_size: Optional[int],
adb_mapping: Optional[RDFGraph],
**import_options: Any,
**adb_import_kwargs: Any,
) -> ADBGraph:
raise NotImplementedError # pragma: no cover

Expand All @@ -47,10 +49,14 @@ def arangodb_to_rdf(
name: str,
rdf_graph: RDFGraph,
metagraph: ADBMetagraph,
explicit_metagraph: bool,
list_conversion_mode: str,
dict_conversion_mode: str,
infer_type_from_adb_v_col: bool,
include_adb_key_statements: bool,
**export_options: Any,
include_adb_v_col_statements: bool,
include_adb_v_key_statements: bool,
include_adb_e_key_statements: bool,
**adb_export_kwargs: Any,
) -> Tuple[RDFGraph, RDFGraph]:
raise NotImplementedError # pragma: no cover

Expand All @@ -61,9 +67,12 @@ def arangodb_collections_to_rdf(
v_cols: Set[str],
e_cols: Set[str],
list_conversion_mode: str,
dict_conversion_mode: str,
infer_type_from_adb_v_col: bool,
include_adb_key_statements: bool,
**export_options: Any,
include_adb_v_col_statements: bool,
include_adb_v_key_statements: bool,
include_adb_e_key_statements: bool,
**adb_export_kwargs: Any,
) -> Tuple[RDFGraph, RDFGraph]:
raise NotImplementedError # pragma: no cover

Expand All @@ -72,9 +81,12 @@ def arangodb_graph_to_rdf(
name: str,
rdf_graph: RDFGraph,
list_conversion_mode: str,
dict_conversion_mode: str,
infer_type_from_adb_v_col: bool,
include_adb_key_statements: bool,
**export_options: Any,
include_adb_v_col_statements: bool,
include_adb_v_key_statements: bool,
include_adb_e_key_statements: bool,
**adb_export_kwargs: Any,
) -> Tuple[RDFGraph, RDFGraph]:
raise NotImplementedError # pragma: no cover

Expand Down
Loading
Loading