Skip to content
/ jena-es Public
forked from gertvv/jena-es

Implementation of event sourcing on top of Apache Jena and Fuseki

Notifications You must be signed in to change notification settings

drugis/jena-es

 
 

Repository files navigation

Jena Event Sourcing

Defines a mechanism for versioning of RDF datasets, which are collections of graphs. It is an implementation of the Event Sourcing pattern for the Apache Jena triple store. For more information on all components of the drugis project, please refer to the OVERALL-README.md in the root folder of the addis-core project. The versioning model is designed to be compatible with the Memento RFC, but does not implement that RFC; instead, it provides the semantics that allow a client to implement the RFC on top of this. Other sources of inspiration were Datomic and Git. There is no authentication or authorization: any valid request will be executed. All URIs are generated by the versioning store itself and are not meant to have a particular meaning to the end user.

Running

Set the URI prefix for the store, e.g.:

export EVENT_SOURCE_URI_PREFIX=http://localhost:8080

The run the application directly through Maven:

mvn spring-boot:run

Or package it (mvn install spring-boot:repackage) and then run it using java -jar.

Or, after packaging it as above, run it in a docker container. The Dockerfile is in the docker/ subdirectory. Copy the built jar file there.

example run command (using volume to make the database available externally):

docker run -d \
  -p 3030:8080 \
  -e EVENT_SOURCE_URI_PREFIX=https://localdocker.com:3030 \
  --name="jena-es" 
  -v /home/username/jena-es/docker/DB:/DB  
  jena-es

Note that on MacOS or Windows, problems could arise if the host volume is in a Users directory - see e.g. stain/jena-docker#1

Data model

The data model is described in RDF, using a vocabulary under the prefix http://drugis.org/eventSourcing/es#, abbreviated as es:. Use is also made of Dublin Core Terms, abbreviated as dcterms:. All datasets served by the event sourcing RDF store are stored in a single underlying RDF dataset. The meta-data are stored in the default graph, and named graphs are used to store differences between graph versions.

Datasets match the following pattern:

?dataset a es:Dataset ;
  dcterms:date ?creationDate ;
  dcterms:creator ?creatorUri ;
  es:head ?version .

Of these triples, only the ?dataset es:head ?version triple will ever change. The ?creatorUri has no specific meaning to the event sourcing system, but is assumed to be meaningful to the client. The ?version is further defined as:

?version a es:DatasetVersion ;
  dcterms:date ?versionDate ;
  dcterms:creator ?creatorUri ;
  es:dataset ?dataset .

Here, the es:dataset predicate identifies the dataset for which this version was originally created. Most versions will also point to their predecessor using

?version es:previous ?previousVersion .

Each version enumerates the graphs it contains, by pointing to specific graph revisions:

?version es:graph_revision [
    es:graph ?graphUri ;
    es:revision ?revision .
  ] .
?version es:default_graph_revision [
    es:revision ?revision .
  ] .

A revision is a changeset applied to a single graph, and is represented as follows:

?revision a es:Revision ;
  es:previous ?previousRevision ;
  es:assertions ?assertionsGraph ;
  es:retractions ?retractionsGraph .

The contents of a dataset version can be constructed by constructing the contents of each revision. These, in turn, can be constructed by replaying the series of changesets given by the es:previous property on each revision, from oldest to newest.

URIs for datasets, versions, revisions, assertions, and retractions are generated by the event sourcing store, and are expected to resolve properly, as described below.

Merging

Merging of changes made to different datasets will not be supported directly by the event sourcing store, but can be indicated in meta-data for specific commits. For example, when a dataset is initially created as a copy of another dataset, its meta-data would make use of the es:merged property to indicate the copied version. This is an especially simple type of merge, which we indicate using ?version es:mergeType es:MergeCopyTheirs.

These semantics will be extended in future versions to allow for other merge strategies and to enable the client to specify merge meta-data.

Algorithm for changesets

Interaction through the SPARQL graph store or the SPARQL protocol endpoints may alter the contents of a graph. If G is the graph prior to these changes and H is the graph after, then we define the additions A = H - G and the retractions R = G - H. A changeset then is the pair (A, R). To apply a changeset (A, R) to a graph G, compute H = (G - R) ∪ A. To clarify, '-' refers to the set difference of triples contained in each graph, and '∪' to union of the sets of triples.

When transferred over HTTP, changesets will be serialized using TurtlePatch, a strict subset of SPARQL that represents the additions/retractions model, and is also very close to plain Turtle (See LDP PATCH proposals for alternatives).

If an interaction alters one or more graphs, then the above algorithm is used to compute changesets for each graph. A new revision of each such graph is created, and a new version of the dataset is instantiated that refers to the new revisions for those graphs, and to existing revisions of all other graphs. An exception is when a graph is explicitly deleted (using the HTTP DELETE method), or when all triples in a graph are deleted. In that case, the new version of the dataset will not make reference to that graph (i.e. the has_graph_revision triple will be deleted as well). In case a new graph is created (explicilty through an HTTP POST of graph content, or implicitly through a SPARQL query), this will be represented through a graph revision with no es:previous.

Blank nodes are handled using skolemization (W3C guidance, discussion), i.e. replacing blank nodes (which are essentially existential operators) with specific instances (URIs). Skolemization is important to ensure that changesets containing blank nodes are unambiguous even if they are transferred between systems. Skolem URIs must be "fresh" (globally unique) and they must be recognizable as skolems, so they can be mapped back to blank nodes if necessary. Freshness is achieved by generating 128-bit Flake IDs, and a recognizable URI is derived by appending a Base 64 Encoding with URL and Filename Safe Alphabet without padding to the /.well-known/skolem/ prefix. Note that skolemization must happen before changesets are calculated, and that if clients create new blank nodes, they must retrieve the updated graph to learn the skolem URIs of those blank nodes.

TODO: should there be a route for de-skolemized versions of graphs? Other solutions to blank nodes in changesets are possible, but complicate the way differences are computed - they are NP-complete.

Creating datasets

A new dataset is created though a POST request to /datasets. The X-EventSource-Creator header can be used to specify a URI identifying the author, and the X-EventSource-Title and X-EventSource-Description headers can be used to specify a commit message for the initial version. The request body may be empty, and in that case no Content-Type should be declared:

POST /datasets HTTP/1.1
Host: example.com
X-EventSource-Creator: http://example.com/GreenGoblin
X-EventSource-Title: SW5pdGlhbCB2ZXJzaW9u

Alternatively, an RDF payload may be sent in the request body, and in that case the appropriate Content-Type header needs to be set. This will be the initial content of the default graph of the dataset:

POST /datasets HTTP/1.1
Host: example.com
X-EventSource-Creator: http://example.com/GreenGoblin
X-EventSource-Title: SW5pdGlhbCB2ZXJzaW9u
Content-Type: text/turtle

<http://example.com/GreenGoblin> a <http://example.com/GreatGuy> .

This will result in a 201 Created response indicating both the location of the new dataset and the initial version ID.

HTTP/1.1 201 Created
Location: http://example.com/datasets/ubb245f8sz
X-EventSource-Version: http://example.com/versions/3ucq3j5c7u

A special route is available for copying (cloning) datasets:

POST /datasets?copyOf=http://example.com/versions/7wi4xglx1c HTTP/1.1
Host: example.com
X-EventSource-Creator: http://example.com/PeterParker
X-EventSource-Title: Q29weSBHcmVlbkdvYmxpbi9TcGlkZXJtYW4=

The response contract is identical, but in this case special meta-data will be attached to the initial version of this dataset to indicate it is a copy of another dataset.

Updating and querying datasets

Under /datasets/:dataset-id reside a SPARQL protocol query (./query) and update (./update) endpoint, as well as a SPARQL graph store (./data) using the indirect graph identification pattern. The protocol is modified in three ways:

  1. All requests can specify the X-Accept-EventSource-Version header, and the server will Vary: X-Accept-EventSource-Version. For requests that do not alter the state (i.e. requests to the query endpoint and GET requests to the graph store), the header indicates that the contents of a previous version are being queried. For requests that do alter state, the header indicates that the requests expects the latest version to be the version specified in the header. If it is not, the request fails with a 409 Conflict response. If the header is not specified, the action is executed against the latest version.

  2. All write requests can specify the X-EventSource-Creator header to set the creator attribute of the new version, and X-EventSource-Title and X-EventSource-Description to provide a short and long description respectively. The X-EventSource-Title and X-EventSource-Description headers must be encoded in UTF-8 using Base 64 Encoding.

  3. All requests return a X-EventSource-Version header. For read requests this indicates the version being returned, while for write requests it indicates the new version created.

TODO: should (some of) the request headers be mandatory?

Copying of graphs is supported on the graph store route, using the copyOf parameter:

POST /datasets/:dataset-id/data?graph=:graphUri&copyOf=:revision HTTP/1.1

Similarly to copying of datasets, this is represented in event meta-data through the es:merged and es:mergeType predicates.

Investigating history

GET /datasets/:dataset-id HTTP/1.1
Host: example.com

Returns a basic description of the dataset, including its most recent version and the revisions contained therein. This can be resolved using the following SPARQL query (where $dataset needs to be replaced):

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX es: <http://drugis.org/eventSourcing/es#>

CONSTRUCT {
 ?s ?p ?o
} WHERE {
  $dataset owl:sameAs{0} ?dataset .

  ?dataset es:head ?version .
  {
    ?version es:graph_revision ?graphRev .
  } UNION {
    ?version es:default_graph_revision ?graphRev .
  }
  ?graphRev es:revision/es:previous* ?revision .

  {
    ?dataset owl:sameAs{0} ?s .
  } UNION {
    ?version owl:sameAs{0} ?s .
  } UNION {
    ?graphRev owl:sameAs{0} ?s .
  } UNION {
    ?revision owl:sameAs{0} ?s .
  }

  ?s ?p ?o .
}
GET /datasets/:dataset-id/history HTTP/1.1
Host: example.com

Should return the full history for the given dataset, including the dataset definition itself. This can be resolved using the following SPARQL query (where $dataset needs to be replaced):

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX es: <http://drugis.org/eventSourcing/es#>

CONSTRUCT {
 ?s ?p ?o
} WHERE {
  $dataset owl:sameAs{0} ?dataset .

  ?dataset es:head ?head .
  ?head es:previous* ?version .
  ?version es:merged ?mergedVersion .
  {
    ?version es:graph_revision ?graphRev .
  } UNION {
    ?version es:default_graph_revision ?graphRev .
  }
  ?graphRev es:revision/es:previous* ?revision .
  ?revision es:version ?referencedVersion .

  {
    ?dataset owl:sameAs{0} ?s .
  } UNION {
    ?version owl:sameAs{0} ?s .
  } UNION {
    ?mergedVersion owl:sameAs{0} ?s .
  } UNION {
    ?graphRev owl:sameAs{0} ?s .
  } UNION {
    ?revision owl:sameAs{0} ?s .
  } UNION {
    ?referencedVersion owl:sameAs{0} ?s .
  }

  ?s ?p ?o .
}

Each version will return a description of itself, including any revisions and the dataset. Each revision will return its contents. TODO: viewing history of a revision? TODO: retrieving TurtlePatch? Each assertions and retractions will return its contents.

More complicated queries regarding history can be answered by the /query SPARQL protocol endpoint, which exposes the underlying dataset as is.

Example

Note: this example uses a variety of different ID generation schemes. IDs as shown here should not be seen as indicative of the actual ID generation scheme to be used.

Say the Green Goblin creates a dataset to state that, in fact, Peter Parker goes by the names "Peter Parker" and "Spiderman".

To do this, we first create the new dataset:

POST /datasets HTTP/1.1
Host: example.com
X-EventSource-Creator: http://example.com/GreenGoblin
X-EventSource-Title: SW5pdGlhbCB2ZXJzaW9u

And the server responds with a newly created dataset, and indicates the initial version ID:

HTTP/1.1 201 Created
Location: http://example.com/datasets/ubb245f8sz
X-EventSource-Version: http://example.com/versions/3ucq3j5c7u

After this, the triple store looks like:

@prefix dcterms:   <http://purl.org/dc/elements/1.1/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix es: <http://drugis.org/eventSourcing/es#> .
@prefix dataset: <http://example.com/datasets/> .

dataset:ubb245f8sz a es:Dataset ;
  dcterms:date "2014-09-24T12:40:03+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/GreenGoblin> ;
  es:head version:3ucq3j5c7u .

version:3ucq3j5c7u a es:DatasetVersion ;
  dcterms:date "2014-09-24T12:40:03+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/GreenGoblin> ;
  dcterms:title "Initial version" .

Then, we post new contents for the graph describing Peter Parker:

POST /datasets/ubb245f8sz/data?graph=http://example.com/PeterParker HTTP/1.1
Host: example.com
Content-Type: text/turtle
X-EventSource-Creator: http://example.com/GreenGoblin
X-EventSource-Title: UGV0ZXIgUGFya2VyIGlzIFNwaWRlcm1hbg==
X-EventSource-Description: SXQgaXMgdGltZSB0aGUgd29ybGQga25ldy4uLg0KVGhhdCBQZXRlciBQYXJrZXIgaXMgU3BpZGVybWFuIQ==
X-Accept-EventSource-Version: http://example.com/versions/3ucq3j5c7u

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.com/PeterParker> a foaf:Person ;
  foaf:name "Peter Parker", "Spiderman" .

The response has no content, and contains a header indicating the created version:

HTTP/1.1 204 No Content
X-EventSource-Version: http://example.com/versions/7wi4xglx1c

Now, Peter Parker wants to dispute this claim made by the Green Goblin, and starts by creating a copy of his dataset:

POST /datasets?copyOf=http://example.com/versions/7wi4xglx1c HTTP/1.1
Host: example.com
X-EventSource-Creator: http://example.com/PeterParker
X-EventSource-Title: Q29weSBHcmVlbkdvYmxpbi9TcGlkZXJtYW4=

Notice that no title or description are provided, because no new version is created. The response, again, indicates the location and version of the newly created dataset:

HTTP/1.1 201 Created
Location: http://example.com/datasets/qmk2x16nz1
X-EventSource-Version: http://example.com/versions/f98gj2sgsn

Then Peter procedes to run a SPARQL update query to set the record straight:

POST /datasets/qmk2x16nz1/update HTTP/1.1
Host: example.com
Content-Type: application/sparql-query
X-EventSource-Creator: http://example.com/PeterParker
X-EventSource-Title: VGhlIEdyZWVuIEdvYmxpbiBpcyBhIGxpYXIh
X-Accept-EventSource-Version: http://example.com/versions/7wi4xglx1c

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

DELETE DATA {
  GRAPH <http://example.com/PeterParker> {
    <http://example.com/PeterParker> foaf:name "Spiderman"
  }
};
INSERT DATA {
  GRAPH <http://example.com/Spiderman> {
    <http://example.com/Spiderman> a foaf:Person ;
      foaf:name "Spiderman" .
  }
  GRAPH <http://example.com/PeterParker> {
    <http://example.com/PeterParker> foaf:homepage <http://www.okcupid.com/profile/PeterParker> .
  }
}

Which results in a newly created version:

HTTP/1.1 204 No Content
X-EventSource-Version: http://example.com/versions/g05ri5qvvq

Complete turtle of final store

@prefix dcterms:   <http://purl.org/dc/elements/1.1/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix es: <http://drugis.org/eventSourcing/es#> .

@prefix dataset: <http://example.com/datasets/> .
@prefix version: <http://example.com/versions/> .
@prefix revision: <http://example.com/revisions/> .
@prefix assert: <http://example.com/assertions/> .
@prefix retract: <http://example.com/retractions/> .

dataset:qmk2x16nz1 a es:Dataset ;
  dcterms:date "2014-09-24T12:49:18+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/Spiderman> ;
  es:head version:g05ri5qvvq .

dataset:ubb245f8sz a es:Dataset ;
  dcterms:date "2014-09-24T12:40:03+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/GreenGoblin> ;
  es:head version:7wi4xglx1c .

version:g05ri5qvvq a es:DatasetVersion ;
  dcterms:date "2014-09-24T12:58:16,835290832+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/Spiderman> ;
  dcterms:title "The Green Goblin is a liar!" ;
  es:previous version:f98gj2sgsn ;
  es:graph_revision [
    es:graph <http://example.com/Spiderman> ;
    es:revision revision:302431f4-43e8-11e4-8745-c72e64fa66b1
  ] ;
  es:graph_revision [
    es:graph <http://example.com/PeterParker> ;
    es:revision revision:44ea0618-43e8-11e4-bcfb-bba47531d497
  ] .

version:f98gj2sgsn a es:DatasetVersion ;
  dcterms:date "2014-09-24T12:56:27+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/Spiderman> ;
  dcterms:title "Copy GreenGoblin/Spiderman" ;
  es:merged version:7wi4xglx1c ;
  es:mergeType es:MergeCopyTheirs ;
  es:graph_revision [
    es:graph <http://example.com/PeterParker> ;
    es:revision revision:38fc1de7a-43ea-11e4-a12c-3314171ce0bb
  ] .

version:7wi4xglx1c a es:DatasetVersion ;
  dcterms:date "2014-09-24T12:45:25,048366032+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/GreenGoblin> ;
  dcterms:title "Peter Parker is Spiderman" ;
  dcterms:description "It is time the world knew...\nThat Peter Parker is Spiderman!" ;
  es:previous version:3ucq3j5c7u ;
  es:graph_revision [
    es:graph <http://example.com/PeterParker> ;
    es:revision revision:38fc1de7a-43ea-11e4-a12c-3314171ce0bb
  ] .

version:3ucq3j5c7u a es:DatasetVersion ;
  dcterms:date "2014-09-24T12:40:03+0000"^^xsd:dateTime ;
  dcterms:creator <http://example.com/GreenGoblin> ;
  dcterms:title "Initial version" .

revision:302431f4-43e8-11e4-8745-c72e64fa66b1 a es:Revision ;
  es:version version:g05ri5qvvq ;
  es:assertions assert:302431f4-43e8-11e4-8745-c72e64fa66b1 .

revision:44ea0618-43e8-11e4-bcfb-bba47531d497 a es:Revision ;
  es:version version:g05ri5qvvq ;
  es:previous revision:38fc1de7a-43ea-11e4-a12c-3314171ce0bb ;
  es:assertions assert:44ea0618-43e8-11e4-bcfb-bba47531d497 ;
  es:retractions retract:44ea0618-43e8-11e4-bcfb-bba47531d497 .

revision:38fc1de7a-43ea-11e4-a12c-3314171ce0bb a es:Revision ;
  es:version version:7wi4xglx1c ;
  es:assertions assert:844908ec-43eb-11e4-ac51-6b523949084e .

assert:844908ec-43eb-11e4-ac51-6b523949084e {

  <http://example.com/PeterParker> a foaf:Person ;
    foaf:name "Peter Parker", "Spiderman" .

}

retract:44ea0618-43e8-11e4-bcfb-bba47531d497 {

  <http://example.com/PeterParker> foaf:name "Spiderman" .

}

assert:44ea0618-43e8-11e4-bcfb-bba47531d497 {

  <http://example.com/PeterParker> foaf:homepage <http://www.okcupid.com/profile/PeterParker> .

}

assert:302431f4-43e8-11e4-8745-c72e64fa66b1 {

  <http://example.com/Spiderman> a foaf:Person;
    foaf:name "Spiderman" .

}

About

Implementation of event sourcing on top of Apache Jena and Fuseki

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 86.0%
  • Shell 10.5%
  • Smarty 3.3%
  • Dockerfile 0.2%