-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
elaborating and restructuring the docs as suggested in #35
- Loading branch information
1 parent
9d3f2a5
commit 40eec63
Showing
4 changed files
with
253 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,15 @@ | ||
# py-SUbyT | ||
|
||
Py Semantic Uplifting by Template - A python module for Linked Data production (aka semantic uplifting) through Templating | ||
A <u>Py</u>thon library for <u>S</u>emantic <u>U</u>plifting <u>by</u> <u>T</u>emplates. | ||
|
||
### Usage | ||
Please check out the Py-SUByt [user guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-usage.md) and [style guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-style.md)! | ||
An easy way (through python) to produce Linked Data | ||
(aka semantic uplifting) | ||
from classic data files (CSV, XML, JSON) into triples (RDF, turtle) | ||
through jinja-Templating | ||
|
||
### Usage and further reading | ||
Please check out the Py-SUbyt documentation. Namely: | ||
- the [user guide](./docs/cli-usage.md) | ||
- the supported [features](./docs /features.md) | ||
- the [examples](./docs/examples.md) | ||
- and the [style guide](./docs/cli-style.md)! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# pySUbyT Examples | ||
|
||
The source code comes with a number of included tests found in the `./tests` folder. These are used to automtaically ensure changes to the code do not introduce breaking any of the guaranteed features of the package. | ||
|
||
They also serve as a great way to learn about how you could be using pySUbyT for your own projects. | ||
|
||
The `./tests` folder largely contains: | ||
| subfolder path | holding | for | ||
|-----------------------|-------------------|---------- | ||
| `./tests/in` | data input files | that are injected into the context so the templates can access them. `data.csv` makes up for the core input made available as `_`, while the various `data-name` sections provide auxilary sets available as `sets['name']` | ||
| `./tests/templates` | template files | showing off the various features, their names also hold the modifieres to be applied when executing them | ||
| `./tests/out` | resulting output | matching the names of the template files, these contain the expected outcome of correctly executing the templates with the given inputs | ||
|
||
Running these tests yourself assumes: | ||
* a source-code checkout of the repository | ||
* a nicely set up virtualenv | ||
|
||
All tests are executed automatically by executing `make test`, but you can run each example yourself through the provided cli Command | ||
|
||
Below we describe each test-template and cross-reference the [features](./styleguide.html#a-supported-features) they are made to highlight. | ||
|
||
## 01-basic.ttl | ||
> see [tests/templates/01-basic.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl) | ||
``` bash | ||
$ pysubyt --templates tests/templates --input tests/in/data.csv --name 01-basic.ttl | ||
``` | ||
|
||
This straightforward template just converts each row in the provided data.csv into a bunch of templated triples in `text/turtle` format. | ||
|
||
In doing so it uses a number of basic helpful techniques provided by pySUbyT: | ||
- [uritemplate-expansion](./features.md#uritemplate-expansion) | ||
- [regex-replacements](./features.md#regex-replacements) | ||
- [turtle formatting](./features.md#turtle-formatting) | ||
- [process control indicators](./features.md#process-control-indicators) | ||
|
||
|
||
|
||
TODO follow the above pattern and further describe the essence of all Examples | ||
|
||
## 02-collection_no-it.ttl | ||
> see [tests/templates/02-collection_no-it.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection_no-it.tt) | ||
``` bash | ||
``` | ||
|
||
## 02-collection.ttl | ||
``` bash | ||
``` | ||
|
||
## 03-demo-j2_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 04-json-team_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 05-jsonify_no-it.json | ||
``` bash | ||
``` | ||
|
||
## 06-singlejson_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 07-folderinput_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 08-singlexml_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 09-mixedxml_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 10-csv-experiment_no-it.ttl | ||
``` bash | ||
``` | ||
|
||
## 11-schemadriven.ttl | ||
``` bash | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Supported features | ||
|
||
Below is a description of the main features that are currently supported in PySUByT together with links to [examples](./examples.md) of how to use them. | ||
|
||
If you would like to have features added, please check out the already reported issues and [create a new one](https://github.com/vliz-be-opsci/pysubyt/issues/new) if your specific idea isn't listed yet. | ||
|
||
## uritemplate-expansion: | ||
|
||
We recommend using the built-in function `uritexpand(uritemplat, context)` to produce valid URI based on data variables available in a context-object. (and not through cumbersome, non-standard, and often brittle string-concatenation techniques) | ||
|
||
This implementation follows the [URITemplates - RFC6570]() standard. | ||
|
||
Assuming a context object `_` that looks like: | ||
```json | ||
{ | ||
"id": 01472, | ||
"type": 'boat' | ||
} | ||
``` | ||
|
||
Then a basic template containing: | ||
```jinja | ||
<{{uritexpand("https://ex.org{/type}{#id}",_)}}> a ex:{{_.type}} . | ||
``` | ||
|
||
Will produce: | ||
```turtle | ||
<https://ex.org/boat#01472> a ex:boat . | ||
``` | ||
|
||
> see [example-01](./examples.md#01-basic.ttl), and basically every other one in fact, because this is just the way to go! | ||
|
||
## regex-replacements: | ||
|
||
For basic string-reformatting we provided the `regereplace(regexmatch, replacement, original)` function. | ||
|
||
So that e.g. | ||
```jinja | ||
{{regexreplace('^[^:]*:', '', 'all-after-semicolon:is-kept')}} | ||
``` | ||
|
||
Will simply throw away all text before and including the first semicolon. | ||
|
||
> see [example-01](./examples.md#01-basic.ttl) | ||
## turtle formatting: | ||
|
||
We consider `text/turtle` to be (contemporary) the triple-format with the best balanced features towards 'ease of reading/writing' by humans at least. More machine-friendly RDF formats can be easily generated from it. Following that thought we assume templates will be easiest to read/write when targeting that output-format. | ||
|
||
While the underlying jinjai engine still allows to generate any text based output, we currently have made a special effort to facilitate typical turtle-based string-formatting. | ||
|
||
The provided jinja filter that deals with this is called `| ttl(type, quote_char)` and uses a type-argument to indicate the intended effect. | ||
|
||
The quote_char to be used can be either `'` or `"` and is optional (defaulting to `'`). Any escaping to the nested content will be applied automatically. Additionally the triple-quote variant will be applied when the content is spread over multiple lines. (an often overlooked [nuance of the turtle spec](https://www.w3.org/TR/turtle/#h4_turtle-literals)) | ||
|
||
Supported types are: | ||
|
||
| type-string | effect | | ||
|-------------|------------| | ||
| @langcode | adds @language-code suffix for translated strings, e.g. "Dit is nederlands"@nl | ||
| xsd:string | adds `^^xsd:string` | ||
| xsd:integer | verifies the content is a valid int (or a string that can be unambiguously converted to one) and adds `^^xsd:integer` | ||
| xsd:double | verifies the content is a valid double (or a string that can be unambiguously converted to one) and adds `^^xsd:double` | ||
| xsd:date | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:date` | ||
| xsd:datetime | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:datetime` | ||
| xsd:boolean | converts any of `'', 'no, 'false', 'off', '0', 0, False` to `false` and everything else to `true`, adds the `^^xsd:boolean` | ||
| xsd:anyURI | adds `^^xsd:anyURI` | ||
|
||
> see [example-01](./examples.md#01-basic.ttl) | ||
## file input from various sources: | ||
|
||
TODO describe how | ||
* data-access of main set via `_` | ||
* data-access of additional sets via `sets[«name»]` | ||
|
||
### csv input: | ||
|
||
TODO obvious but useful to mention that fieldnames come from the column names (a line with those is assumed) | ||
|
||
> see [example-01](./examples.md#01-basic.ttl) | ||
### xml input: | ||
|
||
TODO mention xmladict + mixed content model support | ||
|
||
> see [example-08](./examples.md#08-singlexml_no-it.ttl), [example-09](./examples.md#09-mixedxml_no-it.ttl) | ||
### json input: | ||
|
||
TODO give a simple exmaple | ||
|
||
> see [example-04](./example.md#04-json-team_no-it.ttl), [example-06](./examples.md#06-singlejson_no-it.ttl) | ||
### folder input: | ||
|
||
TODO point out the mixed-format support | ||
|
||
> see [example-07](./examples.md#07-folderinput_no-it.ttl) | ||
## mapping: | ||
|
||
TODO the why and how | ||
|
||
> see [example-02](./examples.md#02-collection.ttl), and [example-02-no-it](./examples.md#02-collection_no-it.ttl)) | ||
## various execution mode settings: | ||
see [client docu](./cli.md) | ||
|
||
## template management features provided by Jinja2: | ||
> see [example-03](./examples.md#03-demo-j2_no-it.ttl), and [example-05](./examples.md#05-jsonify_no-it.json), | ||
Please check out the [Jinja documentation](https://jinja.palletsprojects.com/en/3.0.x/) as well. | ||
|
||
## process control indicators | ||
|
||
TODO explain {{ctrl.isFirst}} {{ctrl.isLast}} {{ctrl.index}} | ||
|
||
> see [example-01](./examples.md#01-basic.ttl) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters