Skip to content

Commit

Permalink
elaborating and restructuring the docs as suggested in #35
Browse files Browse the repository at this point in the history
  • Loading branch information
marc-portier committed Jun 10, 2022
1 parent 9d3f2a5 commit 40eec63
Show file tree
Hide file tree
Showing 4 changed files with 253 additions and 61 deletions.
15 changes: 12 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# py-SUbyT

Py Semantic Uplifting by Template - A python module for Linked Data production (aka semantic uplifting) through Templating
A <u>Py</u>thon library for <u>S</u>emantic <u>U</u>plifting <u>by</u> <u>T</u>emplates.

### Usage
Please check out the Py-SUByt [user guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-usage.md) and [style guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-style.md)!
An easy way (through python) to produce Linked Data
(aka semantic uplifting)
from classic data files (CSV, XML, JSON) into triples (RDF, turtle)
through jinja-Templating

### Usage and further reading
Please check out the Py-SUbyt documentation. Namely:
- the [user guide](./docs/cli-usage.md)
- the supported [features](./docs /features.md)
- the [examples](./docs/examples.md)
- and the [style guide](./docs/cli-style.md)!
85 changes: 85 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# pySUbyT Examples

The source code comes with a number of included tests found in the `./tests` folder. These are used to automtaically ensure changes to the code do not introduce breaking any of the guaranteed features of the package.

They also serve as a great way to learn about how you could be using pySUbyT for your own projects.

The `./tests` folder largely contains:
| subfolder path | holding | for
|-----------------------|-------------------|----------
| `./tests/in` | data input files | that are injected into the context so the templates can access them. `data.csv` makes up for the core input made available as `_`, while the various `data-name` sections provide auxilary sets available as `sets['name']`
| `./tests/templates` | template files | showing off the various features, their names also hold the modifieres to be applied when executing them
| `./tests/out` | resulting output | matching the names of the template files, these contain the expected outcome of correctly executing the templates with the given inputs

Running these tests yourself assumes:
* a source-code checkout of the repository
* a nicely set up virtualenv

All tests are executed automatically by executing `make test`, but you can run each example yourself through the provided cli Command

Below we describe each test-template and cross-reference the [features](./styleguide.html#a-supported-features) they are made to highlight.

## 01-basic.ttl
> see [tests/templates/01-basic.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl)
``` bash
$ pysubyt --templates tests/templates --input tests/in/data.csv --name 01-basic.ttl
```

This straightforward template just converts each row in the provided data.csv into a bunch of templated triples in `text/turtle` format.

In doing so it uses a number of basic helpful techniques provided by pySUbyT:
- [uritemplate-expansion](./features.md#uritemplate-expansion)
- [regex-replacements](./features.md#regex-replacements)
- [turtle formatting](./features.md#turtle-formatting)
- [process control indicators](./features.md#process-control-indicators)



TODO follow the above pattern and further describe the essence of all Examples

## 02-collection_no-it.ttl
> see [tests/templates/02-collection_no-it.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection_no-it.tt)
``` bash
```

## 02-collection.ttl
``` bash
```

## 03-demo-j2_no-it.ttl
``` bash
```

## 04-json-team_no-it.ttl
``` bash
```

## 05-jsonify_no-it.json
``` bash
```

## 06-singlejson_no-it.ttl
``` bash
```

## 07-folderinput_no-it.ttl
``` bash
```

## 08-singlexml_no-it.ttl
``` bash
```

## 09-mixedxml_no-it.ttl
``` bash
```

## 10-csv-experiment_no-it.ttl
``` bash
```

## 11-schemadriven.ttl
``` bash
```
120 changes: 120 additions & 0 deletions docs/features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Supported features

Below is a description of the main features that are currently supported in PySUByT together with links to [examples](./examples.md) of how to use them.

If you would like to have features added, please check out the already reported issues and [create a new one](https://github.com/vliz-be-opsci/pysubyt/issues/new) if your specific idea isn't listed yet.

## uritemplate-expansion:

We recommend using the built-in function `uritexpand(uritemplat, context)` to produce valid URI based on data variables available in a context-object. (and not through cumbersome, non-standard, and often brittle string-concatenation techniques)

This implementation follows the [URITemplates - RFC6570]() standard.

Assuming a context object `_` that looks like:
```json
{
"id": 01472,
"type": 'boat'
}
```

Then a basic template containing:
```jinja
<{{uritexpand("https://ex.org{/type}{#id}",_)}}> a ex:{{_.type}} .
```

Will produce:
```turtle
<https://ex.org/boat#01472> a ex:boat .
```

> see [example-01](./examples.md#01-basic.ttl), and basically every other one in fact, because this is just the way to go!

## regex-replacements:

For basic string-reformatting we provided the `regereplace(regexmatch, replacement, original)` function.

So that e.g.
```jinja
{{regexreplace('^[^:]*:', '', 'all-after-semicolon:is-kept')}}
```

Will simply throw away all text before and including the first semicolon.

> see [example-01](./examples.md#01-basic.ttl)
## turtle formatting:

We consider `text/turtle` to be (contemporary) the triple-format with the best balanced features towards 'ease of reading/writing' by humans at least. More machine-friendly RDF formats can be easily generated from it. Following that thought we assume templates will be easiest to read/write when targeting that output-format.

While the underlying jinjai engine still allows to generate any text based output, we currently have made a special effort to facilitate typical turtle-based string-formatting.

The provided jinja filter that deals with this is called `| ttl(type, quote_char)` and uses a type-argument to indicate the intended effect.

The quote_char to be used can be either `'` or `"` and is optional (defaulting to `'`). Any escaping to the nested content will be applied automatically. Additionally the triple-quote variant will be applied when the content is spread over multiple lines. (an often overlooked [nuance of the turtle spec](https://www.w3.org/TR/turtle/#h4_turtle-literals))

Supported types are:

| type-string | effect |
|-------------|------------|
| @langcode | adds @language-code suffix for translated strings, e.g. "Dit is nederlands"@nl
| xsd:string | adds `^^xsd:string`
| xsd:integer | verifies the content is a valid int (or a string that can be unambiguously converted to one) and adds `^^xsd:integer`
| xsd:double | verifies the content is a valid double (or a string that can be unambiguously converted to one) and adds `^^xsd:double`
| xsd:date | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:date`
| xsd:datetime | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:datetime`
| xsd:boolean | converts any of `'', 'no, 'false', 'off', '0', 0, False` to `false` and everything else to `true`, adds the `^^xsd:boolean`
| xsd:anyURI | adds `^^xsd:anyURI`

> see [example-01](./examples.md#01-basic.ttl)
## file input from various sources:

TODO describe how
* data-access of main set via `_`
* data-access of additional sets via `sets[«name»]`

### csv input:

TODO obvious but useful to mention that fieldnames come from the column names (a line with those is assumed)

> see [example-01](./examples.md#01-basic.ttl)
### xml input:

TODO mention xmladict + mixed content model support

> see [example-08](./examples.md#08-singlexml_no-it.ttl), [example-09](./examples.md#09-mixedxml_no-it.ttl)
### json input:

TODO give a simple exmaple

> see [example-04](./example.md#04-json-team_no-it.ttl), [example-06](./examples.md#06-singlejson_no-it.ttl)
### folder input:

TODO point out the mixed-format support

> see [example-07](./examples.md#07-folderinput_no-it.ttl)
## mapping:

TODO the why and how

> see [example-02](./examples.md#02-collection.ttl), and [example-02-no-it](./examples.md#02-collection_no-it.ttl))
## various execution mode settings:
see [client docu](./cli.md)

## template management features provided by Jinja2:
> see [example-03](./examples.md#03-demo-j2_no-it.ttl), and [example-05](./examples.md#05-jsonify_no-it.json),
Please check out the [Jinja documentation](https://jinja.palletsprojects.com/en/3.0.x/) as well.

## process control indicators

TODO explain {{ctrl.isFirst}} {{ctrl.isLast}} {{ctrl.index}}

> see [example-01](./examples.md#01-basic.ttl)
94 changes: 36 additions & 58 deletions docs/styleguide.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,14 @@
# pySUbyT template documentation:
Here you can find template documentation related to PySUByT which consists of:
- [Supported features](#a-supported-features)
- [Style guide](#b-style-guide)
# Style Guide


## A. Supported features
Below is a list of features that are currently supported in PySUByT together with examples of how to use. If you would like to have features added, please check out the issues and [create a new one](https://github.com/vliz-be-opsci/pysubyt/issues/new) if it isn't listed.

- uritemplate-expansion:
see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl)
- regex-replacements:
see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl)
- turtle formatting:
see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl)

- file input from various sources:

- csv: see [test-10](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/10-csv-experiment_no-it.ttl)
- xml: see [test-08](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/08-singlexml_no-it.ttl) and [test-09](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/09-mixedxml_no-it.ttl)
- json: see [test-04](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/04-json-team_no-it.ttl) and [test-06](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/06-singlejson_no-it.ttl)

- folder input:
see [test-07](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/07-folderinput_no-it.ttl)

- mapping:
see [test-02](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection.ttl) (and [test-02-no-it](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection_no-it.ttl))

- various mode settings: see [client docu](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli.md)

- template management features provided by Jinja2
see [test-03](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/03-demo-j2_no-it.ttl) and [test-05](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/05-jsonify_no-it.json), (please check out the [Jinja documentation](https://jinja.palletsprojects.com/en/3.0.x/) as well).

All examples of how to PySUByT can be used, including all input data, can be found in [/pysubyt/tests/templates](https://github.com/vliz-be-opsci/pysubyt/tree/main/tests/templates).

___

## B. Style Guide
Here we provide a style guide for PySUByT templates in order to make your templates more FAIR. It provides general guidelines on how to structure template folders and how to style the templates themselves, together with some tips and tricks:
This style guide for PySUByT templates is provided to make your templates more FAIR. It provides general guidelines on how to structure template folders and how to style the templates themselves, together with some tips and tricks:

1. [folder structure](#1-folder-structure)
2. [template structure](#2-template-structure)

### 1. Folder structure
## 1. Folder structure
To improve readability within templates it is easiest to have 'include' and 'macro' folders in the same directory as your template. We suggest the following folder structure:

```
template/
include/
include_ex.ldt
Expand All @@ -55,21 +20,31 @@ To improve readability within templates it is easiest to have 'include' and 'mac
template_1.ldt
template_2.ldt
...
```

Depending on your own preference (and the syntax-highjlighting support of your editor) you can apply the following file-extensions to the template files:
* `.tld` (for linked-data-template) making them stand out and easily recognisable
* `*.ttl` (or any other targeted extension) which makes them naturally fit the format of the results they should produce
* `*.ttl.j2` showing you recognise their Jinja2 origins
* anything else

All of the above will just work. If you work on these in a team, we strongly suggest agreeing on these kind of details to avoid confusion, surprise, and worse: wasted time.


There are several (non-mutually exclusive) ways to construct and structure the include-, macro- and template-files:
1. following certain profiles or frameworks, such as DCAT-APs, INSPIRE, I-ADOPT, ...
2. following certain ontology models, when working with template to generate triples, such as SSN/SOSA, PROV-O, ....
3. following recurring data structure(s), for example sensor observation, samplings, measured parameters with unit information, provenance information, ...
2. following certain ontology models, when working with template to generate triples, such as SSN/SOSA, PROV-O, ....
3. following recurring data structure(s), for example sensor observation, samplings, measured parameters with unit information, provenance information, ...
4. ...

Following one or more of these methods to contruct your files will increase their re-useability.
Following one or more of these methods to contruct your files will increase their re-useability.

**Tip:** Use logical and meaningful names for your marco-, include and template-files that indicate what they're about. So include names of for example followed frameworks, ontologies, etc. in those filenames.

**Tip:** Follow standard practices within your data files as well (e.g. column names, description of column names, ...); this allows to construct more re-useable macro's/templates.


### 2. Template structure
## 2. Template structure
PySUByT makes use of the **Jinja templating engine**, so for basic template design of templates themselves we would like to refer to the excellent [Template Designer Documentation](https://jinja.palletsprojects.com/en/3.0.x/templates/). Please check it out!

In order to further improve the readability of templates we propose to follow this general structure of statements within your template:
Expand All @@ -79,52 +54,54 @@ In order to further improve the readability of templates we propose to follow th
3. Macro statements
4. Other template statements

**Header**
Each template should provide a comment header that includes a set of fields in order to make the templates self-describing.This header should include a minimal set of fields:
- **Name** - the name of the template,
### Header
Each template should provide a comment header that includes a set of fields in order to make the templates self-describing.This header should include a minimal set of fields:
- **Name** - the name of the template,
- **Description** - a short description of the context,
- **Author** - name of person that made the template,
- **Author** - name of person that made the template,
- **Date** - the date on which the template was created/updated,
- **Inputs** - All input files used in the template; depending on the use-case these consist of:
- input-file,
- sets-file(s),
- include-file(s),
- macro-file(s)

Depending on the context, additional fields can be added such as:
Depending on the context, additional fields can be added such as:
- **Target vocabs** - listing the vocabs used in the template; this can also reference a file
- **Mode** - when settings diverge from the default
- ...

**Identation**
### Identation
{% Statements %} should be indented.
{{ Expressions }} should not have any indentation.
{# Comments #} can occur through the template.
{# Comments #} can occur through the template.


**An example:**
### An example

- _Folder structure:_

```
template/
include/
prefixes.ldt
macro/
csvw_TableSchema.ldt
ARMS_Samples_IJI_template_macro.ldt

```

- _Template structure:_
```jinja
{# Template 'ARMS_Samples_IJI_template_macrotest.ldt'
Description: 'Template to generate triples from ARMS Samples IJI data.'

```jinja
{# Template 'ARMS_Samples_IJI_template_macrotest.ldt'
Description: 'Template to generate triples from ARMS Samples IJI data.'
Author: Laurian Van Maldeghem
Date: 01/03/2022
Target vocabs: (see prefixes.ldt)
Inputs:
Inputs:
- input-file: ARMS_Samples_IJI.csv
- set-file(s): ARMS_Samples_IJI_description.csv as tableSchema
- mode: (default)
#}
#}
{%- include 'include/prefixes.ldt' -%}
Expand All @@ -141,3 +118,4 @@ Depending on the context, additional fields can be added such as:
{%- endif %}
{%- endfor -%}
.
```

0 comments on commit 40eec63

Please sign in to comment.