diff --git a/README.md b/README.md index 6080daa..c261775 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,15 @@ # py-SUbyT -Py Semantic Uplifting by Template - A python module for Linked Data production (aka semantic uplifting) through Templating + A Python library for Semantic Uplifting by Templates. -### Usage -Please check out the Py-SUByt [user guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-usage.md) and [style guide](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli-style.md)! + An easy way (through python) to produce Linked Data + (aka semantic uplifting) + from classic data files (CSV, XML, JSON) into triples (RDF, turtle) + through jinja-Templating + +### Usage and further reading +Please check out the Py-SUbyt documentation. Namely: +- the [user guide](./docs/cli-usage.md) +- the supported [features](./docs /features.md) +- the [examples](./docs/examples.md) +- and the [style guide](./docs/cli-style.md)! diff --git a/docs/examples.md b/docs/examples.md new file mode 100644 index 0000000..3aef291 --- /dev/null +++ b/docs/examples.md @@ -0,0 +1,85 @@ +# pySUbyT Examples + +The source code comes with a number of included tests found in the `./tests` folder. These are used to automtaically ensure changes to the code do not introduce breaking any of the guaranteed features of the package. + +They also serve as a great way to learn about how you could be using pySUbyT for your own projects. + +The `./tests` folder largely contains: +| subfolder path | holding | for +|-----------------------|-------------------|---------- +| `./tests/in` | data input files | that are injected into the context so the templates can access them. `data.csv` makes up for the core input made available as `_`, while the various `data-name` sections provide auxilary sets available as `sets['name']` +| `./tests/templates` | template files | showing off the various features, their names also hold the modifieres to be applied when executing them +| `./tests/out` | resulting output | matching the names of the template files, these contain the expected outcome of correctly executing the templates with the given inputs + +Running these tests yourself assumes: +* a source-code checkout of the repository +* a nicely set up virtualenv + +All tests are executed automatically by executing `make test`, but you can run each example yourself through the provided cli Command + +Below we describe each test-template and cross-reference the [features](./styleguide.html#a-supported-features) they are made to highlight. + +## 01-basic.ttl +> see [tests/templates/01-basic.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl) + +``` bash +$ pysubyt --templates tests/templates --input tests/in/data.csv --name 01-basic.ttl +``` + +This straightforward template just converts each row in the provided data.csv into a bunch of templated triples in `text/turtle` format. + +In doing so it uses a number of basic helpful techniques provided by pySUbyT: +- [uritemplate-expansion](./features.md#uritemplate-expansion) +- [regex-replacements](./features.md#regex-replacements) +- [turtle formatting](./features.md#turtle-formatting) +- [process control indicators](./features.md#process-control-indicators) + + + +TODO follow the above pattern and further describe the essence of all Examples + +## 02-collection_no-it.ttl +> see [tests/templates/02-collection_no-it.ttl](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection_no-it.tt) + +``` bash +``` + +## 02-collection.ttl +``` bash +``` + +## 03-demo-j2_no-it.ttl +``` bash +``` + +## 04-json-team_no-it.ttl +``` bash +``` + +## 05-jsonify_no-it.json +``` bash +``` + +## 06-singlejson_no-it.ttl +``` bash +``` + +## 07-folderinput_no-it.ttl +``` bash +``` + +## 08-singlexml_no-it.ttl +``` bash +``` + +## 09-mixedxml_no-it.ttl +``` bash +``` + +## 10-csv-experiment_no-it.ttl +``` bash +``` + +## 11-schemadriven.ttl +``` bash +``` diff --git a/docs/features.md b/docs/features.md new file mode 100644 index 0000000..e7830c8 --- /dev/null +++ b/docs/features.md @@ -0,0 +1,120 @@ +# Supported features + +Below is a description of the main features that are currently supported in PySUByT together with links to [examples](./examples.md) of how to use them. + +If you would like to have features added, please check out the already reported issues and [create a new one](https://github.com/vliz-be-opsci/pysubyt/issues/new) if your specific idea isn't listed yet. + +## uritemplate-expansion: + +We recommend using the built-in function `uritexpand(uritemplat, context)` to produce valid URI based on data variables available in a context-object. (and not through cumbersome, non-standard, and often brittle string-concatenation techniques) + +This implementation follows the [URITemplates - RFC6570]() standard. + +Assuming a context object `_` that looks like: +```json +{ + "id": 01472, + "type": 'boat' +} +``` + +Then a basic template containing: +```jinja +<{{uritexpand("https://ex.org{/type}{#id}",_)}}> a ex:{{_.type}} . +``` + +Will produce: +```turtle + a ex:boat . +``` + +> see [example-01](./examples.md#01-basic.ttl), and basically every other one in fact, because this is just the way to go! + + +## regex-replacements: + +For basic string-reformatting we provided the `regereplace(regexmatch, replacement, original)` function. + +So that e.g. +```jinja +{{regexreplace('^[^:]*:', '', 'all-after-semicolon:is-kept')}} +``` + +Will simply throw away all text before and including the first semicolon. + +> see [example-01](./examples.md#01-basic.ttl) + +## turtle formatting: + +We consider `text/turtle` to be (contemporary) the triple-format with the best balanced features towards 'ease of reading/writing' by humans at least. More machine-friendly RDF formats can be easily generated from it. Following that thought we assume templates will be easiest to read/write when targeting that output-format. + +While the underlying jinjai engine still allows to generate any text based output, we currently have made a special effort to facilitate typical turtle-based string-formatting. + +The provided jinja filter that deals with this is called `| ttl(type, quote_char)` and uses a type-argument to indicate the intended effect. + +The quote_char to be used can be either `'` or `"` and is optional (defaulting to `'`). Any escaping to the nested content will be applied automatically. Additionally the triple-quote variant will be applied when the content is spread over multiple lines. (an often overlooked [nuance of the turtle spec](https://www.w3.org/TR/turtle/#h4_turtle-literals)) + +Supported types are: + +| type-string | effect | +|-------------|------------| +| @langcode | adds @language-code suffix for translated strings, e.g. "Dit is nederlands"@nl +| xsd:string | adds `^^xsd:string` +| xsd:integer | verifies the content is a valid int (or a string that can be unambiguously converted to one) and adds `^^xsd:integer` +| xsd:double | verifies the content is a valid double (or a string that can be unambiguously converted to one) and adds `^^xsd:double` +| xsd:date | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:date` +| xsd:datetime | verifies the content is a valid date (or a string that can be unambiguously converted to one), formats using ISO_8601 and adds `^^xsd:datetime` +| xsd:boolean | converts any of `'', 'no, 'false', 'off', '0', 0, False` to `false` and everything else to `true`, adds the `^^xsd:boolean` +| xsd:anyURI | adds `^^xsd:anyURI` + +> see [example-01](./examples.md#01-basic.ttl) + +## file input from various sources: + +TODO describe how +* data-access of main set via `_` +* data-access of additional sets via `sets[«name»]` + +### csv input: + +TODO obvious but useful to mention that fieldnames come from the column names (a line with those is assumed) + +> see [example-01](./examples.md#01-basic.ttl) + +### xml input: + +TODO mention xmladict + mixed content model support + +> see [example-08](./examples.md#08-singlexml_no-it.ttl), [example-09](./examples.md#09-mixedxml_no-it.ttl) + +### json input: + +TODO give a simple exmaple + +> see [example-04](./example.md#04-json-team_no-it.ttl), [example-06](./examples.md#06-singlejson_no-it.ttl) + +### folder input: + +TODO point out the mixed-format support + +> see [example-07](./examples.md#07-folderinput_no-it.ttl) + +## mapping: + +TODO the why and how + +> see [example-02](./examples.md#02-collection.ttl), and [example-02-no-it](./examples.md#02-collection_no-it.ttl)) + +## various execution mode settings: +see [client docu](./cli.md) + +## template management features provided by Jinja2: +> see [example-03](./examples.md#03-demo-j2_no-it.ttl), and [example-05](./examples.md#05-jsonify_no-it.json), + +Please check out the [Jinja documentation](https://jinja.palletsprojects.com/en/3.0.x/) as well. + +## process control indicators + +TODO explain {{ctrl.isFirst}} {{ctrl.isLast}} {{ctrl.index}} + +> see [example-01](./examples.md#01-basic.ttl) diff --git a/docs/styleguide.md b/docs/styleguide.md index 9242cdb..93e0e9b 100644 --- a/docs/styleguide.md +++ b/docs/styleguide.md @@ -1,49 +1,14 @@ -# pySUbyT template documentation: -Here you can find template documentation related to PySUByT which consists of: -- [Supported features](#a-supported-features) -- [Style guide](#b-style-guide) +# Style Guide - -## A. Supported features -Below is a list of features that are currently supported in PySUByT together with examples of how to use. If you would like to have features added, please check out the issues and [create a new one](https://github.com/vliz-be-opsci/pysubyt/issues/new) if it isn't listed. - -- uritemplate-expansion: -see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl) -- regex-replacements: -see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl) -- turtle formatting: -see [test-01](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/01-basic.ttl) - -- file input from various sources: - - - csv: see [test-10](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/10-csv-experiment_no-it.ttl) - - xml: see [test-08](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/08-singlexml_no-it.ttl) and [test-09](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/09-mixedxml_no-it.ttl) - - json: see [test-04](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/04-json-team_no-it.ttl) and [test-06](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/06-singlejson_no-it.ttl) - -- folder input: -see [test-07](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/07-folderinput_no-it.ttl) - -- mapping: -see [test-02](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection.ttl) (and [test-02-no-it](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/02-collection_no-it.ttl)) - -- various mode settings: see [client docu](https://github.com/vliz-be-opsci/pysubyt/blob/main/docs/cli.md) - -- template management features provided by Jinja2 -see [test-03](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/03-demo-j2_no-it.ttl) and [test-05](https://github.com/vliz-be-opsci/pysubyt/blob/main/tests/templates/05-jsonify_no-it.json), (please check out the [Jinja documentation](https://jinja.palletsprojects.com/en/3.0.x/) as well). - -All examples of how to PySUByT can be used, including all input data, can be found in [/pysubyt/tests/templates](https://github.com/vliz-be-opsci/pysubyt/tree/main/tests/templates). - -___ - -## B. Style Guide -Here we provide a style guide for PySUByT templates in order to make your templates more FAIR. It provides general guidelines on how to structure template folders and how to style the templates themselves, together with some tips and tricks: +This style guide for PySUByT templates is provided to make your templates more FAIR. It provides general guidelines on how to structure template folders and how to style the templates themselves, together with some tips and tricks: 1. [folder structure](#1-folder-structure) 2. [template structure](#2-template-structure) -### 1. Folder structure +## 1. Folder structure To improve readability within templates it is easiest to have 'include' and 'macro' folders in the same directory as your template. We suggest the following folder structure: +``` template/ include/ include_ex.ldt @@ -55,21 +20,31 @@ To improve readability within templates it is easiest to have 'include' and 'mac template_1.ldt template_2.ldt ... +``` + +Depending on your own preference (and the syntax-highjlighting support of your editor) you can apply the following file-extensions to the template files: + * `.tld` (for linked-data-template) making them stand out and easily recognisable + * `*.ttl` (or any other targeted extension) which makes them naturally fit the format of the results they should produce + * `*.ttl.j2` showing you recognise their Jinja2 origins + * anything else + + All of the above will just work. If you work on these in a team, we strongly suggest agreeing on these kind of details to avoid confusion, surprise, and worse: wasted time. + There are several (non-mutually exclusive) ways to construct and structure the include-, macro- and template-files: 1. following certain profiles or frameworks, such as DCAT-APs, INSPIRE, I-ADOPT, ... - 2. following certain ontology models, when working with template to generate triples, such as SSN/SOSA, PROV-O, .... - 3. following recurring data structure(s), for example sensor observation, samplings, measured parameters with unit information, provenance information, ... + 2. following certain ontology models, when working with template to generate triples, such as SSN/SOSA, PROV-O, .... + 3. following recurring data structure(s), for example sensor observation, samplings, measured parameters with unit information, provenance information, ... 4. ... - Following one or more of these methods to contruct your files will increase their re-useability. + Following one or more of these methods to contruct your files will increase their re-useability. **Tip:** Use logical and meaningful names for your marco-, include and template-files that indicate what they're about. So include names of for example followed frameworks, ontologies, etc. in those filenames. **Tip:** Follow standard practices within your data files as well (e.g. column names, description of column names, ...); this allows to construct more re-useable macro's/templates. -### 2. Template structure +## 2. Template structure PySUByT makes use of the **Jinja templating engine**, so for basic template design of templates themselves we would like to refer to the excellent [Template Designer Documentation](https://jinja.palletsprojects.com/en/3.0.x/templates/). Please check it out! In order to further improve the readability of templates we propose to follow this general structure of statements within your template: @@ -79,11 +54,11 @@ In order to further improve the readability of templates we propose to follow th 3. Macro statements 4. Other template statements -**Header** -Each template should provide a comment header that includes a set of fields in order to make the templates self-describing.This header should include a minimal set of fields: -- **Name** - the name of the template, +### Header +Each template should provide a comment header that includes a set of fields in order to make the templates self-describing.This header should include a minimal set of fields: +- **Name** - the name of the template, - **Description** - a short description of the context, -- **Author** - name of person that made the template, +- **Author** - name of person that made the template, - **Date** - the date on which the template was created/updated, - **Inputs** - All input files used in the template; depending on the use-case these consist of: - input-file, @@ -91,40 +66,42 @@ Each template should provide a comment header that includes a set of fields in o - include-file(s), - macro-file(s) -Depending on the context, additional fields can be added such as: +Depending on the context, additional fields can be added such as: - **Target vocabs** - listing the vocabs used in the template; this can also reference a file - **Mode** - when settings diverge from the default - ... -**Identation** +### Identation {% Statements %} should be indented. {{ Expressions }} should not have any indentation. -{# Comments #} can occur through the template. +{# Comments #} can occur through the template. -**An example:** +### An example - _Folder structure:_ - +``` template/ include/ prefixes.ldt macro/ csvw_TableSchema.ldt ARMS_Samples_IJI_template_macro.ldt - +``` + - _Template structure:_ - ```jinja - {# Template 'ARMS_Samples_IJI_template_macrotest.ldt' - Description: 'Template to generate triples from ARMS Samples IJI data.' + +```jinja + {# Template 'ARMS_Samples_IJI_template_macrotest.ldt' + Description: 'Template to generate triples from ARMS Samples IJI data.' Author: Laurian Van Maldeghem Date: 01/03/2022 Target vocabs: (see prefixes.ldt) - Inputs: + Inputs: - input-file: ARMS_Samples_IJI.csv - set-file(s): ARMS_Samples_IJI_description.csv as tableSchema - mode: (default) - #} + #} {%- include 'include/prefixes.ldt' -%} @@ -141,3 +118,4 @@ Depending on the context, additional fields can be added such as: {%- endif %} {%- endfor -%} . +```