Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/130 integration queries #138

Merged
merged 21 commits into from
Sep 7, 2022
Merged

Conversation

SvenLieber
Copy link
Collaborator

This feature replaces the implementation of the data integration based on already serialized SPARQL queries with an implementation based on dynamically generated SPARQL queries.

This will ease the integration of new data sources, because instead of manually creating several new SPARQL queries (or copy/paste and adapt existing ones) which is prone to error, only a lightweight CSV configuration needs to be given and the queries to integrate a data source are created automatically on-demand.

  • Test cases for the data integration were added and both the old and new implementation succeed in all tests.
  • Based on statistics and comparison of corpus versions we could verify that the new implementation leads to the same results for the integration of book data.
    Using the new implementation, more contributors are listed in the corpus Excel file, but this is due to the fact that the old implementation apparently could not fetch all contributor label properties properly. Hence many translations contained contributors with "missing name" which resulted in less contributors for the actual person contributors list.

We noticed that the test coverage is not satisfying yet, but this can be tackled in a later issue: #137

…plementations and renamed integration queries directory

due to renaming, the query_builder is now importable
…sed data integration

this implementations uses a query builder to generate SPARQL UPDATE queries on the fly based on a configuration instead of reading already serialized SPARQL queries from disk
The only thing missing are the handling of the respective own data source ID via dcterms:identifier and adding the binding using the UUID function
…n of records

Syntax errors in the genreated SPARQL queries were fixed, they still don't seem to work though which requires investigation why
…QL queries works

The number of translations remained the same, there are more contributors but this is due to the fact that we have less 'missing name' contributors (20 instead of 280)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant