Feature/130 integration queries #138

SvenLieber · 2022-09-07T07:51:25Z

This feature replaces the implementation of the data integration based on already serialized SPARQL queries with an implementation based on dynamically generated SPARQL queries.

This will ease the integration of new data sources, because instead of manually creating several new SPARQL queries (or copy/paste and adapt existing ones) which is prone to error, only a lightweight CSV configuration needs to be given and the queries to integrate a data source are created automatically on-demand.

Test cases for the data integration were added and both the old and new implementation succeed in all tests.
Based on statistics and comparison of corpus versions we could verify that the new implementation leads to the same results for the integration of book data.
Using the new implementation, more contributors are listed in the corpus Excel file, but this is due to the fact that the old implementation apparently could not fetch all contributor label properties properly. Hence many translations contained contributors with "missing name" which resulted in less contributors for the actual person contributors list.

We noticed that the test coverage is not satisfying yet, but this can be tackled in a later issue: #137

…gration #130

… works

…SBN13

…plementations and renamed integration queries directory due to renaming, the query_builder is now importable

…sed data integration this implementations uses a query builder to generate SPARQL UPDATE queries on the fly based on a configuration instead of reading already serialized SPARQL queries from disk

The only thing missing are the handling of the respective own data source ID via dcterms:identifier and adding the binding using the UUID function

…n of records Syntax errors in the genreated SPARQL queries were fixed, they still don't seem to work though which requires investigation why

…ntations

…#130

…uality library authorities

…QL queries works The number of translations remained the same, there are more contributors but this is due to the fact that we have less 'missing name' contributors (20 instead of 280)

SvenLieber added 21 commits August 22, 2022 10:31

added overview of resources for different test cases

4298838

added a description of needed testdata for our SPARQL-based data inte…

ad3ce6d

…gration #130

first draft of manifestation test data

344b152

added contributor test data for data integration test

50f16d4

added script to create RDF out of CSV test data

50a60e0

first suceeding test cases to verify that the contributor integration…

31a300a

… works

added integration test for integration of manifestations via ISBN10/I…

282b630

…SBN13

first draft of script to generate thecontributor update queries #130

79cca50

contributor-integration integration-tests abstracted for different im…

5f28acc

…plementations and renamed integration queries directory due to renaming, the query_builder is now importable

first draft of a different implementation to execute SPARQL UPDATE-ba…

001e18f

…sed data integration this implementations uses a query builder to generate SPARQL UPDATE queries on the fly based on a configuration instead of reading already serialized SPARQL queries from disk

The query builder for Contributor creation is now almost complete

29d971c

The only thing missing are the handling of the respective own data source ID via dcterms:identifier and adding the binding using the UUID function

new integration implementation also generates queries for the creatio…

e09a924

…n of records Syntax errors in the genreated SPARQL queries were fixed, they still don't seem to work though which requires investigation why

Successfully tested new contributor data integration implementation #130

74730b9

Successfully generating creation queries for organizations #130

1c1fca4

first draft of query generation for manifestations

bd98832

successfully tested query generation for manifestation integration #130

a506044

changed inheritance to easy later addition of tests for other impleme…

724e08f

…ntations

fix: upload a file by name

6b66729

fixed issues and use SPARQL query generation in the integraiton script …

0cf0da9

…#130

changed order of contributor integration: after ISNI first the high q…

fcf35ea

…uality library authorities

New statistics to verify that the integration based on generated SPAR…

9e77689

…QL queries works The number of translations remained the same, there are more contributors but this is due to the fact that we have less 'missing name' contributors (20 instead of 280)

SvenLieber merged commit 5f34c73 into main Sep 7, 2022

SvenLieber mentioned this pull request Sep 7, 2022

Update data integration SPARQL queries to be configurable instead of containing hard coded named graphs #130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/130 integration queries #138

Feature/130 integration queries #138

SvenLieber commented Sep 7, 2022

Feature/130 integration queries #138

Feature/130 integration queries #138

Conversation

SvenLieber commented Sep 7, 2022