Add forge methods to acces/query datasets from external sources #367

crisely09 · 2023-12-14T12:47:14Z

This is not complete, there are inconsistencies to be fixed, but I didn't manage to do it on time.
Basically, IT DOESN'T WORK YET.
I leave it open for now.

* load local configuration when configuration tests * keep commons imports together * fix duplicate keyword argument issue * use context path from default config in test_to_resource test * rm extra store_config.pop * refactor store config

* pass view when sparql, elastic call, todo search * rm unless constants from store * turn view to endpoint * endpoint to view param * rename param * rename param2 * keyword param from forge call * missing underscore * git status * make endpoint refac * edit querying notebook to showcase feature, todo set up better view * refac notebook edit * change view creation mapping * check filters not provided as keyword arg * fix querying notebook, retrieve using resource url * test make endpoint function * use *filters for the store interface and implementations

* added timeouts to every requests. call * centralise default request timeout * rm import * use constant in file get

…ething it raises an error

* change signatures to allow for boolean change_schema on update * change_schema implemented for single resource update * refactor batch actions * lint * began batch update schema * change schema many * progress * rm / join * rm useless change * fix * change schema to update schema * update instead of change * example notebook for schema method, todo update * notebook example with update * lint * improve notebook * fix one test * keep unconstrained schema only for update endpoint else _ * same url building in one and many * add timeout * schema id optional in update * rename local parse * rename keep_unconstrained to use_unconstrained_id * rm extra docstring param

* change signatures to allow for boolean change_schema on update * change_schema implemented for single resource update * refactor batch actions * lint * began batch update schema * change schema many * progress * rm / join * rm useless change * fix * change schema to update schema * update instead of change * example notebook for schema method, todo update * notebook example with update * lint * improve notebook * fix one test * keep unconstrained schema only for update endpoint else _ * same url building in one and many * add timeout * schema id optional in update * rename local parse * rm second request for metadata * add query param annotate only if retrieve source is true * retrieval error if issue creating resource from response * rm cross bucket check with source * add todo * separate metadata call if cross bucket and source * refac * fixes and notebook update * check deployment endpoint self, may need to be checking multiple values * revert to self.endpoint * updated notebook to show retrieval * fix response name * add query param annotate only if retrieve source is true * separate metadata call if cross bucket and source * rename keep_unconstrained to use_unconstrained_id * rm extra docstring * better comments * clarify comment * comment fix * improve markdown

* re-do metadata fetch until endpoint is fixed * better notebook * rename variables * code style * fix replace in comments * update comments

…False (#382) * return resource as_json optionally when forge.elastic * as_resource instead of as_json, default True * skeleton to enable building resources from different values in the es response payload * example of forge.elastic as_resource = False in getting started Querying notebook

* Set jsonld context version to 1.1 * this enables non IRI-delimiting character not present in rdflib.plugins.shared.jsonld.context.URI_GEN_DELIMS (e.g '_' in "NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_") to be used when defining jsonld context prefix

…gregated (#392)

Currently the pySHACL throws a ReportableRuntimeError("Evaluation path too deep!\n{}".format(path_str)) exception when evaluating a shape if the length of its transitive closure of the sh:node property is bigger or equal to 30. Give a node shape, this PR addresses this by: * Fixes pyshacl deep nodeshape path eval error by first recursively collecting all the property shapes directly defined (through sh:property) by the node shape or inherited from the node shape parents then link those collected property shapes to the node shape through sh:property and finally remove the node shape <-> parent shape relatioinships * Aligned the expected data model for a shacl shape between the RDF StoreService and DirectoryService:

* Fixed coupled of issues: * forge.prefixes() was raising pandas.Dataframe error "if using all scalar values, you must pass an index" * fixed forge.types() to properly collect types from rdfsercice.class_to_shape * fixed forge.template() when using an rdf model based on a store service: Unable to generate template:'tuple' object has no attribute 'traverse'

* Added support for inference when using the RdfModel: * support for importing ontologies from schemas using owl:imports * use forge.validate(resource, inference="inference_value", type_='AType') with inference_value as in https://github.com/RDFLib/pySHACL/blob/v0.25.0/pyshacl/validate.py#L81. inference_value="rdfs" seems to be enough to extend the resource with the transitive closures of type subClassOf and/or property subPropertyOf relations as per the RDFS entailment rules (https://www.w3.org/TR/rdf-mt/). * Validation now fails when a type not in the resource is provided as value of the type_ argument unless inference is enabled (with inference='rdfs' for example) and the resource type is a subClassOf of type_

…cursive_resolve when resolving a str jsonld context (#402)

* Add alternateName to agent resolver * Added property also to the agent-to-resource mapping

* Make add_image method of a Dataset, and not of KnowledgeGraphForge * Update notebooks with example of dataset.add_image method

resolve with target species returns only species and strain returns strains

…able (#406) * Add method when initializing forge to export environment variable * Remove addition in setup.py and try os.environ instead of os.system

…-file-content-length` to header (#403) * Remove nexus_sdk from nexus store when uploading files * Change content length header to be

* split file get call and property access * split prepare download one * lint --------- Co-authored-by: Leonardo Cristella <leonardo.cristella@epfl.ch>

* rm sdk usage from bluebrainexus store file * rm sdk usage from utils * rm sdk usage from service * rm nexussdk from test * lint * fix patch * change usage of project_fetch function for successful patching in tests * rename module of sdk methods * remove leftover image from store * missing file * restore config * remove s from file name * missing the file, again --------- Co-authored-by: Cristina E. González-Espinoza <crisbeth46@gmail.com>

…Literal value (#397)

* Added batched retrieval

* Check for schema id to use schema endpoint * Add example in unit test * Add missing parameter inside _register_one. Add notebook example of schema handling * check if resource_id is given

crisely09 added 4 commits December 12, 2023 09:14

Add web service

cc8b557

More changes

f6f853e

Clean code and add tests

47e8464

Added methods for forge and configuration file

2ff9d7d

crisely09 added the Dataset label Dec 14, 2023

crisely09 requested a review from MFSY December 14, 2023 12:47

ssssarah and others added 24 commits July 5, 2024 16:13

filter exception fix (#375)

16e1d94

added timeouts to every requests. call (#368)

5fc964c

* added timeouts to every requests. call * centralise default request timeout * rm import * use constant in file get

resolving returns also the hasleafregionpart attribute of the resource

caab280

deflatten is independent of order of items. If both prop and prop.som…

d6388b8

…ething it raises an error

removed prints and fixed linting

e358acf

comments for clarity as suggested by Sarah Mouffok

9ca0398

previously to deflatten, we check all columns names are strings

922bfd3

re-do metadata fetch until endpoint is fixed (#381)

41503dc

* re-do metadata fetch until endpoint is fixed * better notebook * rename variables * code style * fix replace in comments * update comments

Use as default Aggregate SPARQL view (#391)

a050dc0

reset resolver sparql view to default, change store sparql view to ag…

d6f38de

…gregated (#392)

improved deflatten function (lower time complexity)

85d6c6f

linting

a36f42c

Increased request timeout constant (#400)

f12b465

Propagated inference value to model when using forge.validate (#399)

7342b58

Expand jsonld context in load_resource_graph_from_source (#401)

f31e6cc

MFSY and others added 22 commits July 5, 2024 16:32

Added AttributeError exception when a None resolver is provided to re…

8106ed8

…cursive_resolve when resolving a str jsonld context (#402)

Add alternateName to agent resolver (#404)

29280b6

* Add alternateName to agent resolver * Added property also to the agent-to-resource mapping

Add add_image method to Dataset entity (#389)

3d8a37c

* Make add_image method of a Dataset, and not of KnowledgeGraphForge * Update notebooks with example of dataset.add_image method

Update prod-forge-nexus.yml

8db6ae8

resolve with target species returns only species and strain returns strains

now it works. (has_rank.id and expanded NCBITaxon)

eb6a56d

example of resolving with strain ResolvingStrategies.ipynb

89634dd

Add method when initializing forge to export pyshacl environment vari…

c5538fe

…able (#406) * Add method when initializing forge to export environment variable * Remove addition in setup.py and try os.environ instead of os.system

Remove nexus_sdk from nexus store when uploading files and add `x-nxs…

c895b7c

…-file-content-length` to header (#403) * Remove nexus_sdk from nexus store when uploading files * Change content length header to be

File fetch (#408)

9476d95

* split file get call and property access * split prepare download one * lint --------- Co-authored-by: Leonardo Cristella <leonardo.cristella@epfl.ch>

Fix org and project metadata (#409)

c198389

Added v0.8.2 release notes (#410)

559e3bd

Fix readthedocs yml connfig error: Value build not found (#411)

aa0cb9a

Fixed building a SPARQL query from a Filter with __ne__ operator and …

50130ec

…Literal value (#397)

Store initialization configuration in forge._config method (#333)

0b3b91f

Refactor batch request mechanism (#383)

7e3844b

* Added batched retrieval

rewrite uri as static method (#413)

7848a54

Check for schema id to use schema endpoint (#415)

42b6f8c

* Check for schema id to use schema endpoint * Add example in unit test * Add missing parameter inside _register_one. Add notebook example of schema handling * check if resource_id is given

Use schema_id in prepare_update (#416)

6777f70

Merge master in dev-datasets

3e26c2e

Merge branch 'master' into dev-datasets

e5fcf75

Fixed tox errors

b1f8d09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add forge methods to acces/query datasets from external sources #367

Add forge methods to acces/query datasets from external sources #367

crisely09 commented Dec 14, 2023

Add forge methods to acces/query datasets from external sources #367

Are you sure you want to change the base?

Add forge methods to acces/query datasets from external sources #367

Conversation

crisely09 commented Dec 14, 2023