-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-3564 - Simplify metadata conversion and validation #55
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -134,6 +134,10 @@ def clean_read(ifile): | |
if line.startswith('Validation failed with following error(s):'): | ||
collect = True | ||
else: | ||
while line and not line.startswith('/'): | ||
# Sometimes there are multiple (possibly redundant) errors listed under a single property, | ||
# we only report the first | ||
Comment on lines
+138
to
+139
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We probably should report that to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I'll make an issue for them. |
||
line = clean_read(open_file) | ||
line2 = clean_read(open_file) | ||
if line is None or line2 is None: | ||
break # EOF | ||
|
@@ -164,6 +168,9 @@ def convert_metadata_attribute(sheet, json_attribute, xls2json_conf): | |
attributes_dict = {} | ||
attributes_dict.update(xls2json_conf[sheet].get('required', {})) | ||
attributes_dict.update(xls2json_conf[sheet].get('optional', {})) | ||
attributes_dict['Scientific Name'] = 'species' | ||
attributes_dict['BioSample Name'] = 'name' | ||
|
||
for attribute in attributes_dict: | ||
if attributes_dict[attribute] == json_attribute: | ||
return attribute | ||
|
@@ -185,7 +192,12 @@ def parse_metadata_property(property_str): | |
|
||
|
||
def parse_sample_metadata_property(property_str): | ||
# Check characteristics | ||
match = re.match(r'/sample/(\d+)/bioSampleObject/characteristics/(\w+)', property_str) | ||
if match: | ||
return 'sample', match.group(1), match.group(2) | ||
# Check name | ||
match = re.match(r'/sample/(\d+)/bioSampleObject/name', property_str) | ||
if match: | ||
return 'sample', match.group(1), 'name' | ||
return None, None, None |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
- column: Tax ID | ||
description: Worksheet Project is missing required header Tax ID | ||
- column: '' | ||
description: 'Error loading problem.xlsx: Exception()' | ||
row: '' | ||
sheet: Project | ||
sheet: '' |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,4 +26,19 @@ | |
should have required property 'bioSampleObject' | ||
/sample/0 | ||
should match exactly one schema in oneOf | ||
/sample/3/bioSampleObject/name | ||
must have required property 'name' | ||
must have required property 'name' | ||
must have required property 'name' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure what's going on here, but this is the output I get from biovalidator... |
||
/sample/3/bioSampleObject/characteristics/organism | ||
must have required property 'organism' | ||
must have required property 'organism' | ||
/sample/3/bioSampleObject/characteristics/Organism | ||
must have required property 'Organism' | ||
/sample/3/bioSampleObject/characteristics/species | ||
must have required property 'species' | ||
/sample/3/bioSampleObject/characteristics/Species | ||
must have required property 'Species' | ||
/sample/3/bioSampleObject/characteristics | ||
must match a schema in anyOf | ||
[0m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having a mechanism that convert a excel spreadsheet into more than one BioSample field ?
We could then provide this in the
spreadsheet2json_conf.yaml
asThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do, it would only be used for this case though as far as I know - the name field goes outside of characteristics so has to be handled differently. I probably won't add it to this PR but we should keep it in mind.