-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding testing check_data function to load_data #458
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Changing back to WIP to add: (1) check that there is a column of data corresponding to everything mentioned in the data mapping, and (2) a data report that tells the shape of the data and what columns are included |
@sabinala , what is the status of this PR? |
@SamWitty the status of this PR is that it's still in progress. I need to convert the |
…er-inside-of-load_data merging main into this branch
…er-inside-of-load_data merging main into this branch
…er-inside-of-load_data Merging main into this branch
@djinnome let me know if you have questions on this. I added a |
@djinnome the only issue is that I was not able to create a dataset with "NaN" values in |
@djinnome let's talk about this during our meeting later...it looks like some tests aren't passing, but the problems are coming from tests for visuals and interruptions? When I run |
…er-inside-of-load_data Merging main into this branch.
…/github.com/ciemss/pyciemss into 454-create-data-checker-inside-of-load_data Merging local and remote.
This PR is ready to go, but currently blocked by #481. (Still getting this issue after having merged main into this branch and running |
I'm confused as to why it appears this PR is failing on linting...when I run |
I think it is because local flake8 is out of sync with the flake8 on the CI. Should we upgrade local flake8 or downgrade the CI flake8 @SamWitty ? |
Please update local flake8. Thanks for checking! |
Cleaning up data report
@SamWitty @djinnome I updated local flake8 with ===================================== FAILURES ======================================
_______________ test_export_PNG[schema_file2-ref_file2-trajectories] ________________
schema_file = PosixPath('/Users/altu809/Projects/pyciemss/pyciemss/visuals/schemas/trajectories.vg.json')
ref_file = PosixPath('/Users/altu809/Projects/pyciemss/tests/visuals/reference_images/trajectories.png')
name = 'trajectories'
@pytest.mark.parametrize("schema_file, ref_file, name", schemas(ref_ext="png"))
def test_export_PNG(schema_file, ref_file, name):
"""
Test all default schema files against the reference files for PNG files
schema_file: default schema files saved within the visuals module
ref_file: compare the created png to this reference file
name: stem name of reference file
"""
with open(schema_file) as f:
schema = json.load(f)
image = plots.ipy_display(schema, format="PNG", dpi=72).data
save_result(image, name, "png")
test_threshold = 0.04
JS_boolean, JS_score = png_matches(image, ref_file, test_threshold)
> assert (
JS_boolean
), f"{name}: PNG Histogram divergence: Shannon Jansen value {JS_score} > {test_threshold} "
E AssertionError: trajectories: PNG Histogram divergence: Shannon Jansen value 0.1562242136437859 > 0.04
E assert False
tests/visuals/test_schemas.py:148: AssertionError |
This is not a linting error. The test itself is failing. This happens sometimes, as the tests are randomized. Rerunning failing tests should work most of the time. |
"The first column must be named 'Timestamp' and contain the time corresponding to each row of data." | ||
) | ||
|
||
# Check that there are no NaN values or empty entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a constraint that we want to impose? It might be better if we could handle ragged data, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djinnome It would be better, but I'm not sure where to go with that. How would that propagate to calibrate? I think it's probably best to throw an error message for now, and create a new issue to handle ragged data in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. Perhaps we should create an issue for a feature request to relax the missing data constraint.
This PR adds a function to check for formatting errors in a dataset within the
load_data
function that is called wheneversample
is used. I've also included a notebook where these errors are produced.Closes #454
Closes #290