Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15 provide automation of csv yaml conversion and validation #16

Merged

Conversation

turbomam
Copy link
Collaborator

@turbomam turbomam commented Jun 26, 2024

some people would say that I included too many generated files in this PR. It makes it harder to review and will become problematic with respect to the size of the repo if the number of size of the example data files grows significantly. We could add these directories to .gitignore:

  • examples/output/
  • project/
  • src/mifc/datamodel/

Thanks to my contributions, your build process (make all-all) deviates from the make all offered in the initial cookiecutter. Please especially review

  • Makefile
  • project.Makefile

make all-all now includes examples-all, so any TSV file in src/data/examples/TSV/ will be included in the subsequent examples validation. Please note that the names of the files in src/data/examples/TSV/ must follow the this pattern:

{target class name}-{index slot name}-{anything else}.tsv

@turbomam turbomam requested a review from kaiiam June 26, 2024 13:48
@turbomam turbomam linked an issue Jun 26, 2024 that may be closed by this pull request
@turbomam
Copy link
Collaborator Author

I haven't included an automated YAML to TSV converter yet. Would you be interested in trying that based on tsv-data-to-yaml and src/data/examples/valid/%.yaml ?

@turbomam
Copy link
Collaborator Author

Did you make some changes to .github/workflows/deploy-docs.yaml ?

@turbomam
Copy link
Collaborator Author

I am going to make another commit in which I change the last command in .github/workflows/main.yaml from run: make test to make all-all. In the future, if you have many more or much larger examples data files, running that as part of every pull request check might be slower than you would like. (Although the build and check process will get slower as your schema gets more sophisticated anyway).

@turbomam
Copy link
Collaborator Author

I think we should scrutinize all of the GitHub actions in .github/workflows at some point. We could even bring in another LinkML developer to suggest changes to us, or to integrate some of the decisions we've made back into the cookiecutter.

@turbomam
Copy link
Collaborator Author

my last commit starts ignoring

  • examples/output/
  • project/
  • src/mifc/datamodel/

but that doesn't make this PR any smaller

which I added for tracking doc gen sucess
@kaiiam
Copy link
Owner

kaiiam commented Jun 26, 2024

Thanks @turbomam!

I included too many generated files in this PR

Maybe but I do appreciate the effort and I can still review it.

Did you make some changes to .github/workflows/deploy-docs.yaml ?

I might have but it's been a while the initial docs didn't work right away when I made it with the cookie cutter originally. I had to adjust some settings I though I'd documented it in an issue somewhere but not sure at the moment.

In the future, if you have many more or much larger examples data files ...

I guess the plan for the test data is to not have too many rows of data in the tsvs or to have too long of containers, but it's probably best to have a a bunch of tests for all the different attributes with small valid and invalid examples.

scrutinize all of the GitHub actions in .github/workflows

Are there potentially issues? Or is this to suggest to add to the cookie cutter? Happy for this repo to be a usecase for testing and improving LinkML as we go. Up to you @turbomam if you think it's worth brining in other LinkML people. I'm trusting your expertise.

@kaiiam
Copy link
Owner

kaiiam commented Jun 26, 2024

@turbomam I'm also curious why src/mifc/datamodel/mifc.py got deleted?

@kaiiam
Copy link
Owner

kaiiam commented Jun 26, 2024

When I ran make all-all it was fine, but I'm still not sure why src/mifc/datamodel/mifc.py seems to be deleted in the files changed? After running make all all I see it perhaps it's in the gitnore or something or I'm missing something in the files changes as it's quite a few for this PR.

Copy link
Owner

@kaiiam kaiiam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said above make all-all works and I'm trusting that the changes are in line with what linkML should be or it is something people can adapt to. I didn't mean for the MIFC repo to deviate from any standard practices if possible, although I do want to have the tsv as validation data. Thanks again @turbomam and I'll merge now.

@kaiiam kaiiam merged commit 7a241ee into main Jun 26, 2024
2 checks passed
@kaiiam kaiiam deleted the 15-provide-automation-of-csv-yaml-conversion-and-validation branch June 26, 2024 15:45
@kaiiam kaiiam added help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Jun 26, 2024
@turbomam
Copy link
Collaborator Author

This PR removes src/mifc/datamodel/mifc.py as a policy of .gitignore, on the grounds that its derived from the schema. Its purely a judgement call. Feel free to revert that ignore rule or any of the other ones I did this morning.

I was thinking we could check Patrick Kalita's work in nmdc-schema regarding what to push to the main branch of thsi repo vs. what to bundle into a GH release or a PyPI release if you're going to make that.

@kaiiam
Copy link
Owner

kaiiam commented Jun 26, 2024

Thanks yes, I'd like to make releases starting here and again later once we reboot or move this repo to a new organization. I'd be good to know what best practices for linkML releases should include. I'd like for example the output MIFC excel sheet so people can use it already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide automation of CSV <-> YAML conversion and validation
2 participants